Recognizing ads

View previous topic View next topic Go down

Recognizing ads

Post  theotocopulitos on Thu Dec 13, 2012 4:12 pm

Hi.

I am intrigued by the way you recognize the correct issue using the cover image. How do you do that, how do you define what is similar?

Why am I interested in this, you might add. Well, for many years now, I have preferred c2c scans over noads for archiving purposes, but for reading noads are of course better. I have many times thought about ways to tag (in Comicrack) ads in c2c and thus have a "definitive" scanned archive... but doing so for so many comics is terribly tedious. I thought along the lines that ad pages might show some graphic features not present in comic pages, and try to automate that to some extent, but I really do not know enough about image features recognition to tackle that....

But your approach made me think that (1) marking a few comics ads and (2) extracting the pages marked marked as ads to a directory, and making a script that would try to match other's comics pages against those already known (those pages in the directory), might be a feasible approach...

Thoughts, anyone?

theotocopulitos

Posts : 5
Join date : 2012-12-13

View user profile

Back to top Go down

Re: Recognizing ads

Post  ComicTagger on Thu Dec 13, 2012 7:09 pm

The cover matching is done using the "Average Hash" algorithm described here:
https://www.memonic.com/user/aengus/folder/coding/id/1qVeq

Despite being very simple, it's quite effective at determining similarity between images, especially when you have a small set to compare the reference to.

I don't know how well it would work to detect advertisement pages, though. You would have to have a huge database of hashes, and I suspect you would get lots of false positives (i.e. small hamming distances) when comparing the page in the comic (the one you want to know if it's an ad) to the reference hash list ( which could be probably contain 10's of thousands of ads.)

That said, your idea might be worth trying. Have a look at the ImageHasher.py file in the ComicTagger source, and have at it!


Last edited by ComicTagger on Thu Dec 13, 2012 7:18 pm; edited 1 time in total
avatar
ComicTagger
Admin

Posts : 208
Join date : 2012-12-02

View user profile http://comictagger.forumotion.com

Back to top Go down

Re: Recognizing ads

Post  DP812 on Thu Dec 13, 2012 7:14 pm

I don't even know if this is possible, but if it is, that would be awesome. I've got a bunch of complete runs that I think were Marvel's CD-ROM releases and they have the ads included (and double-page spreads split into separate pages). If this could also be used to remove scanner credit pages, that'd be great, too -- some scanners put their credit pages at the beginning of the file so when I open up CBL in cover view mode, I've got tons of scanner pages instead of the actual covers.

DP812

Posts : 74
Join date : 2012-12-08

View user profile

Back to top Go down

Re: Recognizing ads

Post  theotocopulitos on Thu Dec 13, 2012 7:26 pm

ComicTagger wrote:
That said, your idea might be worth trying. Have a look at the ImageHasher.py file in the ComicTagger source, and have at it!

Thanks for the explanation... sounds like a fun project for Christmas time... Wink ... I'll try to get some time for this...

theotocopulitos

Posts : 5
Join date : 2012-12-13

View user profile

Back to top Go down

Re: Recognizing ads

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum