Recognizing ads

theotocopulitos Thu Dec 13, 2012 4:12 pm

Hi.

I am intrigued by the way you recognize the correct issue using the cover image. How do you do that, how do you define what is similar?

Why am I interested in this, you might add. Well, for many years now, I have preferred c2c scans over noads for archiving purposes, but for reading noads are of course better. I have many times thought about ways to tag (in Comicrack) ads in c2c and thus have a "definitive" scanned archive... but doing so for so many comics is terribly tedious. I thought along the lines that ad pages might show some graphic features not present in comic pages, and try to automate that to some extent, but I really do not know enough about image features recognition to tackle that....

But your approach made me think that (1) marking a few comics ads and (2) extracting the pages marked marked as ads to a directory, and making a script that would try to match other's comics pages against those already known (those pages in the directory), might be a feasible approach...

Thoughts, anyone?

**ComicTagger** Thu Dec 13, 2012 7:09 pm

The cover matching is done using the "Average Hash" algorithm described here:
https://www.memonic.com/user/aengus/folder/coding/id/1qVeq

Despite being very simple, it's quite effective at determining similarity between images, especially when you have a small set to compare the reference to.

I don't know how well it would work to detect advertisement pages, though. You would have to have a huge database of hashes, and I suspect you would get lots of false positives (i.e. small hamming distances) when comparing the page in the comic (the one you want to know if it's an ad) to the reference hash list ( which could be probably contain 10's of thousands of ads.)

That said, your idea might be worth trying. Have a look at the ImageHasher.py file in the ComicTagger source, and have at it!

DP812 Thu Dec 13, 2012 7:14 pm

I don't even know if this is possible, but if it is, that would be awesome. I've got a bunch of complete runs that I think were Marvel's CD-ROM releases and they have the ads included (and double-page spreads split into separate pages). If this could also be used to remove scanner credit pages, that'd be great, too -- some scanners put their credit pages at the beginning of the file so when I open up CBL in cover view mode, I've got tons of scanner pages instead of the actual covers.

theotocopulitos Thu Dec 13, 2012 7:26 pm

ComicTagger wrote:
That said, your idea might be worth trying. Have a look at the ImageHasher.py file in the ComicTagger source, and have at it!

Thanks for the explanation... sounds like a fun project for Christmas time... Wink

... I'll try to get some time for this...

Sponsored content

Recognizing ads

Recognizing ads

Re: Recognizing ads

Re: Recognizing ads

Re: Recognizing ads

Re: Recognizing ads