Art... enough to match solely on?

View previous topic View next topic Go down

Art... enough to match solely on?

Post  anomander on Fri May 17, 2013 8:44 am

Before I made a feature request i thought it best to throw the theory around here first.

So CT creates a signature of a cover and after a number of other checks downloads a short list of online covers from CV and does a mathematical distance match.

This works really well as we all know.

But what if we could sometimes skip the stage that creates the short list first?

Obviously we cant do this all the time since CV can't be queried in this way but what if we could share art hashes (not sure what this is really called so will use hash as name in interim).

How would/could this work?...

Well if I create an art hash of cover X from CV it will be identical to the hash someone else would make. So if i could pass someone else some data such as CV series id, issue id and art hash they could in theory use this as a fast way to match comics. This is interesting as it is completely file name agnostic.

Some sanity would have to be put in here as you cant implicitly trust random hashes from Joe Blogs but its a thought.


Where this gets very interesting is if we also allow the sharing of hashes generates from user collections. In theory then only the first person to see a cover of an especially tricky to match comic would need to do the manual work and from there at least most of the matching would be automatic for everyone else who is given this hash.

How would we share these hashes... I dont know but thats surely not an impossible hurdle.

Thoughts?
avatar
anomander

Posts : 74
Join date : 2013-03-28

View user profile

Back to top Go down

Re: Art... enough to match solely on?

Post  ComicTagger on Sat May 18, 2013 3:25 pm

The algorithm I use is called an average hash, and I found it here: https://www.memonic.com/user/aengus/folder/coding/id/1qVeq

I had some thoughts of this very idea of sharing hashes when I was first exploring the image matching. I didn't pursue it for a few reasons:

1. The image matching processing is pretty cheap (bandwidth and time) as it's grabbing the thumbnails from Comic Vine, so it doesn't take much resources. By contrast, most of the time in the tagging process is spent actually writing out the Zip or RAR file.

2. This kind of hash seems to lend itself to comparing images one on one, and I started to get the idea that a broad net of hashes would lead to lots of false positives, and you would end up having to narrow it by other metadata anyways.

Now, if we put some work into this, we might find that we could make this matching more efficient, but in the end I don't know if it would be worth it. I'm open to more discussion though.


avatar
ComicTagger
Admin

Posts : 208
Join date : 2012-12-02

View user profile http://comictagger.forumotion.com

Back to top Go down

Re: Art... enough to match solely on?

Post  anomander on Sun May 19, 2013 4:55 am

You make fine points. My focus this now is the hard to match stuff and ideas on how to make the percentage unmatched as small as possible.

As with the previous (of XX) suggestion I would imagine this as a route of last resort for items that have been right through the matching process and not had a hit.

The problem is I cant think of a way to a proof of concept that isnt a lot of work.

One thing i will point out is that in the real world if we were to compare 100 users complete hash list of stuff they have tagged from CBZ there will be high percentage of exact matches. I will say that again for impact... not close matches, mathematically identical hashes. If a user does some manual work to tag this issue it could be interesting to find a way to stop anyone else having to.
avatar
anomander

Posts : 74
Join date : 2013-03-28

View user profile

Back to top Go down

Re: Art... enough to match solely on?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum