Upgrade to Pro — share decks privately, control downloads, hide ads and more …

That Looks Oddly Familiar

That Looks Oddly Familiar

Perceptual hashing is a fascinating technique of summarising media files. It has little in common with cryptographic hashes such as SHA1. Two input files which _look similar_ will end up having _different_ cryptographic yet _similar_ perceptual hashes. And by similar, we mean having most bits set the same way.

In this talk we'll combine pHash and a BK-tree to efficiently search through metric spaces of perceptual hashes. We will use Ruby to implement a simple command line tool. It will scan our photo library, hash all the pictures, and look for similarities. By the end of the talk, we'll have a complete list of similar and near-duplicate images needlessly occupying space on our hard drive.

As seen on [Ruby Berlin](http://www.rug-b.de/events/march-meetup-2019-536).

Jan Stępień

March 07, 2019
Tweet

More Decks by Jan Stępień

Other Decks in Programming

Transcript