Perceptual hashing is a fascinating technique of summarising media files. It has little in common with cryptographic hashes such as SHA1. Two input files which _look similar_ will end up having _different_ cryptographic yet _similar_ perceptual hashes. And by similar, we mean having most bits set the same way.
In this talk we'll combine pHash and a BK-tree to efficiently search through metric spaces of perceptual hashes. We will use Ruby to implement a simple command line tool. It will scan our photo library, hash all the pictures, and look for similarities. By the end of the talk, we'll have a complete list of similar and near-duplicate images needlessly occupying space on our hard drive.
As seen on [Ruby Berlin](http://www.rug-b.de/events/march-meetup-2019-536).