That Looks Oddly Familiar

That Looks Oddly Familiar

Perceptual hashing is a fascinating technique of summarising media files. It has little in common with cryptographic hashes such as SHA1. Two input files which _look similar_ will end up having _different_ cryptographic yet _similar_ perceptual hashes. And by similar, we mean having most bits set the same way.

In this talk we'll combine pHash and a BK-tree to efficiently search through metric spaces of perceptual hashes. We will use Ruby to implement a simple command line tool. It will scan our photo library, hash all the pictures, and look for similarities. By the end of the talk, we'll have a complete list of similar and near-duplicate images needlessly occupying space on our hard drive.

As seen on [Ruby Berlin](http://www.rug-b.de/events/march-meetup-2019-536).

Ae7a42fb716793697b1d222f3cc753b8?s=128

Jan Stępień

March 07, 2019
Tweet

Transcript

  1. None
  2. None
  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None
  35. None
  36. None