Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#JJUG - Java で最速のハッシュアルゴリズムを求めて

#JJUG - Java で最速のハッシュアルゴリズムを求めて

【東京】【聴講者募集】JJUG ナイト・セミナー 「ビール片手にLT&納涼会」の発表資料です。
https://jjug.doorkeeper.jp/events/28182

E77287648aff5484ac7659748e45c936?s=128

KOMIYA Atsushi

August 10, 2015
Tweet

Transcript

  1. Java Ͱ࠷଎ͷ ϋογϡΞϧΰϦζϜΛٻΊͯ 2015-08-10 JJUG Night seminar @komiya_atsushi

  2. ͓·ͩΕ ʢ͓લ୭Αʁʣ

  3. KOMIYA Atsushi @komiya_atsushi

  4. None
  5. bit.ly/WeLoveSmartNews

  6. ຊ೔ͷτϐοΫ: ϋογϡΞϧΰϦζϜ (ؔ਺)

  7. ϋογϡؔ਺ʁ

  8. ϋογϡؔ਺ʁ ͋ͬɺ͜ΕਐݚθϛͰ+%,ͷ ιʔεͰΈͨ΍ͭͩʂʂ

  9. ϋογϡؔ਺ʁ KBWBVUJM)BTI.BQ Λ࢖͏ͱ͖ʹ͓ੈ࿩ʹ ͳͬͯΔΞϨͰ͢

  10. ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •

    Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸
  11. ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •

    Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸ )BTI.BQҎ֎Ͱ΋ සൟʹ͓ੈ࿩ʹͳͬͯ·͢
  12. ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մม௕ͷ௕͞ͷσʔλʹରͯ͠ɺϋογϡ஋ΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔ਺ͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯ΋ɺ


    γʔυ͕ҟͳΔͳΒ͹ϋογϡ஋΋ҟͳΔ • ੑೳ • ଎͍ • িಥ͠ʹ͍͘
  13. ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մม௕ͷ௕͞ͷσʔλʹରͯ͠ɺϋογϡ஋ΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔ਺ͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯ΋ɺ


    γʔυ͕ҟͳΔͳΒ͹ϋογϡ஋΋ҟͳΔ • ੑೳ • ଎͍ • িಥ͠ʹ͍͘ ଎͍ʹਖ਼ٛ
  14. ※҉߸ֶతϋογϡؔ਺ • ϝοηʔδμΠδΣετ΍ϝοηʔδೝূූ ߸ͷੜ੒ʹར༻Ͱ͖Δϋογϡؔ਺ • ී௨ͷϋογϡؔ਺ͷಛੑʹՃ͑ɺڧিಥ ଱ੑ΍ऑিಥ଱ੑͳͲͷಛੑΛ΋ͭඞཁ͕ ͋Δ • ྫɿMD5

    ΍ SHA-xx γϦʔζͳͲ
  15. ※҉߸ֶతϋογϡؔ਺ • ϝοηʔδμΠδΣετ΍ϝοηʔδೝূූ ߸ͷੜ੒ʹར༻Ͱ͖Δϋογϡؔ਺ • ී௨ͷϋογϡؔ਺ͷಛੑʹՃ͑ɺڧিಥ ଱ੑ΍ऑিಥ଱ੑͳͲͷಛੑΛ΋ͭඞཁ͕ ͋Δ • ྫɿMD5

    ΍ SHA-xx γϦʔζͳͲ ҉߸ֶతϋογϡؔ਺͸ ຊ೔͸औΓѻ͍·ͤΜ ʢ஗͍ͷͰʣ
  16. ຊ೔औΓ্͛Δ ϋογϡΞϧΰϦζϜ

  17. 5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI

  18. 5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI 2VBMJUZ͕े෼ͳ ͜ͷͭΛऔΓ্͛·͢

  19. MurmurHash series • 2008 ೥ʙ • ༷ʑͳϓϩμΫτͰ৭ʑͳ༻్Ͱ࢖ΘΕ͍ͯΔ • Nginx, Hadoop,

    Cassandra, Solr… • from https://en.wikipedia.org/wiki/ MurmurHash#Implementations • Current version: MurmurHash3 • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ
  20. CityHash • 2011 ೥ʙ • Google ۘ੡ͷϋογϡΞϧΰϦζϜ • http://google-opensource.blogspot.jp/2011/04/introducing- cityhash.html

    • “inspired by (தུ) MurmurHash” • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ϦϦʔεϊʔτʹ͸ʮMurmurHash3 ΑΓ଎͍ʯతͳ͜ͱ͕ॻ͔Ε͍ͯΔ ͕… • https://code.google.com/p/cityhash/source/browse/trunk/README
  21. xxHash • 2012 ೥ʙ • Extremely fast ͳѹॖΞϧΰϦζϜ LZ4 Λ։ൃ͍ͯ͠Δํ

    (Yann Collet @ Facebook Paris) ͷɺ͜Ε·ͨ extremely fast ͳϋογϡΞϧΰϦζϜ • C ࣮૷Ͱ͸ MurmurHash3 ͦͷଞΛ཈͑ͯ #1 ͷ଎౓Β͍͠ • ࠷େ 64 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ར༻࣮੷͸·ͩଟ͘ͳ͛͞ • Presto ͷϋογϡΞϧΰϦζϜ͕ MurmurHash3 ͔Β xxHash ʹࠩ͠ ସ͑ΒΕΔͳͲɺ࠾༻͸ঃʑʹ޿͕͖͍ͬͯͯΔʁ • https://github.com/facebook/presto/commit/ 87cb4f2ba8a57a3edb6e4d5a89658b6a3191b3e7
  22. ֤छϋογϡΞϧΰϦζϜͷ Java ࣮૷

  23. Java ͰͷϋογϡΞϧΰϦζϜ࣮૷ • ࠷ۙͷϋογϡΞϧΰϦζϜ͸ CPU ͷ໋ྩΛҙࣝͯ͠ઃܭ͞Εͨ΋ͷ ͕ଟ͍ • JVM ্Ͱಈ͘

    Java ͸ɺͦͷઃܭͷԸܙΛड͚ΒΕΔͱ͸ݶΒͳ͍ • ಉ͡ϋογϡΞϧΰϦζϜͰ΋ɺ࣮૷ํ๏ʹΑͬͯ଎౓͕ࠩੜ͡Δ • Pure Java ࣮૷ • sun.misc.Unsafe ࣮૷ • Private API ͱ͸ͳΜͩͬͨͷ͔… • JNI ܦ༝ͷ native ࣮૷
  24. Guava • ‘com.google.guava:guava:18.0’ • Google Core Libraries for Java •

    ศརͳػೳ͕͍Ζ͍Ζೖ͍ͬͯΔ • ϋογϡΞϧΰϦζϜͷ࣮૷͸ MurmurHash3 ͷΈ • ೖྗɾग़ྗΠϯλϑΣʔεͱ΋ʹॆ࣮͍ͯ͠Δ
  25. Zero-allocation hashing (OpenHFT) • ‘net.openhft:zero-­‐allocation-­‐hashing:0.3’ • HFT (ߴස౓औҾ) ͳձࣾʁͷϓϩμΫτ •

    ϋογϡΞϧΰϦζϜͷ࣮૷ʹಛԽ • MurmurHash3, CityHash, xxHash ͷ࣮૷͕ఏڙ͞Ε͍ͯΔ • Google guava ͱಉ͡Α͏ʹΠϯλϑΣʔε͕ͦͦ͜͜ॆ࣮͠ ͍ͯΔ • sun.misc.Unsafe API Λར༻͍ͯ͠Δ
  26. lz4-java • ‘net.jpountz.lz4:lz4:1.3.0’ • LZ4 ͷ Java ϙʔςΟϯά • ͚ͩͲɺ΋Εͳ͘

    xxHash ͷ Java ࣮૷΋
 ͍ͭͯ͘Δ • Pure Java / sun.misc.Unsafe / Native Ͱͷ
 ֤࣮૷Λఏڙ͍ͯ͠Δ
  27. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ 
  28. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ  ೖྗ*'ͷ๛෋͞͸ ;FSPBMMPDBUJPO IBTIJOH͕ѹ౗త
  29. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ 
  30. ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋   MPOH ̋

    ̋ ̋ JOU ̋   4USJOH ̋  
  31. ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋   MPOH ̋

    ̋ ̋ JOU ̋   4USJOH ̋   ग़ྗ*'͸ (VBWB͕ ༏Ε͍ͯΔ
  32. ϕϯνϚʔΫ

  33. ೖྗσʔλ • byte ഑ྻ • 8 byte, 1024 byte, 64K

    byte ͷ 3 ύλʔϯ • long (ϓϦϛςΟϒ) • String • 64K จࣈ
  34. ϋογϡΞϧΰϦζϜ • ͍ͣΕͷϋογϡΞϧΰϦζϜ΋γʔυ͸ݻఆ • MurmurHash • 128 bit ൛Λར༻ •

    CityHash • 64 bit ൛Λར༻ • xxHash • 64 bit ൛Λར༻
  35. bit.ly/jjug-2015-hash-bench

  36. ϕϯνϚʔΫ݁Ռ (byte array)

  37. Byte array (8 bytes)

  38. Byte array (8 bytes) /BUJWF͸Φʔόʔϔουେ͖Ί ͳͷ͔ɺ୹͍σʔλΛͨ͘͞Μ ॲཧ͢Δͷ͕ۤखͬΆ͍

  39. Byte array (8 bytes)

  40. Byte array (1024 bytes)

  41. Byte array (64K bytes)

  42. Byte array (64K bytes) 6OTBGF͸଎͍

  43. Byte array (64K bytes) /BUJWF΋େ͖͍σʔλʹରͯ͠͸଎͍

  44. Byte array (64K bytes) YY)BTI$JUZ)BTI .VSNVS)BTI

  45. ϕϯνϚʔΫ݁Ռ (primitive long)

  46. Primitive long

  47. ϕϯνϚʔΫ݁Ռ (string)

  48. String

  49. ·ͱΊ

  50. ࠓ೔͸͜Ε͚֮ͩ͑ͯؼ͍ͬͯͩ͘͞ • 2015೥8݄࣌఺Ͱ࠷଎ͷϋογϡΞϧΰϦζϜ • xxHash • 2015೥8݄࣌఺Ͱ࠷଎ͷ Java ࣮૷ •

    OpenHFT ͷ Zero-allocation hashing
  51. ࣮ࡍ͸έʔεόΠέʔε • 128 bit ͷϋογϡ஋͕ཉ͍͠ • Guava • 64 bit

    Ͱ͍͍ͷͰɺ࠷଎ͷ MurmurHash ͕ཉ͍͠ • Zero-allocation hashing • ετϦʔϜతʹϋογϡ஋Λܭࢉ͍ͨ͠ • lz4-java or Guava
  52. ࠔͬͨΒͱΓ͋͑ͣɺ xxHash Zero-allocation hashing

  53. Thank you & Happy hashing!!

  54. We’re hiring! iOSΤϯδχΞ / AndroidΤϯδχΞ / WebΞϓϦέʔγϣϯΤϯδχΞ / ϓϩμΫςΟϏςΟΤϯδχΞ /

    ػցֶश / ࣗવݴޠॲཧΤϯδχΞ / άϩʔεϋοΫΤϯδχΞ / αʔόαΠυΤϯδχΞ / ޿ࠂΤϯδχΞ…