Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#JJUG - Java で最速のハッシュアルゴリズムを求めて

#JJUG - Java で最速のハッシュアルゴリズムを求めて

【東京】【聴講者募集】JJUG ナイト・セミナー 「ビール片手にLT&納涼会」の発表資料です。
https://jjug.doorkeeper.jp/events/28182

KOMIYA Atsushi

August 10, 2015
Tweet

More Decks by KOMIYA Atsushi

Other Decks in Programming

Transcript

  1. ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •

    Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸
  2. ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •

    Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸ )BTI.BQҎ֎Ͱ΋ සൟʹ͓ੈ࿩ʹͳͬͯ·͢
  3. MurmurHash series • 2008 ೥ʙ • ༷ʑͳϓϩμΫτͰ৭ʑͳ༻్Ͱ࢖ΘΕ͍ͯΔ • Nginx, Hadoop,

    Cassandra, Solr… • from https://en.wikipedia.org/wiki/ MurmurHash#Implementations • Current version: MurmurHash3 • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ
  4. CityHash • 2011 ೥ʙ • Google ۘ੡ͷϋογϡΞϧΰϦζϜ • http://google-opensource.blogspot.jp/2011/04/introducing- cityhash.html

    • “inspired by (தུ) MurmurHash” • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ϦϦʔεϊʔτʹ͸ʮMurmurHash3 ΑΓ଎͍ʯతͳ͜ͱ͕ॻ͔Ε͍ͯΔ ͕… • https://code.google.com/p/cityhash/source/browse/trunk/README
  5. xxHash • 2012 ೥ʙ • Extremely fast ͳѹॖΞϧΰϦζϜ LZ4 Λ։ൃ͍ͯ͠Δํ

    (Yann Collet @ Facebook Paris) ͷɺ͜Ε·ͨ extremely fast ͳϋογϡΞϧΰϦζϜ • C ࣮૷Ͱ͸ MurmurHash3 ͦͷଞΛ཈͑ͯ #1 ͷ଎౓Β͍͠ • ࠷େ 64 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ར༻࣮੷͸·ͩଟ͘ͳ͛͞ • Presto ͷϋογϡΞϧΰϦζϜ͕ MurmurHash3 ͔Β xxHash ʹࠩ͠ ସ͑ΒΕΔͳͲɺ࠾༻͸ঃʑʹ޿͕͖͍ͬͯͯΔʁ • https://github.com/facebook/presto/commit/ 87cb4f2ba8a57a3edb6e4d5a89658b6a3191b3e7
  6. Java ͰͷϋογϡΞϧΰϦζϜ࣮૷ • ࠷ۙͷϋογϡΞϧΰϦζϜ͸ CPU ͷ໋ྩΛҙࣝͯ͠ઃܭ͞Εͨ΋ͷ ͕ଟ͍ • JVM ্Ͱಈ͘

    Java ͸ɺͦͷઃܭͷԸܙΛड͚ΒΕΔͱ͸ݶΒͳ͍ • ಉ͡ϋογϡΞϧΰϦζϜͰ΋ɺ࣮૷ํ๏ʹΑͬͯ଎౓͕ࠩੜ͡Δ • Pure Java ࣮૷ • sun.misc.Unsafe ࣮૷ • Private API ͱ͸ͳΜͩͬͨͷ͔… • JNI ܦ༝ͷ native ࣮૷
  7. Guava • ‘com.google.guava:guava:18.0’ • Google Core Libraries for Java •

    ศརͳػೳ͕͍Ζ͍Ζೖ͍ͬͯΔ • ϋογϡΞϧΰϦζϜͷ࣮૷͸ MurmurHash3 ͷΈ • ೖྗɾग़ྗΠϯλϑΣʔεͱ΋ʹॆ࣮͍ͯ͠Δ
  8. Zero-allocation hashing (OpenHFT) • ‘net.openhft:zero-­‐allocation-­‐hashing:0.3’ • HFT (ߴස౓औҾ) ͳձࣾʁͷϓϩμΫτ •

    ϋογϡΞϧΰϦζϜͷ࣮૷ʹಛԽ • MurmurHash3, CityHash, xxHash ͷ࣮૷͕ఏڙ͞Ε͍ͯΔ • Google guava ͱಉ͡Α͏ʹΠϯλϑΣʔε͕ͦͦ͜͜ॆ࣮͠ ͍ͯΔ • sun.misc.Unsafe API Λར༻͍ͯ͠Δ
  9. lz4-java • ‘net.jpountz.lz4:lz4:1.3.0’ • LZ4 ͷ Java ϙʔςΟϯά • ͚ͩͲɺ΋Εͳ͘

    xxHash ͷ Java ࣮૷΋
 ͍ͭͯ͘Δ • Pure Java / sun.misc.Unsafe / Native Ͱͷ
 ֤࣮૷Λఏڙ͍ͯ͠Δ
  10. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ 
  11. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ  ೖྗ*'ͷ๛෋͞͸ ;FSPBMMPDBUJPO IBTIJOH͕ѹ౗త
  12. ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS 

    ̋  4USJOH ̋ ̋  MPOH ̋ ̋  JOU ̋ ̋  4USFBNJOH ̋  ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ 
  13. ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋   MPOH ̋

    ̋ ̋ JOU ̋   4USJOH ̋   ग़ྗ*'͸ (VBWB͕ ༏Ε͍ͯΔ
  14. ೖྗσʔλ • byte ഑ྻ • 8 byte, 1024 byte, 64K

    byte ͷ 3 ύλʔϯ • long (ϓϦϛςΟϒ) • String • 64K จࣈ
  15. ࣮ࡍ͸έʔεόΠέʔε • 128 bit ͷϋογϡ஋͕ཉ͍͠ • Guava • 64 bit

    Ͱ͍͍ͷͰɺ࠷଎ͷ MurmurHash ͕ཉ͍͠ • Zero-allocation hashing • ετϦʔϜతʹϋογϡ஋Λܭࢉ͍ͨ͠ • lz4-java or Guava
  16. We’re hiring! iOSΤϯδχΞ / AndroidΤϯδχΞ / WebΞϓϦέʔγϣϯΤϯδχΞ / ϓϩμΫςΟϏςΟΤϯδχΞ /

    ػցֶश / ࣗવݴޠॲཧΤϯδχΞ / άϩʔεϋοΫΤϯδχΞ / αʔόαΠυΤϯδχΞ / ޿ࠂΤϯδχΞ…