Slide 1

Slide 1 text

Java Ͱ࠷଎ͷ ϋογϡΞϧΰϦζϜΛٻΊͯ 2015-08-10 JJUG Night seminar @komiya_atsushi

Slide 2

Slide 2 text

͓·ͩΕ ʢ͓લ୭Αʁʣ

Slide 3

Slide 3 text

KOMIYA Atsushi @komiya_atsushi

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

bit.ly/WeLoveSmartNews

Slide 6

Slide 6 text

ຊ೔ͷτϐοΫ: ϋογϡΞϧΰϦζϜ (ؔ਺)

Slide 7

Slide 7 text

ϋογϡؔ਺ʁ

Slide 8

Slide 8 text

ϋογϡؔ਺ʁ ͋ͬɺ͜ΕਐݚθϛͰ+%,ͷ ιʔεͰΈͨ΍ͭͩʂʂ

Slide 9

Slide 9 text

ϋογϡؔ਺ʁ KBWBVUJM)BTI.BQ Λ࢖͏ͱ͖ʹ͓ੈ࿩ʹ ͳͬͯΔΞϨͰ͢

Slide 10

Slide 10 text

ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ • Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸

Slide 11

Slide 11 text

ϋογϡؔ਺ͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ଄ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ • Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸ )BTI.BQҎ֎Ͱ΋ සൟʹ͓ੈ࿩ʹͳͬͯ·͢

Slide 12

Slide 12 text

ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մม௕ͷ௕͞ͷσʔλʹରͯ͠ɺϋογϡ஋ΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔ਺ͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯ΋ɺ
 γʔυ͕ҟͳΔͳΒ͹ϋογϡ஋΋ҟͳΔ • ੑೳ • ଎͍ • িಥ͠ʹ͍͘

Slide 13

Slide 13 text

ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մม௕ͷ௕͞ͷσʔλʹରͯ͠ɺϋογϡ஋ΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔ਺ͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯ΋ɺ
 γʔυ͕ҟͳΔͳΒ͹ϋογϡ஋΋ҟͳΔ • ੑೳ • ଎͍ • িಥ͠ʹ͍͘ ଎͍ʹਖ਼ٛ

Slide 14

Slide 14 text

※҉߸ֶతϋογϡؔ਺ • ϝοηʔδμΠδΣετ΍ϝοηʔδೝূූ ߸ͷੜ੒ʹར༻Ͱ͖Δϋογϡؔ਺ • ී௨ͷϋογϡؔ਺ͷಛੑʹՃ͑ɺڧিಥ ଱ੑ΍ऑিಥ଱ੑͳͲͷಛੑΛ΋ͭඞཁ͕ ͋Δ • ྫɿMD5 ΍ SHA-xx γϦʔζͳͲ

Slide 15

Slide 15 text

※҉߸ֶతϋογϡؔ਺ • ϝοηʔδμΠδΣετ΍ϝοηʔδೝূූ ߸ͷੜ੒ʹར༻Ͱ͖Δϋογϡؔ਺ • ී௨ͷϋογϡؔ਺ͷಛੑʹՃ͑ɺڧিಥ ଱ੑ΍ऑিಥ଱ੑͳͲͷಛੑΛ΋ͭඞཁ͕ ͋Δ • ྫɿMD5 ΍ SHA-xx γϦʔζͳͲ ҉߸ֶతϋογϡؔ਺͸ ຊ೔͸औΓѻ͍·ͤΜ ʢ஗͍ͷͰʣ

Slide 16

Slide 16 text

ຊ೔औΓ্͛Δ ϋογϡΞϧΰϦζϜ

Slide 17

Slide 17 text

5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI

Slide 18

Slide 18 text

5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI 2VBMJUZ͕े෼ͳ ͜ͷͭΛऔΓ্͛·͢

Slide 19

Slide 19 text

MurmurHash series • 2008 ೥ʙ • ༷ʑͳϓϩμΫτͰ৭ʑͳ༻్Ͱ࢖ΘΕ͍ͯΔ • Nginx, Hadoop, Cassandra, Solr… • from https://en.wikipedia.org/wiki/ MurmurHash#Implementations • Current version: MurmurHash3 • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ

Slide 20

Slide 20 text

CityHash • 2011 ೥ʙ • Google ۘ੡ͷϋογϡΞϧΰϦζϜ • http://google-opensource.blogspot.jp/2011/04/introducing- cityhash.html • “inspired by (தུ) MurmurHash” • ࠷େ 128 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ϦϦʔεϊʔτʹ͸ʮMurmurHash3 ΑΓ଎͍ʯతͳ͜ͱ͕ॻ͔Ε͍ͯΔ ͕… • https://code.google.com/p/cityhash/source/browse/trunk/README

Slide 21

Slide 21 text

xxHash • 2012 ೥ʙ • Extremely fast ͳѹॖΞϧΰϦζϜ LZ4 Λ։ൃ͍ͯ͠Δํ (Yann Collet @ Facebook Paris) ͷɺ͜Ε·ͨ extremely fast ͳϋογϡΞϧΰϦζϜ • C ࣮૷Ͱ͸ MurmurHash3 ͦͷଞΛ཈͑ͯ #1 ͷ଎౓Β͍͠ • ࠷େ 64 bit ͷϋογϡ஋Λܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ར༻࣮੷͸·ͩଟ͘ͳ͛͞ • Presto ͷϋογϡΞϧΰϦζϜ͕ MurmurHash3 ͔Β xxHash ʹࠩ͠ ସ͑ΒΕΔͳͲɺ࠾༻͸ঃʑʹ޿͕͖͍ͬͯͯΔʁ • https://github.com/facebook/presto/commit/ 87cb4f2ba8a57a3edb6e4d5a89658b6a3191b3e7

Slide 22

Slide 22 text

֤छϋογϡΞϧΰϦζϜͷ Java ࣮૷

Slide 23

Slide 23 text

Java ͰͷϋογϡΞϧΰϦζϜ࣮૷ • ࠷ۙͷϋογϡΞϧΰϦζϜ͸ CPU ͷ໋ྩΛҙࣝͯ͠ઃܭ͞Εͨ΋ͷ ͕ଟ͍ • JVM ্Ͱಈ͘ Java ͸ɺͦͷઃܭͷԸܙΛड͚ΒΕΔͱ͸ݶΒͳ͍ • ಉ͡ϋογϡΞϧΰϦζϜͰ΋ɺ࣮૷ํ๏ʹΑͬͯ଎౓͕ࠩੜ͡Δ • Pure Java ࣮૷ • sun.misc.Unsafe ࣮૷ • Private API ͱ͸ͳΜͩͬͨͷ͔… • JNI ܦ༝ͷ native ࣮૷

Slide 24

Slide 24 text

Guava • ‘com.google.guava:guava:18.0’ • Google Core Libraries for Java • ศརͳػೳ͕͍Ζ͍Ζೖ͍ͬͯΔ • ϋογϡΞϧΰϦζϜͷ࣮૷͸ MurmurHash3 ͷΈ • ೖྗɾग़ྗΠϯλϑΣʔεͱ΋ʹॆ࣮͍ͯ͠Δ

Slide 25

Slide 25 text

Zero-allocation hashing (OpenHFT) • ‘net.openhft:zero-­‐allocation-­‐hashing:0.3’ • HFT (ߴස౓औҾ) ͳձࣾʁͷϓϩμΫτ • ϋογϡΞϧΰϦζϜͷ࣮૷ʹಛԽ • MurmurHash3, CityHash, xxHash ͷ࣮૷͕ఏڙ͞Ε͍ͯΔ • Google guava ͱಉ͡Α͏ʹΠϯλϑΣʔε͕ͦͦ͜͜ॆ࣮͠ ͍ͯΔ • sun.misc.Unsafe API Λར༻͍ͯ͠Δ

Slide 26

Slide 26 text

lz4-java • ‘net.jpountz.lz4:lz4:1.3.0’ • LZ4 ͷ Java ϙʔςΟϯά • ͚ͩͲɺ΋Εͳ͘ xxHash ͷ Java ࣮૷΋
 ͍ͭͯ͘Δ • Pure Java / sun.misc.Unsafe / Native Ͱͷ
 ֤࣮૷Λఏڙ͍ͯ͠Δ

Slide 27

Slide 27 text

ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS ̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ

Slide 28

Slide 28 text

ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS ̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ ೖྗ*'ͷ๛෋͞͸ ;FSPBMMPDBUJPO IBTIJOH͕ѹ౗త

Slide 29

Slide 29 text

ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS ̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ

Slide 30

Slide 30 text

ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋ ̋ ̋ JOU ̋ 4USJOH ̋

Slide 31

Slide 31 text

ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋ ̋ ̋ JOU ̋ 4USJOH ̋ ग़ྗ*'͸ (VBWB͕ ༏Ε͍ͯΔ

Slide 32

Slide 32 text

ϕϯνϚʔΫ

Slide 33

Slide 33 text

ೖྗσʔλ • byte ഑ྻ • 8 byte, 1024 byte, 64K byte ͷ 3 ύλʔϯ • long (ϓϦϛςΟϒ) • String • 64K จࣈ

Slide 34

Slide 34 text

ϋογϡΞϧΰϦζϜ • ͍ͣΕͷϋογϡΞϧΰϦζϜ΋γʔυ͸ݻఆ • MurmurHash • 128 bit ൛Λར༻ • CityHash • 64 bit ൛Λར༻ • xxHash • 64 bit ൛Λར༻

Slide 35

Slide 35 text

bit.ly/jjug-2015-hash-bench

Slide 36

Slide 36 text

ϕϯνϚʔΫ݁Ռ (byte array)

Slide 37

Slide 37 text

Byte array (8 bytes)

Slide 38

Slide 38 text

Byte array (8 bytes) /BUJWF͸Φʔόʔϔουେ͖Ί ͳͷ͔ɺ୹͍σʔλΛͨ͘͞Μ ॲཧ͢Δͷ͕ۤखͬΆ͍

Slide 39

Slide 39 text

Byte array (8 bytes)

Slide 40

Slide 40 text

Byte array (1024 bytes)

Slide 41

Slide 41 text

Byte array (64K bytes)

Slide 42

Slide 42 text

Byte array (64K bytes) 6OTBGF͸଎͍

Slide 43

Slide 43 text

Byte array (64K bytes) /BUJWF΋େ͖͍σʔλʹରͯ͠͸଎͍

Slide 44

Slide 44 text

Byte array (64K bytes) YY)BTI$JUZ)BTI .VSNVS)BTI

Slide 45

Slide 45 text

ϕϯνϚʔΫ݁Ռ (primitive long)

Slide 46

Slide 46 text

Primitive long

Slide 47

Slide 47 text

ϕϯνϚʔΫ݁Ռ (string)

Slide 48

Slide 48 text

String

Slide 49

Slide 49 text

·ͱΊ

Slide 50

Slide 50 text

ࠓ೔͸͜Ε͚֮ͩ͑ͯؼ͍ͬͯͩ͘͞ • 2015೥8݄࣌఺Ͱ࠷଎ͷϋογϡΞϧΰϦζϜ • xxHash • 2015೥8݄࣌఺Ͱ࠷଎ͷ Java ࣮૷ • OpenHFT ͷ Zero-allocation hashing

Slide 51

Slide 51 text

࣮ࡍ͸έʔεόΠέʔε • 128 bit ͷϋογϡ஋͕ཉ͍͠ • Guava • 64 bit Ͱ͍͍ͷͰɺ࠷଎ͷ MurmurHash ͕ཉ͍͠ • Zero-allocation hashing • ετϦʔϜతʹϋογϡ஋Λܭࢉ͍ͨ͠ • lz4-java or Guava

Slide 52

Slide 52 text

ࠔͬͨΒͱΓ͋͑ͣɺ xxHash Zero-allocation hashing

Slide 53

Slide 53 text

Thank you & Happy hashing!!

Slide 54

Slide 54 text

We’re hiring! iOSΤϯδχΞ / AndroidΤϯδχΞ / WebΞϓϦέʔγϣϯΤϯδχΞ / ϓϩμΫςΟϏςΟΤϯδχΞ / ػցֶश / ࣗવݴޠॲཧΤϯδχΞ / άϩʔεϋοΫΤϯδχΞ / αʔόαΠυΤϯδχΞ / ޿ࠂΤϯδχΞ…