Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#JJUG - Java で最速のハッシュアルゴリズムを求めて
Search
KOMIYA Atsushi
August 10, 2015
Programming
7
4.1k
#JJUG - Java で最速のハッシュアルゴリズムを求めて
【東京】【聴講者募集】JJUG ナイト・セミナー 「ビール片手にLT&納涼会」の発表資料です。
https://jjug.doorkeeper.jp/events/28182
KOMIYA Atsushi
August 10, 2015
Tweet
Share
More Decks by KOMIYA Atsushi
See All by KOMIYA Atsushi
#JJUG Java における乱数生成器とのつき合い方
komiya_atsushi
5
5.2k
#JJUG Fork/Join フレームワークを効率的に正しく使いたい
komiya_atsushi
0
470
[#JSUG] SmartNews における container friendly な Spring Boot アプリケーション開発
komiya_atsushi
1
11k
Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
komiya_atsushi
4
4.8k
#devsumi 自然言語処理・機械学習によるファクトチェック業務の支援
komiya_atsushi
1
4.4k
SmartNews Ads における機械学習の活用とその運用 #mlops
komiya_atsushi
3
19k
GBDT によるクリック率予測を高速化したい #オレシカナイト vol.4
komiya_atsushi
5
1.3k
Maven central repository の artifact をランキングする #渋谷java
komiya_atsushi
0
1.3k
確率的データ構造を Java で扱いたい! #JJUG
komiya_atsushi
6
2.2k
Other Decks in Programming
See All in Programming
Serverless Rust: Your Low-Risk Entry Point to Rust in Production (and the benefits are huge)
lmammino
1
140
Jakarta EE meets AI
ivargrimstad
0
160
Datadog DBMでなにができる? JDDUG Meetup#7
nealle
0
130
sappoRo.R #12 初心者セッション
kosugitti
0
270
Code smarter, not harder - How AI Coding Tools Boost Your Productivity | Angular Meetup Berlin
danielsogl
0
100
pylint custom ruleで始めるレビュー自動化
shogoujiie
0
140
15分で学ぶDuckDBの可愛い使い方 DuckDBの最近の更新
notrogue
1
370
CSS Linter による Baseline サポートの仕組み
ryo_manba
1
140
仕様変更に耐えるための"今の"DRY原則を考える
mkmk884
9
3k
Djangoにおける複数ユーザー種別認証の設計アプローチ@DjangoCongress JP 2025
delhi09
PRO
4
440
Ça bouge du côté des animations CSS !
goetter
2
140
Honoとフロントエンドの 型安全性について
yodaka
7
1.4k
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
521
39k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.4k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Visualization
eitanlees
146
15k
Statistics for Hackers
jakevdp
797
220k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.7k
We Have a Design System, Now What?
morganepeng
51
7.4k
Mobile First: as difficult as doing things right
swwweet
223
9.4k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
Java Ͱ࠷ͷ ϋογϡΞϧΰϦζϜΛٻΊͯ 2015-08-10 JJUG Night seminar @komiya_atsushi
͓·ͩΕ ʢ͓લ୭Αʁʣ
KOMIYA Atsushi @komiya_atsushi
None
bit.ly/WeLoveSmartNews
ຊͷτϐοΫ: ϋογϡΞϧΰϦζϜ (ؔ)
ϋογϡؔʁ
ϋογϡؔʁ ͋ͬɺ͜ΕਐݚθϛͰ+%,ͷ ιʔεͰΈͨͭͩʂʂ
ϋογϡؔʁ KBWBVUJM)BTI.BQ Λ͏ͱ͖ʹ͓ੈʹ ͳͬͯΔΞϨͰ͢
ϋογϡؔͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •
Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸
ϋογϡؔͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •
Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸ )BTI.BQҎ֎Ͱ සൟʹ͓ੈʹͳͬͯ·͢
ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մมͷ͞ͷσʔλʹରͯ͠ɺϋογϡΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯɺ
γʔυ͕ҟͳΔͳΒϋογϡҟͳΔ • ੑೳ • ͍ • িಥ͠ʹ͍͘
ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մมͷ͞ͷσʔλʹରͯ͠ɺϋογϡΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯɺ
γʔυ͕ҟͳΔͳΒϋογϡҟͳΔ • ੑೳ • ͍ • িಥ͠ʹ͍͘ ͍ʹਖ਼ٛ
※҉߸ֶతϋογϡؔ • ϝοηʔδμΠδΣετϝοηʔδೝূූ ߸ͷੜʹར༻Ͱ͖Δϋογϡؔ • ී௨ͷϋογϡؔͷಛੑʹՃ͑ɺڧিಥ ੑऑিಥੑͳͲͷಛੑΛͭඞཁ͕ ͋Δ • ྫɿMD5
SHA-xx γϦʔζͳͲ
※҉߸ֶతϋογϡؔ • ϝοηʔδμΠδΣετϝοηʔδೝূූ ߸ͷੜʹར༻Ͱ͖Δϋογϡؔ • ී௨ͷϋογϡؔͷಛੑʹՃ͑ɺڧিಥ ੑऑিಥੑͳͲͷಛੑΛͭඞཁ͕ ͋Δ • ྫɿMD5
SHA-xx γϦʔζͳͲ ҉߸ֶతϋογϡؔ ຊऔΓѻ͍·ͤΜ ʢ͍ͷͰʣ
ຊऔΓ্͛Δ ϋογϡΞϧΰϦζϜ
5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI
5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI 2VBMJUZ͕ेͳ ͜ͷͭΛऔΓ্͛·͢
MurmurHash series • 2008 ʙ • ༷ʑͳϓϩμΫτͰ৭ʑͳ༻్ͰΘΕ͍ͯΔ • Nginx, Hadoop,
Cassandra, Solr… • from https://en.wikipedia.org/wiki/ MurmurHash#Implementations • Current version: MurmurHash3 • ࠷େ 128 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ
CityHash • 2011 ʙ • Google ۘͷϋογϡΞϧΰϦζϜ • http://google-opensource.blogspot.jp/2011/04/introducing- cityhash.html
• “inspired by (தུ) MurmurHash” • ࠷େ 128 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ϦϦʔεϊʔτʹʮMurmurHash3 ΑΓ͍ʯతͳ͜ͱ͕ॻ͔Ε͍ͯΔ ͕… • https://code.google.com/p/cityhash/source/browse/trunk/README
xxHash • 2012 ʙ • Extremely fast ͳѹॖΞϧΰϦζϜ LZ4 Λ։ൃ͍ͯ͠Δํ
(Yann Collet @ Facebook Paris) ͷɺ͜Ε·ͨ extremely fast ͳϋογϡΞϧΰϦζϜ • C ࣮Ͱ MurmurHash3 ͦͷଞΛ͑ͯ #1 ͷΒ͍͠ • ࠷େ 64 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ར༻࣮·ͩଟ͘ͳ͛͞ • Presto ͷϋογϡΞϧΰϦζϜ͕ MurmurHash3 ͔Β xxHash ʹࠩ͠ ସ͑ΒΕΔͳͲɺ࠾༻ঃʑʹ͕͖͍ͬͯͯΔʁ • https://github.com/facebook/presto/commit/ 87cb4f2ba8a57a3edb6e4d5a89658b6a3191b3e7
֤छϋογϡΞϧΰϦζϜͷ Java ࣮
Java ͰͷϋογϡΞϧΰϦζϜ࣮ • ࠷ۙͷϋογϡΞϧΰϦζϜ CPU ͷ໋ྩΛҙࣝͯ͠ઃܭ͞Εͨͷ ͕ଟ͍ • JVM ্Ͱಈ͘
Java ɺͦͷઃܭͷԸܙΛड͚ΒΕΔͱݶΒͳ͍ • ಉ͡ϋογϡΞϧΰϦζϜͰɺ࣮ํ๏ʹΑ͕ͬͯࠩੜ͡Δ • Pure Java ࣮ • sun.misc.Unsafe ࣮ • Private API ͱͳΜͩͬͨͷ͔… • JNI ܦ༝ͷ native ࣮
Guava • ‘com.google.guava:guava:18.0’ • Google Core Libraries for Java •
ศརͳػೳ͕͍Ζ͍Ζೖ͍ͬͯΔ • ϋογϡΞϧΰϦζϜͷ࣮ MurmurHash3 ͷΈ • ೖྗɾग़ྗΠϯλϑΣʔεͱʹॆ࣮͍ͯ͠Δ
Zero-allocation hashing (OpenHFT) • ‘net.openhft:zero-‐allocation-‐hashing:0.3’ • HFT (ߴසऔҾ) ͳձࣾʁͷϓϩμΫτ •
ϋογϡΞϧΰϦζϜͷ࣮ʹಛԽ • MurmurHash3, CityHash, xxHash ͷ࣮͕ఏڙ͞Ε͍ͯΔ • Google guava ͱಉ͡Α͏ʹΠϯλϑΣʔε͕ͦͦ͜͜ॆ࣮͠ ͍ͯΔ • sun.misc.Unsafe API Λར༻͍ͯ͠Δ
lz4-java • ‘net.jpountz.lz4:lz4:1.3.0’ • LZ4 ͷ Java ϙʔςΟϯά • ͚ͩͲɺΕͳ͘
xxHash ͷ Java ࣮ ͍ͭͯ͘Δ • Pure Java / sun.misc.Unsafe / Native Ͱͷ ֤࣮Λఏڙ͍ͯ͠Δ
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ ೖྗ*'ͷ๛͞ ;FSPBMMPDBUJPO IBTIJOH͕ѹత
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ
ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋
̋ ̋ JOU ̋ 4USJOH ̋
ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋
̋ ̋ JOU ̋ 4USJOH ̋ ग़ྗ*' (VBWB͕ ༏Ε͍ͯΔ
ϕϯνϚʔΫ
ೖྗσʔλ • byte ྻ • 8 byte, 1024 byte, 64K
byte ͷ 3 ύλʔϯ • long (ϓϦϛςΟϒ) • String • 64K จࣈ
ϋογϡΞϧΰϦζϜ • ͍ͣΕͷϋογϡΞϧΰϦζϜγʔυݻఆ • MurmurHash • 128 bit ൛Λར༻ •
CityHash • 64 bit ൛Λར༻ • xxHash • 64 bit ൛Λར༻
bit.ly/jjug-2015-hash-bench
ϕϯνϚʔΫ݁Ռ (byte array)
Byte array (8 bytes)
Byte array (8 bytes) /BUJWFΦʔόʔϔουେ͖Ί ͳͷ͔ɺ͍σʔλΛͨ͘͞Μ ॲཧ͢Δͷ͕ۤखͬΆ͍
Byte array (8 bytes)
Byte array (1024 bytes)
Byte array (64K bytes)
Byte array (64K bytes) 6OTBGF͍
Byte array (64K bytes) /BUJWFେ͖͍σʔλʹର͍ͯ͠
Byte array (64K bytes) YY)BTI$JUZ)BTI .VSNVS)BTI
ϕϯνϚʔΫ݁Ռ (primitive long)
Primitive long
ϕϯνϚʔΫ݁Ռ (string)
String
·ͱΊ
ࠓ͜Ε͚֮ͩ͑ͯؼ͍ͬͯͩ͘͞ • 20158݄࣌Ͱ࠷ͷϋογϡΞϧΰϦζϜ • xxHash • 20158݄࣌Ͱ࠷ͷ Java ࣮ •
OpenHFT ͷ Zero-allocation hashing
࣮ࡍέʔεόΠέʔε • 128 bit ͷϋογϡ͕ཉ͍͠ • Guava • 64 bit
Ͱ͍͍ͷͰɺ࠷ͷ MurmurHash ͕ཉ͍͠ • Zero-allocation hashing • ετϦʔϜతʹϋογϡΛܭࢉ͍ͨ͠ • lz4-java or Guava
ࠔͬͨΒͱΓ͋͑ͣɺ xxHash Zero-allocation hashing
Thank you & Happy hashing!!
We’re hiring! iOSΤϯδχΞ / AndroidΤϯδχΞ / WebΞϓϦέʔγϣϯΤϯδχΞ / ϓϩμΫςΟϏςΟΤϯδχΞ /
ػցֶश / ࣗવݴޠॲཧΤϯδχΞ / άϩʔεϋοΫΤϯδχΞ / αʔόαΠυΤϯδχΞ / ࠂΤϯδχΞ…