Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#JJUG - Java で最速のハッシュアルゴリズムを求めて
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
KOMIYA Atsushi
August 10, 2015
Programming
7
4.2k
#JJUG - Java で最速のハッシュアルゴリズムを求めて
【東京】【聴講者募集】JJUG ナイト・セミナー 「ビール片手にLT&納涼会」の発表資料です。
https://jjug.doorkeeper.jp/events/28182
KOMIYA Atsushi
August 10, 2015
Tweet
Share
More Decks by KOMIYA Atsushi
See All by KOMIYA Atsushi
#JJUG Java における乱数生成器とのつき合い方
komiya_atsushi
5
5.5k
#JJUG Fork/Join フレームワークを効率的に正しく使いたい
komiya_atsushi
0
570
[#JSUG] SmartNews における container friendly な Spring Boot アプリケーション開発
komiya_atsushi
1
11k
Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
komiya_atsushi
4
5.3k
#devsumi 自然言語処理・機械学習によるファクトチェック業務の支援
komiya_atsushi
1
4.8k
SmartNews Ads における機械学習の活用とその運用 #mlops
komiya_atsushi
3
21k
GBDT によるクリック率予測を高速化したい #オレシカナイト vol.4
komiya_atsushi
5
1.4k
Maven central repository の artifact をランキングする #渋谷java
komiya_atsushi
0
1.5k
確率的データ構造を Java で扱いたい! #JJUG
komiya_atsushi
6
2.4k
Other Decks in Programming
See All in Programming
20260315 AWSなんもわからん🥲
chiilog
2
180
夢の無限スパゲッティ製造機 -実装篇- #phpstudy
o0h
PRO
0
170
Migration to Signals, Signal Forms, Resource API, and NgRx Signal Store @Angular Days 03/2026 Munich
manfredsteyer
PRO
0
190
Understanding Apache Lucene - More than just full-text search
spinscale
0
140
Goの型安全性で実現する複数プロダクトの権限管理
ishikawa_pro
2
1.4k
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
1.2k
AI時代の脳疲弊と向き合う ~言語学としてのPHP~
sakuraikotone
1
1.6k
メッセージングを利用して時間的結合を分離しよう #phperkaigi
kajitack
3
510
AI時代のシステム設計:ドメインモデルで変更しやすさを守る設計戦略
masuda220
PRO
6
1.1k
Tamach-sre-3_ANDPAD-shimaison93
mane12yurks38
0
190
OTP を自動で入力する裏技
megabitsenmzq
0
130
安いハードウェアでVulkan
fadis
1
840
Featured
See All Featured
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
150
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
260
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
250
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
What's in a price? How to price your products and services
michaelherold
247
13k
Paper Plane
katiecoart
PRO
0
48k
Being A Developer After 40
akosma
91
590k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
110
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
Done Done
chrislema
186
16k
Transcript
Java Ͱ࠷ͷ ϋογϡΞϧΰϦζϜΛٻΊͯ 2015-08-10 JJUG Night seminar @komiya_atsushi
͓·ͩΕ ʢ͓લ୭Αʁʣ
KOMIYA Atsushi @komiya_atsushi
None
bit.ly/WeLoveSmartNews
ຊͷτϐοΫ: ϋογϡΞϧΰϦζϜ (ؔ)
ϋογϡؔʁ
ϋογϡؔʁ ͋ͬɺ͜ΕਐݚθϛͰ+%,ͷ ιʔεͰΈͨͭͩʂʂ
ϋογϡؔʁ KBWBVUJM)BTI.BQ Λ͏ͱ͖ʹ͓ੈʹ ͳͬͯΔΞϨͰ͢
ϋογϡؔͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •
Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸
ϋογϡؔͷར༻༻్ • ΞϧΰϦζϜ / σʔλߏ • ϋογϡςʔϒϧ • ϒϧʔϜϑΟϧλ •
Count-Min sketch • ػցֶश • Locality sensitive hashing • Feature hashing • ηΩϡϦςΟ • ϝοηʔδμΠδΣετ / ϝοηʔδೝূූ߸ )BTI.BQҎ֎Ͱ සൟʹ͓ੈʹͳͬͯ·͢
ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մมͷ͞ͷσʔλʹରͯ͠ɺϋογϡΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯɺ
γʔυ͕ҟͳΔͳΒϋογϡҟͳΔ • ੑೳ • ͍ • িಥ͠ʹ͍͘
ϋογϡΞϧΰϦζϜʹٻΊΔػೳɾੑೳ • ػೳ • Մมͷ͞ͷσʔλʹରͯ͠ɺϋογϡΛܭࢉͰ͖Δ • γʔυΛ༩͑Δ͜ͱͰɺϋογϡؔͷόϦΤʔγϣϯΛ࡞ Δ͜ͱ͕Ͱ͖Δ • ಉ͡ϋογϡΞϧΰϦζϜʹಉ͡σʔλΛ༩͑ͯɺ
γʔυ͕ҟͳΔͳΒϋογϡҟͳΔ • ੑೳ • ͍ • িಥ͠ʹ͍͘ ͍ʹਖ਼ٛ
※҉߸ֶతϋογϡؔ • ϝοηʔδμΠδΣετϝοηʔδೝূූ ߸ͷੜʹར༻Ͱ͖Δϋογϡؔ • ී௨ͷϋογϡؔͷಛੑʹՃ͑ɺڧিಥ ੑऑিಥੑͳͲͷಛੑΛͭඞཁ͕ ͋Δ • ྫɿMD5
SHA-xx γϦʔζͳͲ
※҉߸ֶతϋογϡؔ • ϝοηʔδμΠδΣετϝοηʔδೝূූ ߸ͷੜʹར༻Ͱ͖Δϋογϡؔ • ී௨ͷϋογϡؔͷಛੑʹՃ͑ɺڧিಥ ੑऑিಥੑͳͲͷಛੑΛͭඞཁ͕ ͋Δ • ྫɿMD5
SHA-xx γϦʔζͳͲ ҉߸ֶతϋογϡؔ ຊऔΓѻ͍·ͤΜ ʢ͍ͷͰʣ
ຊऔΓ্͛Δ ϋογϡΞϧΰϦζϜ
5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI
5BCMFGSPNIUUQTHJUIVCDPN$ZBOYY)BTI 2VBMJUZ͕ेͳ ͜ͷͭΛऔΓ্͛·͢
MurmurHash series • 2008 ʙ • ༷ʑͳϓϩμΫτͰ৭ʑͳ༻్ͰΘΕ͍ͯΔ • Nginx, Hadoop,
Cassandra, Solr… • from https://en.wikipedia.org/wiki/ MurmurHash#Implementations • Current version: MurmurHash3 • ࠷େ 128 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ
CityHash • 2011 ʙ • Google ۘͷϋογϡΞϧΰϦζϜ • http://google-opensource.blogspot.jp/2011/04/introducing- cityhash.html
• “inspired by (தུ) MurmurHash” • ࠷େ 128 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ϦϦʔεϊʔτʹʮMurmurHash3 ΑΓ͍ʯతͳ͜ͱ͕ॻ͔Ε͍ͯΔ ͕… • https://code.google.com/p/cityhash/source/browse/trunk/README
xxHash • 2012 ʙ • Extremely fast ͳѹॖΞϧΰϦζϜ LZ4 Λ։ൃ͍ͯ͠Δํ
(Yann Collet @ Facebook Paris) ͷɺ͜Ε·ͨ extremely fast ͳϋογϡΞϧΰϦζϜ • C ࣮Ͱ MurmurHash3 ͦͷଞΛ͑ͯ #1 ͷΒ͍͠ • ࠷େ 64 bit ͷϋογϡΛܭࢉ͢Δ͜ͱ͕Ͱ͖Δ • ར༻࣮·ͩଟ͘ͳ͛͞ • Presto ͷϋογϡΞϧΰϦζϜ͕ MurmurHash3 ͔Β xxHash ʹࠩ͠ ସ͑ΒΕΔͳͲɺ࠾༻ঃʑʹ͕͖͍ͬͯͯΔʁ • https://github.com/facebook/presto/commit/ 87cb4f2ba8a57a3edb6e4d5a89658b6a3191b3e7
֤छϋογϡΞϧΰϦζϜͷ Java ࣮
Java ͰͷϋογϡΞϧΰϦζϜ࣮ • ࠷ۙͷϋογϡΞϧΰϦζϜ CPU ͷ໋ྩΛҙࣝͯ͠ઃܭ͞Εͨͷ ͕ଟ͍ • JVM ্Ͱಈ͘
Java ɺͦͷઃܭͷԸܙΛड͚ΒΕΔͱݶΒͳ͍ • ಉ͡ϋογϡΞϧΰϦζϜͰɺ࣮ํ๏ʹΑ͕ͬͯࠩੜ͡Δ • Pure Java ࣮ • sun.misc.Unsafe ࣮ • Private API ͱͳΜͩͬͨͷ͔… • JNI ܦ༝ͷ native ࣮
Guava • ‘com.google.guava:guava:18.0’ • Google Core Libraries for Java •
ศརͳػೳ͕͍Ζ͍Ζೖ͍ͬͯΔ • ϋογϡΞϧΰϦζϜͷ࣮ MurmurHash3 ͷΈ • ೖྗɾग़ྗΠϯλϑΣʔεͱʹॆ࣮͍ͯ͠Δ
Zero-allocation hashing (OpenHFT) • ‘net.openhft:zero-‐allocation-‐hashing:0.3’ • HFT (ߴසऔҾ) ͳձࣾʁͷϓϩμΫτ •
ϋογϡΞϧΰϦζϜͷ࣮ʹಛԽ • MurmurHash3, CityHash, xxHash ͷ࣮͕ఏڙ͞Ε͍ͯΔ • Google guava ͱಉ͡Α͏ʹΠϯλϑΣʔε͕ͦͦ͜͜ॆ࣮͠ ͍ͯΔ • sun.misc.Unsafe API Λར༻͍ͯ͠Δ
lz4-java • ‘net.jpountz.lz4:lz4:1.3.0’ • LZ4 ͷ Java ϙʔςΟϯά • ͚ͩͲɺΕͳ͘
xxHash ͷ Java ࣮ ͍ͭͯ͘Δ • Pure Java / sun.misc.Unsafe / Native Ͱͷ ֤࣮Λఏڙ͍ͯ͠Δ
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ ೖྗ*'ͷ๛͞ ;FSPBMMPDBUJPO IBTIJOH͕ѹత
ೖྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ ̋ ̋ #ZUF#V⒎FS
̋ 4USJOH ̋ ̋ MPOH ̋ ̋ JOU ̋ ̋ 4USFBNJOH ̋ ̋ CZUF<> PUIFST 0CKFDU 0UIFSQSJNJUJWFT BSSBZ
ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋
̋ ̋ JOU ̋ 4USJOH ̋
ग़ྗΠϯλϑΣʔεͷൺֱ (VBWB 0QFO)'5 M[KBWB CZUF<> ̋ MPOH ̋
̋ ̋ JOU ̋ 4USJOH ̋ ग़ྗ*' (VBWB͕ ༏Ε͍ͯΔ
ϕϯνϚʔΫ
ೖྗσʔλ • byte ྻ • 8 byte, 1024 byte, 64K
byte ͷ 3 ύλʔϯ • long (ϓϦϛςΟϒ) • String • 64K จࣈ
ϋογϡΞϧΰϦζϜ • ͍ͣΕͷϋογϡΞϧΰϦζϜγʔυݻఆ • MurmurHash • 128 bit ൛Λར༻ •
CityHash • 64 bit ൛Λར༻ • xxHash • 64 bit ൛Λར༻
bit.ly/jjug-2015-hash-bench
ϕϯνϚʔΫ݁Ռ (byte array)
Byte array (8 bytes)
Byte array (8 bytes) /BUJWFΦʔόʔϔουେ͖Ί ͳͷ͔ɺ͍σʔλΛͨ͘͞Μ ॲཧ͢Δͷ͕ۤखͬΆ͍
Byte array (8 bytes)
Byte array (1024 bytes)
Byte array (64K bytes)
Byte array (64K bytes) 6OTBGF͍
Byte array (64K bytes) /BUJWFେ͖͍σʔλʹର͍ͯ͠
Byte array (64K bytes) YY)BTI$JUZ)BTI .VSNVS)BTI
ϕϯνϚʔΫ݁Ռ (primitive long)
Primitive long
ϕϯνϚʔΫ݁Ռ (string)
String
·ͱΊ
ࠓ͜Ε͚֮ͩ͑ͯؼ͍ͬͯͩ͘͞ • 20158݄࣌Ͱ࠷ͷϋογϡΞϧΰϦζϜ • xxHash • 20158݄࣌Ͱ࠷ͷ Java ࣮ •
OpenHFT ͷ Zero-allocation hashing
࣮ࡍέʔεόΠέʔε • 128 bit ͷϋογϡ͕ཉ͍͠ • Guava • 64 bit
Ͱ͍͍ͷͰɺ࠷ͷ MurmurHash ͕ཉ͍͠ • Zero-allocation hashing • ετϦʔϜతʹϋογϡΛܭࢉ͍ͨ͠ • lz4-java or Guava
ࠔͬͨΒͱΓ͋͑ͣɺ xxHash Zero-allocation hashing
Thank you & Happy hashing!!
We’re hiring! iOSΤϯδχΞ / AndroidΤϯδχΞ / WebΞϓϦέʔγϣϯΤϯδχΞ / ϓϩμΫςΟϏςΟΤϯδχΞ /
ػցֶश / ࣗવݴޠॲཧΤϯδχΞ / άϩʔεϋοΫΤϯδχΞ / αʔόαΠυΤϯδχΞ / ࠂΤϯδχΞ…