Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
確率的データ構造を Java で扱いたい! #JJUG
Search
KOMIYA Atsushi
August 23, 2017
Programming
6
2.4k
確率的データ構造を Java で扱いたい! #JJUG
JJUG ナイト・セミナー 「ビール片手にLT&納涼会 2017」 の発表資料です。
https://jjug.doorkeeper.jp/events/63719
KOMIYA Atsushi
August 23, 2017
Tweet
Share
More Decks by KOMIYA Atsushi
See All by KOMIYA Atsushi
#JJUG Java における乱数生成器とのつき合い方
komiya_atsushi
5
5.5k
#JJUG Fork/Join フレームワークを効率的に正しく使いたい
komiya_atsushi
0
540
[#JSUG] SmartNews における container friendly な Spring Boot アプリケーション開発
komiya_atsushi
1
11k
Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
komiya_atsushi
4
5.2k
#devsumi 自然言語処理・機械学習によるファクトチェック業務の支援
komiya_atsushi
1
4.7k
SmartNews Ads における機械学習の活用とその運用 #mlops
komiya_atsushi
3
20k
GBDT によるクリック率予測を高速化したい #オレシカナイト vol.4
komiya_atsushi
5
1.4k
Maven central repository の artifact をランキングする #渋谷java
komiya_atsushi
0
1.5k
High-performance Jackson #渋谷Java
komiya_atsushi
2
17k
Other Decks in Programming
See All in Programming
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
1
210
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
560
PostgreSQLで手軽にDuckDBを使う!DuckDB&pg_duckdb入門/osc25hi-duckdb
takahashiikki
0
240
チームをチームにするEM
hitode909
0
440
dchart: charts from deck markup
ajstarks
3
950
Navigation 3: 적응형 UI를 위한 앱 탐색
fornewid
1
530
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
670
[AtCoder Conference 2025] LLMを使った業務AHCの上⼿な解き⽅
terryu16
6
1k
Combinatorial Interview Problems with Backtracking Solutions - From Imperative Procedural Programming to Declarative Functional Programming - Part 2
philipschwarz
PRO
0
130
AI Agent Dojo #4: watsonx Orchestrate ADK体験
oniak3ibm
PRO
0
120
HTTPプロトコル正しく理解していますか? 〜かわいい猫と共に学ぼう。ฅ^•ω•^ฅ ニャ〜
hekuchan
2
610
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
7
4.3k
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
331
21k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
Navigating Team Friction
lara
191
16k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.9k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
1
880
A Soul's Torment
seathinner
2
2.1k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
71k
Done Done
chrislema
186
16k
Designing for Timeless Needs
cassininazir
0
110
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
76
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.9k
Transcript
֬తσʔλߏΛ Java Ͱѻ͍͍ͨʂ 2017-08-23 JJUG night seminar LT KOMIYA Atsushi
@komiya_atsushi
Today’s topic
֬తσʔλߏ
֬తσʔλߏͱʁ • ֬తಛੑΛར༻ͨ͠σʔλߏ • ͋ΔΛɺ࣌ؒతۭؒ͘͠తʹޮΑ͘ (≅লϝϞϦͰ) ղ͘͜ͱΛతͱ͢Δ • ࠓճʮۭؒޮͷΑ͍σʔλߏʯʹண •
σʔλߏʹΑͬͯɺݫີղͰͳۙ͘ࣅղ ͕ಘΒΕΔ͜ͱ͕͋Δ • ਫ਼ͱۭؒޮτϨʔυΦϑͷؔ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ
ͲΜͳͱ͖ʹ͏ͷ͔ʁ • ϦΞϧλΠϜ͔ͭେྔʹൃੜ͢ΔσʔλΛ ΦϯϥΠϯͰॲཧ͍ͨ͠ • ϝϞϦʹऩ·Γ͖Βͳ͍େنͳσʔλΛ ඇྗͳ PC Ͱॲཧ͍ͨ͠ •
ࢄॲཧͰ͖Δڥ͕͋ΔͳΒɺ͋͑ͯ ֬తσʔλߏΛ͏ඞཁͳ͍
Java Ͱ ֬తσʔλߏΛѻ͏
ࣗલ࣮ʁ ϥΠϒϥϦ͏ʁ • ଟ͘ͷ֬తσʔλߏɺͦͷจ͕͙͙ ΕӾཡՄೳͳঢ়ଶͰ͙͢ʹݟ͔ͭΔ • ͦΕΛಡΜͰࣗલ࣮͢ΔͷΑ͠ • ҰํͰ Maven
central ʹ͍ͭ͘ͷطଘ࣮ ͕ଘࡏ͍ͯ͠Δ • ڊਓͷݞͷ্ʹཱͭͷ͕ݡ͍Γํ
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
֬తσʔλߏͷ Java ࣮ • stream-lib ‘com.addthis:stream-lib’ • Membership query /
cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
stream-lib ʹΑΔ ֬తσʔλߏͷར༻ํ๏
http://bit.ly/JJUG-2017-08- probds-code
Membership query
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ
ཁૉ͕ू߹ʹଐ͢Δ͔൱͔Λఆ͢Δ Set<T> Λ༻ҙͯ͠ Set#contains(T) Ͱଘ൱Λఆ͠ Set#add(T) Ͱू߹ʹཁૉΛՃ͢Δ
Bloom filter • ֬తʹؒҧͬͨ͑ʢଘ൱݁ՌʣΛฦ͢ • ِཅੑ (ଘࡏ͠ͳ͍ͷΛଘࡏ͢Δͱޡೝ͢ Δࣄ) ੜ͡Δ͕ɺِӄੑੜ͡ͳ͍ •
ʮఆ͞ΕΔཁૉͷछྨʯʮڐ༰Ͱ͖Δِ ཅੑͷ֬ʯΛࢦఆͯ͠ɺώʔϓ༻ྔΛ੍ޚ Ͱ͖Δ • ཁૉͷՃͰ͖Δ͕ɺআ͍͠
stream-lib ͷ Bloom filter
stream-lib ͷ Bloom filter ཁૉͱِཅੑ֬Λࢦఆͯ͠ BloomFilter Λ༻ҙ͠ BloomFilter#isPresent(String) Ͱଘ൱Λఆ Set
ͱಉ༷ʹ add() ͢Δ
ώʔϓ༻ྔΛ֬ೝͯ͠ΈΔ • “Lorem ipsum” ͷςΩετΛྫʹɺJOL (Java Object Layout) Ͱώʔϓ༻ྔΛଌఆ •
http://openjdk.java.net/projects/code- tools/jol/ • Set: 6,032 bytes • stream-lib BloomFilter: 136 bytes 97.8% smaller !
Cardinality estimation
ҟͳΓΛٻΊΔ
ҟͳΓΛٻΊΔ Set<T> Λ༻ҙ͠ɺ Set#add() Ͱͻͨ͢ΒಥͬࠐΉ Set#size() ͰҟͳΓ͕ಘΒΕΔ
HyperLogLog++ (1/2) • ҟͳΓΛਪఆ͢Δσʔλߏ • ಘΒΕΔਪఆɺຊདྷͷҟͳΓʹର্ͯ͠ৼΕɾԼৼ Εͱʹى͜Γ͏Δ • Redshift /
BigQuery / Presto ͳͲͰɺCOUNT(DISTINCT x) Λۙࣅ͢Δखஈͱͯ͠ΘΕ͍ͯΔ • https://aws.amazon.com/jp/about-aws/whats-new/ 2013/11/11/amazon-redshift-new-performance-data- loading-security-features/ • https://cloud.google.com/blog/big-data/2017/07/ counting-uniques-faster-in-bigquery-with-hyperloglog
HyperLogLog++ (2/2) • ʮਪఆͷਫ਼ pʯΛௐ͢Δ͜ͱͰɺώʔϓ༻ྔΛ੍ ޚ͢Δ͜ͱ͕Ͱ͖Δ • Λେ͖͘͢Δͱਫ਼͕ߴ͘ͳΔ & ۭؒޮѱԽ͢Δ
• ఆ͞ΕΔҟͳΓඞཁͱ͞ΕΔਫ਼ɺώʔϓͷ੍ Λߟྀͯ͠ p Λܾఆ͢Δ • HyperLogLog ͷΈΛཧղ͢ΔʹɺҎԼͷϒϩάΤϯ τϦ͕͓͢͢Ί • http://blog.brainpad.co.jp/entry/2016/06/27/110000
stream-lib ͷ HyperLogLog++
stream-lib ͷ HyperLogLog++ ਫ਼Λࢦఆͯ͠ HyperLogLogPlus() Λ༻ҙ͢Δ HyperLogLogPlus#offer() ͰཁૉΛՃ͍ͯ͘͠ HyperLogLogPlus#cardinality() ͰҟͳΓ͕ಘΒΕΔ
Frequency counting
ཁૉͷසΛ্͑͛Δ
ཁૉͷසΛ্͑͛Δ Map Ͱཁૉ͝ͱͷΧϯλΛදݱ͢Δ ͻͨ͢Βཁૉ͝ͱʹ্͑͛Δ
Count-min sketch (1/2) • ཁૉͷසΛਪఆ͢ΔσʔλߏͷҰͭ • ࣮ࡍͷසΑΓେ͖͍ਪఆΛฦ͢͜ͱ͕ ͋ΔҰํͰɺখ͍͞ਪఆΛฦ͢͜ͱͳ͍ • ස͕খ͍͞ཁૉ΄Ͳɺ͜ͷόΠΞεͷӨ
ڹΛड͚͘͢ͳΔ
Count-min sketch (2/2) • width ͱ depth ͷೋͭͷύϥϝʔλͰɺۭؒ ޮਫ਼Λ੍ޚ͢Δ •
width * depth ͷݸͷΧϯλ͕࡞ΒΕΔ • Χϯλ 2࣍ݩྻͰදݱ • depth ͷ͚ͩϋογϡ͕࣮ؔߦ͞ΕΔͷ ͰɺతͳύϑΥʔϚϯεʹӨڹΛ༩͑Δ
stream-lib ͷ Count-min sketch
stream-lib ͷ Count-min sketch width:10 * depth:30 ͷΧϯλʹΑΔ Count-Min sketch
Λ༻ҙ͢Δ CountMinSketch#add(String, int) ͰΧϯτ͍ͯ͘͠
Quantile estimation
ύʔηϯλΠϧΛٻΊΔ
ύʔηϯλΠϧΛٻΊΔ ιʔτ͞Εͨঢ়ଶͰྻԽ͢Δ ͋ͱ n ύʔηϯλΠϧΛࢀর͢Δ͚ͩ
t-digest • ྻͷҐΛਪఆ͢Δσʔλߏ • ܦݧΛۙࣅతʹදݱ͢Δ • ύʔηϯλΠϧɺ͜ͷܦݧͷۙࣅදݱ͔Βૠ Λ༻͍ͯࢉग़͞ΕΔ • ʮѹॖύϥϝʔλʯʹΑͬͯɺਫ਼ͱۭؒޮͷτϨʔυ
ΦϑΛௐ͢Δ • Λେ͖͘͢Δ͜ͱͰɺਫ਼ΛߴΊΔ͜ͱ͕Ͱ͖Δ
stream-lib ͷ t-digest
stream-lib ͷ t-digest ѹॖύϥϝʔλΛࢦఆͯ͠ TDigest Λ༻ҙ͢Δ TDigest#add(double) ͰΛՃ͍ͯ͘͠ TDigest#quantile(double) ͰύʔηϯλΠϧΛಘΔ
·ͱΊ
·ͱΊ • ֬తσʔλߏΛ༻͍Δ͜ͱͰɺେنσʔλॲཧ ΦϯϥΠϯॲཧΛޮతʹ࣮ݱͰ͖Δʢ͔ʣ • Java Ͱ֬తσʔλߏΛ͓खܰʹѻ͍͍ͨͳΒɺ ·ͣstream-lib ͷར༻Λݕ౼ͯ͠ΈΔ •
ਪఆਫ਼ͱۭؒޮͷτϨʔυΦϑΛ੍ޚ͢Δ ύϥϝʔλͷௐɺ৬ਓܳʹͳΓ͕ͪ • JOL JMH Λ༻͍ͯɺ࣮ࡍͷۭؒޮͱ࣌ؒޮΛ ͖ͪΜͱଌఆ͠ͳ͕Βௐ͢Δ͜ͱΛ͓͢͢Ί͍ͨ͠
Thank you!