KOMIYA Atsushi
August 23, 2017
2.2k

# 確率的データ構造を Java で扱いたい！ #JJUG

JJUG ナイト・セミナー 「ビール片手にLT&納涼会 2017」 の発表資料です。
https://jjug.doorkeeper.jp/events/63719

August 23, 2017

## Transcript

5. ### ֬཰తσʔλߏ଄ͱ͸ʁ • ֬཰࿦తಛੑΛར༻ͨ͠σʔλߏ଄ • ͋Δ໰୊Λɺ࣌ؒత΋͘͠͸ۭؒతʹޮ཰Α͘  (≅লϝϞϦͰ) ղ͘͜ͱΛ໨తͱ͢Δ • ࠓճ͸ʮۭؒޮ཰ͷΑ͍σʔλߏ଄ʯʹண໨ •

σʔλߏ଄ʹΑͬͯ͸ɺݫີղͰ͸ͳۙ͘ࣅղ ͕ಘΒΕΔ͜ͱ͕͋Δ • ਫ਼౓ͱۭؒޮ཰͸τϨʔυΦϑͷؔ܎

7. ### ͲΜͳͱ͖ʹ࢖͏ͷ͔ʁ • ϦΞϧλΠϜ͔ͭେྔʹൃੜ͢ΔσʔλΛ  ΦϯϥΠϯͰॲཧ͍ͨ͠ • ϝϞϦʹऩ·Γ͖Βͳ͍େن໛ͳσʔλΛ  ඇྗͳ PC Ͱॲཧ͍ͨ͠ •

෼ࢄॲཧͰ͖Δ؀ڥ͕͋ΔͳΒɺ͋͑ͯ  ֬཰తσʔλߏ଄Λ࢖͏ඞཁ͸ͳ͍

9. ### ࣗલ࣮૷ʁ ϥΠϒϥϦ࢖͏ʁ • ଟ͘ͷ֬཰తσʔλߏ଄͸ɺͦͷ࿦จ͕͙͙ Ε͹ӾཡՄೳͳঢ়ଶͰ͙͢ʹݟ͔ͭΔ • ͦΕΛಡΜͰࣗલ࣮૷͢Δͷ΋Α͠ • ҰํͰ Maven

central ʹ͸͍ͭ͘΋ͷطଘ࣮ ૷͕ଘࡏ͍ͯ͠Δ • ڊਓͷݞͷ্ʹཱͭͷ͕ݡ͍΍Γํ
10. ### ֬཰తσʔλߏ଄ͷ Java ࣮૷ • stream-lib ‘com.addthis:stream-lib’ • Membership query /

cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation
11. ### ֬཰తσʔλߏ଄ͷ Java ࣮૷ • stream-lib ‘com.addthis:stream-lib’ • Membership query /

cardinality estimation / frequency counting / quantile estimation • Google Guava ‘com.google.guava:guava’ • Membership query • java-hll ‘net.agkn:hll’ • Cardinality estimation • t-digest ‘com.tdunning:t-digest’ • Quantile estimation

17. ### Bloom ﬁlter • ֬཰తʹؒҧͬͨ౴͑ʢଘ൱݁ՌʣΛฦ͢ • ِཅੑ (ଘࡏ͠ͳ͍΋ͷΛଘࡏ͢Δͱޡೝ͢ Δࣄ৅) ͸ੜ͡Δ͕ɺِӄੑ͸ੜ͡ͳ͍ •

ʮ૝ఆ͞ΕΔཁૉͷछྨ਺ʯ΍ʮڐ༰Ͱ͖Δِ ཅੑͷ֬཰ʯΛࢦఆͯ͠ɺώʔϓ࢖༻ྔΛ੍ޚ Ͱ͖Δ • ཁૉͷ௥Ճ͸Ͱ͖Δ͕ɺ࡟আ͸೉͍͠

20. ### ώʔϓ࢖༻ྔΛ֬ೝͯ͠ΈΔ • “Lorem ipsum” ͷςΩετΛྫʹɺJOL (Java Object Layout) Ͱώʔϓ࢖༻ྔΛଌఆ •

http://openjdk.java.net/projects/code- tools/jol/ • Set: 6,032 bytes • stream-lib BloomFilter: 136 bytes 97.8% smaller !

25. ### HyperLogLog++ (2/2) • ʮਪఆ஋ͷਫ਼౓ pʯΛௐ੔͢Δ͜ͱͰɺώʔϓ࢖༻ྔΛ੍ ޚ͢Δ͜ͱ͕Ͱ͖Δ • ஋Λେ͖͘͢Δͱਫ਼౓͕ߴ͘ͳΔ & ۭؒޮ཰͸ѱԽ͢Δ

• ૝ఆ͞ΕΔҟͳΓ਺΍ඞཁͱ͞ΕΔਫ਼౓ɺώʔϓͷ੍໿ Λߟྀͯ͠ p Λܾఆ͢Δ • HyperLogLog ͷ࢓૊ΈΛཧղ͢Δʹ͸ɺҎԼͷϒϩάΤϯ τϦ͕͓͢͢Ί • http://blog.brainpad.co.jp/entry/2016/06/27/110000

ڹΛड͚΍͘͢ͳΔ
32. ### Count-min sketch (2/2) • width ͱ depth ͷೋͭͷύϥϝʔλͰɺۭؒ ޮ཰΍ਫ਼౓Λ੍ޚ͢Δ •

width * depth ͷݸ਺ͷΧ΢ϯλ͕࡞ΒΕΔ • Χ΢ϯλ͸ 2࣍ݩ഑ྻͰදݱ • depth ͷ਺͚ͩϋογϡؔ਺͕࣮ߦ͞ΕΔͷ Ͱɺ଎౓తͳύϑΥʔϚϯεʹӨڹΛ༩͑Δ

38. ### t-digest • ਺஋ྻͷ෼Ґ਺Λਪఆ͢Δσʔλߏ଄ • ܦݧ෼෍Λۙࣅతʹදݱ͢Δ • ύʔηϯλΠϧ஋͸ɺ͜ͷܦݧ෼෍ͷۙࣅදݱ͔Β಺ૠ Λ༻͍ͯࢉग़͞ΕΔ • ʮѹॖύϥϝʔλʯʹΑͬͯɺਫ਼౓ͱۭؒޮ཰ͷτϨʔυ

ΦϑΛௐ੔͢Δ • ஋Λେ͖͘͢Δ͜ͱͰɺਫ਼౓ΛߴΊΔ͜ͱ͕Ͱ͖Δ

42. ### ·ͱΊ • ֬཰తσʔλߏ଄Λ༻͍Δ͜ͱͰɺେن໛σʔλॲཧ΍ ΦϯϥΠϯॲཧΛޮ཰తʹ࣮ݱͰ͖Δʢ͔΋ʣ • Java Ͱ֬཰తσʔλߏ଄Λ͓खܰʹѻ͍͍ͨͳΒɺ  ·ͣ͸stream-lib ͷར༻Λݕ౼ͯ͠ΈΔ •

ਪఆਫ਼౓ͱۭؒޮ཰ͷτϨʔυΦϑΛ੍ޚ͢Δ  ύϥϝʔλͷௐ੔͸ɺ৬ਓܳʹͳΓ͕ͪ • JOL ΍ JMH Λ༻͍ͯɺ࣮ࡍͷۭؒޮ཰ͱ࣌ؒޮ཰Λ ͖ͪΜͱଌఆ͠ͳ͕Βௐ੔͢Δ͜ͱΛ͓͢͢Ί͍ͨ͠