Java のデータ圧縮ライブラリを極める #jjug_ccc #ccc_c7
by
KOMIYA Atsushi
×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
Java ͷ σʔλѹॖϥΠϒϥϦΛۃΊΔ JJUG CCC 2018 Spring / 2018-05-26 KOMIYA Atsushi
Slide 2
Slide 2 text
@komiya_atsushi / খٶ ಞ࢙
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
σʔλѹॖͷجૅࣝ
Slide 5
Slide 5 text
ଛࣦͷͳ͍ ൚༻తͳ σʔλѹॖΞϧΰϦζϜ
Slide 6
Slide 6 text
ଛࣦͷ༗ແ • ѹॖ͞Εͨঢ়ଶͷσʔλ͔ΒɺѹॖલͷσʔλΛ 1 bit ҧΘͣ෮ݩͰ͖Δ͔൱͔ • 1 bit ͰҟͳΔͷͰ͋Εʮଛࣦ͋Γʯ • ଛࣦͷͳ͍ѹॖΞϧΰϦζϜͷྫ • Deflate (ZIP, gzip, PNG), LZW (GIF, compress) • ଛࣦͷ͋ΔѹॖΞϧΰϦζϜͷྫ • JPEG (ISO/IEC 10918-1:1994), MPEG-1
Slide 7
Slide 7 text
൚༻తͳѹॖΞϧΰϦζϜ • ςΩετɾը૾ɾԻͳͲͷσʔλͷछྨΛ Θͣʹద༻Ͱ͖ΔѹॖΞϧΰϦζϜͷ͜ͱ • ઌͷྫͩͱ Deflate LZW ͳͲ͕֘ • JPEG / MPEG ͷΑ͏ʹɺͦΕͧΕͷσʔλʹ ಛԽͨ͠ѹॖΞϧΰϦζϜͷํ͕Ұൠతʹ ੑೳ͕Α͍
Slide 8
Slide 8 text
ѹॖΞϧΰϦζϜͷධՁ • ѹॖ • ͱͷσʔλαΠζ (όΠτ) ʹରͯ͠ɺͲΕͩ ͚খ͘͞Ͱ͖͔ͨʁ • খ͚͞Εখ͍͞΄ͲΑ͍ • ʮѹॖલͷαΠζ / ѹॖޙͷαΠζʯͰද͞ΕΔ ͜ͱ͋ΔͷͰҙ͕ඞཁ • ͜ͷ߹ɺ͕େ͖͍ํ͕Α͍ੑೳͱͳΔ
Slide 9
Slide 9 text
ѹॖΞϧΰϦζϜͷධՁ • ॲཧ • ୯Ґ࣌ؒ͋ͨΓʹѹॖɾ෮ݩͰ͖Δσʔλͷྔ (୯Ґ MB/ඵͳͲ) • ΞϧΰϦζϜʹΑͬͯɺѹॖͱ෮ݩͦΕͧΕʹ͔͔Δ ͕࣌ؒେ͖͘ҟͳΔ͜ͱ͋Δ (ѹॖ͍͕෮ݩ ͍ɺͳͲ) • ѹॖɾ෮ݩʹ͔͔Δ࣌ؒʹՃ͑ͯɺѹॖσʔλΛૹ͢ Δͷʹ͔͔Δؚ࣌ؒΊͨεϧʔϓοτΛग़͢͜ͱ͋ Δ
Slide 10
Slide 10 text
ѹॖΞϧΰϦζϜͷධՁ • ѹॖͱॲཧͷؔ • τϨʔυΦϑͷؔʹͳΔ • ߴ͍ѹॖΛಘΔͨΊʹίϯϐϡʔςΟϯάϦ ιʔεΛࠅ͍ͯ͠͠ॲཧΛ͢Δ → ॲཧ ͘ͳΔ • ͍ॲཧΛٻΊΔͱɺ͍͠ॲཧͰ͖ͳ ͘ͳΔ → ѹॖ͘ͳΔ
Slide 11
Slide 11 text
ѹॖΞϧΰϦζϜͷධՁ • ϝϞϦɾϑοτϓϦϯτ • ѹॖ͓Αͼ෮ݩͦΕͧΕͷॲཧʹ͓͍ͯඞཁͱͳΔ࠷େͷ ϫʔΩϯάϝϞϦྔ • ѹॖΞϧΰϦζϜύϥϝʔλʹΑͬͯେ͖͘ࠨӈ͞ΕΔ • ॲཧͱಉ͘͡ɺѹॖɾ෮ݩͦΕͧΕͷϝϞϦɾϑοτ ϓϦϯτ͕ඇରশʹͳΔ͜ͱ͋Δ • ෮ݩॲཧͰඞཁͳϝϞϦɾϑοτϓϦϯτ͕খ͍͞έʔ ε͕ଟ͍
Slide 12
Slide 12 text
࠷ۙͷ σʔλѹॖΞϧΰϦζϜࣄ
Slide 13
Slide 13 text
2010 ʹೖΔ·Ͱ • Deflate ͕σϑΝΫτελϯμʔυͱͯ͠܅ྟ • ͦͦ͜͜ͷͰɺ·ͣ·ͣͷѹॖΛୡͰ͖Δ • ϝϞϦɾϑοτϓϦϯτ͕ѹॖ࣌Ͱඦ KB ͱ (ࠓͱͳͬͯͱͯ) খͯ͘͞ࡁΉ • ੑೳ͕ٻΊΒΕΔ໘Ͱ LZO ͱ͍͏બࢶ͕͋ͬͨ • ͨͩ͠ϥΠηϯε͕ GPLv2 ͳͷͰѻ͍ͮΒ͍
Slide 14
Slide 14 text
࠷ۙ (2010Ҏ߱) ͷτϨϯυ • طଘΞϧΰϦζϜͷѹॖΛྗٕͰ࠷దԽ • Zopfli (Deflate), Guetzli (JPEG) • ॲཧͷ͞ʹಛԽ • Snappy, LZ4 • ॲཧͱѹॖͷόϥϯεΛॏࢹ • Zstandard, Brotli
Slide 15
Slide 15 text
࠷ۙ (2010Ҏ߱) ͷτϨϯυ • طଘΞϧΰϦζϜͷѹॖΛྗٕͰ࠷దԽ • Zopfli (Deflate), Guetzli (JPEG) • ॲཧͷ͞ʹಛԽ • Snappy, LZ4 • ॲཧͱѹॖͷόϥϯεΛॏࢹ • Zstandard, Brotli
Slide 16
Slide 16 text
Snappy • Google • LZ77 Λϕʔεͱͨ͠ΞϧΰϦζϜ • όΠτ୯Ґͷ I/O • ѹॖ͓Αͼ෮ݩͷॲཧ͕͍ • ͦͷɺѹॖ΄Ͳ΄ͲͰܾͯ͠ྑ͘ͳ͍ • BSD-type ϥΠηϯε
Slide 17
Slide 17 text
LZ4 • Snappy ͱಉ༷ʹɺLZ77 ϕʔε & όΠτ୯Ґͷ I/O Λ࠾༻ • ಛʹ෮ݩॲཧͷΛॏࢹ͍ͯ͠Δ • ѹॖ Snappy ͱಉఔ͔ͪΐͬͱѱ͍͙Β͍ʁ • ѹॖॲཧʹ͔͔Δ࣌ؒΛ٘ਜ਼ʹͭͭ͠ɺߴ͍ѹॖΛಘΔ HC (high compression) Φϓγϣϯఏڙ͍ͯ͠Δ • Deflate ΞϧΰϦζϜʹഭΔѹॖΛୡͰ͖Δ • BSD ϥΠηϯε
Slide 18
Slide 18 text
LZ4 ར༻࣮͕ͱͯ๛
Slide 19
Slide 19 text
Zstandard • LZ77 ͱ ANS (Asymmetric numeral system) ͷҰ࣮Ͱ͋Δ FSE (Finite state entropy) ΛΈ߹ΘͤͨΞϧΰϦζϜ • FSE ͷΘΓʹ Huffman ූ߸Λ͏͜ͱͰ͖Δ • Deflate ͱಉ͔ɺͦΕҎ্ͷѹॖ͓ΑͼॲཧΛୡ͢Δ • ෮ݩ Snappy, LZ4 ΄ͲͰͳ͍ʹͯ͠ Deflate ΑΓ͍ • σʔλʹಛԽͨࣙ͠ॻΛߏங͠ɺͦΕΛ༻͍ͯѹॖΛ্ͤ͞ΔΈΛඋ͑ͯ ͍Δ • LZ4 ͱಉ͡։ൃऀ (ݱࡏ Facebook ʹॴଐ) • BSD / GPLv2 ͷσϡΞϧϥΠηϯε • Ҏલ͋ͷ Facebook BSD + Patents ϥΠηϯεͩͬͨ
Slide 20
Slide 20 text
Brotli • Google • LZ77 ͱ 2 ࣍ͷ౷ܭతϞσϦϯάΛར༻ͨ͠ Huffman ූ߸ͷΈ ߹Θͤ • Zstandard ͱಉ༷ʹɺDeflate ͱಉఔҎ্ͷੑೳΛୡ͢Δ • HTTP ѹॖʹ͓͚ΔΤϯίʔσΟϯάͷҰͭͱͯ͠࠾༻͞Ε͍ͯΔ • ࣄલఆٛ͞Ε Brotli ʹΈࠐ·ΕͨࣙॻΛ༻͍ͯɺѹॖΛ্ ͤ͞ΔΈ͕උΘ͍ͬͯΔ • MIT ϥΠηϯε
Slide 21
Slide 21 text
Brotli Chrome Ͱ www.google.com ʹΞΫηεͯ͠ΈΔͱ…
Slide 22
Slide 22 text
Java ʹ͓͚Δ σʔλѹॖϥΠϒϥϦ
Slide 23
Slide 23 text
ϥΠϒϥϦʹ·ΕΔཁૉɾಛੑ • ѹॖΞϧΰϦζϜͷ࣮ (JNI binding vs pure Java) • σʔλѹॖಘͯͯ͠ CPU-intensive ͳॲཧʹͳΔ • ϦϑΝϨϯε࣮ͷωΠςΟϒϥΠϒϥϦΛ JNI binding ͢Δͷ͕ੑೳతʹ·͍͠ • ֤छ OS / ΞʔΩςΫνϟ͚ͷϏϧυࡁΈωΠςΟϒ ϥΠϒϥϦΛ༻ҙ͠ͳ͚ΕͳΒͳ͍͕ ☹ • Pure Java ࣮ɺੑೳҎ֎ʹࡉ͔ͳڍಈͷҧ͍ ΞϧΰϦζϜͦͷͷͷਐԽͱ͍ͬͨͰϦϑΝϨϯε࣮ ͱဃ͕ੜ͍͢͡
Slide 24
Slide 24 text
ϥΠϒϥϦʹ·ΕΔཁૉɾಛੑ • ఏڙ͞ΕΔΠϯλϑΣʔε • ϦϑΝϨϯε࣮ͷϥΠϒϥϦ͕ఏڙ͢ΔΠϯλϑΣʔεʹՃ ͑ͯɺjava.io ύοέʔδͷ InputStream / OutputStream ʹैͬͨΠϯλϑΣʔεΛఏڙ͍ͯ͠Δͷ͕·͍͠ • ଞݴޠͷόΠϯσΟϯάͱͷ૬ޓӡ༻ੑ • ྫ͑ɺJava ͷϥΠϒϥϦͰѹॖͨ͠σʔλΛ Ruby ͷόΠϯ σΟϯάͰ෮ݩͰ͖Δ͔ʁ • LZ4 ͷΑ͏ʹɺඪ४ͷϑϨʔϜϑΥʔϚοτ͕ଘࡏ͠ͳ͔ͬͨ ѹॖΞϧΰϦζϜಛʹҙ͕ඞཁ
Slide 25
Slide 25 text
Java ͚ͷσʔλѹॖϥΠϒϥϦ • Snappy • 'org.xerial.snappy:snappy-java' • LZ4 • 'org.lz4:lz4-java' • Zstandard • 'com.github.luben:zstd-jni' • Brotli • 'org.meteogroup.jbrotli:jbrotli'
Slide 26
Slide 26 text
snappy-java • JNI binding ʹΑΔ࣮ • ଟ༷ͳ OS / ΞʔΩςΫνϟΛαϙʔτ • ྻσʔλͷѹॖΛߴΊΔ BitShuffle ͷ࣮Λఏڙ • ૬ޓӡ༻ੑͷ͋ΔΠϯλϑΣʔε • SnappyFramedInputStream • SnappyFramedOutputStream
Slide 27
Slide 27 text
lz4-java • JNI binding, Unsafe API, pure Java ͷ 3 ͭͷ࣮Λఏڙ • JNI binding ओཁͳ OS / ΞʔΩςΫνϟΛαϙʔτ • ͳΔ͘ߴͳ࣮͕ΘΕΔΈʹͳ͍ͬͯΔ • JNI biding → Unsafe API → pure Java ͷ༏ઌॱҐ • ૬ޓӡ༻ੑͷ͋ΔΠϯλϑΣʔε • LZ4FrameInputStream • LZ4FrameOutputStream • ࠷৽όʔδϣϯ (1.4.1) Ͱ High compression Φϓγϣϯ͕ࢦఆͰ͖ͳ ͍ͳͲͷ੍ݶ͕͋Δ
Slide 28
Slide 28 text
zstd-jni • JNI binding ʹΑΔ࣮ • ओཁͳ OS / ΞʔΩςΫνϟΛαϙʔτ • ૬ޓӡ༻ੑͷ͋ΔΠϯλϑΣʔε • ZstdInputStream • ZstdOutputStream • όοϑΝϦϯάͷΈ͕࣮͞Ε͍ͯͳ͍ͷͰɺ ZstdOutputStream#write(int) Λසൟʹݺͼग़͢ͳͲͷ ͍ํΛ͢Δͱ JNI ༝དྷͷΦʔόʔϔουͰ͘ͳΔՄೳੑ͕͋Δ
Slide 29
Slide 29 text
jbrotli • JNI binding ʹΑΔ࣮ • ωΠςΟϒϥΠϒϥϦຊମίʔυͷ jar ϑΝΠϧ ͱผʹఏڙ͞Ε͍ͯΔ • OS / ΞʔΩςΫνϟ͝ͱʹ jar ϑΝΠϧ͕༻ҙ͞ Ε͍ͯΔͷͰɺ։ൃڥ / ࣮ߦڥʹ߹ͬͨϥΠ ϒϥϦ͚ͩґଘؔʹՃ͢ΕΑ͍ • BrotliLibraryLoader.loadBrotli() ͷ͓· ͡ͳ͍͕ඞཁ
Slide 30
Slide 30 text
jbrotli • ૬ޓӡ༻ੑͷ͋ΔΠϯλϑΣʔε • BrotliInputStream • BrotliOutputStream • ࠷৽όʔδϣϯ (0.5.0) Ͱෆ۩߹͕͋ΔΑ͏Ͱɺ BrotliOutputStream Ͱѹॖͨ͠σʔλΛ BrotliInputStream Ͱ෮ݩͰ͖ͳ͍͜ͱ͕͋Δ
Slide 31
Slide 31 text
Java ΞϓϦέʔγϣϯʹ͓͚Δ ѹॖΞϧΰϦζϜͷબఆج४
Slide 32
Slide 32 text
ͲͷѹॖΞϧΰϦζϜΛબ͖͔͢ʁ • ॲཧͱѹॖɺͲͪΒΛॏࢹ͢Δͷ͔ʁ • ॲཧΛॏࢹ: LZ4, Snappy • ѹॖΛॏࢹ: Zstandard (, Brotli) • ରԠ͍ͯ͠Δ OS / ΞʔΩςΫνϟͷଟ༷͞ • Deflate (JDK built-in), LZ4 (pure Java) • ࣍: Snappy
Slide 33
Slide 33 text
ࢀߟ: Docker ίϯςφͰͷར༻ • Alpine Linux ͱ JNI binding ͳσʔλѹॖϥΠϒϥϦ ͷΈ߹Θ͕ͤൃੜ͍͢͠ͷͰҙ͕ඞཁ • https://github.com/xerial/snappy-java/issues/181 • https://github.com/luben/zstd-jni/issues/38 • https://github.com/luben/zstd-jni/issues/43 • (ͦΕͧΕ࠷৽όʔδϣϯͰղফࡁΈ)
Slide 34
Slide 34 text
ࢀߟ: ѹॖɾѹॖͷൺֱ ϥΠϒϥϦ ύϥϝʔλ ѹॖॲཧ<.#T> ѹॖ<> EFqBUF EFqBUF EFqBUF M[KBWB ,# M[KBWB .# TOBQQZKBWB ,# [TUEKOJ [TUEKOJ [TUEKOJ -BSHFUFYUDPNQSFTTJPOCFODINBSLͷϑΝΠϧ FOXJLΑΓઌ಄.#ΛΓग़ͯ͠ධՁʹར༻
Slide 35
Slide 35 text
ࢀߟ: ෮ݩɾѹॖͷൺֱ ϥΠϒϥϦ ύϥϝʔλ ෮ݩॲཧ<.#T> ѹॖ<> EFqBUF M[KBWB ,# TOBQQZKBWB ,# [TUEKOJ
Slide 36
Slide 36 text
·ͱΊ
Slide 37
Slide 37 text
·ͱΊ • ѹॖͱॲཧɺͲͪΒΛॏࢹ͢Δ͔ͰɺదͳѹॖΞ ϧΰϦζϜมΘͬͯ͘Δ • ݸਓతͳݟղͱͯ͠… • ѹॖ (ͱॲཧͷόϥϯε) ॏࢹ: Zstandard • ॲཧॏࢹ: LZ4 • ѹॖΞϧΰϦζϜͱσʔλͱͷʮ૬ੑʯ͕͋Γ͏ΔͷͰɺ ࣮ࡍʹѻ͏σʔλͰϕϯνϚʔΫΛऔͬͯΈΔ͜ͱ͕େࣄ
Slide 38
Slide 38 text
Thank you!
Slide 39
Slide 39 text
We’re hiring! • ͯ͢ͷืू৬छ • bit.ly/SmartNews-Hiring • Φʔϓϯϙδγϣϯ / ϙδγϣϯαʔν • bit.ly/SmartNews-OpenPosition