Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to loo...
Search
polamjag
January 23, 2019
Programming
200
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to look over unmanaged data with fully managed services
ref.
http://developer.hatenastaff.com/entry/2019/01/11/110000
polamjag
January 23, 2019
More Decks by polamjag
See All by polamjag
転ばぬ先のXS入門 / YAPC::Fukuoka 2025
polamjag
0
530
障害対応を起点としたもっといい開発と運用のサイクル作りのためにできること / Hatena Enginner Seminar #29
polamjag
0
1k
GitHub Linguistと学ぶ!全国このコードはPerlとRakuどっちでしょうクイズ選手権大会 in YAPC::Hiroshima 2024 #yapcjapan
polamjag
0
880
はてなブログとチーム構成とスクラムのこの1年 #dmm_hatena
polamjag
0
3.4k
Google I/O で見た Android Q 以外の話 / #io19 recap
polamjag
0
170
Other Decks in Programming
See All in Programming
A2UI という光を覗いてみる
satohjohn
1
130
Inside Stream API
skrb
1
680
さぁV100、メモリをお食べ・・・
nilpe
0
140
運用エージェントは "作る" から "育てる" へ - 記憶と自己進化の3層設計パターン / self-evolving-agents-three-layer-agent-design
gawa
12
3.6k
技術記事、AIに書かせるか、自分で書くか? 〜それでも私が自分の手で書く理由〜 / #QiitaConference
jnchito
2
1.3k
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
260
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
490
脅威をエンジニアリングの糧にして――現場編 / Turning Threats into Engineering Fuel — Field Edition
nrslib
0
270
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
120
AI駆動開発で崩れていくコードベースを立て直す
kyoko_nr_nr
1
450
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
240
Featured
See All Featured
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
730
Typedesign – Prime Four
hannesfritz
42
3.1k
エンジニアに許された特別な時間の終わり
watany
107
250k
Amusing Abliteration
ianozsvald
1
200
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
360
Visualization
eitanlees
152
17k
How to build a perfect <img>
jonoalderson
1
5.6k
Paper Plane (Part 1)
katiecoart
PRO
0
8.8k
Thoughts on Productivity
jonyablonski
76
5.2k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
The Cult of Friendly URLs
andyhume
79
6.9k
The Invisible Side of Design
smashingmag
302
52k
Transcript
σʔλج൫Λ࡞Δͷʹ͏Α͏ͳαʔϏε ୯ൃͰ͏͚ͩͰศར άϩʔεϋοΫͷจ຺Λఴ͑ͯ Hatena Engineer Seminar #11 2019-01-23 @ Tokyo
/ 2019-01-30 @ Kyoto id:polamjag
whoami • id:polamjag ϙ ϥ Ϝ δ ϟ ά •
ͯͳϒϩά • ΞϓϦέʔγϣϯΤϯδχΞ
ࠓ͢͜ͱ • େ౷Ұσʔλج൫ͷΑ͏ͳଘࡏ͕ແ͍ঢ়ଶͰɺ σʔλج൫Λ࡞Δͷʹ͏Α͏ͳαʔϏε ΞυϗοΫͳੳʹ୯ൃͰ͑Δ͚ͩͰศར • Google BigQuery ͱ Amazon
Athena Ͱ ·ͣઌͷ͔Βղܾ
ࠓ͞ͳ͍͜ͱ • ͍͢͝σʔλج൫Λθϩ͔Β࡞͍ͬͯ͘ํ๏ • ͍͢͝σʔλج൫Λ׆༻͢Δํ๏ • ࣮ࡍͷࢪࡦͦͷͷʹ͍ͭͯ • ࣈͷݟํɺଊ͑ํ
άϩʔεϋοΫ https://trends.google.com/trends/explore?date=all&geo=JP&q=άϩʔεϋοΫ
ͱ͍͑ࣈ
(ผʹάϩʔεϋοΫʹݶΒͣ) ͪΐͬͱ͜͏͍͏ࣈΛ ݟͯΈ͍ͨΜ͚ͩͲɺͱ͍͏ͱ͖
(ผʹάϩʔεϋοΫʹݶΒͣ) ͪΐͬͱ͜͏͍͏ࣈΛ ݟͯΈ͍ͨΜ͚ͩͲɺͱ͍͏ͱ͖ αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ
αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ (ͯͳϒϩά MySQL ͳͷͰͦΕΛલఏʹ)
• SQL Λී௨ʹୟ͘ • ॻࣺͯεΫϦϓτΛॻ͍ͯूܭͨ͠ͷΛεϓϨο υγʔτʹΞοϓϩʔυ • ͦ͏͍͏ͷΛόονͱͯ͠ఆظ࣮ߦ͢Δ • ……
• SQL Λී௨ʹୟ͘ • ෳͷσʔλϕʔεʹ·͕ͨͬͯूܭ͍ͨ͠? • SELECT COUNT(*) ʹ͔͔࣌ؒΔΑ͏ͳςʔϒϧͰͱʹ͔͘ू ܭ͍ͨ͠?
• ूܭΫΤϦΛߴԽ͢ΔͨΊʹՃͰ index ཉ͍͚͠Ͳ ͱΓ͋͑ͣݟͯΈ͍ͨϨϕϧͷूܭͷͨΊʹ ALTER TABLE Ͱ͖Δ? • ྺ࢙తܦҢͰγϦΞϥΠζ͞Εͨσʔλ͕ೖͬͯΔΧϥϜ͕͋Δ ͷΛΫΤϦ͍ͨ͠?
• SQL Λී௨ʹୟ͘ • ୟ͍ͨ݁Ռͷڞ༗ՄࢹԽͱ͔Ͳ͏͢Δ? • ॻࣺͯεΫϦϓτΛॻ͍ͯूܭͨ͠ͷΛεϓϨου γʔτʹΞοϓϩʔυ • ͦ͏͍͏ͷΛόονͱͯ͠ఆظ࣮ߦ͢Δ
• ͦͷεϓϨουγʔτ୭͕ϝϯςφϯε͢Δ? • …………
!!! BigQuery !!! • ͏·͍͍͍҆ۜͷؙ • సૹʹ Embulk Λ͍ɺ DSL
ͷதͰσʔλΛ͔ͯ͠Β όϧΫϩʔσΟϯά͢Δ • ඞཁͳσʔλશ෦ूΊ͖ͯͨΒવ JOIN ͳͲͰ͖Δ • σʔλྔ͕͍ͨͨ͜͠ͱͳ͍[ͲΕ͘Β͍?]ͳΒߟ͑ͳ͍ͱ͍ ͚ͳ͍͜ͱҰͳ͍
ΛϫϯγϣοτͰ͏ • Embulk ͰσʔλૹΔ • 1ճ͖ΓͳΒߟ͑ͳ͍ͱ͍͚ͳ͍͜ͱগͳ͍ • (ఆظ࣮ߦɾߋ৽ͷ͘͠ΈɺΠϯΫϦϝϯλϧϩʔυɺ……) • BigQuery
ͰͳΜΒ͔ूܭ͢Δ • Google εϓϨουγʔτʹΤΫεϙʔτ͢ΔϘλϯ • σʔλϙʔλϧ (Data Studio) ʹಥͬࠐΉ
BigQuery ʹΠϯϙʔτ͠ʹ͍͘ςʔϒϧ • ׂͷׂςʔϒϧΛ͍ʹ͍͘ςʔϒϧ • Ϩίʔυ͕ INSERT ͞Εͨ࣌ࠁɺΛද͢ΧϥϜ͕ແ͍ • ্Ґ֓೦ͷͦ͏͍͏ใΛ
JOIN ͖ͯͯͦ͠ΕͰύʔςΟγϣ ϯ͢Δ? • αʔϏεͷྺ࢙͕ 4000 (≒ 10.96 ) Ҏ্͋Δ • צͰύʔςΟγϣϯͷִؒΛؒҾ͘? • Α͘Α͘ߟ͑ͨΒຊ·ͩύʔςΟγϣχϯάඞཁͳ͍ͱ͔
Google BigQuery ྉۚ • ౦ژϦʔδϣϯͳΒετϨʔδ୯Ձ S3 ΑΓ͍҆ • 0.023USD /
GB / ݄ • ΫΤϦྉۚ 8.55 USD / 1 TB • ࠓͷҝସϨʔτͩͱ͍͍ͩͨ 1 GB ͋ͨΓ 1 ԁऑ • ͪΐͬͱ͍ͨ͠ํͳΒ΄ͱΜͲޡࠩϨϕϧ
͜͜·Ͱͷ·ͱΊ • αʔϏεͷσʔλʹରͯ͠ϔϏʔͳूܭΛͨ͘͠ ͳͬͨͱ͖ͷಀ͛ಓͱͯ͠ͷ Google BigQuery Λ ͝հ͠·ͨ͠
αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ
ͦͷϩάࠓͲ͜ʹ • Ͳ͔͜ͷαʔό্? • ͦΕͦ͜ BigQuery ͱ͔ RedShift ͱ͔? •
Amazon S3 ʹஔ͔Ε͍ͯΔ? • ࣗͷ߹ݟ͔ͨͬͨͷ͕ S3 ʹஔ͔Εͯ ͍ͨ
S3 ʹஔ͍ͯ͋ΔͳΒ Amazon Athena Ͱ • S3 ʹஔ͍ͯ͋ΔϑΝΠϧΛΫΤϦͰ͖ΔϚωʔδυαʔϏε • த
Presto (https://prestodb.io/) • Presto Hive ͷ͕ࣝ͋Δͱ͍͜ͳͤΔͱ͍͏͜ͱ͕ଟ͍ • ϑϧϚωʔδυͳͷͰɺूܭͷͨΊʹετϨʔδͱωοτϫʔΫ ڧΊͷϚγϯΛ༻ҙͨ͠Γɺͱ͍ͬͨ͜ͱΛ͠ͳ͍͍ͯ͘ • ͓ΉΖʹ S3 ͷϑΝΠϧΛࢦఆͯ͠ CREATE TABLE ͯ͠ΫΤ Ϧ͢Δ͚ͩ
• ҙͷϑΥʔϚοτΛύʔεͯ͠ SQL ͰΫΤϦ • CSV/TSV, JSONL, ֤छΧϥϜφ (ྻࢤ) ϑΝΠϧͳͲ
ඪ४αϙʔτ • LTSV ਖ਼نදݱͰύʔεͯ͠ΫΤϦͰ͖Δ • → ਖ਼نදݱͰύʔεͯ͠Λൈ͖ग़ͤΔϑΝΠϧͳ ΒԿͰΫΤϦͰ͖Δ • ύʔε͠ͳ͍͜ͱͰ grep as a service ͱͯ͠׆༻
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false') ……
ใ • S3 ʹͲΜͲΜϩάΛอଘ͢Δͱ͖ɺHive ܗࣜͷσΟϨΫτϦͰ อଘͯ͠ΔͱύʔςΟγϣχϯά͕ࣗಈͰߦΘΕͯศར • ྫ: s3://bucket/somelogs/year=2019/month=1/ day=23/engineer_seminar_001.log
• ͦ͏ͳͬͯͳͯ͘ symlink ͰͳΜͱ͔Ͱ͖Δͱ͍͏͋Δ • https://dev.classmethod.jp/cloud/aws/create-athena-table- with-symboliclink/
Amazon Athena ஈ • εΩϟϯͨ͠༰ྔʹରͯ͠՝ۚ (5 USD / 1 TB)
• gzip ѹॖࡁϑΝΠϧͦͷ··ΫΤϦͰ͖Δ • ྻࢤͰͳ͍ϑΝΠϧʹରͯ͠ΫΤϦ͢ΔͳΒɺूܭ ͢ΔΧϥϜΛߜͬͯྉۚઅʹͳΒͳ͍ • ج൫ͱͯ͠׆༻͍ͯ͘͠ײ͡ͳΒɺ ྻࢤϑΝΠϧʹม͢ΔϑΣʔζ͕ͳ͍ͱݫͦ͠͏
͜͜·Ͱͷ·ͱΊ • Amazon Athena Λ͏͜ͱͰɺ S3 ্ʹ͋ΔͳΒ͍͍ͩͨͲΜͳϩάͰͱ͘ʹ લ४උ͍͖ͤͣͳΓूܭͰ͖Δ
• όϧΫϩʔυͯ͠͠·͑ͬͪ͜ͷͷͱ͍͏ ؾ࣋ͪͰരूܭͰ͖Δάοζ • όϧΫϩʔυؾ͕ॏ͍ͱ͍͏ϩάϑΝΠϧͰɺ ϑΝΠϧͷ͋Γ͔Λઃఆ͢Δ͚ͩͰɺ Կ४උΛ͍͖ͤͣͳΓूܭͰ͖Δάοζ • ઌͷ݁ߏ͘͢ղܾͰ͖ΔΑ͏ʹ
• ઌͷΛղܾ͠ଓ͚ͯΔͱී௨ʹج൫ͬΆ͍ ͷ͕ཉ͘͠ͳͬͯ͘Δ͠ɺ͜͏ͳͬͯΔͱྑ͍ ͔ͳͱ͍͏ͷ͕ݟ͑ͯ͘Δ • ج൫͕͋Δͷʹӽͨ͜͠ͱͳ͍ • ͍Ζ͍Ζࢼ͢͜ͱͰσʔλͱͷ͖߹͍ํͷ Πϝʔδ͕ΒΜͩͷྑ͔ͬͨ