Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to loo...
Search
polamjag
January 23, 2019
Programming
200
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to look over unmanaged data with fully managed services
ref.
http://developer.hatenastaff.com/entry/2019/01/11/110000
polamjag
January 23, 2019
More Decks by polamjag
See All by polamjag
転ばぬ先のXS入門 / YAPC::Fukuoka 2025
polamjag
0
530
障害対応を起点としたもっといい開発と運用のサイクル作りのためにできること / Hatena Enginner Seminar #29
polamjag
0
1k
GitHub Linguistと学ぶ!全国このコードはPerlとRakuどっちでしょうクイズ選手権大会 in YAPC::Hiroshima 2024 #yapcjapan
polamjag
0
880
はてなブログとチーム構成とスクラムのこの1年 #dmm_hatena
polamjag
0
3.4k
Google I/O で見た Android Q 以外の話 / #io19 recap
polamjag
0
170
Other Decks in Programming
See All in Programming
The NotImplementedError Problem in Ruby
koic
1
700
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
160
CSC307 Lecture 17
javiergs
PRO
0
320
TAKTでAI駆動開発の品質を設計する
j5ik2o
6
1.2k
Oxcを導入して開発体験が向上した話
yug1224
4
300
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
330
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
250
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
190
LLMによるContent Moderationの本番運用の裏側と品質担保への挑戦
suikabar
2
530
Vue × Nuxt × Oxc どこまで使える?実運用の現在地
andpad
0
160
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
120
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
5.3k
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.5k
Code Reviewing Like a Champion
maltzj
528
40k
A designer walks into a library…
pauljervisheath
211
24k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
200
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
160
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
The Invisible Side of Design
smashingmag
302
52k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Facilitating Awesome Meetings
lara
57
7k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
The Curious Case for Waylosing
cassininazir
1
380
Transcript
σʔλج൫Λ࡞Δͷʹ͏Α͏ͳαʔϏε ୯ൃͰ͏͚ͩͰศར άϩʔεϋοΫͷจ຺Λఴ͑ͯ Hatena Engineer Seminar #11 2019-01-23 @ Tokyo
/ 2019-01-30 @ Kyoto id:polamjag
whoami • id:polamjag ϙ ϥ Ϝ δ ϟ ά •
ͯͳϒϩά • ΞϓϦέʔγϣϯΤϯδχΞ
ࠓ͢͜ͱ • େ౷Ұσʔλج൫ͷΑ͏ͳଘࡏ͕ແ͍ঢ়ଶͰɺ σʔλج൫Λ࡞Δͷʹ͏Α͏ͳαʔϏε ΞυϗοΫͳੳʹ୯ൃͰ͑Δ͚ͩͰศར • Google BigQuery ͱ Amazon
Athena Ͱ ·ͣઌͷ͔Βղܾ
ࠓ͞ͳ͍͜ͱ • ͍͢͝σʔλج൫Λθϩ͔Β࡞͍ͬͯ͘ํ๏ • ͍͢͝σʔλج൫Λ׆༻͢Δํ๏ • ࣮ࡍͷࢪࡦͦͷͷʹ͍ͭͯ • ࣈͷݟํɺଊ͑ํ
άϩʔεϋοΫ https://trends.google.com/trends/explore?date=all&geo=JP&q=άϩʔεϋοΫ
ͱ͍͑ࣈ
(ผʹάϩʔεϋοΫʹݶΒͣ) ͪΐͬͱ͜͏͍͏ࣈΛ ݟͯΈ͍ͨΜ͚ͩͲɺͱ͍͏ͱ͖
(ผʹάϩʔεϋοΫʹݶΒͣ) ͪΐͬͱ͜͏͍͏ࣈΛ ݟͯΈ͍ͨΜ͚ͩͲɺͱ͍͏ͱ͖ αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ
αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ (ͯͳϒϩά MySQL ͳͷͰͦΕΛલఏʹ)
• SQL Λී௨ʹୟ͘ • ॻࣺͯεΫϦϓτΛॻ͍ͯूܭͨ͠ͷΛεϓϨο υγʔτʹΞοϓϩʔυ • ͦ͏͍͏ͷΛόονͱͯ͠ఆظ࣮ߦ͢Δ • ……
• SQL Λී௨ʹୟ͘ • ෳͷσʔλϕʔεʹ·͕ͨͬͯूܭ͍ͨ͠? • SELECT COUNT(*) ʹ͔͔࣌ؒΔΑ͏ͳςʔϒϧͰͱʹ͔͘ू ܭ͍ͨ͠?
• ूܭΫΤϦΛߴԽ͢ΔͨΊʹՃͰ index ཉ͍͚͠Ͳ ͱΓ͋͑ͣݟͯΈ͍ͨϨϕϧͷूܭͷͨΊʹ ALTER TABLE Ͱ͖Δ? • ྺ࢙తܦҢͰγϦΞϥΠζ͞Εͨσʔλ͕ೖͬͯΔΧϥϜ͕͋Δ ͷΛΫΤϦ͍ͨ͠?
• SQL Λී௨ʹୟ͘ • ୟ͍ͨ݁Ռͷڞ༗ՄࢹԽͱ͔Ͳ͏͢Δ? • ॻࣺͯεΫϦϓτΛॻ͍ͯूܭͨ͠ͷΛεϓϨου γʔτʹΞοϓϩʔυ • ͦ͏͍͏ͷΛόονͱͯ͠ఆظ࣮ߦ͢Δ
• ͦͷεϓϨουγʔτ୭͕ϝϯςφϯε͢Δ? • …………
!!! BigQuery !!! • ͏·͍͍͍҆ۜͷؙ • సૹʹ Embulk Λ͍ɺ DSL
ͷதͰσʔλΛ͔ͯ͠Β όϧΫϩʔσΟϯά͢Δ • ඞཁͳσʔλશ෦ूΊ͖ͯͨΒવ JOIN ͳͲͰ͖Δ • σʔλྔ͕͍ͨͨ͜͠ͱͳ͍[ͲΕ͘Β͍?]ͳΒߟ͑ͳ͍ͱ͍ ͚ͳ͍͜ͱҰͳ͍
ΛϫϯγϣοτͰ͏ • Embulk ͰσʔλૹΔ • 1ճ͖ΓͳΒߟ͑ͳ͍ͱ͍͚ͳ͍͜ͱগͳ͍ • (ఆظ࣮ߦɾߋ৽ͷ͘͠ΈɺΠϯΫϦϝϯλϧϩʔυɺ……) • BigQuery
ͰͳΜΒ͔ूܭ͢Δ • Google εϓϨουγʔτʹΤΫεϙʔτ͢ΔϘλϯ • σʔλϙʔλϧ (Data Studio) ʹಥͬࠐΉ
BigQuery ʹΠϯϙʔτ͠ʹ͍͘ςʔϒϧ • ׂͷׂςʔϒϧΛ͍ʹ͍͘ςʔϒϧ • Ϩίʔυ͕ INSERT ͞Εͨ࣌ࠁɺΛද͢ΧϥϜ͕ແ͍ • ্Ґ֓೦ͷͦ͏͍͏ใΛ
JOIN ͖ͯͯͦ͠ΕͰύʔςΟγϣ ϯ͢Δ? • αʔϏεͷྺ࢙͕ 4000 (≒ 10.96 ) Ҏ্͋Δ • צͰύʔςΟγϣϯͷִؒΛؒҾ͘? • Α͘Α͘ߟ͑ͨΒຊ·ͩύʔςΟγϣχϯάඞཁͳ͍ͱ͔
Google BigQuery ྉۚ • ౦ژϦʔδϣϯͳΒετϨʔδ୯Ձ S3 ΑΓ͍҆ • 0.023USD /
GB / ݄ • ΫΤϦྉۚ 8.55 USD / 1 TB • ࠓͷҝସϨʔτͩͱ͍͍ͩͨ 1 GB ͋ͨΓ 1 ԁऑ • ͪΐͬͱ͍ͨ͠ํͳΒ΄ͱΜͲޡࠩϨϕϧ
͜͜·Ͱͷ·ͱΊ • αʔϏεͷσʔλʹରͯ͠ϔϏʔͳूܭΛͨ͘͠ ͳͬͨͱ͖ͷಀ͛ಓͱͯ͠ͷ Google BigQuery Λ ͝հ͠·ͨ͠
αʔϏεͷσʔλϕʔε͔Βूܭ ϩάϑΝΠϧ܈͔Βूܭ
ͦͷϩάࠓͲ͜ʹ • Ͳ͔͜ͷαʔό্? • ͦΕͦ͜ BigQuery ͱ͔ RedShift ͱ͔? •
Amazon S3 ʹஔ͔Ε͍ͯΔ? • ࣗͷ߹ݟ͔ͨͬͨͷ͕ S3 ʹஔ͔Εͯ ͍ͨ
S3 ʹஔ͍ͯ͋ΔͳΒ Amazon Athena Ͱ • S3 ʹஔ͍ͯ͋ΔϑΝΠϧΛΫΤϦͰ͖ΔϚωʔδυαʔϏε • த
Presto (https://prestodb.io/) • Presto Hive ͷ͕ࣝ͋Δͱ͍͜ͳͤΔͱ͍͏͜ͱ͕ଟ͍ • ϑϧϚωʔδυͳͷͰɺूܭͷͨΊʹετϨʔδͱωοτϫʔΫ ڧΊͷϚγϯΛ༻ҙͨ͠Γɺͱ͍ͬͨ͜ͱΛ͠ͳ͍͍ͯ͘ • ͓ΉΖʹ S3 ͷϑΝΠϧΛࢦఆͯ͠ CREATE TABLE ͯ͠ΫΤ Ϧ͢Δ͚ͩ
• ҙͷϑΥʔϚοτΛύʔεͯ͠ SQL ͰΫΤϦ • CSV/TSV, JSONL, ֤छΧϥϜφ (ྻࢤ) ϑΝΠϧͳͲ
ඪ४αϙʔτ • LTSV ਖ਼نදݱͰύʔεͯ͠ΫΤϦͰ͖Δ • → ਖ਼نදݱͰύʔεͯ͠Λൈ͖ग़ͤΔϑΝΠϧͳ ΒԿͰΫΤϦͰ͖Δ • ύʔε͠ͳ͍͜ͱͰ grep as a service ͱͯ͠׆༻
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
-- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT
EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false') ……
ใ • S3 ʹͲΜͲΜϩάΛอଘ͢Δͱ͖ɺHive ܗࣜͷσΟϨΫτϦͰ อଘͯ͠ΔͱύʔςΟγϣχϯά͕ࣗಈͰߦΘΕͯศར • ྫ: s3://bucket/somelogs/year=2019/month=1/ day=23/engineer_seminar_001.log
• ͦ͏ͳͬͯͳͯ͘ symlink ͰͳΜͱ͔Ͱ͖Δͱ͍͏͋Δ • https://dev.classmethod.jp/cloud/aws/create-athena-table- with-symboliclink/
Amazon Athena ஈ • εΩϟϯͨ͠༰ྔʹରͯ͠՝ۚ (5 USD / 1 TB)
• gzip ѹॖࡁϑΝΠϧͦͷ··ΫΤϦͰ͖Δ • ྻࢤͰͳ͍ϑΝΠϧʹରͯ͠ΫΤϦ͢ΔͳΒɺूܭ ͢ΔΧϥϜΛߜͬͯྉۚઅʹͳΒͳ͍ • ج൫ͱͯ͠׆༻͍ͯ͘͠ײ͡ͳΒɺ ྻࢤϑΝΠϧʹม͢ΔϑΣʔζ͕ͳ͍ͱݫͦ͠͏
͜͜·Ͱͷ·ͱΊ • Amazon Athena Λ͏͜ͱͰɺ S3 ্ʹ͋ΔͳΒ͍͍ͩͨͲΜͳϩάͰͱ͘ʹ લ४උ͍͖ͤͣͳΓूܭͰ͖Δ
• όϧΫϩʔυͯ͠͠·͑ͬͪ͜ͷͷͱ͍͏ ؾ࣋ͪͰരूܭͰ͖Δάοζ • όϧΫϩʔυؾ͕ॏ͍ͱ͍͏ϩάϑΝΠϧͰɺ ϑΝΠϧͷ͋Γ͔Λઃఆ͢Δ͚ͩͰɺ Կ४උΛ͍͖ͤͣͳΓूܭͰ͖Δάοζ • ઌͷ݁ߏ͘͢ղܾͰ͖ΔΑ͏ʹ
• ઌͷΛղܾ͠ଓ͚ͯΔͱී௨ʹج൫ͬΆ͍ ͷ͕ཉ͘͠ͳͬͯ͘Δ͠ɺ͜͏ͳͬͯΔͱྑ͍ ͔ͳͱ͍͏ͷ͕ݟ͑ͯ͘Δ • ج൫͕͋Δͷʹӽͨ͜͠ͱͳ͍ • ͍Ζ͍Ζࢼ͢͜ͱͰσʔλͱͷ͖߹͍ํͷ Πϝʔδ͕ΒΜͩͷྑ͔ͬͨ