Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dbtとBigQueryで始めるData Vault入門
Search
Kazuki Taniguchi
May 10, 2022
Programming
0
2.1k
dbtとBigQueryで始めるData Vault入門
dbt Tokyo Meeup #3の発表内容です
発表のアーカイブはこちらから
https://youtu.be/SYsiRFR2LGw
#dbt_tokyo
Kazuki Taniguchi
May 10, 2022
Tweet
Share
More Decks by Kazuki Taniguchi
See All by Kazuki Taniguchi
経済学者に知ってほしい機械学習 ~反事実モデルによる予測~ / JEA2020 tutorial CFML
kazk1018
3
2k
CFML関連のライブラリの紹介 / cfml #3 libraries
kazk1018
1
250
CFMLの概要と研究動向 / cfml #1 introduction
kazk1018
5
800
Unsupervised Domain Adaptation by Backpropagation
kazk1018
1
300
Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML
kazk1018
5
2k
【devsumi2017】人工知能の研究開発チームが プロダクト・組織をどのように変えたのか
kazk1018
8
3.4k
Other Decks in Programming
See All in Programming
人口ダッシュボード作成講座資料
jo76shin
0
170
PHPerライフをChrome拡張開発でちょっと便利に / PR TIMES x DMM.com
meihei3
0
190
Faster, greener, and happier- why Quarkus should be your next tech stack
hollycummins
0
130
C#でのPlaywrightを使ったE2Eテストの実際
tomokusaba
0
210
ISUCONってなんだか難しそう……!!でも、初めてのISUCONにPHPで挑戦してきました!
kotomin_m
0
260
SmartHRにおけるプロダクトエンジニア/product_engineer_in_smarthr_20240227
saitoryc
5
180
Laravel標準バリデーションでできること
hmb_ok
1
330
Flutterアプリを GitHub Actions & Xcode Cloud で社内配布する / Distribute Flutter apps internally
takasfz
0
130
とにかくHTTP3をライトニングに話す / Anyway, I'll talk to Lightning about HTTP3.
seike460
PRO
0
110
【KMC春合宿2024】実装視点で見るNeural Radiance Fields
runningoutrate
0
130
PHPアプリケーションのスケーラビリティと 信頼性を革新する nginx+ngx_mrubyとGoの融合
pyama86
2
220
Cloud RunとCloud PubSubでサーバレスなデータ基盤2024 with Terraform / Cloud Run and PubSub with Terraform
shinyorke
7
1.7k
Featured
See All Featured
Writing Fast Ruby
sferik
619
59k
Gamification - CAS2011
davidbonilla
76
4.5k
Thoughts on Productivity
jonyablonski
57
3.7k
Building a Modern Day E-commerce SEO Strategy
aleyda
15
6.3k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
101
6.6k
Practical Orchestrator
shlominoach
180
9.6k
Optimizing for Happiness
mojombo
369
69k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
1
1.2k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
272
12k
How to name files
jennybc
62
91k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
185
15k
Raft: Consensus for Rubyists
vanstee
130
6.2k
Transcript
dbtͱBigQueryͰ࢝ΊΔ Data Vaultೖ dbt Tokyo Meetup #3 Kazuki Taniguchi (@Kazk1018)
Introduction • Kazuki Taniguchi (@Kazk1018) • SWE(Data) @ 10X, Inc
• Careers • Data Scientist @ CyberAgent, Inc • Co-founder @ the Babels, Inc • CEO @ ExpData, LLC https:/kazk1018.github.io/
ຊ͓͢Δ͜ͱ • Stailerͷհ • Stailerʹ͓͚ΔσʔλϞσϦϯάͷ՝ • Data Vaultʹ͍ͭͯ • dbtͱBigQueryΛ༻͍ͨData
Vaultʹ͍ͭͯ
Stailer খചࣄۀऀͷσδλϧԽΛ࣮ݱ͢Δͯ͢ͷγεςϜΛϓϥοτϑΥʔ Ϝͱͯ͠ఏڙ ͓٬༷͚ খചࣄۀऀ͚ ૹۀऀ͚
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ൢଅ ૹใ ࡏݿใ ձһใ 4UBJMFS%BUB-BLF
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ใ 4UBJMFS%BUB-BLF খചࣄۀऀA খചࣄۀऀB ҟͳΔϑΥʔϚοτ
Data Vault • σʔλΣΞϋεʹ͓͚ΔσʔλϞσϦϯάख๏ͷҰͭͰ2000 ʹDaniel (Dan) LinstedtʹΑͬͯఏҊ͞Εͨ • 2014ʹఏҊऀͷϒϩάͰData Vault
2.0͕հ͞Εͨ (ຊൃදͰData Vault 2.0ʹج͍ͮͯઆ໌͠·͢)
Business Objects ӦۀੳऀͷϏδωεϢʔβʔ͕ར༻͢ΔΦϒδΣΫτΛϢχʔΫ ʹಛఆͰ͖ΔϏδωεΩʔΛઃܭ͢Δඞཁ͕͋Δ 0CKFDU #VTJOFTT,FZT 6TFS VTFS*%PS&NBJM 1SPEVDU ݩ
൪߸ 4IPQ ళฮ໊PSاۀ໊ ళฮ໊ Ex)
Data Vaultʹ͓͍ͯγεςϜ͕ੜ͢ΔओͳΧϥϜ System Fields 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTILFZ \PCKFDU^@IBTILFZ %8)Ͱར༻͢ΔΩʔ
ϏδωεΩʔ͔ΒϋογϡΛ༻͍ͯܭࢉ͢Δ -PBE%BUF5JNF4UBNQ MPBE@EUT %8)͕ॳΊͯϏδωεΦϒδΣΫτΛ ֬ೝͨ࣌͠ 3FDPSE4PVSDF SFDPSE@TPVSDF ֨ೲ͞Εͨσʔλͷσʔλιʔε໊
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Hub ֤ϏδωεΦϒδΣΫτͷϏδωεΩʔΛอ࣋͢Δςʔϒϧ )VC6TFS VTFS@IBTILFZ VTFS@JE MPBE@EUT SFDPSE@TPVSDF )VC4IPQ TIPQ@IBTILFZ OBNF
MPBE@EUT SFDPSE@TPVSDF )VC1SPEVDU QSPEVDU@IBTILFZ QSPEVDU@OVNCFS MPBE@EUT SFDPSE@TPVSDF
ෳͷϏδωεΦϒδΣΫτͷؔΛอ࣋͢Δςʔϒϧ -JOL0SEFS VTFS@IBTILFZ QSPEVDU@IBTILFZ TIPQ@LFZ MPBE@EUT SFDPSE@TPVSDF Link
Satellite HubLinkΛઆ໌͢ΔͨΊͷɺ͓ΑͼͦͷཤྺΛอ࣋͢Δςʔϒϧ 4BU6TFS VTFS@IBTILFZ fi STU@OBNF MBTU@OBNF MPBE@EUT SFDPSE@TPVSDF 4BU1SPEVDU
QSPEVDU@IBTILFZ OBNF QSJDF MPBE@EUT SFDPSE@TPVSDF 4BU0SEFS PSEFS@IBTILFZ BNPVOU TIJQQJOH@EBUF PSEFS@EBUF MPBE@EUT SFDPSE@TPVSDF
Satellite ͷཤྺΛอ࣋͢Δ(SCD type2)ׂ͕͋ΔͷͰඞཁʹԠͯ࣍͡ͷ System FieldsΛར༻͢Δ 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTI%J f
)BTIEJ f มߋ͞Ε͔ͨͲ͏͔Λൺֱ͢ΔͨΊͷϋογϡ -PBE&OE%BUF5JNF4UBNQ MPBE@FOE@EUT 1,ຖʹ৽͍͕͠ೖ͖ͬͯͨͱ͖ͷ࣌ ಉ͡1,Ͱ࠷৽ͷߦʹ/6--͕ೖ͍ͬͯΔ
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Data Vault Pros • ༷ʑͳσʔλιʔε͕૿͍͑ͯ͘߹Ͱ࠷খݶͷมߋͰ࣮͢Δ͜ ͱ͕ՄೳͰ͋Δ • σʔλؒͷ͕ؔมߋ͞Εͯ༰қʹมߋ͕ՄೳͰ͋Δ • DWHʹ͓͍ͯσʔλιʔεͷ͕ՄೳͰ͋Δ
Data Vault Pros )VC6TFS 4BU6TFS )VC -JOL 4BUFMMJUF 4BU$3. ҟͳΔσʔλιʔεΛՃ͢Δ߹SatelliteΛՃ͢Δ͚ͩͰྑ͍
Data Vault Cons • ຊޠͷใ͕গͳ͍ͷͰӳޠΛಡΊΔඞཁ͕͋Δ • ଞͷσʔλϞσϦϯάʹൺͯൣғ͕͍͜ͱ͋Δ͕ɺߏஙͷͨΊ ʹඞཁͱ͢Δ͕ࣝଟ͍ (ຊൃදͰհͰ͖͍ͯΔ༰جຊతͳ෦ ͚ͩͰ͢)
Data Vault @ 10X dbtͱBigQueryΛ༻͍ͯData VaultΛݕূ͍ͯ͠Δ BigQuery dbt BigQuery
• dbtvault • (ৄࡉޙͷൃදΛ͝ௌߨ͍ͩ͘͞) • ࠓճͷݕূͰௐࠪ·ͰͰ࣮ࡍʹར༻͍ͯ͠·ͤΜ Data Vault using dbt
with BigQuery
Data Vault using dbt with BigQuery • dbtͰͷϑϧεΫϥον • dbtvault͕͋ΔΑ͏ʹςϯϓϨʔτͰSQLΛੜͰ͖Δdbt૬ੑ
͕ඇৗʹྑ͍ • MaterializationͷIncrementalΛ༻͍࣮ͯ͢Δ͜ͱ͕Ͱ͖Δ • (Incrementalʹ͍ͭͯޙͷൃදΛ͝ௌߨ͍ͩ͘͞)
Data Vault using dbt with BigQuery • dbtvaultͰϑϧεΫϥονͰجຊͳ࣮͘Ͱ͖Δ • ARRAYSTRUCTͷѻ͍ʹҙ͢Δ
• Data VaultͰଟ༻͞ΕΔhashdistinct͕ѻ͑ͳ͍ • (dbt snapshotͱ༷ͷͱͯ͠ಉ͡)
(ߋʹৄ͍͠ઃܭৄࡉʹ͍ͭͯԼهͷຊΛࢀߟʹ͍ͯͩ͘͠͞) More Information about Data Vault
Summary • 10Xʹ͓͚ΔDWHߏஙͷ՝ • Data Vaultͷجຊతͳ֓೦ • dbtͱBigQueryΛ༻͍ͨData Vault
References • Books • Building a Scalable Data Warehouse with
Data Vault 2.0 • Articles • A short intro to #datavault 2.0