Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dbtとBigQueryで始めるData Vault入門
Search
Kazuki Taniguchi
May 10, 2022
Programming
0
3k
dbtとBigQueryで始めるData Vault入門
dbt Tokyo Meeup #3の発表内容です
発表のアーカイブはこちらから
https://youtu.be/SYsiRFR2LGw
#dbt_tokyo
Kazuki Taniguchi
May 10, 2022
Tweet
Share
More Decks by Kazuki Taniguchi
See All by Kazuki Taniguchi
経済学者に知ってほしい機械学習 ~反事実モデルによる予測~ / JEA2020 tutorial CFML
kazk1018
3
2.3k
CFML関連のライブラリの紹介 / cfml #3 libraries
kazk1018
1
290
CFMLの概要と研究動向 / cfml #1 introduction
kazk1018
5
1k
Unsupervised Domain Adaptation by Backpropagation
kazk1018
1
450
Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML
kazk1018
5
2.2k
【devsumi2017】人工知能の研究開発チームが プロダクト・組織をどのように変えたのか
kazk1018
8
3.6k
Other Decks in Programming
See All in Programming
実はマルチモーダルだった。ブラウザの組み込みAI🧠でWebの未来を感じてみよう #jsfes #gemini
n0bisuke2
3
1.4k
Denoのセキュリティに関する仕組みの紹介 (toranoana.deno #23)
uki00a
0
240
rack-attack gemによるリクエスト制限の失敗と学び
pndcat
0
210
Unicodeどうしてる? PHPから見たUnicode対応と他言語での対応についてのお伺い
youkidearitai
PRO
0
720
ELYZA_Findy AI Engineering Summit登壇資料_AIコーディング時代に「ちゃんと」やること_toB LLMプロダクト開発舞台裏_20251216
elyza
2
1.1k
Canon EOS R50 V と R5 Mark II 購入でみえてきた最近のデジイチ VR180 事情、そして VR180 静止画に活路を見出すまで
karad
0
140
クラウドに依存しないS3を使った開発術
simesaba80
0
230
コントリビューターによるDenoのすゝめ / Deno Recommendations by a Contributor
petamoriken
0
170
CSC307 Lecture 04
javiergs
PRO
0
640
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
1
370
メルカリのリーダビリティチームが取り組む、AI時代のスケーラブルな品質文化
cloverrose
2
470
ThorVG Viewer In VS Code
nors
0
700
Featured
See All Featured
Amusing Abliteration
ianozsvald
0
87
4 Signs Your Business is Dying
shpigford
187
22k
Why Our Code Smells
bkeepers
PRO
340
58k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
200
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
420
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
40
From π to Pie charts
rasagy
0
120
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
69
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
190
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
Transcript
dbtͱBigQueryͰ࢝ΊΔ Data Vaultೖ dbt Tokyo Meetup #3 Kazuki Taniguchi (@Kazk1018)
Introduction • Kazuki Taniguchi (@Kazk1018) • SWE(Data) @ 10X, Inc
• Careers • Data Scientist @ CyberAgent, Inc • Co-founder @ the Babels, Inc • CEO @ ExpData, LLC https:/kazk1018.github.io/
ຊ͓͢Δ͜ͱ • Stailerͷհ • Stailerʹ͓͚ΔσʔλϞσϦϯάͷ՝ • Data Vaultʹ͍ͭͯ • dbtͱBigQueryΛ༻͍ͨData
Vaultʹ͍ͭͯ
Stailer খചࣄۀऀͷσδλϧԽΛ࣮ݱ͢Δͯ͢ͷγεςϜΛϓϥοτϑΥʔ Ϝͱͯ͠ఏڙ ͓٬༷͚ খചࣄۀऀ͚ ૹۀऀ͚
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ൢଅ ૹใ ࡏݿใ ձһใ 4UBJMFS%BUB-BLF
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ใ 4UBJMFS%BUB-BLF খചࣄۀऀA খചࣄۀऀB ҟͳΔϑΥʔϚοτ
Data Vault • σʔλΣΞϋεʹ͓͚ΔσʔλϞσϦϯάख๏ͷҰͭͰ2000 ʹDaniel (Dan) LinstedtʹΑͬͯఏҊ͞Εͨ • 2014ʹఏҊऀͷϒϩάͰData Vault
2.0͕հ͞Εͨ (ຊൃදͰData Vault 2.0ʹج͍ͮͯઆ໌͠·͢)
Business Objects ӦۀੳऀͷϏδωεϢʔβʔ͕ར༻͢ΔΦϒδΣΫτΛϢχʔΫ ʹಛఆͰ͖ΔϏδωεΩʔΛઃܭ͢Δඞཁ͕͋Δ 0CKFDU #VTJOFTT,FZT 6TFS VTFS*%PS&NBJM 1SPEVDU ݩ
൪߸ 4IPQ ళฮ໊PSاۀ໊ ళฮ໊ Ex)
Data Vaultʹ͓͍ͯγεςϜ͕ੜ͢ΔओͳΧϥϜ System Fields 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTILFZ \PCKFDU^@IBTILFZ %8)Ͱར༻͢ΔΩʔ
ϏδωεΩʔ͔ΒϋογϡΛ༻͍ͯܭࢉ͢Δ -PBE%BUF5JNF4UBNQ MPBE@EUT %8)͕ॳΊͯϏδωεΦϒδΣΫτΛ ֬ೝͨ࣌͠ 3FDPSE4PVSDF SFDPSE@TPVSDF ֨ೲ͞Εͨσʔλͷσʔλιʔε໊
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Hub ֤ϏδωεΦϒδΣΫτͷϏδωεΩʔΛอ࣋͢Δςʔϒϧ )VC6TFS VTFS@IBTILFZ VTFS@JE MPBE@EUT SFDPSE@TPVSDF )VC4IPQ TIPQ@IBTILFZ OBNF
MPBE@EUT SFDPSE@TPVSDF )VC1SPEVDU QSPEVDU@IBTILFZ QSPEVDU@OVNCFS MPBE@EUT SFDPSE@TPVSDF
ෳͷϏδωεΦϒδΣΫτͷؔΛอ࣋͢Δςʔϒϧ -JOL0SEFS VTFS@IBTILFZ QSPEVDU@IBTILFZ TIPQ@LFZ MPBE@EUT SFDPSE@TPVSDF Link
Satellite HubLinkΛઆ໌͢ΔͨΊͷɺ͓ΑͼͦͷཤྺΛอ࣋͢Δςʔϒϧ 4BU6TFS VTFS@IBTILFZ fi STU@OBNF MBTU@OBNF MPBE@EUT SFDPSE@TPVSDF 4BU1SPEVDU
QSPEVDU@IBTILFZ OBNF QSJDF MPBE@EUT SFDPSE@TPVSDF 4BU0SEFS PSEFS@IBTILFZ BNPVOU TIJQQJOH@EBUF PSEFS@EBUF MPBE@EUT SFDPSE@TPVSDF
Satellite ͷཤྺΛอ࣋͢Δ(SCD type2)ׂ͕͋ΔͷͰඞཁʹԠͯ࣍͡ͷ System FieldsΛར༻͢Δ 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTI%J f
)BTIEJ f มߋ͞Ε͔ͨͲ͏͔Λൺֱ͢ΔͨΊͷϋογϡ -PBE&OE%BUF5JNF4UBNQ MPBE@FOE@EUT 1,ຖʹ৽͍͕͠ೖ͖ͬͯͨͱ͖ͷ࣌ ಉ͡1,Ͱ࠷৽ͷߦʹ/6--͕ೖ͍ͬͯΔ
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Data Vault Pros • ༷ʑͳσʔλιʔε͕૿͍͑ͯ͘߹Ͱ࠷খݶͷมߋͰ࣮͢Δ͜ ͱ͕ՄೳͰ͋Δ • σʔλؒͷ͕ؔมߋ͞Εͯ༰қʹมߋ͕ՄೳͰ͋Δ • DWHʹ͓͍ͯσʔλιʔεͷ͕ՄೳͰ͋Δ
Data Vault Pros )VC6TFS 4BU6TFS )VC -JOL 4BUFMMJUF 4BU$3. ҟͳΔσʔλιʔεΛՃ͢Δ߹SatelliteΛՃ͢Δ͚ͩͰྑ͍
Data Vault Cons • ຊޠͷใ͕গͳ͍ͷͰӳޠΛಡΊΔඞཁ͕͋Δ • ଞͷσʔλϞσϦϯάʹൺͯൣғ͕͍͜ͱ͋Δ͕ɺߏஙͷͨΊ ʹඞཁͱ͢Δ͕ࣝଟ͍ (ຊൃදͰհͰ͖͍ͯΔ༰جຊతͳ෦ ͚ͩͰ͢)
Data Vault @ 10X dbtͱBigQueryΛ༻͍ͯData VaultΛݕূ͍ͯ͠Δ BigQuery dbt BigQuery
• dbtvault • (ৄࡉޙͷൃදΛ͝ௌߨ͍ͩ͘͞) • ࠓճͷݕূͰௐࠪ·ͰͰ࣮ࡍʹར༻͍ͯ͠·ͤΜ Data Vault using dbt
with BigQuery
Data Vault using dbt with BigQuery • dbtͰͷϑϧεΫϥον • dbtvault͕͋ΔΑ͏ʹςϯϓϨʔτͰSQLΛੜͰ͖Δdbt૬ੑ
͕ඇৗʹྑ͍ • MaterializationͷIncrementalΛ༻͍࣮ͯ͢Δ͜ͱ͕Ͱ͖Δ • (Incrementalʹ͍ͭͯޙͷൃදΛ͝ௌߨ͍ͩ͘͞)
Data Vault using dbt with BigQuery • dbtvaultͰϑϧεΫϥονͰجຊͳ࣮͘Ͱ͖Δ • ARRAYSTRUCTͷѻ͍ʹҙ͢Δ
• Data VaultͰଟ༻͞ΕΔhashdistinct͕ѻ͑ͳ͍ • (dbt snapshotͱ༷ͷͱͯ͠ಉ͡)
(ߋʹৄ͍͠ઃܭৄࡉʹ͍ͭͯԼهͷຊΛࢀߟʹ͍ͯͩ͘͠͞) More Information about Data Vault
Summary • 10Xʹ͓͚ΔDWHߏஙͷ՝ • Data Vaultͷجຊతͳ֓೦ • dbtͱBigQueryΛ༻͍ͨData Vault
References • Books • Building a Scalable Data Warehouse with
Data Vault 2.0 • Articles • A short intro to #datavault 2.0