Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dbtとBigQueryで始めるData Vault入門
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Kazuki Taniguchi
May 10, 2022
Programming
0
3k
dbtとBigQueryで始めるData Vault入門
dbt Tokyo Meeup #3の発表内容です
発表のアーカイブはこちらから
https://youtu.be/SYsiRFR2LGw
#dbt_tokyo
Kazuki Taniguchi
May 10, 2022
Tweet
Share
More Decks by Kazuki Taniguchi
See All by Kazuki Taniguchi
経済学者に知ってほしい機械学習 ~反事実モデルによる予測~ / JEA2020 tutorial CFML
kazk1018
3
2.3k
CFML関連のライブラリの紹介 / cfml #3 libraries
kazk1018
1
290
CFMLの概要と研究動向 / cfml #1 introduction
kazk1018
5
1.1k
Unsupervised Domain Adaptation by Backpropagation
kazk1018
1
460
Counterfactual Machine Learning 入門 / Introduction to Counterfactual ML
kazk1018
5
2.2k
【devsumi2017】人工知能の研究開発チームが プロダクト・組織をどのように変えたのか
kazk1018
8
3.6k
Other Decks in Programming
See All in Programming
AIと一緒にレガシーに向き合ってみた
nyafunta9858
0
250
カスタマーサクセス業務を変革したヘルススコアの実現と学び
_hummer0724
0
730
並行開発のためのコードレビュー
miyukiw
0
510
LLM Observabilityによる 対話型音声AIアプリケーションの安定運用
gekko0114
2
430
AgentCoreとHuman in the Loop
har1101
5
240
[KNOTS 2026登壇資料]AIで拡張‧交差する プロダクト開発のプロセス および携わるメンバーの役割
hisatake
0
290
今から始めるClaude Code超入門
448jp
8
9k
Claude Codeと2つの巻き戻し戦略 / Two Rewind Strategies with Claude Code
fruitriin
0
140
Raku Raku Notion 20260128
hareyakayuruyaka
0
350
CSC307 Lecture 02
javiergs
PRO
1
780
2026年 エンジニアリング自己学習法
yumechi
0
140
Amazon Bedrockを活用したRAGの品質管理パイプライン構築
tosuri13
5
770
Featured
See All Featured
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
94
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
130
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
350
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
150
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.2k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
380
Docker and Python
trallard
47
3.7k
GraphQLとの向き合い方2022年版
quramy
50
14k
Chasing Engaging Ingredients in Design
codingconduct
0
110
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
55
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
Transcript
dbtͱBigQueryͰ࢝ΊΔ Data Vaultೖ dbt Tokyo Meetup #3 Kazuki Taniguchi (@Kazk1018)
Introduction • Kazuki Taniguchi (@Kazk1018) • SWE(Data) @ 10X, Inc
• Careers • Data Scientist @ CyberAgent, Inc • Co-founder @ the Babels, Inc • CEO @ ExpData, LLC https:/kazk1018.github.io/
ຊ͓͢Δ͜ͱ • Stailerͷհ • Stailerʹ͓͚ΔσʔλϞσϦϯάͷ՝ • Data Vaultʹ͍ͭͯ • dbtͱBigQueryΛ༻͍ͨData
Vaultʹ͍ͭͯ
Stailer খചࣄۀऀͷσδλϧԽΛ࣮ݱ͢Δͯ͢ͷγεςϜΛϓϥοτϑΥʔ Ϝͱͯ͠ఏڙ ͓٬༷͚ খചࣄۀऀ͚ ૹۀऀ͚
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ൢଅ ૹใ ࡏݿใ ձһใ 4UBJMFS%BUB-BLF
Our Issues খചࣄۀऀຖʹҟͳΔෳͷγεςϜͷσʔλΛ࿈ܞ͢ΔͨΊʹσʔλ ιʔεͷଟ༷ੑ͕ߴ͍ ใ ใ 4UBJMFS%BUB-BLF খചࣄۀऀA খചࣄۀऀB ҟͳΔϑΥʔϚοτ
Data Vault • σʔλΣΞϋεʹ͓͚ΔσʔλϞσϦϯάख๏ͷҰͭͰ2000 ʹDaniel (Dan) LinstedtʹΑͬͯఏҊ͞Εͨ • 2014ʹఏҊऀͷϒϩάͰData Vault
2.0͕հ͞Εͨ (ຊൃදͰData Vault 2.0ʹج͍ͮͯઆ໌͠·͢)
Business Objects ӦۀੳऀͷϏδωεϢʔβʔ͕ར༻͢ΔΦϒδΣΫτΛϢχʔΫ ʹಛఆͰ͖ΔϏδωεΩʔΛઃܭ͢Δඞཁ͕͋Δ 0CKFDU #VTJOFTT,FZT 6TFS VTFS*%PS&NBJM 1SPEVDU ݩ
൪߸ 4IPQ ళฮ໊PSاۀ໊ ళฮ໊ Ex)
Data Vaultʹ͓͍ͯγεςϜ͕ੜ͢ΔओͳΧϥϜ System Fields 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTILFZ \PCKFDU^@IBTILFZ %8)Ͱར༻͢ΔΩʔ
ϏδωεΩʔ͔ΒϋογϡΛ༻͍ͯܭࢉ͢Δ -PBE%BUF5JNF4UBNQ MPBE@EUT %8)͕ॳΊͯϏδωεΦϒδΣΫτΛ ֬ೝͨ࣌͠ 3FDPSE4PVSDF SFDPSE@TPVSDF ֨ೲ͞Εͨσʔλͷσʔλιʔε໊
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Hub ֤ϏδωεΦϒδΣΫτͷϏδωεΩʔΛอ࣋͢Δςʔϒϧ )VC6TFS VTFS@IBTILFZ VTFS@JE MPBE@EUT SFDPSE@TPVSDF )VC4IPQ TIPQ@IBTILFZ OBNF
MPBE@EUT SFDPSE@TPVSDF )VC1SPEVDU QSPEVDU@IBTILFZ QSPEVDU@OVNCFS MPBE@EUT SFDPSE@TPVSDF
ෳͷϏδωεΦϒδΣΫτͷؔΛอ࣋͢Δςʔϒϧ -JOL0SEFS VTFS@IBTILFZ QSPEVDU@IBTILFZ TIPQ@LFZ MPBE@EUT SFDPSE@TPVSDF Link
Satellite HubLinkΛઆ໌͢ΔͨΊͷɺ͓ΑͼͦͷཤྺΛอ࣋͢Δςʔϒϧ 4BU6TFS VTFS@IBTILFZ fi STU@OBNF MBTU@OBNF MPBE@EUT SFDPSE@TPVSDF 4BU1SPEVDU
QSPEVDU@IBTILFZ OBNF QSJDF MPBE@EUT SFDPSE@TPVSDF 4BU0SEFS PSEFS@IBTILFZ BNPVOU TIJQQJOH@EBUF PSEFS@EBUF MPBE@EUT SFDPSE@TPVSDF
Satellite ͷཤྺΛอ࣋͢Δ(SCD type2)ׂ͕͋ΔͷͰඞཁʹԠͯ࣍͡ͷ System FieldsΛར༻͢Δ 'JFMET $PMVNOOBNF %FTDSJQUJPO )BTI%J f
)BTIEJ f มߋ͞Ε͔ͨͲ͏͔Λൺֱ͢ΔͨΊͷϋογϡ -PBE&OE%BUF5JNF4UBNQ MPBE@FOE@EUT 1,ຖʹ৽͍͕͠ೖ͖ͬͯͨͱ͖ͷ࣌ ಉ͡1,Ͱ࠷৽ͷߦʹ/6--͕ೖ͍ͬͯΔ
Example: e-Commerce )VC6TFS )VC4IPQ )VC1SPEVDU -JOL0SEFS 4BU0SEFS 4BU6TFS 4BU1SPEVDU )VC
-JOL 4BUFMMJUF
Data Vault Pros • ༷ʑͳσʔλιʔε͕૿͍͑ͯ͘߹Ͱ࠷খݶͷมߋͰ࣮͢Δ͜ ͱ͕ՄೳͰ͋Δ • σʔλؒͷ͕ؔมߋ͞Εͯ༰қʹมߋ͕ՄೳͰ͋Δ • DWHʹ͓͍ͯσʔλιʔεͷ͕ՄೳͰ͋Δ
Data Vault Pros )VC6TFS 4BU6TFS )VC -JOL 4BUFMMJUF 4BU$3. ҟͳΔσʔλιʔεΛՃ͢Δ߹SatelliteΛՃ͢Δ͚ͩͰྑ͍
Data Vault Cons • ຊޠͷใ͕গͳ͍ͷͰӳޠΛಡΊΔඞཁ͕͋Δ • ଞͷσʔλϞσϦϯάʹൺͯൣғ͕͍͜ͱ͋Δ͕ɺߏஙͷͨΊ ʹඞཁͱ͢Δ͕ࣝଟ͍ (ຊൃදͰհͰ͖͍ͯΔ༰جຊతͳ෦ ͚ͩͰ͢)
Data Vault @ 10X dbtͱBigQueryΛ༻͍ͯData VaultΛݕূ͍ͯ͠Δ BigQuery dbt BigQuery
• dbtvault • (ৄࡉޙͷൃදΛ͝ௌߨ͍ͩ͘͞) • ࠓճͷݕূͰௐࠪ·ͰͰ࣮ࡍʹར༻͍ͯ͠·ͤΜ Data Vault using dbt
with BigQuery
Data Vault using dbt with BigQuery • dbtͰͷϑϧεΫϥον • dbtvault͕͋ΔΑ͏ʹςϯϓϨʔτͰSQLΛੜͰ͖Δdbt૬ੑ
͕ඇৗʹྑ͍ • MaterializationͷIncrementalΛ༻͍࣮ͯ͢Δ͜ͱ͕Ͱ͖Δ • (Incrementalʹ͍ͭͯޙͷൃදΛ͝ௌߨ͍ͩ͘͞)
Data Vault using dbt with BigQuery • dbtvaultͰϑϧεΫϥονͰجຊͳ࣮͘Ͱ͖Δ • ARRAYSTRUCTͷѻ͍ʹҙ͢Δ
• Data VaultͰଟ༻͞ΕΔhashdistinct͕ѻ͑ͳ͍ • (dbt snapshotͱ༷ͷͱͯ͠ಉ͡)
(ߋʹৄ͍͠ઃܭৄࡉʹ͍ͭͯԼهͷຊΛࢀߟʹ͍ͯͩ͘͠͞) More Information about Data Vault
Summary • 10Xʹ͓͚ΔDWHߏஙͷ՝ • Data Vaultͷجຊతͳ֓೦ • dbtͱBigQueryΛ༻͍ͨData Vault
References • Books • Building a Scalable Data Warehouse with
Data Vault 2.0 • Articles • A short intro to #datavault 2.0