Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ整備の優先順位付けに役立つテクニック
Search
nagai shinya
July 11, 2023
5
2.7k
データ整備の優先順位付けに役立つテクニック
nagai shinya
July 11, 2023
Tweet
Share
More Decks by nagai shinya
See All by nagai shinya
Analytics Engineeringチームを立ち上げて学んだこと
__hiza__
4
1.5k
1日50万件貯まるクエリのログを活かして、SQLの生成に挑戦している話
__hiza__
7
1.6k
Analytics Engineeringチームの目標管理
__hiza__
56
32k
データマネジメントがちょっと楽になるBigQuery監査ログの使い方
__hiza__
0
4.8k
レガシー化したdata pipelineの廃止
__hiza__
0
940
メルカリにおける分析環境整備の取り組み
__hiza__
8
7.5k
LookerのDashboardをより柔軟に作る
__hiza__
0
1.5k
Featured
See All Featured
Side Projects
sachag
452
42k
A better future with KSS
kneath
237
17k
Art, The Web, and Tiny UX
lynnandtonic
296
20k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
130k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
225
22k
Debugging Ruby Performance
tmm1
73
12k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
126
18k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
In The Pink: A Labor of Love
frogandcode
139
22k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
What's in a price? How to price your products and services
michaelherold
243
11k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3.5k
Transcript
1 σʔλඋͷ༏ઌॱҐ͚ʹཱͭςΫχοΫ 2023/07/11 Nagai Shinya (@__hiza__)
2 • ӬҪ৳ (@__hiza__) • גࣜձࣾϝϧΧϦ / BI Product Team
ॴଐ • Analystʹཱ͍ۙͰੳڥͷඋΛਐΊ͍ͯ·͢ ൃදऀ
3 σʔλඋΛߦ͏ʹ͋ͨͬͯͷ༏ઌॱҐ͚ʹཱͭςΫχοΫ • σʔλඋʹͱͬͯ༏ઌॱҐ͚ॏཁɻ • ใͷूΊํ ◦ ఆྔతͳใΛूΊΔ (ࠪϩάͷੳ) ◦
ఆੑతͳใΛूΊΔ (ώΞϦϯά) ◦ σʔλ͕ɺͲͷۀʹΘΕ͍ͯΔͷ͔? ͦͷۀͲΕ͘Β͍ॏཁͳͷ͔? ࠓͷςʔϚ
4 ϝϧΧϦͷσʔλ׆༻ঢ়گ ར༻ऀ͕ଟ͘ɺ༻్͕෯͍ ར༻ऀ 900໊+ / ݄ σʔληοτ 1500+ ༻్
σʔλੳɺMLɺϚʔέςΟϯάɺΧελϚʔα ϙʔτͳͲ ͪͳΈʹج൫ͱͯ͠BigQuery / dbt / LookerͳͲΛ༻ɻ
5 σʔλඋͷ՝ : ༏ઌॱҐͷඞཁੑ • ࣮ࢪ͍ͨ͠උ ◦ ੳ͍͢͠தؒςʔϒϧ࡞ΓɺLookerͷඋɺσʔλʹର͢ΔςετɺσʔλΧλϩά Λ࡞ΓࠐΉ etc…
• Ϧιʔεͷ੍ ◦ 900໊×1500σʔληοτʹରͯ͠ҰʹඋͰ͖ͳ͍ɻ ◦ ࡞ͬͨͷʹϝϯςφϯε͕͏ͷͰɺશͯʹରͯ͠උΛߦ͏͖Ͱແ͍ɻ ◦ ༏ઌॱҐ͚͕ඞཁɻ શͯͷςʔϒϧΛҰʹඋ͢Δ͜ͱͰ͖ͳ͍ͨΊ༏ઌॱҐ͚͕ඞཁ
6 • ࣄྫ : Looker Explorerͷඋ ◦ ಛʹॏཁͳ4ͭͷfactςʔϒϧʹରͯ͠Looker ExploreΛඋɻ ◦
1500+σʔληοτͷதͰͨͬͨ4ͭɻ • 4ͭͷfactςʔϒϧ͕ͩར༻֦େ ◦ ؒͰར༻Ϣʔβʔ͕40໊ɺ30νʔϜ΄Ͳʹɻ ◦ είʔϓΛߜͬͯͪΌΜͱʹཱͬͯΔɻ ༏ઌॱҐ͚ͷࣄྫ దͳ༏ઌॱҐ͚σʔλඋͷίετΛܶతʹݮΒͯ͘͠ΕΔ
7 1. ఆྔతͳใΛूΊΔ (audit logͷੳ) ◦ ςʔϒϧ͝ͱʹԿਓ͕ɺԿճ͘Β͍ࢀরͨ͠ͷ͔ௐΔɻ ◦ ॴଐνʔϜใͱͷΫϩεूܭɻ 2.
ఆੑతͳใΛूΊΔ (ࣾͷώΞϦϯά) ◦ σʔλΛͬͯԿΛ͍ͯ͠Δͷ͔ฉ͖औΔɻ ◦ ར༻ྔগͳ͍͕ॏཁͳϢʔεέʔεΛฉ͖औΔɻ 3. ༏ઌॱҐΛ͚Δ ◦ ͲͷσʔλΛ୭͕Կʹ͍ͬͯΔͷ͔ɺͲ͏͍͏Ռʹ݁ͼ͍͍ͭͯΔͷ͔ཧ → ༏ઌॱҐΛܾΊΔɻ ༏ઌॱҐ͚ͷେ·͔ͳεςοϓ ϩάௐࠪɺώΞϦϯάͰใΛूΊɺձࣾશମͷ༏ઌΛݩʹ༏ઌॱҐ͚
8 ఆྔใͷੳᶃ ςʔϒϧຖͷඃࢀরྔͷௐࠪˠ୯७ʹར༻ྔ͕ଟ͍ςʔϒϧ͕͔Δ ࠪϩά (BigQueryͷjobs_by_organizationͳͲ)͔Βɺςʔ ϒϧ͝ͱͷඃࢀরྔΛௐΔɻ ϝϧΧϦͷ߹ɺBQϢʔβʔͷ1ׂҎ্͕ࢀর͢Δςʔϒϧ 1500σʔληοτͷ40ςʔϒϧ΄Ͳʹ͗͢ͳ͔ͬͨɻ
9 ఆྔใͷੳᶄ ॴଐใͱͷΫϩεूܭˠಛఆͷνʔϜʹͱͬͯྑ͘͏σʔλ͕͔Δ ͋Δςʔϒϧʹରͯ͠ɺॴଐνʔϜ͝ ͱʹɺΞΫηεͨ͠ྻͷใΛௐࠪɻ ҹͷྻʮଞͷνʔϜ͋·Γͬ ͯͳ͍͕Team D͚ͩྑ͍ͬͯ͘ Δʯࣄ͕͔Δɻ શମͷྔ͔Βݟ͑ͳ͔ͬͨॏཁੑ͕
ݟ͑ͯ͘Δɻ
10 ఆੑใͷੳᶃ ࣮ࡍͷར༻ऀͷฉ͖औΓˠྔগͳ͍͕ॏཁͳϢʔεέʔεͷѲ • ฉ͖औΓͷେ·͔ͳྲྀΕ ◦ ఆྔใ͔ΒɺσʔλΛར༻͍ͯ͠ΔओͳνʔϜΛϦετΞοϓɻ ◦ ͦΕͧΕͷνʔϜʹରͯ͠ώΞϦϯάΛߦͬͯใΛ·ͱΊΔɻ •
ώΞϦϯάͷ༰ ◦ ྔগͳ͍͚Ͳॏཁͳ༻్Λฉ͖औΔɻ ▪ ྫ : 2໊͔ͬͯ͠ͳ͍͠ɺ1࢛ظʹ1ճ͔͍ͬͯ͠ͳ͍͕ɺܾࢉൃදʹඞཁͳ KPIΛूܭ͍ͯ͠Δɻ
11 searchϩάͱߪങϩάΛඥ ͚ͮͯੳ͍ͯ͠Δɻ ఆੑใͷੳᶄ • σʔλͰͲΜͳۀΛ͍ͯ͠Δͷ͔? ͦͷۀձࣾશମͷՌʹͲ͏݁ͼ͍͍ͭͯΔͷ͔ฉ͖औΔɻ ࣮ࡍͷར༻ऀͷฉ͖औΓˠϢʔεέʔεͱతͷௐࠪ σʔλ ۀ
Ռ searchͷΞϧΰϦζϜมߋ ͰߪങCVR͕ͲΕ͘Β͍ม ΘΔ͔ABςετ͍ͨ͠ɻ ཉ͍͕͠ݟ͔ͭΓ͢ ͘ͳΔ͜ͱͰɺ͓٬͞· ങ͍͕͘͢͠ͳΔ͠ɺ ձࣾͷऩӹ্͕͢Δɻ ྫ ʮͰɺऩӹͷ্ͱ͍͏؍Ͱ Ͳͷۀͷσʔλͷඋ͕࠷ޮ Ռతͳͷ͔?ʯͱൺֱͰ͖Δɻ ۀ͕ࢦ͍ͯ͠ΔՌ(త)·Ͱ Ѳͯ͠͡Ίͯ༏ઌॱҐ͚͕ Մೳʹɻ
12 ՌΛஅ͢Δ࣌ʹཱͭࢹ • ʮՌ৫ͷ֎෦ʹ͔͋͠Γ͑ͳ͍ʯby ϐʔλʔɾυϥοΧʔ ◦ ސ٬Ձ͕࣮ݱ͢Δͷձࣾͷ֎ɺࣄۀརӹ͕࣮ݱ͢Δͷձࣾͷ֎ɻ ◦ ձࣾͷ֎ʹ·ͰΠϯύΫτ͕ग़ͤͯॳΊͯʮՌʯ ◦
ͦͷσʔλΛඋ͢Δ͜ͱͰɺۀʹͲ͏ཱ͔ͭ? ͚ͩͰͳ͘ɺͦͷۀ͕ྑ͘ͳΔ͜ ͱͰɺձࣾͷ֎ʹͲΜͳΠϯύΫτΛग़ͤΔ͔? ͱ͍͏ࢹ͕େࣄɻ ͦͷۀʹऔΓΉ͜ͱͰɺձࣾͷ֎ʹͲΜͳΠϯύΫτ͕ग़ͤΔ͔?
13 • σʔλΛඋ͢Δʹ͋ͨͬͯ༏ઌॱҐ͚͕ඞཁɻ • ͦͷͨΊʹࠪϩάͷੳͱώΞϦϯάཱ͕ͭɻ ◦ ࠪϩά ▪ ୯७ʹར༻ྔ͕ଟ͍Ϣʔεέʔε͕͔Δɻ ▪
ͩΕʹώΞϦϯάʹߦ͘ͱྑͦ͞͏͔͋ͨΓ͕͘ɻ ◦ ώΞϦϯά ▪ ྔʹදΕ͍ͯͳ͍ॏཁͳϢʔεέʔε͕͔Δɻ ▪ ͦΕͧΕͷσʔλΛͲΜͳۀʹ͍ͬͯΔͷ͔͔Δɻ • ༏ઌॱҐΛܾΊΔ ◦ ʮσʔλˠۀˠՌʯͷྲྀΕΛཧղͯ͠͡Ίͯ༏ઌॱҐ͕ܾΊΒΕΔΑ͏ʹͳΔɻ ◦ σʔλͷඋ͢Δਓɺձ͕࣮ࣾݱ͖͢ՌԿ͔? Λ͍ɺܾΊΔඞཁ͕͋Δɻ ·ͱΊ