Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
310
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
92
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
970
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
130
Reliability of Distributed Systems
meson10
0
230
My TLS was broken
meson10
0
110
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
130
Other Decks in Technology
See All in Technology
「Chatwork」の認証基盤の移行とログ活用によるプロダクト改善
kubell_hr
1
240
タイミーのデータモデリング事例と今後のチャレンジ
ttccddtoki
4
1.3k
20250625 Snowflake Summit 2025活用事例 レポート / Nowcast Snowflake Summit 2025 Case Study Report
kkuv
1
370
OpenHands🤲にContributeしてみた
kotauchisunsun
1
500
Fabric + Databricks 2025.6 の最新情報ピックアップ
ryomaru0825
1
160
ビギナーであり続ける/beginning
ikuodanaka
1
200
Amazon Bedrockで実現する 新たな学習体験
kzkmaeda
2
680
生成AIで小説を書くためにプロンプトの制約や原則について学ぶ / prompt-engineering-for-ai-fiction
nwiizo
4
3.4k
モバイル界のMCPを考える
naoto33
0
350
生成AI開発案件におけるClineの業務活用事例とTips
shinya337
0
180
生成AI時代 文字コードを学ぶ意義を見出せるか?
hrsued
1
730
プロダクトエンジニアリング組織への歩み、その現在地 / Our journey to becoming a product engineering organization
hiro_torii
0
140
Featured
See All Featured
Thoughts on Productivity
jonyablonski
69
4.7k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
A designer walks into a library…
pauljervisheath
207
24k
Building Adaptive Systems
keathley
43
2.6k
How STYLIGHT went responsive
nonsquared
100
5.6k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Side Projects
sachag
455
42k
Designing for humans not robots
tammielis
253
25k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
A better future with KSS
kneath
239
17k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in