Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
300
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
87
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
960
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
120
Reliability of Distributed Systems
meson10
0
210
My TLS was broken
meson10
0
94
Technology that builds Organizations
meson10
0
93
Namespace.go
meson10
0
110
Other Decks in Technology
See All in Technology
Zabbixチョットデキルとは!?
kujiraitakahiro
0
130
サーバシステムを無理なくコンテナ移行する際に伝えたい4つのポイント/Container_Happy_Migration_Method
ozawa
1
130
AIエージェントの地上戦 〜開発計画と運用実践 / 2025/04/08 Findy W&Bミートアップ #19
smiyawaki0820
17
4.6k
開発現場とセキュリティ担当をつなぐ脅威モデリング
cloudace
0
140
デザインシステムのレガシーコンポーネントを刷新した話/Design System Legacy Renewal
kaonavi
0
140
モンテカルロ木探索のパフォーマンスを予測する Kaggleコンペ解説 〜生成AIによる未知のゲーム生成〜
rist
4
1.3k
ルートユーザーの活用と管理を徹底的に深掘る
yuobayashi
8
740
Beyond {shiny}: The Future of Mobile Apps with R
colinfay
0
210
OPENLOGI Company Profile
hr01
0
62k
大規模サービスにおける カスケード障害
takumiogawa
3
770
Symfony in 2025: Scaling to 0
fabpot
2
270
FinOps_Demo
tkhresk
0
110
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.2k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
102
19k
Documentation Writing (for coders)
carmenintech
69
4.7k
KATA
mclloyd
29
14k
We Have a Design System, Now What?
morganepeng
51
7.5k
Fireside Chat
paigeccino
37
3.4k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
7
630
Navigating Team Friction
lara
184
15k
Thoughts on Productivity
jonyablonski
69
4.6k
Building a Modern Day E-commerce SEO Strategy
aleyda
39
7.2k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
29
2k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.1k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in