Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Piyush Verma
November 13, 2017
Technology
360
0
Share
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
130
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.2k
Reliability
meson10
0
180
Reliability of Distributed Systems
meson10
0
300
My TLS was broken
meson10
0
160
Technology that builds Organizations
meson10
0
150
Namespace.go
meson10
0
180
Other Decks in Technology
See All in Technology
Slack MCPでインシデント対応とFAQ生成を加速する:社内ワークショップの実践
lycorptech_jp
PRO
0
290
サプライチェーン攻撃への備えについて考えている #湘なんか
stefafafan
3
2.3k
React Compiler導入の効果と運用の工夫
kakehashi
PRO
3
320
TSKaigi 2026 - Auth.jsからBetter Authへの 移行に見る「型とランタイム」の 設計思想の変化
teamlab
PRO
1
190
ジュニアエンジニアはSREとどう向き合うべきか
nrinetcom
PRO
1
110
社内RAGの導入で気を付けたポイント
yakumo
2
150
"スキルファースト"で作る、AIの自走環境
subroh0508
1
690
既存プロダクトQAから新規プロダクトQAへ
ryotakahashi
0
180
最新技術を"今は選ばない"という技術選定
leveragestech
PRO
0
380
TSKaigi 2026 - 型プラグインシステムの実装に使われるテクニック
teamlab
PRO
1
220
AI時代に求められる思考のパラダイムシフト
nrinetcom
PRO
1
130
AIコーディングエージェントの活用で、コードは静かに肥大化した
yosukeshinoda
1
220
Featured
See All Featured
The Spectacular Lies of Maps
axbom
PRO
1
760
Building Adaptive Systems
keathley
44
3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
How to Think Like a Performance Engineer
csswizardry
28
2.6k
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Faster Mobile Websites
deanohume
310
31k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
180
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
350
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
Mind Mapping
helmedeiros
PRO
1
200
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.4k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in