Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
330
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
100
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
260
My TLS was broken
meson10
0
130
Technology that builds Organizations
meson10
0
120
Namespace.go
meson10
0
150
Other Decks in Technology
See All in Technology
お試しで oxlint を導入してみる #vuefes_aftertalk
bengo4com
2
1.4k
自己的售票系統自己做!
eddie
0
430
決済システムの信頼性を支える技術と運用の実践
ykagano
0
460
クレジットカードの不正を防止する技術
yutadayo
13
6.3k
AI時代に必要なデータプラットフォームの要件とは by @Kazaneya_PR / 20251107
kazaneya
PRO
4
960
エンジニア採用と 技術広報の取り組みと注力点/techpr1112
nishiuma
0
130
Black Hat USA 2025 Recap ~ クラウドセキュリティ編 ~
kyohmizu
0
510
今日から使える AWS Step Functions 小技集 / AWS Step Functions Tips
kinunori
7
650
コミュニティと共に変化する 私とFusicの8年間
ayasamind
0
450
AIエージェントは「使う」だけじゃなくて「作る」時代! 〜最新フレームワークで楽しく開発入門しよう〜
minorun365
10
1.6k
開発者から見たLLMの進化 202511
ny7760
1
170
Datadog On-Call と Cloud SIEM で作る SOC 基盤
kuriyosh
0
160
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
432
66k
Rails Girls Zürich Keynote
gr2m
95
14k
The Language of Interfaces
destraynor
162
25k
Embracing the Ebb and Flow
colly
88
4.9k
How to train your dragon (web standard)
notwaldorf
97
6.4k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Optimizing for Happiness
mojombo
379
70k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Docker and Python
trallard
46
3.6k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in