Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
320
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
100
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
990
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
250
My TLS was broken
meson10
0
120
Technology that builds Organizations
meson10
0
120
Namespace.go
meson10
0
140
Other Decks in Technology
See All in Technology
M5製品で作るポン置きセルラー対応カメラ
sayacom
0
140
Oracle Cloud Infrastructure:2025年9月度サービス・アップデート
oracle4engineer
PRO
0
390
PLaMo2シリーズのvLLM実装 / PFN LLM セミナー
pfn
PRO
2
970
SREとソフトウェア開発者の合同チームはどのようにS3のコストを削減したか?
muziyoshiz
1
100
KMP の Swift export
kokihirokawa
0
330
GA technologiesでのAI-Readyの取り組み@DataOps Night
yuto16
0
270
ACA でMAGI システムを社内で展開しようとした話
mappie_kochi
1
250
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
20k
実装で解き明かす並行処理の歴史
zozotech
PRO
1
320
許しとアジャイル
jnuank
1
120
Flaky Testへの現実解をGoのプロポーザルから考える | Go Conference 2025
upamune
1
420
What is BigQuery?
aizack_harks
0
130
Featured
See All Featured
Balancing Empowerment & Direction
lara
4
680
Building Better People: How to give real-time feedback that sticks.
wjessup
368
20k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
KATA
mclloyd
32
15k
How to Think Like a Performance Engineer
csswizardry
27
2k
How to train your dragon (web standard)
notwaldorf
96
6.3k
The Cost Of JavaScript in 2023
addyosmani
53
9k
Optimizing for Happiness
mojombo
379
70k
Rails Girls Zürich Keynote
gr2m
95
14k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.5k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Into the Great Unknown - MozCon
thekraken
40
2.1k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in