Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Piyush Verma
November 13, 2017
Technology
350
0
Share
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
130
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
170
Reliability of Distributed Systems
meson10
0
290
My TLS was broken
meson10
0
150
Technology that builds Organizations
meson10
0
150
Namespace.go
meson10
0
180
Other Decks in Technology
See All in Technology
(きっとたぶん)人材育成や教育のような何かの話
sejima
0
720
"うちにはまだ早い"は本当? ─ 小さく始めるPlatform Engineering入門
harukasakihara
6
520
『生成AI時代のクレデンシャルとパーミッション設計 — Claude Code を起点に』の執筆企画
takuros
3
2.3k
ESP32 IoTを動かしながらメモリ使用量を観測してみた話
zozotech
PRO
0
110
PdM・Eng・QAで進めるAI駆動開発の現在地/aidd-with-pdm-eng-qa
shota_kusaba
0
210
カオナビに Suspenseを導入するまで / The Road to Suspense at kaonavi
kaonavi
1
450
AI-Assisted Contributions and Maintainer Load - PyCon US 2026
pauloxnet
1
110
React 19×Rustツール 進化の「ズレ」を設計で埋める
remrem0090
1
110
20260507-ACL-seminar
satoshi5884
0
110
マンション備え付けのネットワークとLTE回線を組み合わせた ネットワークの安定化の考案
harutiro
1
120
SREの仕事は「壊さないこと」ではなくなった 〜自律化していくシステムに、責任と判断を与えるという価値〜 / 20260515 Naoki Shimada
shift_evolve
PRO
1
140
会社説明資料|株式会社ギークプラス ソフトウェア事業部
geekplus_tech
0
220
Featured
See All Featured
Prompt Engineering for Job Search
mfonobong
0
300
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.4k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
360
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
510
Site-Speed That Sticks
csswizardry
13
1.2k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
780
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
110
HTML-Aware ERB: The Path to Reactive Rendering @ RubyCon 2026, Rimini, Italy
marcoroth
1
36
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
210
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in