Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
310
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
95
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
980
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
130
Reliability of Distributed Systems
meson10
0
240
My TLS was broken
meson10
0
110
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
130
Other Decks in Technology
See All in Technology
今日からあなたもGeminiを好きになる
subaruhello
1
660
怖くない!GritQLでBiomeプラグインを作ろうよ
pal4de
1
140
ML Pipelineの開発と運用を OpenTelemetryで繋ぐ @ OpenTelemetry Meetup 2025-07
getty708
0
320
2025-07-25 NOT A HOTEL TECH TALK ━ スマートホーム開発の最前線 ━ SOFTWARE
wakinchan
0
180
AI エンジニアの立場からみた、AI コーディング時代の開発の品質向上の取り組みと妄想
soh9834
8
590
Tableau API連携の罠!?脱スプシを夢見たはずが、逆に依存を深めた話
cuebic9bic
2
100
Vision Language Modelと自動運転AIの最前線_20250730
yuyamaguchi
2
750
Expertise as a Service via MCP
yodakeisuke
1
160
DatabricksのOLTPデータベース『Lakebase』に詳しくなろう!
inoutk
0
160
手動からの解放!!Strands Agents で実現する総合テスト自動化
ideaws
3
390
【CEDEC2025】現場を理解して実現!ゲーム開発を効率化するWebサービスの開発と、利用促進のための継続的な改善
cygames
PRO
0
410
Step Functions First - サーバーレスアーキテクチャの新しいパラダイム
taikis
1
280
Featured
See All Featured
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
720
How GitHub (no longer) Works
holman
314
140k
Designing for Performance
lara
610
69k
We Have a Design System, Now What?
morganepeng
53
7.7k
Raft: Consensus for Rubyists
vanstee
140
7k
A better future with KSS
kneath
238
17k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Gamification - CAS2011
davidbonilla
81
5.4k
Side Projects
sachag
455
43k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in