Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
340
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
270
My TLS was broken
meson10
0
140
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
160
Other Decks in Technology
See All in Technology
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
67k
GSIが複数キー対応したことで、俺達はいったい何が嬉しいのか?
smt7174
3
120
2026年はチャンキングを極める!
shibuiwilliam
8
1.8k
いよいよ仕事を奪われそうな波が来たぜ
kazzpapa3
3
340
システムのアラート調査をサポートするAI Agentの紹介/Introduction to an AI Agent for System Alert Investigation
taddy_919
2
1.4k
データ民主化のための LLM 活用状況と課題紹介(IVRy の場合)
wxyzzz
2
580
Bill One 開発エンジニア 紹介資料
sansan33
PRO
4
17k
Tebiki Engineering Team Deck
tebiki
0
23k
AIと新時代を切り拓く。これからのSREとメルカリIBISの挑戦
0gm
0
570
オープンウェイトのLLMリランカーを契約書で評価する / searchtechjp
sansan_randd
3
550
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
190
Featured
See All Featured
Documentation Writing (for coders)
carmenintech
77
5.2k
Crafting Experiences
bethany
1
44
Side Projects
sachag
455
43k
Tell your own story through comics
letsgokoyo
1
800
Reality Check: Gamification 10 Years Later
codingconduct
0
2k
Producing Creativity
orderedlist
PRO
348
40k
How to Talk to Developers About Accessibility
jct
2
120
Technical Leadership for Architectural Decision Making
baasie
1
230
Docker and Python
trallard
47
3.7k
Everyday Curiosity
cassininazir
0
120
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
61
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in