Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Piyush Verma
November 13, 2017
Technology
0
350
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
270
My TLS was broken
meson10
0
140
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
170
Other Decks in Technology
See All in Technology
AWS CDK「読めるけど書けない」を脱却するファーストステップ
smt7174
3
190
SLI/SLO 導入で 避けるべきこと3選
yagikota
0
110
TypeScript 7.0の現在地と備え方
uhyo
7
1.8k
品質を経営にどう語るか #jassttokyo / Communicating the Strategic Value of Quality to Executive Leadership
kyonmm
PRO
2
570
AIエージェント、 社内展開の前に知っておきたいこと
oracle4engineer
PRO
2
160
システム標準化PMOから ガバメントクラウドCoEへ
techniczna
1
140
猫でもわかるKiro CLI(AI 駆動開発への道編)
kentapapa
0
270
Goのerror型がシンプルであることの恩恵について理解する
yamatai1212
1
240
20260311 技術SWG活動報告(デジタルアイデンティティ人材育成推進WG Ph2 活動報告会)
oidfj
0
370
今のWordPress の制作手法ってなにがあんねん?(改) / What’s the Deal with WordPress Development These Days?
tbshiki
0
510
コンテキスト・ハーネスエンジニアリングの現在
hirosatogamo
PRO
4
500
visionOS 開発向けの MCP / Skills をつくり続けることで XR の探究と学習を最大化
karad
1
820
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
The Curse of the Amulet
leimatthew05
1
10k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
150
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
140
Into the Great Unknown - MozCon
thekraken
40
2.3k
Music & Morning Musume
bryan
47
7.1k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
400
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
WENDY [Excerpt]
tessaabrams
9
36k
From π to Pie charts
rasagy
0
150
Side Projects
sachag
455
43k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in