Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
340
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
270
My TLS was broken
meson10
0
140
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
160
Other Decks in Technology
See All in Technology
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
480
AzureでのIaC - Bicep? Terraform? それ早く言ってよ会議
torumakabe
1
580
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
93k
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
2k
生成AIを活用した音声文字起こしシステムの2つの構築パターンについて
miu_crescent
PRO
3
210
Red Hat OpenStack Services on OpenShift
tamemiya
0
120
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
350
SREじゃなかった僕らがenablingを通じて「SRE実践者」になるまでのリアル / SRE Kaigi 2026
aeonpeople
6
2.5k
プロポーザルに込める段取り八分
shoheimitani
1
470
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
12
5.6k
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
220
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
440
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
450
Docker and Python
trallard
47
3.7k
How to Ace a Technical Interview
jacobian
281
24k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.3k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
A designer walks into a library…
pauljervisheath
210
24k
30 Presentation Tips
portentint
PRO
1
220
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
94
BBQ
matthewcrist
89
10k
AI: The stuff that nobody shows you
jnunemaker
PRO
2
260
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in