Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Piyush Verma
November 13, 2017
Technology
0
340
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
270
My TLS was broken
meson10
0
140
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
160
Other Decks in Technology
See All in Technology
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
230
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
180
22nd ACRi Webinar - NTT Kawahara-san's slide
nao_sumikawa
0
100
SRE Enabling戦記 - 急成長する組織にSREを浸透させる戦いの歴史
markie1009
0
130
OpenShiftでllm-dを動かそう!
jpishikawa
0
130
配列に見る bash と zsh の違い
kazzpapa3
3
160
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
480
Digitization部 紹介資料
sansan33
PRO
1
6.8k
OCI Database Management サービス詳細
oracle4engineer
PRO
1
7.4k
Bill One 開発エンジニア 紹介資料
sansan33
PRO
5
17k
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
小さく始めるBCP ― 多プロダクト環境で始める最初の一歩
kekke_n
1
450
Featured
See All Featured
Mobile First: as difficult as doing things right
swwweet
225
10k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
71k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
200
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
130
Utilizing Notion as your number one productivity tool
mfonobong
3
220
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
110
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
94
30 Presentation Tips
portentint
PRO
1
220
ラッコキーワード サービス紹介資料
rakko
1
2.3M
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Game over? The fight for quality and originality in the time of robots
wayneb77
1
120
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in