Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
180
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
76
Doing SRE the right way - 2
meson10
0
140
Doing SRE the right way
meson10
0
890
Observability and Control Theory
meson10
1
840
Reliability
meson10
0
87
Reliability of Distributed Systems
meson10
0
150
My TLS was broken
meson10
0
62
Technology that builds Organizations
meson10
0
60
Namespace.go
meson10
0
72
Other Decks in Technology
See All in Technology
どうするコスト最適化のトレードオフ
tetsuyaooooo
1
510
日本におけるデータエンジニアリングのこれまでとこれから
foursue
16
4.2k
Next'24 事例セッションの紹介とクラウド資格を活用したキャリア形成について語りMuscle
yasumuusan
1
440
Janus
bkuhlmann
1
490
Postman v10リリース後を振り返る / Looking back at Postman v10 after release
yokawasa
1
160
継続的な改善 x ⾮連続的な進化
sansantech
PRO
3
150
エンジニアのキャリアをちょっと楽しくする3本の軸/Three Pillars to Make an Engineer's Career More Enjoyable
kwappa
0
2.7k
web-application-security
matsuihidetoshi
0
160
Vertex AI を中心に 生成AIのアップデートを共有します
kaz1437
0
300
DevOpsメトリクスとアウトカムの接続にトライ!開発プロセスを通して計測できるメトリクスの活用方法
ham0215
2
230
反実仮想機械学習とは何か
usaito
PRO
11
4.3k
長期間TiDBを使ってきた話 @ 私たちはなぜNewSQLを使うのかTiDB選定5社が語る選定理由と活用LT / Experiences with TiDB Over Time
chibiegg
2
890
Featured
See All Featured
4 Signs Your Business is Dying
shpigford
175
21k
Documentation Writing (for coders)
carmenintech
60
3.9k
Robots, Beer and Maslow
schacon
PRO
155
7.9k
Navigating Team Friction
lara
178
13k
The Pragmatic Product Professional
lauravandoore
25
5.8k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
322
20k
The Language of Interfaces
destraynor
151
23k
Building Effective Engineering Teams - LeadDev
addyosmani
28
1.8k
Large-scale JavaScript Application Architecture
addyosmani
504
110k
Creatively Recalculating Your Daily Design Routine
revolveconf
210
11k
Making the Leap to Tech Lead
cromwellryan
124
8.5k
Bootstrapping a Software Product
garrettdimon
PRO
302
110k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in