Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
260
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
85
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
930
Observability and Control Theory
meson10
1
940
Reliability
meson10
0
94
Reliability of Distributed Systems
meson10
0
180
My TLS was broken
meson10
0
76
Technology that builds Organizations
meson10
0
75
Namespace.go
meson10
0
83
Other Decks in Technology
See All in Technology
AWS Lambdaと歩んだ“サーバーレス”と今後 #lambda_10years
yoshidashingo
1
170
20241120_JAWS_東京_ランチタイムLT#17_AWS認定全冠の先へ
tsumita
2
290
New Relicを活用したSREの最初のステップ / NRUG OKINAWA VOL.3
isaoshimizu
2
610
Terraform未経験の御様に対してどの ように導⼊を進めていったか
tkikuchi
2
450
Introduction to Works of ML Engineer in LY Corporation
lycorp_recruit_jp
0
130
テストコード品質を高めるためにMutation Testingライブラリ・Strykerを実戦導入してみた話
ysknsid25
7
2.6k
AGIについてChatGPTに聞いてみた
blueb
0
130
OCI 運用監視サービス 概要
oracle4engineer
PRO
0
4.8k
SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~
takumiogawa
1
360
インフラとバックエンドとフロントエンドをくまなく調べて遅いアプリを早くした件
tubone24
1
430
OCI Vault 概要
oracle4engineer
PRO
0
9.7k
Adopting Jetpack Compose in Your Existing Project - GDG DevFest Bangkok 2024
akexorcist
0
110
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
229
52k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
28
8.2k
GraphQLとの向き合い方2022年版
quramy
43
13k
Building Flexible Design Systems
yeseniaperezcruz
327
38k
The Art of Programming - Codeland 2020
erikaheidi
52
13k
Designing for humans not robots
tammielis
250
25k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Optimizing for Happiness
mojombo
376
70k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
890
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in