Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
320
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
98
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
980
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
240
My TLS was broken
meson10
0
120
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
130
Other Decks in Technology
See All in Technology
o11yツールを乗り換えた話
tak0x00
2
1.7k
20250818_KGX・One Hokkaidoコラボイベント
tohgeyukihiro
0
130
MySQL HeatWave:サービス概要のご紹介
oracle4engineer
PRO
4
1.6k
JOAI発表資料 @ 関東kaggler会
joai_committee
1
170
AWSの最新サービスでAIエージェント構築に楽しく入門しよう
minorun365
PRO
9
540
Telemetry APIから学ぶGoogle Cloud ObservabilityとOpenTelemetryの現在 / getting-started-telemetry-api-with-google-cloud
k6s4i53rx
0
170
生成AI利用プログラミング:誰でもプログラムが書けると 世の中どうなる?/opencampus202508
okana2ki
0
180
ABEMAにおける 生成AI活用の現在地 / The Current Status of Generative AI at ABEMA
dekatotoro
0
550
メルカリIBIS:AIが拓く次世代インシデント対応
0gm
2
490
アカデミーキャンプ 2025 SuuuuuuMMeR「燃えろ!!ロボコン」 / Academy Camp 2025 SuuuuuuMMeR "Burn the Spirit, Robocon!!" DAY 1
ks91
PRO
0
150
文字列の並び順 / String Collation
tmtms
1
120
AIと描く、未来のBacklog 〜プロジェクト管理の次の10年を想像し、創造するセッション〜
hrm_o25
0
120
Featured
See All Featured
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
1k
Statistics for Hackers
jakevdp
799
220k
Facilitating Awesome Meetings
lara
55
6.5k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.6k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
4 Signs Your Business is Dying
shpigford
184
22k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Six Lessons from altMBA
skipperchong
28
4k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in