Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
330
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
100
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
990
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
250
My TLS was broken
meson10
0
120
Technology that builds Organizations
meson10
0
120
Namespace.go
meson10
0
150
Other Decks in Technology
See All in Technology
カンファレンスに託児サポートがあるということ / Having Childcare Support at Conferences
nobu09
1
600
Git in Team
kawaguti
PRO
3
380
リセラー企業のテクサポ担当が考える、生成 AI 時代のトラブルシュート 2025
kazzpapa3
1
360
データ戦略部門 紹介資料
sansan33
PRO
1
3.8k
Claude Codeを駆使した初めてのiOSアプリ開発 ~ゼロから3週間でグローバルハッカソンで入賞するまで~
oikon48
10
4.7k
ソースを読むプロセスの例
sat
PRO
15
9k
LLMプロダクトの信頼性を上げるには?LLM Observabilityによる、対話型音声AIアプリケーションの安定運用
ivry_presentationmaterials
0
270
『バイトル』CTOが語る! AIネイティブ世代と切り拓くモノづくり組織
dip_tech
PRO
1
130
サイバーエージェント流クラウドコスト削減施策「みんなで金塊堀太郎」
kurochan
4
2k
AIツールでどこまでデザインを忠実に実装できるのか
oikon48
6
3.5k
Performance Insights 廃止から Database Insights 利用へ/transition-from-performance-insights-to-database-insights
emiki
0
300
スタートアップにおけるこれからの「データ整備」
shomaekawa
2
490
Featured
See All Featured
Building a Scalable Design System with Sketch
lauravandoore
463
33k
A designer walks into a library…
pauljervisheath
209
24k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
GraphQLとの向き合い方2022年版
quramy
49
14k
Documentation Writing (for coders)
carmenintech
75
5.1k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Speed Design
sergeychernyshev
32
1.2k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Faster Mobile Websites
deanohume
310
31k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in