Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
330
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
110
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
260
My TLS was broken
meson10
0
130
Technology that builds Organizations
meson10
0
130
Namespace.go
meson10
0
150
Other Decks in Technology
See All in Technology
Microsoft Agent 365 についてゆっくりじっくり理解する!
skmkzyk
0
370
AI時代の新規LLMプロダクト開発: Findy Insightsを3ヶ月で立ち上げた舞台裏と振り返り
dakuon
0
190
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
410
Python 3.14 Overview
lycorptech_jp
PRO
1
120
マイクロサービスへの5年間 ぶっちゃけ何をしてどうなったか
joker1007
14
6.3k
新 Security HubがついにGA!仕組みや料金を深堀り #AWSreInvent #regrowth / AWS Security Hub Advanced GA
masahirokawahara
1
2.1k
mairuでつくるクレデンシャルレス開発環境 / Credential-less development environment using Mailru
mirakui
5
540
年間40件以上の登壇を続けて見えた「本当の発信力」/ 20251213 Masaki Okuda
shift_evolve
PRO
1
140
SSO方式とJumpアカウント方式の比較と設計方針
yuobayashi
7
690
IAMユーザーゼロの運用は果たして可能なのか
yama3133
1
470
Fashion×AI「似合う」を届けるためのWEARのAI戦略
zozotech
PRO
2
820
GitHub Copilotを使いこなす 実例に学ぶAIコーディング活用術
74th
3
3.4k
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Building Applications with DynamoDB
mza
96
6.8k
Become a Pro
speakerdeck
PRO
31
5.7k
Designing for Performance
lara
610
69k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.5k
Practical Orchestrator
shlominoach
190
11k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
286
14k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in