$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
330
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
110
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
260
My TLS was broken
meson10
0
130
Technology that builds Organizations
meson10
0
130
Namespace.go
meson10
0
150
Other Decks in Technology
See All in Technology
Oracle Cloud Infrastructure:2025年11月度サービス・アップデート
oracle4engineer
PRO
1
120
プロダクトマネジメントの分業が生む「デリバリーの渋滞」を解消するTPMの越境
recruitengineers
PRO
3
460
Docker, Infraestructuras seguras y Hardening
josejuansanchez
0
150
“決まらない”NSM設計への処方箋 〜ビットキーにおける現実的な指標デザイン事例〜 / A Prescription for "Stuck" NSM Design: Bitkey’s Practical Case Study
bitkey
PRO
1
360
Claude Code はじめてガイド -1時間で学べるAI駆動開発の基本と実践-
oikon48
43
26k
ML PM Talk #1 - ML PMの分類に関する考察
lycorptech_jp
PRO
1
550
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
980
原理から解き明かす AIと人間の成長 - Progate BAR
teba_eleven
2
300
生成AI・AIエージェント時代、データサイエンティストは何をする人なのか?そして、今学生であるあなたは何を学ぶべきか?
kuri8ive
2
1.9k
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.7k
AI (LLM) を活用する上で必須級のMCPをAmazon Q Developerで学ぼう / 20251127 Ikuma Yamashita
shift_evolve
PRO
2
100
Playwrightのソースコードに見る、自動テストを自動で書く技術
yusukeiwaki
8
2.9k
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Facilitating Awesome Meetings
lara
57
6.7k
Become a Pro
speakerdeck
PRO
30
5.7k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
69k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
How GitHub (no longer) Works
holman
316
140k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
960
Automating Front-end Workflow
addyosmani
1371
200k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in