Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
320
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
99
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
980
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
250
My TLS was broken
meson10
0
120
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
140
Other Decks in Technology
See All in Technology
JTCにおける内製×スクラム開発への挑戦〜内製化率95%達成の舞台裏/JTC's challenge of in-house development with Scrum
aeonpeople
0
240
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
230
20250913_JAWS_sysad_kobe
takuyay0ne
2
230
スマートファクトリーの第一歩 〜AWSマネージドサービスで 実現する予知保全と生成AI活用まで
ganota
2
220
AWSで始める実践Dagster入門
kitagawaz
1
630
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
250
Terraformで構築する セルフサービス型データプラットフォーム / terraform-self-service-data-platform
pei0804
1
180
Generative AI Japan 第一回生成AI実践研究会「AI駆動開発の現在地──ブレイクスルーの鍵を握るのはデータ領域」
shisyu_gaku
0
300
AIエージェントで90秒の広告動画を制作!台本・音声・映像・編集をつなぐAWS最新アーキテクチャの実践
nasuvitz
0
190
Webアプリケーションにオブザーバビリティを実装するRust入門ガイド
nwiizo
7
860
「何となくテストする」を卒業するためにプロダクトが動く仕組みを理解しよう
kawabeaver
0
420
Modern Linux
oracle4engineer
PRO
0
110
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
113
20k
Large-scale JavaScript Application Architecture
addyosmani
513
110k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Agile that works and the tools we love
rasmusluckow
330
21k
Navigating Team Friction
lara
189
15k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Music & Morning Musume
bryan
46
6.8k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Become a Pro
speakerdeck
PRO
29
5.5k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
The Language of Interfaces
destraynor
161
25k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in