Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
270
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
86
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
940
Observability and Control Theory
meson10
1
970
Reliability
meson10
0
110
Reliability of Distributed Systems
meson10
0
190
My TLS was broken
meson10
0
87
Technology that builds Organizations
meson10
0
85
Namespace.go
meson10
0
97
Other Decks in Technology
See All in Technology
あなたの知らないクラフトビールの世界
miura55
0
140
Formal Development of Operating Systems in Rust
riru
1
420
PaaSの歴史と、 アプリケーションプラットフォームのこれから
jacopen
7
1.5k
自社 200 記事を元に整理した読みやすいテックブログを書くための Tips 集
masakihirose
2
340
完全自律型AIエージェントとAgentic Workflow〜ワークフロー構築という現実解
pharma_x_tech
0
360
タイミーのデータ活用を支えるdbt Cloud導入とこれから
ttccddtoki
1
280
色々なAWSサービス名の由来を調べてみた
iriikeita
0
110
コロプラのオンボーディングを採用から語りたい
colopl
5
1.3k
2025年のARグラスの潮流
kotauchisunsun
0
820
VPC Block Public AccessとCloudFrontVPCオリジンによって何が変わるのか?
hatahata021
2
100
東京Ruby会議12 Ruby と Rust と私 / Tokyo RubyKaigi 12 Ruby, Rust and me
eagletmt
3
890
生成AI × 旅行 LLMを活用した旅行プラン生成・チャットボット
kominet_ava
0
160
Featured
See All Featured
Understanding Cognitive Biases in Performance Measurement
bluesmoon
27
1.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
174
51k
Bash Introduction
62gerente
610
210k
A Tale of Four Properties
chriscoyier
157
23k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
28
4.5k
Six Lessons from altMBA
skipperchong
27
3.6k
Adopting Sorbet at Scale
ufuk
74
9.2k
Speed Design
sergeychernyshev
25
740
Building Better People: How to give real-time feedback that sticks.
wjessup
366
19k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
It's Worth the Effort
3n
183
28k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in