Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
230
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
77
Doing SRE the right way - 2
meson10
0
140
Doing SRE the right way
meson10
0
910
Observability and Control Theory
meson10
1
870
Reliability
meson10
0
91
Reliability of Distributed Systems
meson10
0
160
My TLS was broken
meson10
0
73
Technology that builds Organizations
meson10
0
64
Namespace.go
meson10
0
75
Other Decks in Technology
See All in Technology
ペパボのオブザーバビリティ研修2024 説明資料
kesompochy
0
1.1k
開発と事業を繋ぐ!SREのオブザーバビリティ戦略 ~ Developers Summit 2024 Summer ~
leveragestech
0
620
GoとアクターモデルでES+CQRSを実践! / proto_actor_es_cqrs
ytake
1
150
LINE WORKSへ簡単通知!Incoming Webhookアプリの紹介
mmclsntr
0
110
コンテナ・K8s研修 - 後半 Kubernetes 基礎&ハンズオン【MIXI 24新卒技術研修】
mixi_engineers
PRO
1
120
JBUG岡山 #6 WordCamp男木島の チームビルディング
takeshifurusato
0
150
AIエージェントを現場に導入する目線とは
masahiro_nishimi
1
1.5k
AWSサービスメニュー開発をしていてAWSを好きだ!と感じた瞬間
toru_kubota
0
130
公共領域から学ぶ クラウド移行についてエンジニアが意識していること
kawakawa2222
0
140
Classmethod流のPlatform Engineering / classmethod-platform-engineering-devio2024
tomoki10
0
470
LLMアプリケーションの評価の実践と課題 ~PharmaXにおける今後の展望~
pharma_x_tech
2
160
プレイドにおけるDatadog APMの活用方法
plaidtech
PRO
2
120
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
24
1.8k
Rails Girls Zürich Keynote
gr2m
93
13k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
44
4.7k
Building Your Own Lightsaber
phodgson
101
5.9k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
662
120k
VelocityConf: Rendering Performance Case Studies
addyosmani
321
23k
Clear Off the Table
cherdarchuk
89
320k
RailsConf 2023
tenderlove
16
720
Designing for humans not robots
tammielis
247
25k
How to Ace a Technical Interview
jacobian
274
23k
The Mythical Team-Month
searls
217
43k
Facilitating Awesome Meetings
lara
46
5.8k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in