Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
310
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
94
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
980
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
130
Reliability of Distributed Systems
meson10
0
230
My TLS was broken
meson10
0
110
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
130
Other Decks in Technology
See All in Technology
第64回コンピュータビジョン勉強会「The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition」
x_ttyszk
0
140
american aa airlines®️ USA Contact Numbers: Complete 2025 Support Guide
aaguide
0
470
Amplify Gen2から知るAWS CDK Toolkit Libraryの使い方/How to use the AWS CDK Toolkit Library as known from Amplify Gen2
fossamagna
0
160
ゼロからはじめる採用広報
yutadayo
3
1k
【あのMCPって、どんな処理してるの?】 AWS CDKでの開発で便利なAWS MCP Servers特集
yoshimi0227
6
610
NewSQLや分散データベースを支えるRaftの仕組み - 仕組みを理解して知る得意不得意
hacomono
PRO
3
210
AIの全社活用を推進するための安全なレールを敷いた話
shoheimitani
2
610
TLSから見るSREの未来
atpons
2
200
Claude Code に プロジェクト管理やらせたみた
unson
7
4.9k
アクセスピークを制するオートスケール再設計: 障害を乗り越えKEDAで実現したリソース管理の最適化
myamashii
1
260
推し書籍📚 / Books and a QA Engineer
ak1210
0
120
AWS CDK 開発を成功に導くトラブルシューティングガイド
wandora58
3
150
Featured
See All Featured
Gamification - CAS2011
davidbonilla
81
5.4k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
980
Statistics for Hackers
jakevdp
799
220k
How to Think Like a Performance Engineer
csswizardry
25
1.7k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.7k
RailsConf 2023
tenderlove
30
1.1k
Unsuck your backbone
ammeep
671
58k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in