Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
Understanding the Basics of Data Analysis
Piyush Verma
November 13, 2017
Technology
0
100
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
meson10
0
41
meson10
0
67
meson10
0
190
meson10
0
280
meson10
0
37
meson10
0
48
meson10
0
17
meson10
0
22
meson10
0
43
Other Decks in Technology
See All in Technology
khrd
1
660
sansandsoc
0
470
benzookapi
1
440
dena_tech
0
170
legalforce
PRO
4
320
harshbothra
0
150
aditya45
2
2.4k
clustervr
0
250
eller86
1
240
gobeyond20xx
0
350
900groove
2
530
shoichiron
1
150
Featured
See All Featured
chrislema
231
16k
jeffersonlam
329
15k
qrush
285
19k
notwaldorf
16
1.8k
garrettdimon
288
110k
bermonpainter
342
26k
robhawkes
52
2.8k
holman
461
280k
carmenhchung
31
1.5k
gr2m
83
11k
orderedlist
PRO
328
36k
searls
204
36k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in