×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush Verma Oogway Consulting
Slide 2
Slide 2 text
Common thoughts 2
Slide 3
Slide 3 text
When will my Data become Big Data?
Slide 4
Slide 4 text
Hive Data Will Save.
Slide 5
Slide 5 text
How did we reach here? 5
Slide 6
Slide 6 text
Data :: Business
Slide 7
Slide 7 text
Data :: Business
Slide 8
Slide 8 text
Types of Workload
Slide 9
Slide 9 text
When do I call it Big Enough?
Slide 10
Slide 10 text
Why bother with Data Engineering? 10
Slide 11
Slide 11 text
Why do analysis at all?
Slide 12
Slide 12 text
Descriptive - Historical. - Deterministic. - Inferential. - Managers make pretty graphs.
Slide 13
Slide 13 text
Predictive - Future. - Probabilistic. - Based on Descriptive. - This is what armchair critics do.
Slide 14
Slide 14 text
Prescriptive
Slide 15
Slide 15 text
Architecture: Round 1 15
Slide 16
Slide 16 text
What does data look like?
Slide 17
Slide 17 text
Storage Choice 1
Slide 18
Slide 18 text
Storage Choice 2
Slide 19
Slide 19 text
Challenges: Round 1 19
Slide 20
Slide 20 text
Scaling
Slide 21
Slide 21 text
Archival Policy
Slide 22
Slide 22 text
Oh no
Slide 23
Slide 23 text
Garbage / Purging
Slide 24
Slide 24 text
All related entities end up in complex joins
Slide 25
Slide 25 text
All Relationships complicate over Dimension of time
Slide 26
Slide 26 text
Anatomy 26
Slide 27
Slide 27 text
Anatomy
Slide 28
Slide 28 text
Challenges: Round 2 28
Slide 29
Slide 29 text
Snowflake Schema
Slide 30
Slide 30 text
Star Schema
Slide 31
Slide 31 text
De-Duplication
Slide 32
Slide 32 text
Bloom Filters Cuckoo Filters - Does not exist for sure. - May or may not exist.
Slide 33
Slide 33 text
Slow Changing Dimensions
Slide 34
Slide 34 text
Batching vs Streaming
Slide 35
Slide 35 text
Out-of-Order Processing
Slide 36
Slide 36 text
Cubes ● Efficiency of Retrieval ● Warehouse:Cube :: DB:Table ● View: Dimension + Measure ● Slice, Dice & Rotate
Slide 37
Slide 37 text
Architecture: Revisited 37
Slide 38
Slide 38 text
Sample Solution
Slide 39
Slide 39 text
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in