Read more on http://blog.oogway.in
ABC of Distributed DataProcessing.Achieving Buzzword Compliance.1Piyush VermaOogway Consulting
View Slide
Common thoughts2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here?5
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with DataEngineering?10
Why do analysis at all?
Descriptive- Historical.- Deterministic.- Inferential.- Managers make pretty graphs.
Predictive- Future.- Probabilistic.- Based on Descriptive.- This is what armchair critics do.
Prescriptive
Architecture:Round 115
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges:Round 119
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy26
Anatomy
Challenges:Round 228
Snowflake Schema
Star Schema
De-Duplication
Bloom FiltersCuckoo Filters- Does not exist for sure.- May or may not exist.
Slow ChangingDimensions
Batching vsStreaming
Out-of-OrderProcessing
Cubes● Efficiency of Retrieval● Warehouse:Cube :: DB:Table● View: Dimension + Measure● Slice, Dice & Rotate
Architecture:Revisited37
Sample Solution
Thank you!Piyush Verma@meson10OogwayConsultinghttp://oogway.in