Slide 1

Slide 1 text

ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush Verma Oogway Consulting

Slide 2

Slide 2 text

Common thoughts 2

Slide 3

Slide 3 text

When will my Data become Big Data?

Slide 4

Slide 4 text

Hive Data Will Save.

Slide 5

Slide 5 text

How did we reach here? 5

Slide 6

Slide 6 text

Data :: Business

Slide 7

Slide 7 text

Data :: Business

Slide 8

Slide 8 text

Types of Workload

Slide 9

Slide 9 text

When do I call it Big Enough?

Slide 10

Slide 10 text

Why bother with Data Engineering? 10

Slide 11

Slide 11 text

Why do analysis at all?

Slide 12

Slide 12 text

Descriptive - Historical. - Deterministic. - Inferential. - Managers make pretty graphs.

Slide 13

Slide 13 text

Predictive - Future. - Probabilistic. - Based on Descriptive. - This is what armchair critics do.

Slide 14

Slide 14 text

Prescriptive

Slide 15

Slide 15 text

Architecture: Round 1 15

Slide 16

Slide 16 text

What does data look like?

Slide 17

Slide 17 text

Storage Choice 1

Slide 18

Slide 18 text

Storage Choice 2

Slide 19

Slide 19 text

Challenges: Round 1 19

Slide 20

Slide 20 text

Scaling

Slide 21

Slide 21 text

Archival Policy

Slide 22

Slide 22 text

Oh no

Slide 23

Slide 23 text

Garbage / Purging

Slide 24

Slide 24 text

All related entities end up in complex joins

Slide 25

Slide 25 text

All Relationships complicate over Dimension of time

Slide 26

Slide 26 text

Anatomy 26

Slide 27

Slide 27 text

Anatomy

Slide 28

Slide 28 text

Challenges: Round 2 28

Slide 29

Slide 29 text

Snowflake Schema

Slide 30

Slide 30 text

Star Schema

Slide 31

Slide 31 text

De-Duplication

Slide 32

Slide 32 text

Bloom Filters Cuckoo Filters - Does not exist for sure. - May or may not exist.

Slide 33

Slide 33 text

Slow Changing Dimensions

Slide 34

Slide 34 text

Batching vs Streaming

Slide 35

Slide 35 text

Out-of-Order Processing

Slide 36

Slide 36 text

Cubes ● Efficiency of Retrieval ● Warehouse:Cube :: DB:Table ● View: Dimension + Measure ● Slice, Dice & Rotate

Slide 37

Slide 37 text

Architecture: Revisited 37

Slide 38

Slide 38 text

Sample Solution

Slide 39

Slide 39 text

Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in