Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding the Basics of Data Analysis

Piyush Verma
November 13, 2017

Understanding the Basics of Data Analysis

Read more on http://blog.oogway.in

Piyush Verma

November 13, 2017
Tweet

More Decks by Piyush Verma

Other Decks in Technology

Transcript

  1. ABC of Distributed Data
    Processing.
    Achieving Buzzword Compliance.
    1
    Piyush Verma
    Oogway Consulting

    View Slide

  2. Common thoughts
    2

    View Slide

  3. When will my Data become Big Data?

    View Slide

  4. Hive Data Will Save.

    View Slide

  5. How did we reach here?
    5

    View Slide

  6. Data :: Business

    View Slide

  7. Data :: Business

    View Slide

  8. Types of Workload

    View Slide

  9. When do I call it Big Enough?

    View Slide

  10. Why bother with Data
    Engineering?
    10

    View Slide

  11. Why do analysis at all?

    View Slide

  12. Descriptive
    - Historical.
    - Deterministic.
    - Inferential.
    - Managers make pretty graphs.

    View Slide

  13. Predictive
    - Future.
    - Probabilistic.
    - Based on Descriptive.
    - This is what armchair critics do.

    View Slide

  14. Prescriptive

    View Slide

  15. Architecture:
    Round 1
    15

    View Slide

  16. What does data look like?

    View Slide

  17. Storage Choice 1

    View Slide

  18. Storage Choice 2

    View Slide

  19. Challenges:
    Round 1
    19

    View Slide

  20. Scaling

    View Slide

  21. Archival Policy

    View Slide

  22. Oh no

    View Slide

  23. Garbage / Purging

    View Slide

  24. All related entities end up in complex joins

    View Slide

  25. All Relationships complicate over Dimension of time

    View Slide

  26. Anatomy
    26

    View Slide

  27. Anatomy

    View Slide

  28. Challenges:
    Round 2
    28

    View Slide

  29. Snowflake Schema

    View Slide

  30. Star Schema

    View Slide

  31. De-Duplication

    View Slide

  32. Bloom Filters
    Cuckoo Filters
    - Does not exist for sure.
    - May or may not exist.

    View Slide

  33. Slow Changing
    Dimensions

    View Slide

  34. Batching vs
    Streaming

    View Slide

  35. Out-of-Order
    Processing

    View Slide

  36. Cubes
    ● Efficiency of Retrieval
    ● Warehouse:Cube :: DB:Table
    ● View: Dimension + Measure
    ● Slice, Dice & Rotate

    View Slide

  37. Architecture:
    Revisited
    37

    View Slide

  38. Sample Solution

    View Slide

  39. Thank you!
    Piyush Verma
    @meson10
    Oogway
    Consulting
    http://oogway.in

    View Slide