Understanding the Basics of Data Analysis

Ee5407f7a79eb620c4fd54c136847b33?s=47 Piyush Verma
November 13, 2017

Understanding the Basics of Data Analysis

Read more on http://blog.oogway.in

Ee5407f7a79eb620c4fd54c136847b33?s=128

Piyush Verma

November 13, 2017
Tweet

Transcript

  1. ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush

    Verma Oogway Consulting
  2. Common thoughts 2

  3. When will my Data become Big Data?

  4. Hive Data Will Save.

  5. How did we reach here? 5

  6. Data :: Business

  7. Data :: Business

  8. Types of Workload

  9. When do I call it Big Enough?

  10. Why bother with Data Engineering? 10

  11. Why do analysis at all?

  12. Descriptive - Historical. - Deterministic. - Inferential. - Managers make

    pretty graphs.
  13. Predictive - Future. - Probabilistic. - Based on Descriptive. -

    This is what armchair critics do.
  14. Prescriptive

  15. Architecture: Round 1 15

  16. What does data look like?

  17. Storage Choice 1

  18. Storage Choice 2

  19. Challenges: Round 1 19

  20. Scaling

  21. Archival Policy

  22. Oh no

  23. Garbage / Purging

  24. All related entities end up in complex joins

  25. All Relationships complicate over Dimension of time

  26. Anatomy 26

  27. Anatomy

  28. Challenges: Round 2 28

  29. Snowflake Schema

  30. Star Schema

  31. De-Duplication

  32. Bloom Filters Cuckoo Filters - Does not exist for sure.

    - May or may not exist.
  33. Slow Changing Dimensions

  34. Batching vs Streaming

  35. Out-of-Order Processing

  36. Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •

    View: Dimension + Measure • Slice, Dice & Rotate
  37. Architecture: Revisited 37

  38. Sample Solution

  39. Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in