ABC of Distributed Data Processing

ABC of Distributed Data Processing

New to Big Data world? Too many frameworks? Let's take one step back to learn what is done, why is it done and how we reached here.

Ee5407f7a79eb620c4fd54c136847b33?s=128

Piyush Verma

July 22, 2017
Tweet

Transcript

  1. ABC of Distributed Data Processing. Achieving Buzzword Compliance. Oogway Consulting

  2. Hive Data Will Save. 2 Oogway Consulting

  3. Why are we Here? 3 Oogway Consulting

  4. Types of Workload 4 Oogway Consulting

  5. Why do analysis at all? 5 Oogway Consulting

  6. Descriptive - Historical. - Deterministic. - Inferential. - Managers make

    pretty graphs. 6 Oogway Consulting
  7. Predictive - Future. - Probabilistic. - Based on Descriptive. -

    This is how pundits predict stocks. 7 Oogway Consulting
  8. Prescriptive 8 Oogway Consulting

  9. Enough. What does this data look like? 9 Oogway Consulting

  10. Storage Choice 1 10 Oogway Consulting

  11. Storage Choice 2 11 Oogway Consulting

  12. Star Schema 12 Oogway Consulting

  13. Snowflake Schema 13 Oogway Consulting

  14. Oh no 14 Oogway Consulting

  15. What are we solving? 15 Oogway Consulting

  16. Challenges 16 Oogway Consulting

  17. Scaling 17 Oogway Consulting

  18. Archival Policy 18 Oogway Consulting

  19. Garbage / Purging 19 Oogway Consulting

  20. All related entities end up in complex joins 20 Oogway

    Consulting
  21. All Relationships complicate over Dimension of time 21 Oogway Consulting

  22. Anatomy 22 Oogway Consulting

  23. One such solution. 23 Oogway Consulting

  24. Thank you. - Piyush Verma - piyush@oogway.in - Twitter: meson10

    24 Oogway Consulting