Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

Insights can only be as good as the data. The data quality domain is enormously large, so you need to understand your company pain points to know what to focus on first.

https://www.bigdataspain.org/2017/talk/big-data-big-quality

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

December 05, 2017
Tweet

Transcript

  1. None
  2. Irene Gonzálvez, Product Manager at Spotify Big Data, Big Quality?

  3. None
  4. Irene Gonzálvez Product Manager Data Infrastructure

  5. Music Streaming Service Launched in 2008 Premium and Free Tiers

    Available in 61 Countries
  6. Over 140M Monthly Active Users

  7. More than 30M Songs

  8. Over 1 billion plays per day

  9. Data enables recommendations, advertising, label and artist payments and more

    $ $ $ $ $ $
  10. Data First

  11. Data of Good Quality First

  12. Data quality problems cost US business $600B a year! Data

    Warehouse Institute
  13. None
  14. Data Quality Dimensions Timely Correctness Completeness Consistency

  15. None
  16. None
  17. DataMon

  18. Data Counters

  19. MetriLab

  20. MetriLab

  21. MetriLab

  22. Data Quality Dimensions Timely Correctness Completeness Consistency Datamon Data Counters

    MetriLab
  23. TC4D: Test Certified for Data Level 1: Set-up, monitoring, alerting

    and documentation Level 2: Data management and Unit tests Level 3: Build your defenses
  24. What’s next? Build an algorithm library for anomaly detection (ML4ALL)

    Provide the infrastructure to ‘plug&play’ more algorithms Provide parameter recommendations to tweak the algorithms
  25. What’s next? Spotify-wide strategy • Have metrics to understand when

    a dataset qualifies as ‘good’ quality. • Identify which datasets are critical/ central to Spotify and make them of ‘good’ quality
  26. Key Takeaways

  27. Lesson #1: Think Big Understand your org’s pain points

  28. Lesson #2: Start small And start NOW!

  29. Lesson #3: Data Quality is not an add-on Insights can

    ONLY be as good as the data
  30. Data will increase 10x by 2025 International Data Corp 1

    ZB = 1 trillion GB
  31. 20% 10% Critical Data Hypercritical Data

  32. Q&A Irene Gonzálvez Product Manager, Spotify irene@ spotify.com