Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

Insights can only be as good as the data. The data quality domain is enormously large, so you need to understand your company pain points to know what to focus on first.

https://www.bigdataspain.org/2017/talk/big-data-big-quality

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Big Data Spain

December 05, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. View Slide

  2. Irene Gonzálvez, Product Manager at Spotify
    Big Data,
    Big Quality?

    View Slide

  3. View Slide

  4. Irene Gonzálvez
    Product Manager
    Data Infrastructure

    View Slide

  5. Music Streaming Service
    Launched in 2008
    Premium and Free Tiers
    Available in 61 Countries

    View Slide

  6. Over 140M Monthly
    Active Users

    View Slide

  7. More than 30M Songs

    View Slide

  8. Over 1 billion plays per day

    View Slide

  9. Data enables recommendations, advertising,
    label and artist payments and more
    $ $ $
    $ $ $

    View Slide

  10. Data First

    View Slide

  11. Data of Good Quality First

    View Slide

  12. Data quality problems cost US
    business $600B a year!
    Data Warehouse Institute

    View Slide

  13. View Slide

  14. Data Quality Dimensions
    Timely Correctness
    Completeness Consistency

    View Slide

  15. View Slide

  16. View Slide

  17. DataMon

    View Slide

  18. Data Counters

    View Slide

  19. MetriLab

    View Slide

  20. MetriLab

    View Slide

  21. MetriLab

    View Slide

  22. Data Quality Dimensions
    Timely Correctness
    Completeness Consistency
    Datamon Data Counters
    MetriLab

    View Slide

  23. TC4D: Test Certified for Data
    Level 1: Set-up, monitoring, alerting and documentation
    Level 2: Data management and Unit tests
    Level 3: Build your defenses

    View Slide

  24. What’s next?
    Build an algorithm library for anomaly detection (ML4ALL)
    Provide the infrastructure to ‘plug&play’ more algorithms
    Provide parameter recommendations to tweak the algorithms

    View Slide

  25. What’s next?
    Spotify-wide strategy
    ● Have metrics to understand when a dataset qualifies
    as ‘good’ quality.
    ● Identify which datasets are critical/ central to Spotify
    and make them of ‘good’ quality

    View Slide

  26. Key Takeaways

    View Slide

  27. Lesson #1: Think Big
    Understand your org’s pain points

    View Slide

  28. Lesson #2: Start small
    And start NOW!

    View Slide

  29. Lesson #3: Data Quality is
    not an add-on
    Insights can ONLY be as good as the data

    View Slide

  30. Data will increase
    10x by 2025
    International Data Corp
    1 ZB = 1 trillion GB

    View Slide

  31. 20% 10%
    Critical Data Hypercritical Data

    View Slide

  32. Q&A
    Irene Gonzálvez
    Product Manager,
    Spotify
    [email protected]
    spotify.com

    View Slide