Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing Machine Learning Systems in Staging by John Cragg

Shannon
January 23, 2019

Testing Machine Learning Systems in Staging by John Cragg

Testing the qualitative performance of a ML model is a difficult task, usually only done by measuring fluctuations of key business metrics in production. In this session John will discuss his teams attempts to produce a deterministic test of algorithmic performance in Depop's staging environment for their product recommendation system.

Shannon

January 23, 2019
Tweet

More Decks by Shannon

Other Decks in Technology

Transcript

  1. Testing Machine Learning Models In Staging
    John Cragg
    24/01/2019

    View full-size slide

  2. Product
    Recommendations We recommend products to users
    based upon their in app interactions.
    (likes, saves, messages, comments and
    purchases)
    We expect similar users to like similar
    products
    Depop Product Recommendations

    View full-size slide

  3. Testing is
    important!

    View full-size slide

  4. Testing ML
    Models
    It isn’t simple

    View full-size slide

  5. So how is it
    usually done Unit testing
    Monitoring the effects on business metrics
    Validating big data & ML pipelines

    View full-size slide

  6. But wait John,
    that’s too late!
    A | B testing can cause bad user experience

    View full-size slide

  7. Can we do better?

    View full-size slide

  8. Staging needs
    to mirror
    production

    View full-size slide

  9. What are the
    differences
    between
    staging and
    production?
    It’s mock data

    View full-size slide

  10. What are the
    differences
    between
    staging and
    production?
    The data isn’t dynamic

    View full-size slide

  11. What are the
    differences
    between
    staging and
    production? The size of
    the data
    #Products #Users
    Staging ~0.5 mil ~0.5 mil
    Production ~93 mil ~12 mil

    View full-size slide

  12. How did we do
    it?
    Similar users should like similar products
    We partitioned users and products into classes
    U0
    U1
    U2
    U3
    U4
    User Id 30320
    User Id 4356

    View full-size slide

  13. How did we do
    it?
    We created preferences between
    user and product classes
    U0
    P1
    P2
    P3
    P4
    P0
    95%
    50%
    15%
    5%
    1%

    View full-size slide

  14. How did we do
    it? In general we compute the preference probability as

    View full-size slide

  15. User Product
    Interactions
    Lambda
    Thats me!

    View full-size slide

  16. Staging User
    Profile
    Each number in the selling tab, n, is
    a product of class n.

    View full-size slide

  17. User Product
    Interactions
    Lambda
    Interaction events are streamed to the
    datalake via Kinesis.
    The users and product ids, are logged in a
    slack channel for internal use.
    This bit!

    View full-size slide

  18. Staging
    Recommendations

    View full-size slide

  19. So does it work?

    View full-size slide

  20. How can we
    use it? Remember: We want similar users to be recommended similar products.

    View full-size slide

  21. How can we
    use it? V1:

    Original version
    V2: 

    Slightly worse on
    the test data.
    This may or may not
    be okay

    View full-size slide

  22. How can we
    use it? V1:
    Original version
    V3:
    Performs very poorly
    on the test data.
    We should not
    continue with this
    deployment.

    View full-size slide

  23. How can we
    use it? V1:
    Original version
    V4:
    All recommendations
    filtered out. We
    should not continue
    with this
    deployment.

    View full-size slide

  24. Try to test before production
    Create data that reflects production
    Sanity checking models maintains good UX
    Conclusion

    View full-size slide

  25. Any questions?
    https://twitter.com/DepopEng
    https://engineering.depop.com
    We’re hiring
    https://www.depop.com/about/jobs

    View full-size slide