Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reclaim your time: Automating Canary Analysis

Reclaim your time: Automating Canary Analysis

Automating canary analysis can help you deploy more safely and with more confidence. You can reclaim the time you would have spent verifying deploys or handling incidents. Learn how to use statistical methods to automate the analysis of your canaries.

megankanne

June 05, 2018
Tweet

Other Decks in Programming

Transcript

  1. Automating Canary
    Analysis
    @megankanne
    June 2018
    Reclaim your time
    #autocanary • http://bit.ly/autocanary
    by cjaphotography on Flickr

    View full-size slide

  2. Photo by Paul Fisher in SOMA

    View full-size slide

  3. by quintanomedia on Flickr

    View full-size slide

  4. Reclaim your time
    verification
    reliability
    safety
    confidence
    #autocanary

    View full-size slide

  5. “We found out when
    customers
    complained”
    “We caught it before it
    caused any issues”
    1 3
    “We got alerted
    about it”
    2
    Is my build “healthy”?
    #autocanary

    View full-size slide

  6. 1913
    John Scott
    Haldane
    Wikipedia
    George McCaa, U.S. Bureau of Mines

    View full-size slide

  7. definition
    A partial
    deployment
    of new code
    by quimby on Flickr
    by midom on Flickr

    View full-size slide

  8. Portion of Production
    production
    canary
    #autocanary

    View full-size slide

  9. by gt_hawk63 on Flickr

    View full-size slide

  10. Canary Cluster
    production
    cluster
    canary
    cluster
    #autocanary

    View full-size slide

  11. Canary Cluster
    production
    cluster
    canary
    cluster
    proxy
    #autocanary

    View full-size slide

  12. Canary Cluster
    production
    cluster
    canary
    cluster
    proxy
    #autocanary

    View full-size slide

  13. Canary Cluster v2
    production
    cluster
    canary cluster
    proxy
    prod build candidate
    #autocanary

    View full-size slide

  14. Visual Pattern
    Matching
    #autocanary

    View full-size slide

  15. Tap
    Compare
    response A response B
    ==
    #autocanary

    View full-size slide

  16. Machine
    Learning
    by clintadair on Unsplash
    #autocanary

    View full-size slide

  17. #statistics
    #autocanary

    View full-size slide

  18. “Does This Shard
    Look Like The
    Others?”
    #autocanary

    View full-size slide

  19. horizontally offset

    Ex: memory used
    maxPercentile
    MAD
    Do
    metrics that group 

    Ex: success rates, latencies
    Don’t
    median absolute deviation
    From NIST.gov
    (A)
    (B)
    (C)

    View full-size slide

  20. toleranceFactor
    DBSCAN
    oscillate but don’t group

    Ex: memory used
    Do metrics that group 

    Ex: success rates, latencies
    Don’t
    density-based spatial clustering of
    applications with noise
    from hdbscan docs
    #autocanary

    View full-size slide

  21. minSimilarShardsPercent
    HDBSCAN
    oscillate but don’t group

    Ex: memory used
    Do metrics that group 

    Ex: success rates, latencies
    Don’t
    hierarchical dbscan
    from hdbscan docs
    #autocanary

    View full-size slide

  22. tolerance
    Mann-Whitney
    U Test
    confidenceLevel
    direction
    oscillate but don’t group

    Ex: memory used
    Do metrics that group 

    Ex: success rates, latencies
    Don’t
    Kayenta
    #autocanary

    View full-size slide

  23. Success Story
    #autocanary

    View full-size slide

  24. #Tips
    Simplify
    Configuration
    #autocanary

    View full-size slide

  25. #Tips
    Choosing Metrics
    #autocanary

    View full-size slide

  26. #Tips
    User Trust
    #autocanary

    View full-size slide

  27. …but you told me the build was fine?!
    #autocanary

    View full-size slide

  28. Per Pull Request
    Other Uses
    #autocanary

    View full-size slide

  29. Examples
    Kayenta
    Twitter
    #autocanary

    View full-size slide

  30. Twitter For more info on workflows see https://www.youtube.com/watch?v=w36TOkuyAVc
    (un)block
    deploy
    get metrics analyze
    clean

    View full-size slide

  31. Future Work
    better statistics
    no config
    scale to all metrics
    per pull request
    #autocanary

    View full-size slide

  32. Reclaim Your Time With
    Automated Canary
    Analysis
    safety
    confidence
    verification
    reliability
    Statistics
    #autocanary

    View full-size slide

  33. Thanks
    Dylan Dignan
    Rohit Khansili
    Chris Regado
    Akshay Thejaswi
    Ratheesh Vijayan
    Rich Vincelette
    @megankanne #autocanary http://bit.ly/autocanary

    View full-size slide