Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practices and Obstacles to Conducting Experiments in Production

xLeitix
September 09, 2016

Practices and Obstacles to Conducting Experiments in Production

Slides of a talk I gave at the CHOOSE forum 2016.

xLeitix

September 09, 2016
Tweet

More Decks by xLeitix

Other Decks in Research

Transcript

  1. software evolution & architecture lab University of Zurich, Switzerland `

    Practices of and Obstacles to Conducting Experiments in Production Dr. Philipp Leitner @xLeitix
  2. Talk Outline • Part I - an Intro to CD

    and Live Experimentation • Part II - Live Experimentation in the Wild • Part III - Release-as-Code with Bifrost
  3. Two Keys to Moving Fast (without breaking too much) •

    Software-as-a-Service • Continuous Delivery and Deployment
  4. velocity confidence Problematic Madness Cautious Balanced low high low high

    Gerald Schermann, Jürgen Cito, Philipp Leitner, Harald C. Gall (2016). Towards Quality Gates in Continuous Delivery and Deployment. In Proceedings of the 24TH IEEE International Conference on Program Comprehension (ICPC), Best Short Paper Award. But: velocity should come with confidence
  5. Delivery Pipeline @ Facebook Dror G. Feitelson, Eitan Frachtenberg, Kent

    L. Beck, "Development and Deployment at Facebook," IEEE Internet Computing, vol. 17, no. 4, pp. 8-17, July-Aug., 2013
  6. Live Experimentation • Basic Idea: “Listen to your customers, not

    the opinion of the highest-paid person in the room.” Ron Kohavi, Randal M. Henne, and Dan Sommerfield. 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '07). ACM, New York, NY, USA, 959-967. DOI=http://dx.doi.org/10.1145/1281192.1281295 Even more: Listen to what your customers do, not what they say!
  7. Academic sources? Ron Kohavi, Randal M. Henne, and Dan Sommerfield.

    2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '07). ACM, New York, NY, USA, 959-967. Case studies from Microsoft Dror G. Feitelson, Eitan Frachtenberg, Kent L. Beck, "Development and Deployment at Facebook," IEEE Internet Computing, vol. 17, no. 4, pp. 8-17, July-Aug., 2013 Case study from Facebook T.Savor, M.Douglas, M.Gentili, L.Williams, K.Beck, and M.Stumm. Continuous Deployment at Facebook and OANDA. In Proceedings of the 38th International Conference on Software Engineering Companion, ICSE ’16, pages 21–30, New York, NY, USA, 2016. ACM. Case studies from Facebook and OANDA Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic configuration management at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 328-343.
 Case study from Facebook
  8. RQ1: What principles and practices enable organizations to leverage live

    experimentation? RQ2: What are the different flavors of live experimentation and how do they differ? Research Questions
  9. Method Mixed-Method Study (4 Phases) Pre study • Scientific literature

    • Company blogs (Netflix, Etsy, …) • Hacker News articles Qualitative Interviews (1) • 20 Interviews / 18 Companies • Software developers and release engineers • Various domains and company sizes • Open coding & card sorting (683 cards)
  10. Method Mixed-Method Study (4 Phases) Qualitative Interviews (2) / Deep-Dive

    • 11 Interviews / 9 Companies • Mainly Web applications • Software developers and release engineers Quantitative Survey • 187 complete responses • 12 min on average • Multiple/single choice, likert-scale, text questions
  11. Basic Techniques • Dogfooding (“eat your own dog food”) •

    Blue/Green (or: Red/Black) Deployments • Canary Releases (or: Partial Rollouts) • Gradual Rollouts • Dark (or: Shadow) Launches • A/B Testing
  12. Architecture “It is difficult to release individual parts of the

    system as dependencies between new code and the system in the back are just too high” - Study Participant small, independently deployable services
  13. Monitoring “The decision whether to continue rolling out is based

    on monitoring data. We look at log files, has something happened, did we get any customer feedback, if there is nothing for a couple of days, then we move on.” - Study Participant Monitoring to determine that everything runs as expected (e.g., health checks) decide about experiments (continue / stop, interpret results) A B
  14. Expert Consultants Expert teams & consultants teams not staffed with

    data science experts centralized support teams (e.g., data scientists, DevOps) e.g., identify metrics to look at, interpret results
  15. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives)
  16. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives) Practices Canary Releases, Dark Launches, Gradual Rollouts, Blue/Green Deployments A/B Testing
  17. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives) Practices Canary Releases, Dark Launches, Gradual Rollouts, Blue/Green Deployments A/B Testing Data Interpretation Often intuitive, less process driven Hypothesis- and data-driven
  18. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives) Practices Canary Releases, Dark Launches, Gradual Rollouts, Blue/Green Deployments A/B Testing Data Interpretation Often intuitive, less process driven Hypothesis- and data-driven Duration Minutes to multiple days Order of weeks
  19. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives) Practices Canary Releases, Dark Launches, Gradual Rollouts, Blue/Green Deployments A/B Testing Data Interpretation Often intuitive, less process driven Hypothesis- and data-driven Duration Minutes to multiple days Order of weeks User Selection Small scoped, sometimes gradually increased Two or more groups, constant size
  20. Live Experimentation Regression-Driven Experiments Business-Driven Experiments Main Goal Mitigation of

    technical problems Evaluation from a business perspective (e.g., monetary incentives) Practices Canary Releases, Dark Launches, Gradual Rollouts, Blue/Green Deployments A/B Testing Data Interpretation Often intuitive, less process driven Hypothesis- and data-driven Duration Minutes to multiple days Order of weeks User Selection Small scoped, sometimes gradually increased Two or more groups, constant size Responsibility Siloization Multiple teams and services
  21. Bifrost CD with Multi-Phase Live Testing Strategies Gerald Schermann, Dominik

    Schöni, Philipp Leitner, and Harald C. Gall: Bifrost - Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies. In the 2016 ACM/IFIP/ USENIX Middleware Conference (to appear.)
  22. Bifrost Live experimentation often daunting task operate multiple versions in

    parallel metrics basis for rollout / rollback decisions “expensive” when manually administered
  23. Bifrost Live experimentation often daunting task operate multiple versions in

    parallel metrics basis for rollout / rollback decisions “expensive” when manually administered Bifrost middleware for executing multi-phased release strategies designed for microservices-based applications based on formal model DSL on top non-intrusive, no code-level changes
  24. Bifrost - Architecture Service n Service 2 Service 1 Service

    2 Bifrost Proxy Service n Bifrost Proxy Service 1 Bifrost Proxy … Bifrost Engine Metrics Provider configures configures configures user DSL release strategies Bifrost CLI Bifrost Dashboard strategies status updates metric collection interacts queries developer / release engineer A P I
  25. software evolution & architecture lab Dr. Philipp Leitner University of

    Zurich, Switzerland Summary Some insights from benchmarking EC2, GCE, Azure, and Softlayer More info: Gerald Schermann, Jürgen Cito, Philipp Leitner, Harald C. Gall (2016). Towards Quality Gates in Continuous Delivery and Deployment. In Proceedings of the 24TH IEEE International Conference on Program Comprehension (ICPC). Philipp Leitner @xLeitix Main contacts: Gerald Schermann @sh3llcat Harald C. Gall @hcgall Gerald Schermann, Dominik Schöni, Philipp Leitner, Harald C. Gall (2016). Bifrost - Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies. In Proceedings of the 2016 ACM/IFIP/USENIX Middleware Conference. Gerald Schermann, Jürgen Cito, Philipp Leitner, Uwe Zdun, Harald C. Gall. An Empirical Study on Principles and Practices of Continuous Delivery and Deployment. PeerJ Preprints 4:e1889v1 (2016).