Production: Designing for Testability

Production: Designing for Testability

A major part of our lives is working safely with production - yet few organizations today are designing production to enable higher quality and end to end verification of the code we write and deploy. In this talk, we build on the foundation of great microservice architectures to include the first class design of testability as one of the most important artifacts that high velocity and high-quality teams should consider. In particular, we’ll explore what it’s like to build quality software with no development, QA, or staging environments. We'll include a deep dive into “verification in production” and what it really takes to build software that can safely be tested continuously in production. Let’s build developer happiness by *knowing* production is correct.


Michael Bryzek

June 27, 2017


  1. Production: Designing for Testability Michael Bryzek / @mbryzek Cofounder

    / CTO Flow Cofounder / ex-CTO Gilt
  2. Think about something you deployed recently to production…

  3. Is It Working?

  4. Right now in production?

  5. How do you know?

  6. Feeling anxious?

  7. 7 Let’s remove that anxiety

  8. About Me

  9. Software Quality is Hard Think end to end for entire

    lifecycle of code Verification in Production is a powerful technique to help us build quality software
  10. True continuous delivery No staging environments Don’t run code locally

    Life at Flow: Delivering Quality Software
  11. True Continuous Delivery Automated tests / No safeguards 1 way

    to do something Assume Continuous Delivery in Design Process
  12. “I love my staging environment” Said Nobody Ever

  13. No Staging Environments Bottlenecks Fragile Difficult to understand failure Expensive

    (30-40% of budget common) Create the wrong incentives
  14. Don’t run code locally If unsure, write the test! Learn

    to trust your tests
  15. Quality Through Architecture Extreme Isolation • Rich event streams •

    Own DNS, load balancer • Private database • No consul/zk/shared state • Stop cascading failures • “Delay” not “Outage”
  16. Let’s look at real examples Successfully “testing in production”

  17. Example: Know That Checkout Works Bot places an order every

    few minutes Identify test orders and immediately cancel
  18. Example: Support “Sandbox” Accounts ”SaaS” – even for internal accounts

    Mark individual accounts as sandbox One API Key for all sandbox accounts "every service is a third party"
  19. Example: End to End Integration Tests Create Sandbox Org Run

    tests Delete Sandbox Org “Safe and Repeatable”
  20. Example: Using Sandbox Account for Test Orders

  21. Example: Verifying Proxy Server Works as Expected

  22. Operating As Expected

  23. But sometimes things go wrong Even to the best of

  24. Considerations Make production access explicit (not the default) Use defined

    paths (e.g. API calls) Restrict sensitive data Design for side effects
  25. Unexpected Benefits

  26. Perfect Documentation

  27. Capture request/response of API Calls

  28. Tooling: API | Builder (formerly known as apidoc) 28

    Version control for APIs Backwards Compatibility High Quality Mocks
  29. High Quality Mocks – From Specs Full Mock Generated Implement

    Only What You Need To Test
  30. Tooling: Real Time DB Monitoring

  31. Tooling: Super Simple Alerts from Log Log a prefix Schedule

    a real time alert
  32. Key Takeaways – Design Production to be Testable Trust your

    tests, run subset in production Invest in continuous delivery Sandbox accounts are powerful High quality, trustworthy mocks Real-time feedback from production
  33. Thank You! We’re hiring: Michael Bryzek / @mbryzek

    Cofounder / CTO Flow Cofounder / ex-CTO Gilt
  34. Appendix Michael Bryzek / @mbryzek Cofounder / CTO Flow

    Commerce Cofounder / ex-CTO Gilt Groupe
  35. 35 Knowing production works Frees your mind To build new

    applications And Live in Bliss
  36. Example: Deploy, Then Release Separate deploy from real traffic Rollout

    incrementally Splitter -
  37. Culture 37 Failure is a mandatory component of success Progress

  38. Define Failure – shift to model cost Never forget there

    is cost to doing nothing
  39. No Private Network 39 Nothing special about our network