Fear of the Computer

Fear of the Computer

Have you ever worked on a computer system that was so fragile it was frightening to make changes to? Maybe it was challenging to deploy, difficult to delete code, or changing one piece would cause surprising cascading failures.

3d7b72d70ff07f8186126a4464bc6166?s=128

Maggie Zhou

April 24, 2018
Tweet

Transcript

  1. Channel your
 Fear of the Computer
 into confident engineering Maggie

    Zhou The Lead Developer New York 2018
  2. None
  3. todo: 
 photo of self Photo of sf Photo of

    slack
  4. We rely on so many so?ware systems to build our

    products.
  5. Change is required to make progress.

  6. But sometimes things break photo here.

  7. Comic: Alex Norris of webcomicname.com

  8. None
  9. Techniques for confidence • Measure your code. But how? •

    Ramp up your work incrementally. But why?
  10. Case Study 1 How to modify a black box with

    confidence • Instrument the code to understand your baseline • Ramp up gradually and enable ramping back down
  11. Instrument the code func connect_to_db(db_xyz): return connect(xyz) func connect_to_db(db_xyz): time

    = now() conn = connect(xyz) metric(now() - time, db_xyz) return conn
  12. Feature Flags!

  13. Why feature flags? • Gate user visible client-side features with

    server-side configuraOon • Do A/B tesOng • Merge code that isn’t ready for full-feature release • Incremental ramp-ups • Easy, well-socialized place to ramp down and turn it off
  14. Why feature flags for infrastructure changes? • Gate user visible

    client-side features with server-side configuraOon • Do A/B tesOng • Merge code that isn’t ready for full-feature release • Incremental ramp-ups • Easy, well-socialized place to ramp down and turn it off
  15. Build confidence From incremental ramp-ups and watching your metrics.

  16. Ramp up incrementally func connect_to_db(db_xyz): time = now() conn =

    connect_old_way(xyz) metric(now() - time, db_xyz) return conn
  17. Ramp up incrementally func connect_to_db(db_xyz): time = now() if (Feature::isOn(‘db_new’)):

    conn = connect_new_way(xyz) else: conn = connect_old_way(xyz) metric(now() - time, db_xyz) return conn 
 
 
 …
 … $config[‘db_new’] = array( ‘on’ => 1 
 # Ramp up to 1% of traffic ) …
  18. Ramp up incrementally This triggered out of memory errors!
 


    Why?
  19. Actually… A hidden recursive dependency on the ORM inside the

    Feature library!
  20. Ramp up incrementally func connect_to_db(db_xyz): time = now() conn =

    connect_old_way(xyz) metric(now() - time, db_xyz) return conn
  21. Ramp up incrementally func connect_to_db(db_xyz): time = now() if (host

    in rampup_list): conn = connect_new_way(xyz) else: conn = connect_old_way(xyz) metric(now() - time, db_xyz) return conn
  22. Confidence From knowing 
 where the off-switch is.

  23. Releasing to 100% should be boring.

  24. Minimize user impact by making it easy to ramp down

    your change.
  25. -Allison Kaptur, in ‘Love your Bugs’ “…if you’re wriOng a

    mobile or a desktop applicaOon: You need server-side feature gaOng and server-side flags. When you discover a problem and you don’t have server- side controls, the resoluOon might take days or weeks as you push out a new release or submit a new version to the app store. That’s a bad situaOon to be in.”
  26. Confidence from: • Ramping up my change incrementally • Knowing

    how to ramp down my change quickly • Knowing that my team has my back and that I’ve socialized how to turn off my change. • Understanding my baseline performance and watching metrics as my change is released.
  27. Case Study 2 Build confidence by understanding your changes. How?

    • Stare really hard at the code?
  28. Case Study 2 Build confidence by understanding your changes. How?

    • Stare really hard at the code? • Reduce risk by dogfooding • Have analyOcs available just for your dogfood group with the changes you made • Measure your yardsOcks
  29. None
  30. one API endpoint ms

  31. Segment your metrics by your ramp-up key to catch performance

    regressions early!
  32. But what’s going on? Measurements at the data access layer

    showed no performance regressions.
  33. None
  34. -Carlos Buenos in Mature OpBmizaBon “Measurement so?ware is just as

    likely to have bugs as anything else we write, and we should take more care than usual. A bug in user facing code results in a bad experience. A bug in measurement code results in bad decisions.”
  35. –Johnny Appleseed “Type a quote here.”

  36. Make big infrastructure changes with confidence • Instrument your code.

    • Understand your baseline. • Check your yardsOcks. • Segment your metrics into populaOons that match your ramp-ups. • Ramp up incrementally • Make it easy and well-understood how to config your changes off.
  37. Make big infrastructure changes with confidence.

  38. Maggie Zhou The Lead Developer New York 2018 Resources: bit.ly/zmagg_leaddev2018

    Find me on the Internet @zmagg ✨Thank you ✨ to my teams at Slack and Etsy for all the lessons learned. Special thanks to Kiran BhaSaram, Kiné Camara, Julia Evans, DureW Hirpa, Andrew Morrison, Bhaskar Mookerji, Tracy Stampfil, Meri Williams and Lydia Wagner for your early feedback and support on dra]s of this talk .