Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Engineer's Guide to Good Nights Sleep

An Engineer's Guide to Good Nights Sleep

An updated version of my talk about practices to avoid that dreaded 3am call.

As organisations look to empower engineers more, and embrace DevOps practices, we have seen the support role change quite a bit too. Developers are moving from being purely third line support, to working more collaboratively with engineers and operational staff. Also as we move to cloud native microservice solutions, the increased complexity and diversity of our production landscape means operational staff may well rely more heavily on the engineers, in particular out of hours.

I have spent the last 18 years working across a plethora of industries utilising a myriad of technology and approaches. From working on everything from trading applications to content enrichment APIs, I have seen a lot of approaches and processes try to help minimise operational support for developers.

In this talk, I will be exploring and discussing some of my top approaches and techniques to help reduce the risk of that dreaded 3am call! You will gain some practical insight into how to handle failure in today's more complex distributed microservice systems. This will include looking at approaches to resiliency, understanding your system, understanding the requirements for fault tolerance, and the developers' mindset necessary for this. I will be peppering this talk with real world examples, and an occasional war story along the way too.

Nicky Wrightson

September 16, 2019
Tweet

More Decks by Nicky Wrightson

Other Decks in Technology

Transcript

  1. @nickywrightson 2014 Consumers add a caching layer to protect against

    our outages 2019 Out of hours calls to 3rd line have all but disappeared 2018 Migration to Kubernetes completed 2017 Our services were given an SLA of 15mins recovery time
  2. @nickywrightson “Only have alerts that you need to action” Sarah

    Wells - Director of Operations and Reliability at FT 4
  3. @nickywrightson Use tracing to monitor your critical flows 4 Ben

    Sigelman Restoring Confidence in Microservices: Tracing That's More Than Traces
  4. @nickywrightson “a method of experimenting on infrastructure that lets you

    expose weaknesses before they become a real problem.” 5
  5. @nickywrightson Resources Testing Microservices, the sane way by Cindy Sridharan

    https://medium.com/@copyconstruct/testing-microservices-the-sane-way-9bb31d158c16 Microservices trade offs by Martin Fowler https://martinfowler.com/articles/microservice-trade-offs.html Ben Sigelman @ QCon 2019 https://www.infoq.com/presentations/microservices-distributed-tracing? itm_source=infoq&itm_medium=QCon_EarlyAccessVideos&itm_campaign=QConLondon2019 James Governor on progressive delivery: https://redmonk.com/jgovernor/2018/08/06/towards- progressive-delivery/ Chaity Majors on Friday freezes: https://charity.wtf/2019/05/01/friday-deploy-freezes-are-exactly-like- murdering-puppies/