Microservices can be a great way to work: the services are simple, you can use the right technology for the job, and deployments become smaller and less risky. Unfortunately, other things become more complex. You probably took some time to design a deployment pipeline and set up self-service provisioning, for example. But did the rest of your thinking about what “done” means catch up? Are you still setting up alerts, run books, and monitoring for each microservice as though it was a monolith?
Two years ago, a team at the FT started out building a microservices-based system from scratch. Their initial naive approach to monitoring meant that an underlying network issue could mean 20 people each receiving 10,000 alert emails overnight. With that volume, you can’t pick out the important stuff. In fact, your inbox is unusable unless you have everything filtered away where you’ll never see it. Furthermore, you have information radiators all over the place, but there’s always something flashing or the wrong colour. You can spend the whole day moving from one attention-grabbing screen to another.
That team now has over 150 microservices in production. So how they get themselves out of that mess and regain control of their inboxes and their time? First, you have to work out what’s important, and then you have to ruthlessly narrow down on that. You need to be able to see only the things you need to take action on in a way that tells you exactly what you need to do. Sarah shares how her team regained control and offers some tips and tricks.