Imagine you are responsible for a system: the system breaks, the user realizes it, informs you but you do not have a clue what is going on. And guess what? The clock is ticking. Welcome to operations hell!
Two years ago we were exactly in that situation and not only once. Let me tell you: it is as horrible as it sounds and we don't ever want to be there ever again. So we started our journey to escape this hell.
For sure, our system still breaks but we learned and improved a lot. Sometimes, we can prevent that the user experiences any impact at all or at least have a short meantime to recover.
Join me on this exciting journey from being called unexpectedly to preventing outages and see what helped us to escape from there and to defeat all the demons we met on that way.