test add log statements until you can verify what causes the broken state. if the bug did work at some point, find the point at which it did work. write tests to represent the configuration and flow of the fixed state
not connect to server: Connection refused was bubbling up all over the place. Jobs won’t run, emails won’t send, every submit button on the site fatal errored.” on-call log WILDLY CHAOTIC
Julia Evans • Systems Performance, Brendan Gregg • “Why Do Computers Stop and What Can Be Done About It?”, Jim Gray • “Debug Patterns for Efficient High- levelSystemC Debugging”, Frank Rogin, Erhard Fehlauer, Christian Haufe, Sebastian Ohnewald