How does your system react when a key resource fails? Say, the database becomes unavailable, or the message broker fails. What if you get a current surge of load, that you have to keep up? What if a badly worded error message results in a billion dollar fire.
Real life engineering disciplines can teach us a thing or two on designing for resilience. Learn the techniques and patterns that you can borrow from other areas of engineering, and apply them in your systems.