monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 7
or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 8
your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 10
this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 49