Having strong instrumentation of your services allows you to quickly understand what broken during an outage. This presentation gives an introduction to StatsD and Graphite and shows how to effectively instrument your services.
orders is: PGRES_COMMAND_OK I, [2013-07-09T19:22:40.996657 #27430] INFO -- : Import of: orders done in 156.915290356 seconds I, [2013-07-09T19:22:40.997121 #27430] INFO -- : Aggregating data from orders into orders_hourly I, [2013-07-09T19:22:41.000360 #27430] INFO -- : Deleting data in orders_hourly since 2013-07-08 07:00:00 UTC I, [2013-07-09T19:23:08.361929 #27430] INFO -- : Done aggregating data from orders into orders_hourly took: 27.364678771 secs. I, [2013-07-09T19:23:08.362411 #27430] INFO -- : Aggregating data from orders into orders_three_hourly I, [2013-07-09T19:23:08.364060 #27430] INFO -- : Deleting data in orders_three_hourly since 2013-07-08 06:00:00 UTC I, [2013-07-09T19:24:05.365206 #27430] INFO -- : Done aggregating data from orders into orders_three_hourly took: 57.002684493 secs. I, [2013-07-09T19:24:05.365590 #27430] INFO -- : Aggregating data from orders into orders_daily I, [2013-07-09T19:24:05.367022 #27430] INFO -- : Deleting data in orders_daily since 2013-07-08 00:00:00 UTC I, [2013-07-09T19:24:27.067976 #27430] INFO -- : Done aggregating data from orders into orders_daily took: 21.70228348 secs. I, [2013-07-09T19:24:27.068373 #27430] INFO -- : Aggregating data from orders into orders_daily_with_inactive I, [2013-07-09T19:24:27.069709 #27430] INFO -- : Deleting data in orders_daily_with_inactive since 2013-07-08 00:00:00 UTC I, [2013-07-09T19:24:48.167146 #27430] INFO -- : Done aggregating data from orders into orders_daily_with_inactive took: 21.098650945 secs. I, [2013-07-09T19:24:48.167536 #27430] INFO -- : Aggregating data from orders into orders_weekly I, [2013-07-09T19:24:48.169052 #27430] INFO -- : Deleting data in orders_weekly since 2013-07-06 00:00:00 UTC