(within your organization). 2. Yes Everyone! • Self Service Alerting 1. What it is not: • Committing and pushing Nagios configs, and than restarting Nagios. • Needing a Masters in Zabbix to setup a trigger, action, email template, etc.. 2. What it is: • The ability for everyone to create a basic alert and get notified by it. • Key Performance Indicator 1. A metric that is a proxy of success for • a company • particular department • system
WebOps = Web Site Performance / Availability • ServerEng = CPU / Mem / Disk • NetEng = Bandwidth / packets per sec • OMS = Order Count • Communications = Max concurrent calls • Customer Service = Calls answered in under X seconds • Finance = Time it takes to close an accounting period Some times you’re lucky Some times you’re not
Everyone • Why? • Alerting should be tuned by the people that care about it most. • They can adjust it to a reasonable threshold that they are happy with. • They should be able to see how it performs over time so they can improve it.
over the fence! We aren’t saying to start sending them all straight to the beeper. The alert creator should also initially receive and react to it. This helps with awareness and tuning.
user created alerts, now what? The alert history can be reviewed for signal to noise ratio. After an alert has proven itself useful to a user or team promote it! Promoting it might mean migrating the alert to your normal alerting system, or could be as simple expand the subscription of the current alert.
star, Pro Tip: Setup alerts and dashboards for the number of alerts you have and the number of alerts that are being sent out. Someone has to watch the watchers.
+ Elasticsearch + Kibana = Logging done well. It's similar to the Statsd + Graphite = Metrics done well. But it has a similar gap that the Metrics stack had early one.