Four Golden Signals 18 The four golden signals of monitoring are latency, traffic, errors, and saturation. If you can only measure four metrics of your user-facing system, focus on these four. http://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
Four Golden Signals 19 1. latency 2. traffic 3. errors 4. saturation ← How "full" your service is. (e.g., in a memory-constrained system, show memory; in an I/O-constrained system, show I/O)
There are alerts, which say a human must take action right now. Something that is happening or about to happen, that a human needs to take action immediately to improve the situation. The second category is tickets. A human needs to take action, but not immediately. You have maybe hours, typically, days, but some human action is required. The third category is logging. No one ever needs to look at this information, but it is available for diagnostic or forensic purposes. The expectation is that no one reads it. What is ‘Site Reliability Engineering’? https://landing.google.com/sre/interview/ben-treynor.html