develop by themselves • No ask SREs • We SRE provides the process • Design Doc • Production Readiness Checklist • Delegate Infrastructure Management(Terraform) • SLI/SLO • Alerting
develop by themselves • No ask SREs • We SRE provides the process • Design Doc • Production Readiness Check • Delegate Infrastructure Management(Terraform) • SLI/SLO • Alerting Service Team define their SLI/SLO and review it weekly
develop by themselves • No ask SREs • We SRE provides the process • Design Doc • Production Readiness Check • Delegate Infrastructure Management(Terraform) • SLI/SLO • Alerting Service Team define their SLI/SLO and review it weekly https://sre-next.dev/schedule#c4
cause SLO violations • CPU usage is high • OOM Killer happens • Unavailable pods • Unicorn backlog is increasing • Service Team check only SLO alerts • Better to have insights when you received alerts