Netflix: 190 Countries and 5 CORE SREs b. SREcon16 - Performance Checklists for SREs c. SREcon16 - The Realities of the Job of Delivering Reliability d. SouthBay SRE: Cloud Capacity Planning - August 9th 2016 e. Site Reliability Engineering at Dropbox 2
great geek. -- Rick Hwang 值得警惕的是,理解一個系統應該如何工作,並不能使人成為專家。只能靠調查系統 為何不能正常工作才行。 Be warned that being an expert is more than understanding how a system is supposed to work. Expertise is gained by investigating why a system doesn’t work. -- Brian Redman 神寫的系統是不會有雷 6
53 Web App ASG Web Servers ELB (Internal ELB) App Servers Third Party Services Health-Checker Light Health Check Layer Health Check Deep Health Check Service A Service B
App ASG Web Servers ELB (Internal ELB) App Servers Third Party Services Health-Checker Light Health Check Layer Health Check Deep Health Check Service A Service B
App ASG Web Servers ELB (Internal ELB) App Servers Third Party Services Health-Checker Light Health Check Layer Health Check Deep Health Check Service A Service B
Check - Application Self • Layer Health Check - App to App • Deep Health Check: Service Self • Service Health Check: Service to Services Ref: 淺談系統監控與 AWS CloudWatch 的應用