Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Steps toward self-service operations in eureka

Steps toward self-service operations in eureka

fukubaka0825

May 14, 2022
Tweet

More Decks by fukubaka0825

Other Decks in Technology

Transcript

  1. 1 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Steps toward self-service operations in eureka SRE NEXT 2022
 2022/05/14 © 2021 eureka, Inc. All Rights Reserved.
  2. 2 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Who is me © 2021 eureka, Inc. All Rights Reserved. wapper/nari • Site Reliability Engineer at eureka, inc. • Favorite: VR/Hip Hop/Skate Board/Sauna • Twitter ◦ Real: @fukubaka0825 ◦ VR: @wapper0825
  3. 3 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Eureka’s current situation
  4. 4 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Products: 2 Regions: 3 Developers: 50+
  5. 5 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Old(〜2020) Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.
  6. 6 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting New Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.
  7. 7 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Today’s topic scope © 2021 eureka, Inc. All Rights Reserved. “Self-Serive” Operation Design
  8. 8 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting • Good “Self-Service” Operations are ◦ Low Cognitive Load ◦ Low Operational Load for “Users” ◦ Secure and Auditable 
 Conclusion © 2021 eureka, Inc. All Rights Reserved.
  9. 9 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. What/Why/How “Self-Service” Operations
  10. 10 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting What is “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.
  11. 11 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Why “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.
  12. 12 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting How to build “Self Service” Operations © 2021 eureka, Inc. All Rights Reserved. Cognitive Load⬇ Operational Load⬇ Secure⬆ Auditable ⬆
  13. 13 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation
  14. 14 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
  15. 15 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide IaC platform that allows developers to develop and operate infrastructure with Software Development Life Cycle (with Terraform)
  16. 16 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Policy as Code with Conftest/Rego © 2021 eureka, Inc. All Rights Reserved. • Automatic review of semantics problems that cannot be covered by existing static analysis tools without relying on certain human review by introducing Policy as Code Operational Load⬇
  17. 17 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting User-friendly CI Notification © 2021 eureka, Inc. All Rights Reserved. • Notify users of the results of executing Terraform and conftest commands in CI in a form that is easy for them to understand what to change and how to change it • https://github.com/suzuki-shunsuke/tfcmt • https://github.com/suzuki-shunsuke/github-comment Cognitive Load⬇
  18. 18 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Terraform/AWS Workshop for Developers © 2021 eureka, Inc. All Rights Reserved. • Held workshops to raise the knowledge level of Developers' Terraform and Cloud Infrastructure Cognitive Load⬇
  19. 19 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
  20. 20 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide batch container platform for developers with AWS Fargate + Amazon Eventbridge + AWS Lambda ◦ to manage batch schedule and infra computing resources with SDLC by adding simple parameters with Terraform ◦ to execute adhoc batch task by using GitHub Actions
  21. 21 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting ECS Fargate worker task auto scaler with AWS Lambda © 2021 eureka, Inc. All Rights Reserved. • Autoscaling based on current Fargate tasks and SQS depth ◦ Determine the number of tasks to execute based on the difference between the “Backlog (VisibleMsg Count)” and the “Appropriate-Backlog (currently running tasks x capacity per specified task)” • Eliminates the need for detailed capacity planning Operational Load⬇
  22. 22 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Terraform module with few required parameters © 2021 eureka, Inc. All Rights Reserved. • Developers can easily deploy a resource by simply adding a minimum list of variables and calling it with a module • Developers can override CPU/Memory/Task Count and other parameters as needed Cognitive Load⬇
  23. 23 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Adhoc batch task runner with GitHub Actions Workflow Dispatch © 2021 eureka, Inc. All Rights Reserved. • Validate if the user can execute the program by using the GitHub User ID (Team ID) at the first step of the job • Easily track history of who did what Secure⬆ Auditable ⬆
  24. 24 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
  25. 25 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide Incident Response platform with ChatOps interface to reduce the burden of response to incidents, shorten MTTR as much as possible, and complete Postmortems process
  26. 26 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting ChatOps to issue Incident ticket/channel © 2021 eureka, Inc. All Rights Reserved. • Integrate with Slack, which everyone is familiar with, and make it possible to report incidents with as simple commands and steps as possible Cognitive Load⬇
  27. 27 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Add Incident Response flow to General On-boarding Process © 2021 eureka, Inc. All Rights Reserved. • Labor-saving and continuous recognition can be ensured by having the introduction of incident response flow incorporated in the onboarding process with BOT Cognitive Load⬇
  28. 28 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Postmortem Template © 2021 eureka, Inc. All Rights Reserved. • Postmortems can be created from templates with one click of a button on Confluence Operational Load⬇
  29. 29 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting Future Prospects © 2021 eureka, Inc. All Rights Reserved. (Quoted from O’Reilly|Seeking SRE Chapter.4) Operational Load⬇ • Introduction of “Timeline Model” to automate incident response flow more • Measure time between “Response” and “Mitigate” and “Repair” and Analyse them to shorten MTTR
  30. 30 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting • Good “Self-Service” Operations are ◦ Low Cognitive Load ◦ Low Operational Load for “Users” ◦ Secure and Auditable 
 Conclusion © 2021 eureka, Inc. All Rights Reserved.
  31. 31 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:

    Not for Public Distribution - Do Not Copy All Hands Meeting • Self-Service Operations • eurekaにおけるここ一年のTerraform Component Delivery Processの変化 急成長 していくProduct基盤のProductivity,Security,Privacyとの向き合い • Terraformのレビューを自動化するために、Conftestを導入してGitHub ActionsでCIま で設定してみる • Scaling based on Amazon SQS • Self-Serviceとサイロ化と組織構造 / Self-Service, Siloing and Organizational Structure • SRE を実現するための組織マネジメント / Management to achieve SRE • Seeking SRE • インシデントレスポンスを自動化で支援する Slack Bot で人機一体なセキュリティ対 策を実現する Reference © 2021 eureka, Inc. All Rights Reserved.