Slide 1

Slide 1 text

1 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Steps toward self-service operations in eureka SRE NEXT 2022
 2022/05/14 © 2021 eureka, Inc. All Rights Reserved.

Slide 2

Slide 2 text

2 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Who is me © 2021 eureka, Inc. All Rights Reserved. wapper/nari ● Site Reliability Engineer at eureka, inc. ● Favorite: VR/Hip Hop/Skate Board/Sauna ● Twitter ○ Real: @fukubaka0825 ○ VR: @wapper0825

Slide 3

Slide 3 text

3 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Eureka’s current situation

Slide 4

Slide 4 text

4 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Products: 2 Regions: 3 Developers: 50+

Slide 5

Slide 5 text

5 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Old(〜2020) Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.

Slide 6

Slide 6 text

6 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting New Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.

Slide 7

Slide 7 text

7 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Today’s topic scope © 2021 eureka, Inc. All Rights Reserved. “Self-Serive” Operation Design

Slide 8

Slide 8 text

8 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting ● Good “Self-Service” Operations are ○ Low Cognitive Load ○ Low Operational Load for “Users” ○ Secure and Auditable 
 Conclusion © 2021 eureka, Inc. All Rights Reserved.

Slide 9

Slide 9 text

9 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. What/Why/How “Self-Service” Operations

Slide 10

Slide 10 text

10 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting What is “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.

Slide 11

Slide 11 text

11 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Why “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.

Slide 12

Slide 12 text

12 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting How to build “Self Service” Operations © 2021 eureka, Inc. All Rights Reserved. Cognitive Load⬇ Operational Load⬇ Secure⬆ Auditable ⬆

Slide 13

Slide 13 text

13 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation

Slide 14

Slide 14 text

14 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈

Slide 15

Slide 15 text

15 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. ● Provide IaC platform that allows developers to develop and operate infrastructure with Software Development Life Cycle (with Terraform)

Slide 16

Slide 16 text

16 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Policy as Code with Conftest/Rego © 2021 eureka, Inc. All Rights Reserved. ● Automatic review of semantics problems that cannot be covered by existing static analysis tools without relying on certain human review by introducing Policy as Code Operational Load⬇

Slide 17

Slide 17 text

17 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting User-friendly CI Notification © 2021 eureka, Inc. All Rights Reserved. ● Notify users of the results of executing Terraform and conftest commands in CI in a form that is easy for them to understand what to change and how to change it ● https://github.com/suzuki-shunsuke/tfcmt ● https://github.com/suzuki-shunsuke/github-comment Cognitive Load⬇

Slide 18

Slide 18 text

18 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Terraform/AWS Workshop for Developers © 2021 eureka, Inc. All Rights Reserved. ● Held workshops to raise the knowledge level of Developers' Terraform and Cloud Infrastructure Cognitive Load⬇

Slide 19

Slide 19 text

19 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈

Slide 20

Slide 20 text

20 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. ● Provide batch container platform for developers with AWS Fargate + Amazon Eventbridge + AWS Lambda ○ to manage batch schedule and infra computing resources with SDLC by adding simple parameters with Terraform ○ to execute adhoc batch task by using GitHub Actions

Slide 21

Slide 21 text

21 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting ECS Fargate worker task auto scaler with AWS Lambda © 2021 eureka, Inc. All Rights Reserved. ● Autoscaling based on current Fargate tasks and SQS depth ○ Determine the number of tasks to execute based on the difference between the “Backlog (VisibleMsg Count)” and the “Appropriate-Backlog (currently running tasks x capacity per specified task)” ● Eliminates the need for detailed capacity planning Operational Load⬇

Slide 22

Slide 22 text

22 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Terraform module with few required parameters © 2021 eureka, Inc. All Rights Reserved. ● Developers can easily deploy a resource by simply adding a minimum list of variables and calling it with a module ● Developers can override CPU/Memory/Task Count and other parameters as needed Cognitive Load⬇

Slide 23

Slide 23 text

23 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Adhoc batch task runner with GitHub Actions Workflow Dispatch © 2021 eureka, Inc. All Rights Reserved. ● Validate if the user can execute the program by using the GitHub User ID (Team ID) at the first step of the job ● Easily track history of who did what Secure⬆ Auditable ⬆

Slide 24

Slide 24 text

24 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈

Slide 25

Slide 25 text

25 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. ● Provide Incident Response platform with ChatOps interface to reduce the burden of response to incidents, shorten MTTR as much as possible, and complete Postmortems process

Slide 26

Slide 26 text

26 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting ChatOps to issue Incident ticket/channel © 2021 eureka, Inc. All Rights Reserved. ● Integrate with Slack, which everyone is familiar with, and make it possible to report incidents with as simple commands and steps as possible Cognitive Load⬇

Slide 27

Slide 27 text

27 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Add Incident Response flow to General On-boarding Process © 2021 eureka, Inc. All Rights Reserved. ● Labor-saving and continuous recognition can be ensured by having the introduction of incident response flow incorporated in the onboarding process with BOT Cognitive Load⬇

Slide 28

Slide 28 text

28 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Postmortem Template © 2021 eureka, Inc. All Rights Reserved. ● Postmortems can be created from templates with one click of a button on Confluence Operational Load⬇

Slide 29

Slide 29 text

29 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting Future Prospects © 2021 eureka, Inc. All Rights Reserved. (Quoted from O’Reilly|Seeking SRE Chapter.4) Operational Load⬇ ● Introduction of “Timeline Model” to automate incident response flow more ● Measure time between “Response” and “Mitigate” and “Repair” and Analyse them to shorten MTTR

Slide 30

Slide 30 text

30 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting ● Good “Self-Service” Operations are ○ Low Cognitive Load ○ Low Operational Load for “Users” ○ Secure and Auditable 
 Conclusion © 2021 eureka, Inc. All Rights Reserved.

Slide 31

Slide 31 text

31 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy All Hands Meeting ● Self-Service Operations ● eurekaにおけるここ一年のTerraform Component Delivery Processの変化 急成長 していくProduct基盤のProductivity,Security,Privacyとの向き合い ● Terraformのレビューを自動化するために、Conftestを導入してGitHub ActionsでCIま で設定してみる ● Scaling based on Amazon SQS ● Self-Serviceとサイロ化と組織構造 / Self-Service, Siloing and Organizational Structure ● SRE を実現するための組織マネジメント / Management to achieve SRE ● Seeking SRE ● インシデントレスポンスを自動化で支援する Slack Bot で人機一体なセキュリティ対 策を実現する Reference © 2021 eureka, Inc. All Rights Reserved.

Slide 32

Slide 32 text

32 © 2021 eureka, Inc. All Rights Reserved.