$30 off During Our Annual Pro Sale. View Details »

Steps toward self-service operations in eureka

Steps toward self-service operations in eureka

fukubaka0825

May 14, 2022
Tweet

More Decks by fukubaka0825

Other Decks in Technology

Transcript

  1. 1
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Steps toward self-service
    operations in eureka
    SRE NEXT 2022

    2022/05/14
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  2. 2
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Who is me
    © 2021 eureka, Inc. All Rights Reserved.
    wapper/nari
    ● Site Reliability Engineer at eureka, inc.
    ● Favorite: VR/Hip Hop/Skate Board/Sauna
    ● Twitter
    ○ Real: @fukubaka0825
    ○ VR: @wapper0825

    View Slide

  3. 3
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    Eureka’s current situation

    View Slide

  4. 4
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    Products: 2
    Regions: 3
    Developers: 50+

    View Slide

  5. 5
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Old(〜2020) Our SRE Team Practice Overview
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  6. 6
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    New Our SRE Team Practice Overview
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  7. 7
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Today’s topic scope
    © 2021 eureka, Inc. All Rights Reserved.
    “Self-Serive”
    Operation
    Design

    View Slide

  8. 8
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    ● Good “Self-Service” Operations are
    ○ Low Cognitive Load
    ○ Low Operational Load for “Users”
    ○ Secure and Auditable

    Conclusion
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  9. 9
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    What/Why/How “Self-Service” Operations

    View Slide

  10. 10
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    What is “Self Service” Operations?
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  11. 11
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Why “Self Service” Operations?
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  12. 12
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    How to build “Self Service” Operations
    © 2021 eureka, Inc. All Rights Reserved.
    Cognitive Load⬇
    Operational Load⬇
    Secure⬆
    Auditable

    View Slide

  13. 13
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    3 “Self-Service” Operations Examples in eureka
    1.Infrastructure as Code(Terraform) Operation
    2.Batch Container Operation
    3.Incident Response Operation

    View Slide

  14. 14
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    3 “Self-Service” Operations Examples in eureka
    1.Infrastructure as Code(Terraform) Operation
    2.Batch Container Operation
    3.Incident Response Operation
    👈

    View Slide

  15. 15
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Overview
    © 2021 eureka, Inc. All Rights Reserved.
    ● Provide IaC platform that allows developers to develop and operate
    infrastructure with Software Development Life Cycle (with Terraform)

    View Slide

  16. 16
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Policy as Code with Conftest/Rego
    © 2021 eureka, Inc. All Rights Reserved.
    ● Automatic review of semantics problems that cannot be covered by existing static
    analysis tools without relying on certain human review by introducing Policy as
    Code Operational Load⬇

    View Slide

  17. 17
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    User-friendly CI Notification
    © 2021 eureka, Inc. All Rights Reserved.
    ● Notify users of the results of executing Terraform and conftest commands in CI in
    a form that is easy for them to understand what to change and how to change it
    ● https://github.com/suzuki-shunsuke/tfcmt
    ● https://github.com/suzuki-shunsuke/github-comment
    Cognitive Load⬇

    View Slide

  18. 18
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Terraform/AWS Workshop for Developers
    © 2021 eureka, Inc. All Rights Reserved.
    ● Held workshops to raise the knowledge level of Developers' Terraform and Cloud
    Infrastructure Cognitive Load⬇

    View Slide

  19. 19
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    3 “Self-Service” Operations Examples in eureka
    1.Infrastructure as Code(Terraform) Operation
    2.Batch Container Operation
    3.Incident Response Operation
    👈

    View Slide

  20. 20
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Overview
    © 2021 eureka, Inc. All Rights Reserved.
    ● Provide batch container platform for developers with AWS Fargate + Amazon
    Eventbridge + AWS Lambda
    ○ to manage batch schedule and infra computing resources with SDLC by
    adding simple parameters with Terraform
    ○ to execute adhoc batch task by using GitHub Actions

    View Slide

  21. 21
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    ECS Fargate worker task auto scaler with AWS Lambda
    © 2021 eureka, Inc. All Rights Reserved.
    ● Autoscaling based on current Fargate tasks and SQS depth
    ○ Determine the number of tasks to execute based on the difference between
    the “Backlog (VisibleMsg Count)” and the “Appropriate-Backlog (currently
    running tasks x capacity per specified task)”
    ● Eliminates the need for detailed capacity planning Operational Load⬇

    View Slide

  22. 22
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Terraform module with few required parameters
    © 2021 eureka, Inc. All Rights Reserved.
    ● Developers can easily deploy a resource by simply adding a minimum list of
    variables and calling it with a module
    ● Developers can override CPU/Memory/Task Count and other parameters as
    needed
    Cognitive Load⬇

    View Slide

  23. 23
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Adhoc batch task runner with GitHub Actions Workflow Dispatch
    © 2021 eureka, Inc. All Rights Reserved.
    ● Validate if the user can execute the program by using the GitHub User ID (Team
    ID) at the first step of the job
    ● Easily track history of who did what
    Secure⬆
    Auditable

    View Slide

  24. 24
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    © 2021 eureka, Inc. All Rights Reserved.
    3 “Self-Service” Operations Examples in eureka
    1.Infrastructure as Code(Terraform) Operation
    2.Batch Container Operation
    3.Incident Response Operation 👈

    View Slide

  25. 25
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Overview
    © 2021 eureka, Inc. All Rights Reserved.
    ● Provide Incident Response platform with ChatOps interface to reduce the
    burden of response to incidents, shorten MTTR as much as possible, and
    complete Postmortems process

    View Slide

  26. 26
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    ChatOps to issue Incident ticket/channel
    © 2021 eureka, Inc. All Rights Reserved.
    ● Integrate with Slack, which everyone is familiar with, and make it possible to
    report incidents with as simple commands and steps as possible Cognitive Load⬇

    View Slide

  27. 27
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Add Incident Response flow to General On-boarding Process
    © 2021 eureka, Inc. All Rights Reserved.
    ● Labor-saving and continuous recognition can be ensured by having the
    introduction of incident response flow incorporated in the onboarding process
    with BOT Cognitive Load⬇

    View Slide

  28. 28
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Postmortem Template
    © 2021 eureka, Inc. All Rights Reserved.
    ● Postmortems can be created from templates with one click of a button on
    Confluence Operational Load⬇

    View Slide

  29. 29
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    Future Prospects
    © 2021 eureka, Inc. All Rights Reserved.
    (Quoted from O’Reilly|Seeking SRE Chapter.4)
    Operational Load⬇
    ● Introduction of “Timeline Model” to automate incident response flow more
    ● Measure time between “Response” and “Mitigate” and “Repair” and Analyse them
    to shorten MTTR

    View Slide

  30. 30
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    ● Good “Self-Service” Operations are
    ○ Low Cognitive Load
    ○ Low Operational Load for “Users”
    ○ Secure and Auditable

    Conclusion
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  31. 31
    © 2021 eureka, Inc. All Rights Reserved.
    CONFIDENTIAL INFORMATION: Not for Public Distribution - Do Not Copy
    All Hands Meeting
    ● Self-Service Operations
    ● eurekaにおけるここ一年のTerraform Component Delivery Processの変化 急成長
    していくProduct基盤のProductivity,Security,Privacyとの向き合い
    ● Terraformのレビューを自動化するために、Conftestを導入してGitHub ActionsでCIま
    で設定してみる
    ● Scaling based on Amazon SQS
    ● Self-Serviceとサイロ化と組織構造 / Self-Service, Siloing and Organizational
    Structure
    ● SRE を実現するための組織マネジメント / Management to achieve SRE
    ● Seeking SRE
    ● インシデントレスポンスを自動化で支援する Slack Bot で人機一体なセキュリティ対
    策を実現する
    Reference
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide

  32. 32
    © 2021 eureka, Inc. All Rights Reserved.

    View Slide