Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE Activities at Nulab

Yusuke Matsuura
April 08, 2024
96

SRE Activities at Nulab

Yusuke Matsuura

April 08, 2024
Tweet

Transcript

  1. SRE Activities at Nulab Backlog Meetup in Hanoi SRE Section,

    Development Dept, Nulab Inc. Yusuke Matsuura
  2. Agenda 1. Introduction 2. What is SRE? 3. The Role

    of the SRE at Nulab 4. What Nulab's SRE has been working on so fars 5. Future Challenges for SRE Activities at Nulab 6. Summary
  3. Self Introduction Yusuke Matsuura Engineering Manager, SRE - Nulab Inc.

    GitHub:https://github.com/matsuzj X:https://twitter.com/matsuzj I started my career as an Application Engineer and was hired as an Infrastructure Engineer at Nulab.Since 2019, I started the SRE team to solve various issues and now I am working as an Engineering Manager for the SRE team.
  4. What I want to tell you I would like to

    introduce the role of the SRE in a Japanese SaaS development company
  5. What is SRE? A site reliability engineer, or SRE, is

    a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created.
  6. Improved service reliability Build and improve reliability metrics (SLI) Determine

    reliability objectives (SLO) with product owners, and spread the concept of SLO as a basis for decision making in the release cycle.
  7. SRE Culture Fostering (Enabling) Build an organized incident response system

    (less reliance on a specific person) Build and maintain stable teams for Backlog / Cacoo / Nulab Apps Share and deploy knowledge through product collaboration
  8. Improving Developers' Development Effic Anything that leads to a shorter

    lead time from the time a Product Backlog is initiated to the time it is released. CI / CD improvements
  9. Cost-effective infrastructure enhancement Assistance in designing application architecture from a

    service reliability perspective Continuous improvement and optimization of infrastructure architecture Cost optimization
  10. Continuous improvement of operation and monitoring systems Building a continuous

    improvement cycle by incorporating postmortem Establishment of mechanisms to facilitate detection of problems Establishment of written procedures for symptoms of failure Establishment of an appropriate on-call system
  11. What we cherish We have been making improvements with what

    is described in Beyond the Twelve- Factor App Going stateless Moving to containers Reduce the number of managed servers Use managed services as much as possible
  12. What we actually solved Switching application frameworks Containerization Replacement with

    managed services Email Improvements Fostering an on-call culture
  13. 5. Future Challenges for SRE Activities at Nulab Involve product

    owners in determining reliability objectives (SLO), and instill the concept of SLO as a basis for decision making in the release cycle. Establish an organized incident response system (less reliance on specific people). Establish an appropriate on-call system Establish a system that facilitates problem detection Build and maintain stable teams for Backlog / Cacoo / Nulab Apps
  14. Summary The significant impact of SRE practices on Nulab's development

    process and product reliability. Backlog has been in service since 2006, and not only functional requirements but also non-functional requirements continue to evolve. We will continue to make improvements so that our service can be used with peace of mind by increasing the speed of its evolution in the future.