SRE Activities at Nulab

SRE Activities at Nulab Backlog Meetup in Hanoi SRE Section,
Development Dept, Nulab Inc. Yusuke Matsuura

Agenda 1. Introduction 2. What is SRE? 3. The Role
of the SRE at Nulab 4. What Nulab's SRE has been working on so fars 5. Future Challenges for SRE Activities at Nulab 6. Summary

1. Introduction

Self Introduction Yusuke Matsuura Engineering Manager, SRE - Nulab Inc.
GitHub:https://github.com/matsuzj X:https://twitter.com/matsuzj I started my career as an Application Engineer and was hired as an Infrastructure Engineer at Nulab.Since 2019, I started the SRE team to solve various issues and now I am working as an Engineering Manager for the SRE team.

What I want to tell you I would like to
introduce the role of the SRE in a Japanese SaaS development company

2. What is SRE?

What is SRE? A site reliability engineer, or SRE, is
a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created.

SRE Scope of Responsibility Capacity Planning Availability Performance Monitoring Incident
Response On-call Support Post-Mortem

3. The Role of the SRE at Nulab

Improved service reliability Build and improve reliability metrics (SLI) Determine
reliability objectives (SLO) with product owners, and spread the concept of SLO as a basis for decision making in the release cycle.

SRE Culture Fostering (Enabling) Build an organized incident response system
(less reliance on a specific person) Build and maintain stable teams for Backlog / Cacoo / Nulab Apps Share and deploy knowledge through product collaboration

Improving Developers' Development Effic Anything that leads to a shorter
lead time from the time a Product Backlog is initiated to the time it is released. CI / CD improvements

Toile Reduction Automate repetitive tasks Deactivation and replacement of unnecessary
services

Cost-effective infrastructure enhancement Assistance in designing application architecture from a
service reliability perspective Continuous improvement and optimization of infrastructure architecture Cost optimization

Continuous improvement of operation and monitoring systems Building a continuous
improvement cycle by incorporating postmortem Establishment of mechanisms to facilitate detection of problems Establishment of written procedures for symptoms of failure Establishment of an appropriate on-call system

4. What Nulab's SRE has been working on so far

What we cherish We have been making improvements with what
is described in Beyond the Twelve- Factor App Going stateless Moving to containers Reduce the number of managed servers Use managed services as much as possible

What we actually solved Switching application frameworks Containerization Replacement with
managed services Email Improvements Fostering an on-call culture

5. Future Challenges for SRE Activities at Nulab

5. Future Challenges for SRE Activities at Nulab Involve product
owners in determining reliability objectives (SLO), and instill the concept of SLO as a basis for decision making in the release cycle. Establish an organized incident response system (less reliance on specific people). Establish an appropriate on-call system Establish a system that facilitates problem detection Build and maintain stable teams for Backlog / Cacoo / Nulab Apps

6. Summary

Summary The significant impact of SRE practices on Nulab's development
process and product reliability. Backlog has been in service since 2006, and not only functional requirements but also non-functional requirements continue to evolve. We will continue to make improvements so that our service can be used with peace of mind by increasing the speed of its evolution in the future.

SRE Activities at Nulab

SRE Activities at Nulab

Yusuke Matsuura

More Decks by Yusuke Matsuura

Featured

Transcript