GitHub:https://github.com/matsuzj X:https://twitter.com/matsuzj I started my career as an Application Engineer and was hired as an Infrastructure Engineer at Nulab.Since 2019, I started the SRE team to solve various issues and now I am working as an Engineering Manager for the SRE team.
a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created.
(less reliance on a specific person) Build and maintain stable teams for Backlog / Cacoo / Nulab Apps Share and deploy knowledge through product collaboration
improvement cycle by incorporating postmortem Establishment of mechanisms to facilitate detection of problems Establishment of written procedures for symptoms of failure Establishment of an appropriate on-call system
is described in Beyond the Twelve- Factor App Going stateless Moving to containers Reduce the number of managed servers Use managed services as much as possible
owners in determining reliability objectives (SLO), and instill the concept of SLO as a basis for decision making in the release cycle. Establish an organized incident response system (less reliance on specific people). Establish an appropriate on-call system Establish a system that facilitates problem detection Build and maintain stable teams for Backlog / Cacoo / Nulab Apps
process and product reliability. Backlog has been in service since 2006, and not only functional requirements but also non-functional requirements continue to evolve. We will continue to make improvements so that our service can be used with peace of mind by increasing the speed of its evolution in the future.