Slide 1

Slide 1 text

SRE Introducción

Slide 2

Slide 2 text

AGENDA ● ¿Qué es SRE? ● SRE & DevOps ● Principios ● Prácticas ● SLOs - Vocabulario, Error Budgets ● Estado del Arte ● Un día en la vida de un SRE

Slide 3

Slide 3 text

TITANIC QUEBEC BRIDGE

Slide 4

Slide 4 text

¿QUE ES COMUN EN ESTOS CASOS?

Slide 5

Slide 5 text

Esos sistemas no eran CONFIABLES! Confiabilidad describe la habilidad que tiene un sistema o un componente para funcionar bajo condiciones no esperadas durante un tiempo no especificado.

Slide 6

Slide 6 text

INGENIERIA DE CONFIABILIDAD Ingeniería de Confiabilidad es la disciplina que aplica el know-how científico a un componente, producto o proceso para asegurar que desempeña la función para la que fue diseñado sin falla por un tiempo especificado.

Slide 7

Slide 7 text

T O O L I N G MODELO OPERATIVO CLOUD SQUAD SRE DEVOPS FEATURE SQUAD 1 DEVOPS FEATURE SQUAD 2 DEVOPS FEATURE SQUAD N SRE SQUAD KPIs - Performance - Uptime - Minimal Cost per Service - Deployability SERVICES - Automation Framework - Self Service Infrastructure - Logging & Metrics Monitoring & Reporting - Scaling PRACTICES - Know the Service Level - Embrace Risk - Eliminate Toil - Know What's Broken and Why - Know the Service Level - Stuff Happens - Automate [Almost] Everything - Reliable Releases - Keep it Simple - Chaos Engineering

Slide 8

Slide 8 text

AREAS DE EXPERIENCIA Introducción

Slide 9

Slide 9 text

HISTORIA SRE 2003 DevOps is born Ben Treynor coined SRE 2014 First Conference about SRE: SRECon 2016-2018 SRE Books are released 2019 SRE massification

Slide 10

Slide 10 text

3 CARACTERISTICAS NaLSD Evitar el trabajo innecesario Si a la curiosidad Es Toil!

Slide 11

Slide 11 text

DEVOPS O SRE

Slide 12

Slide 12 text

DEVOPS O SRE

Slide 13

Slide 13 text

PRINCIPIOS

Slide 14

Slide 14 text

Acoge el riesgo Monitorear Sistemas Distribuidos Objetivos de Nivel de Servicio Eliminar el Trabajo Manual PRINCIPIOS

Slide 15

Slide 15 text

Automatización Ingeniería de Despliegue Simplicidad PRINCIPIOS

Slide 16

Slide 16 text

You expect to build 100% reliable services—ones that never fail. However, increasing reliability is worse for a service rather than better! Extreme reliability comes at a cost! Embrace the Risk! No se … me gusta la adrenalina. ACOGIENDO EL RIESGO

Slide 17

Slide 17 text

Toil is not just "work I don’t like to do." If a human operator needs to touch your system during normal operations, you have a bug. In Google at least 50% of each SRE’s time should be spent on engineering project work that will either reduce future toil. ELIMINANDO EL TRABAJO MANUAL

Slide 18

Slide 18 text

Collecting, processing, aggregating, and displaying real-time quantitative data about a system, such as query counts and types, error counts and types, processing times, and server lifetimes. A Google SRE team with 10–12 members typically has one members whose primary assignment is to build and maintain monitoring systems for their service. MONITOREANDO

Slide 19

Slide 19 text

ACOGIENDO EL RIESGO

Slide 20

Slide 20 text

Automate Yourself Out of a Job: Automate ALL the Things! ● User account creation. ● Software or hardware installation preparation and decommissioning. ● Rollouts of new software versions. ● Runtime configuration changes. Automate the Resilience! AUTOMATIZANDO

Slide 21

Slide 21 text

Continuous Build and Deployment! Release engineers have a solid understanding of source code management, compilers, build configuration languages, automated build tools, package managers, and installers. 4 principles: Self-Service Model, High Velocity, Hermetic Builds, Enforcement Policies and Procedures. INGENIERIA DE DESPLIEGUE

Slide 22

Slide 22 text

PRACTICAS

Slide 23

Slide 23 text

PRACTICAS Cultura Postmortem Estar On-Call Respuesta a Incidentes Admin. de Carga

Slide 24

Slide 24 text

PRACTICAS Pruebas para Confiabilidad Simplicidad Ingeniería de Software Soluciones Eficaces

Slide 25

Slide 25 text

PRACTICAS Pipelines Ingeniería del Caos Integridad de datos Canary Releases

Slide 26

Slide 26 text

PRACTICAS

Slide 27

Slide 27 text

PRACTICAS

Slide 28

Slide 28 text

HABILIDADES ● Software Engineering ● Distributed Systems Design ● Operating systems ● Networking ● Databases ● Security ● Reliability ● Troubleshooting ● Customer support

Slide 29

Slide 29 text

The best way to promote a DevOps & SRE culture is adopting a new view, a view focused in the syntoms, no in the causes ...