Slide 1

Slide 1 text

Observing & Understanding Failures Training SRE Apprentices Tammy Bryant Butow, SRE @ Gremlin @tambryantbutow

Slide 2

Slide 2 text

Is it difficult to develop skills to observe and understand failures? Why is training from someone more experienced helpful? @tambryantbutow

Slide 3

Slide 3 text

@tambryantbutow Luke before he met Yoda Could barely use force

Slide 4

Slide 4 text

Luke after he met Yoda Lifts ship out of swamp after doing a handstand @tambryantbutow

Slide 5

Slide 5 text

SRE Apprentice @tambryantbutow Padawan We created an SRE Apprentice Program to hire and train new SREs. Apprentices come from different backgrounds, for example we hired a Math Teacher who had completed a coding bootcamp (Hackbright Academy). We matched SRE Apprentices with SRE Teachers.

Slide 6

Slide 6 text

SRE Apprentice @tambryantbutow SRE Teacher Padawan Jedi SRE Apprentices receive one-on-one instruction in the ways of the SRE.

Slide 7

Slide 7 text

@tambryantbutow When an Apprentice’s training is completed, they must pass the SRE interview-loop to become an SRE. They then continue to develop their skills and one day they will find a Padawan to train. SRE Teacher Padawan Jedi SRE Apprentice

Slide 8

Slide 8 text

Let’s check our assumptions on SRE Apprentices @tambryantbutow Rona Krishelle Thomissa John

Slide 9

Slide 9 text

How do we match SRE Apprentices (Padawans) to SRE Teachers (Jedi)? @tambryantbutow

Slide 10

Slide 10 text

SRE Apprentice @tambryantbutow Padawan - Rona Chong The SRE apprenticeship was critical for my career - it was my foot into the door of the tech industry, when it can be hard to break in as a newcomer without the usual credentials. But getting your foot in the door is just the first step. To help set me up for success, Tammy was several things for me: a mentor, a source of emotional support, and an advocate. Find people who are truly dedicated to being there for others! I'm so glad that Tammy and her folks were there for me, checking in, thinking outside the box and pushing for growth and change in our communities - Rona Chong, Padawan // SRE Apprentice.

Slide 11

Slide 11 text

SRE Apprentice @tambryantbutow SRE Teacher Padawan - Rona Chong Jedi - Tammy Bryant Butow Can feel disturbances in the force and use skills effectively to achieve desired results. Has a desire to learn everything she can about the force.

Slide 12

Slide 12 text

SRE Teacher @tambryantbutow What makes someone suitable? ● Set your Apprentice up for success ● Be a mentor ● Be a source of emotional support ● Be an advocate ● Check-in ● Think outside the box ● Push for growth and change in our communities ● 2+ years SRE experience

Slide 13

Slide 13 text

Psychological Safety @tambryantbutow What is the S.A.F.E.T.Y model and how can it help me teach my apprentice? Self Assessment: https://academy-bbl.com/safety-assessment/

Slide 14

Slide 14 text

SRE Apprentice @tambryantbutow Padawan An approach to learning to observe and understand failures in Production Learn by training Learn by shadowing Learn by practice Learn by community * Mandalorian-style apprentice program featuring Grogu aka Baby Yoda and a Jedi trainer

Slide 15

Slide 15 text

SRE Apprentice Learn by training: start with demo apps & lab environments before production @tambryantbutow

Slide 16

Slide 16 text

burningion/ecommerce-observability @tambryantbutow

Slide 17

Slide 17 text

localhost:3000/cart?variant_id=5 ads service code is throwing errors from a specific erb file @tambryantbutow

Slide 18

Slide 18 text

discounts-service/discounts.py discounts service code is causing performance issues @tambryantbutow

Slide 19

Slide 19 text

gremlin/microservices-demo @tambryantbutow

Slide 20

Slide 20 text

Architecture 12 Services - What matters most to our customers? @tammyxbryant

Slide 21

Slide 21 text

What can we remove from the critical path? @tammyxbryant Architecture

Slide 22

Slide 22 text

@tambryantbutow OH: “Getting out of the critical path is a good thing” gtk

Slide 23

Slide 23 text

Service Not Found Architecture @tammyxbryant Does blackholing a non-critical path service like the Recommendation Service cause unexpected failures for critical services like the Product Catalogue or Frontend?

Slide 24

Slide 24 text

Blackhole → Ads @tambryantbutow

Slide 25

Slide 25 text

@tambryantbutow

Slide 26

Slide 26 text

@tambryantbutow

Slide 27

Slide 27 text

@tambryantbutow Does blackholing a non-critical path service like the Ad Service result in graceful degradation of the customer experience?

Slide 28

Slide 28 text

@tambryantbutow

Slide 29

Slide 29 text

Graceful Degradation @tambryantbutow Yes, our experiment was successful and our results were what we expected them to be.

Slide 30

Slide 30 text

SRE Apprentice Learn by shadowing: on-call @tambryantbutow

Slide 31

Slide 31 text

twitter.com/tambryantbutow/status/1387778372688830465 @tambryantbutow

Slide 32

Slide 32 text

twitter.com/tambryantbutow/status/1387778372688830465 @tambryantbutow

Slide 33

Slide 33 text

@tambryantbutow SRE Apprentice on-call shadow role On-Call Rotation Example

Slide 34

Slide 34 text

@secnerdette @tambryantbutow

Slide 35

Slide 35 text

Partner with your local community to teach the required skills engineers need @tambryantbutow holbertonschool.com

Slide 36

Slide 36 text

SRE Apprentice Learn by practice: code + code reviews @tambryantbutow

Slide 37

Slide 37 text

Partner with your local community to teach the required skills engineers need @tambryantbutow

Slide 38

Slide 38 text

@tambryantbutow ✉

Slide 39

Slide 39 text

Craft a daily email using python to send via cron job that includes key metrics; disk capacity, availability, latency etc. @tambryantbutow

Slide 40

Slide 40 text

Build a web page that uses the PagerDuty API to display the most common alerts ordered by frequency - this enables us to use the Pareto Principle to improve system reliability. @tambryantbutow

Slide 41

Slide 41 text

SRE Apprentice Learn by community: peer mentoring, lunch and learns, study hall, AMAs & slack @tambryantbutow

Slide 42

Slide 42 text

@tambryantbutow 7000+ engineers # learning # donut-intros # questions gremlin.com/slack

Slide 43

Slide 43 text

SRE Apprentices share tips for mentors @tambryantbutow

Slide 44

Slide 44 text

@tambryantbutow Tips for SRE Teachers: 1. Meet your apprentice where they are and help them get where they need to be. 2. Don't limit yourself as a mentor: Your apprentice's role isn't limited to code, so why should your mentorship? 3. Don't forget to establish open lanes of communication.

Slide 45

Slide 45 text

@tambryantbutow funretrospectives.com/draw-your-feelings/

Slide 46

Slide 46 text

SRE Apprentices share tips for future padawans @tambryantbutow

Slide 47

Slide 47 text

@tambryantbutow . Tips for Apprentices: 1. One of the secrets to being effective is being willing to ask questions. 2. Don't just accept a task. 3. Unrealistic expectations happen all the time in software. 4. Invest in your relationships with your colleagues

Slide 48

Slide 48 text

Thank you @tambryantbutow Get some chaos community stickers based on the path you’ll be taking: Padawan: gremlin.com/talk/padawan Jedi: gremlin.com/talk/jedi