Slide 1

Slide 1 text

Yuta Miyama, Apr 15th 2022 SRE bridge the gap: Feature development to Core API ػೳ։ൃνʔϜͱίΞAPIνʔϜͷՍ͚ڮͱͯ͠ͷSRE

Slide 2

Slide 2 text

Who I am Yuta Miyama Student - Entrepreneur Maker - Self-taught programmer Now - Around-the-world migrants

Slide 3

Slide 3 text

Around the world migrants 2010 - Started programming career in Japan 2016 - Moved to Berlin 2020 - Moved to Toronto 2022 - Back to Japan Photo by Amy Humphries on Unsplash

Slide 4

Slide 4 text

What I want to talk about Introduce you to Shopify’s production engineering practice ShopifyͷϓϩμΫγϣϯΤϯδχΞ૊৫ͷ঺հ Encourage the cross discipline moves between feature development and production engineering ৬ೳΛ·͙ͨνʔϜସ͑Ͱɺ͍͔ʹ૊৫ͱݸਓͷ੒௕͕ଅ͞ΕΔ͔

Slide 5

Slide 5 text

Shopify’s history 2004 - 2006 - Shopify was born on Rails 1.x 2022 - Becoming a “Retail Operating System” Size - $175.4 billion GMV (Gross Merchandise Volume) in 2021 Entrepreneurship - $3 billion in “Shopify Capital” funding since 2016 Global - “Shopify Market” Cross border commerce from day one

Slide 6

Slide 6 text

Production Engineering at Shopify Problem Misalignments among distinct teams Outcome Self-service toolings for feature dev, esp. monitoring and alerting Infra components ownership centralized 3x deploy speed and frequency (150 / day) 2015 ~ 2016 - Shopify adapted Production Engineering model Feature dev Scale Monitor Maintenance Feature dev Prod Eng Self service Monitoring / Alerting Next-gen Infrastructure

Slide 7

Slide 7 text

Incident Manager On Call a.k.a IMOC Core incident handling Follow the sun model Deep dive into “cracks” of distributed systems Edge, Ingress, Routing, Application, … 2020 - The need for specialized team on Resiliency Resiliency at Production Engineering Photo by Alexas_Fotos on Unsplash

Slide 8

Slide 8 text

— Tobi Lütke, CEO in internal essay on why we optimize for lash sales “We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.”

Slide 9

Slide 9 text

Complexities of Shopify Highly dynamic tra ff i c BFCM Flashsales / bots Highly con fi gurable shops Liquid Script API endpoints (Headless, ...

Slide 10

Slide 10 text

Semian Load Shedder Toxiproxy / Game day Taming the large distributed systems Photo by Omar Flores on Unsplash

Slide 11

Slide 11 text

Culture and process Follow the sun model also applies to Root Cause Analysis Autonomy based on “trust batteries” Lean on ChatOps enabling async learning Photo by Jay Heike on Unsplash

Slide 12

Slide 12 text

Developing a “Journey Map” Observing “three di ff erent paths” for ICs 1. Feature dev 2. Core API maker 3. SRE The analogy to “Swordsman” Photo by Javier Allegue Barros on Unsplash

Slide 13

Slide 13 text

Feature development teams Deliver high impact product features to the merchants quickly Aim -> Scope -> Execute “How can we iterate quickly, so that we can learn?” ϓϩμΫτ։ൃ͕ओઓ৔ ʮϦʔϯʯ ϚʔέοτϑΟοτ·ͰɺϦιʔε͕ݶఆతͳঢ়ଶͰૣ͘ճ͢ ੒ख़ͭͭ͋͠ΔϓϩμΫτʹରͯ͠ɺ෇ՃՁ஋Λఏڙ͢Δ Photo by Krys Amon on Unsplash

Slide 14

Slide 14 text

Core API makers Long term bets on fundamental components 1. Backbones of web application architecture 2. Investing on “Commerce Primitive” components υϝΠϯΤΩεύʔτ ͦͷڵຯ෼໺Ͱ্Γ٧Ίͨਓͨͪ Photo by Jonny Gios on Unsplash

Slide 15

Slide 15 text

SRE We connect dots when distributed system fails • IMOC • Investigate on the “seams” of running system • Collaborate / communicate to drive resolution on “cracks” ෼ࢄγεςϜͷࣦഊύλʔϯʹର͢ΔΤΩεύʔτ Photo purchased from iStock

Slide 16

Slide 16 text

Multiplication brings value App dev and SRE • Brings the high velocity project scoping • Distributed system 101 Core API dev and App dev • User and Maker feedback Core API dev and SRE • High-level overview v.s. investing on your core interests

Slide 17

Slide 17 text

We are all one team Growth brings specialization and operational e ff i ciency Imagine the dysfunctional feedback loop: • Highly scalable system without the user growth • Growing features without resiliency toolkit • Exponential domain onboarding cost without simple interface to Core API Photo by Kier In Sight on Unsplash

Slide 18

Slide 18 text

Chaos Engineer your org Hybrid (bridging) developer can disrupt specialization • Early adaption is quicker and better than an afterthought • It's easily adaptable, since the underlying failure is common across multiple applications • Usually IC has appetite for resiliency toolkits More bridging developers leads to organic early planning: a key to both speed and quality Photo by Olivier Guillard on Unsplash

Slide 19

Slide 19 text

ྲྀಈੑͷ୲อ͕ɺ ձࣾͱݸਓͷڝ૪ྗʹߩݙ͢Δ Shopify's Jungle Gym Feature Development, Core API architects, SREs

Slide 20

Slide 20 text

What’s next? Shopify’s attracting talents from all over the world. • APAC is growing strong! • We embrace fully distributed environment Develop products that changes livelihood of millions of entrepreneurs • Huge potential in the cross border commerce (my former team) Contribute to one of the most powerful web app stack • Ruby, (not only) Rails, MySQL (KateSQL), k8s

Slide 21

Slide 21 text

Thank you! @kenzan100 @jp_miyama

Slide 22

Slide 22 text

Bonus track - How hard was the transition? Shopify managers accepts its “Jungle Gym” 1. Charge your “trust battery” 2. Look for opportunities 3. Probe with the managers