Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

Shopifyという世界規模で成長している会社で、SRE、Production Engineeringのプラクティスがどのように育ってきたか、二つの観点でお話しします。

一つ目は会社全体の歴史として。どういったタイミングでProduction Engineeringモデルが導入され、その中で現在私が所属するResiliencyというSREチームが何を担っているか。






May 18, 2022

Other Decks in Programming


  1. Yuta Miyama, Apr 15th 2022 SRE bridge the gap: Feature

    development to Core API ػೳ։ൃνʔϜͱίΞAPIνʔϜͷՍ͚ڮͱͯ͠ͷSRE
  2. Who I am Yuta Miyama Student - Entrepreneur Maker -

    Self-taught programmer Now - Around-the-world migrants
  3. Around the world migrants 2010 - Started programming career in

    Japan 2016 - Moved to Berlin 2020 - Moved to Toronto 2022 - Back to Japan Photo by Amy Humphries on Unsplash
  4. What I want to talk about Introduce you to Shopify’s

    production engineering practice ShopifyͷϓϩμΫγϣϯΤϯδχΞ૊৫ͷ঺հ Encourage the cross discipline moves between feature development and production engineering ৬ೳΛ·͙ͨνʔϜସ͑Ͱɺ͍͔ʹ૊৫ͱݸਓͷ੒௕͕ଅ͞ΕΔ͔
  5. Shopify’s history 2004 - https://snowdevil.ca 2006 - Shopify was born

    on Rails 1.x 2022 - Becoming a “Retail Operating System” Size - $175.4 billion GMV (Gross Merchandise Volume) in 2021 Entrepreneurship - $3 billion in “Shopify Capital” funding since 2016 Global - “Shopify Market” Cross border commerce from day one
  6. Production Engineering at Shopify Problem Misalignments among distinct teams Outcome

    Self-service toolings for feature dev, esp. monitoring and alerting Infra components ownership centralized 3x deploy speed and frequency (150 / day) 2015 ~ 2016 - Shopify adapted Production Engineering model Feature dev Scale Monitor Maintenance Feature dev Prod Eng Self service Monitoring / Alerting Next-gen Infrastructure
  7. Incident Manager On Call a.k.a IMOC Core incident handling Follow

    the sun model Deep dive into “cracks” of distributed systems Edge, Ingress, Routing, Application, … 2020 - The need for specialized team on Resiliency Resiliency at Production Engineering Photo by Alexas_Fotos on Unsplash
  8. — Tobi Lütke, CEO in internal essay on why we

    optimize for lash sales “We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.” https://speakerdeck.com/sirupsen/goto-copenhagen-2017-shopifys-architecture-to-handle-80k-rps-sales?slide=3
  9. Complexities of Shopify Highly dynamic tra ff i c BFCM

    Flashsales / bots Highly con fi gurable shops Liquid Script API endpoints (Headless, ... https://shopify.engineering/cloud-load-modular-code-shopify-2022
  10. Semian Load Shedder Toxiproxy / Game day Taming the large

    distributed systems Photo by Omar Flores on Unsplash
  11. Culture and process Follow the sun model also applies to

    Root Cause Analysis Autonomy based on “trust batteries” Lean on ChatOps enabling async learning Photo by Jay Heike on Unsplash
  12. Developing a “Journey Map” Observing “three di ff erent paths”

    for ICs 1. Feature dev 2. Core API maker 3. SRE The analogy to “Swordsman” Photo by Javier Allegue Barros on Unsplash
  13. Feature development teams Deliver high impact product features to the

    merchants quickly Aim -> Scope -> Execute “How can we iterate quickly, so that we can learn?” ϓϩμΫτ։ൃ͕ओઓ৔ ʮϦʔϯʯ ϚʔέοτϑΟοτ·ͰɺϦιʔε͕ݶఆతͳঢ়ଶͰૣ͘ճ͢ ੒ख़ͭͭ͋͠ΔϓϩμΫτʹରͯ͠ɺ෇ՃՁ஋Λఏڙ͢Δ Photo by Krys Amon on Unsplash
  14. Core API makers Long term bets on fundamental components 1.

    Backbones of web application architecture 2. Investing on “Commerce Primitive” components υϝΠϯΤΩεύʔτ ͦͷڵຯ෼໺Ͱ্Γ٧Ίͨਓͨͪ Photo by Jonny Gios on Unsplash
  15. SRE We connect dots when distributed system fails • IMOC

    • Investigate on the “seams” of running system • Collaborate / communicate to drive resolution on “cracks” ෼ࢄγεςϜͷࣦഊύλʔϯʹର͢ΔΤΩεύʔτ Photo purchased from iStock
  16. Multiplication brings value App dev and SRE • Brings the

    high velocity project scoping • Distributed system 101 Core API dev and App dev • User and Maker feedback Core API dev and SRE • High-level overview v.s. investing on your core interests
  17. We are all one team Growth brings specialization and operational

    e ff i ciency Imagine the dysfunctional feedback loop: • Highly scalable system without the user growth • Growing features without resiliency toolkit • Exponential domain onboarding cost without simple interface to Core API Photo by Kier In Sight on Unsplash
  18. Chaos Engineer your org Hybrid (bridging) developer can disrupt specialization

    • Early adaption is quicker and better than an afterthought • It's easily adaptable, since the underlying failure is common across multiple applications • Usually IC has appetite for resiliency toolkits More bridging developers leads to organic early planning: a key to both speed and quality Photo by Olivier Guillard on Unsplash
  19. ྲྀಈੑͷ୲อ͕ɺ ձࣾͱݸਓͷڝ૪ྗʹߩݙ͢Δ Shopify's Jungle Gym Feature Development, Core API architects,

  20. What’s next? Shopify’s attracting talents from all over the world.

    • APAC is growing strong! • We embrace fully distributed environment Develop products that changes livelihood of millions of entrepreneurs • Huge potential in the cross border commerce (my former team) Contribute to one of the most powerful web app stack • Ruby, (not only) Rails, MySQL (KateSQL), k8s
  21. Thank you! @kenzan100 @jp_miyama

  22. Bonus track - How hard was the transition? Shopify managers

    accepts its “Jungle Gym” 1. Charge your “trust battery” 2. Look for opportunities 3. Probe with the managers