Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

Shopifyという世界規模で成長している会社で、SRE、Production Engineeringのプラクティスがどのように育ってきたか、二つの観点でお話しします。

一つ目は会社全体の歴史として。どういったタイミングでProduction Engineeringモデルが導入され、その中で現在私が所属するResiliencyというSREチームが何を担っているか。

二つ目は、一ソフトウェアエンジニアのキャリア展開として。会社が成長する際に、いかに「分野を超えた積極的なポジション替え」が組織と個人、双方の成長に役立つのかをお話しします。

会社が大きくなるにつれて役割分担と新陳代謝が進み、元々一つのチームがもっていた「システム全体の知識と知恵」が細分化されていきます。SREがそれに対するアンチテーゼだとすると、一個人はどのように振る舞うことでこれを上手に活かし、レベルアップできるのでしょうか。

新参者のSREですが、みなさんと一緒に考えてみたいと思います。

kenzan100

May 18, 2022
Tweet

Other Decks in Programming

Transcript

  1. Yuta Miyama, Apr 15th 2022
    SRE bridge the gap:
    Feature development to Core API
    ػೳ։ൃνʔϜͱίΞAPIνʔϜͷՍ͚ڮͱͯ͠ͷSRE

    View Slide

  2. Who I am
    Yuta Miyama
    Student - Entrepreneur

    Maker - Self-taught programmer

    Now - Around-the-world migrants

    View Slide

  3. Around the world migrants
    2010 - Started programming career in Japan

    2016 - Moved to Berlin

    2020 - Moved to Toronto

    2022 - Back to Japan
    Photo by Amy Humphries on Unsplash

    View Slide

  4. What I want to talk about
    Introduce you to Shopify’s production engineering practice
    ShopifyͷϓϩμΫγϣϯΤϯδχΞ૊৫ͷ঺հ

    Encourage the cross discipline moves between feature development and
    production engineering
    ৬ೳΛ·͙ͨνʔϜସ͑Ͱɺ͍͔ʹ૊৫ͱݸਓͷ੒௕͕ଅ͞ΕΔ͔

    View Slide

  5. Shopify’s history
    2004 - https://snowdevil.ca

    2006 - Shopify was born on Rails 1.x

    2022 - Becoming a “Retail Operating System”
    Size - $175.4 billion GMV (Gross Merchandise Volume) in 2021

    Entrepreneurship - $3 billion in “Shopify Capital” funding since 2016

    Global - “Shopify Market” Cross border commerce from day one

    View Slide

  6. Production Engineering at Shopify
    Problem
    Misalignments among distinct teams

    Outcome
    Self-service toolings for feature dev, esp.
    monitoring and alerting

    Infra components ownership centralized

    3x deploy speed and frequency (150 / day)
    2015 ~ 2016 - Shopify adapted Production Engineering model
    Feature dev
    Scale

    Monitor

    Maintenance
    Feature dev
    Prod
    Eng
    Self service
    Monitoring / Alerting
    Next-gen
    Infrastructure

    View Slide

  7. Incident Manager On Call a.k.a IMOC
    Core incident handling

    Follow the sun model

    Deep dive into “cracks” of distributed systems
    Edge, Ingress, Routing, Application, …
    2020 - The need for specialized team on Resiliency
    Resiliency at Production Engineering
    Photo by Alexas_Fotos on Unsplash

    View Slide

  8. — Tobi Lütke, CEO in internal essay on why we optimize for lash sales
    “We learned to absorb these shocks and
    become stronger as a result. [..] The
    school of hard knocks has taught us well.”
    https://speakerdeck.com/sirupsen/goto-copenhagen-2017-shopifys-architecture-to-handle-80k-rps-sales?slide=3

    View Slide

  9. Complexities of Shopify
    Highly dynamic tra
    ff i
    c
    BFCM

    Flashsales / bots

    Highly con
    fi
    gurable shops
    Liquid

    Script

    API endpoints (Headless, ...
    https://shopify.engineering/cloud-load-modular-code-shopify-2022

    View Slide

  10. Semian
    Load Shedder
    Toxiproxy / Game day
    Taming the large distributed systems
    Photo by Omar Flores on Unsplash

    View Slide

  11. Culture and process
    Follow the sun model also applies to Root
    Cause Analysis

    Autonomy based on “trust batteries”

    Lean on ChatOps enabling async learning
    Photo by Jay Heike on Unsplash

    View Slide

  12. Developing a “Journey Map”
    Observing “three di
    ff
    erent paths” for ICs
    1. Feature dev

    2. Core API maker

    3. SRE

    The analogy to “Swordsman”
    Photo by Javier Allegue Barros on Unsplash

    View Slide

  13. Feature development teams
    Deliver high impact product features to the merchants quickly

    Aim -> Scope -> Execute

    “How can we iterate quickly, so that we can learn?”

    ϓϩμΫτ։ൃ͕ओઓ৔ ʮϦʔϯʯ

    ϚʔέοτϑΟοτ·ͰɺϦιʔε͕ݶఆతͳঢ়ଶͰૣ͘ճ͢

    ੒ख़ͭͭ͋͠ΔϓϩμΫτʹରͯ͠ɺ෇ՃՁ஋Λఏڙ͢Δ
    Photo by Krys Amon on Unsplash

    View Slide

  14. Core API makers
    Long term bets on fundamental components

    1. Backbones of web application architecture

    2. Investing on “Commerce Primitive” components

    υϝΠϯΤΩεύʔτ

    ͦͷڵຯ෼໺Ͱ্Γ٧Ίͨਓͨͪ
    Photo by Jonny Gios on Unsplash

    View Slide

  15. SRE
    We connect dots when distributed system fails

    • IMOC

    • Investigate on the “seams” of running system

    • Collaborate / communicate to drive resolution on
    “cracks”

    ෼ࢄγεςϜͷࣦഊύλʔϯʹର͢ΔΤΩεύʔτ
    Photo purchased from iStock

    View Slide

  16. Multiplication brings value
    App dev and SRE

    • Brings the high velocity project scoping

    • Distributed system 101

    Core API dev and App dev

    • User and Maker feedback

    Core API dev and SRE

    • High-level overview v.s. investing on your core interests

    View Slide

  17. We are all one team
    Growth brings specialization and operational e
    ff i
    ciency

    Imagine the dysfunctional feedback loop:
    • Highly scalable system without the user growth

    • Growing features without resiliency toolkit

    • Exponential domain onboarding cost without
    simple interface to Core API
    Photo by Kier In Sight on Unsplash

    View Slide

  18. Chaos Engineer your org
    Hybrid (bridging) developer can disrupt specialization
    • Early adaption is quicker and better than an
    afterthought

    • It's easily adaptable, since the underlying failure is
    common across multiple applications

    • Usually IC has appetite for resiliency toolkits

    More bridging developers leads to organic early planning:
    a key to both speed and quality
    Photo by Olivier Guillard on Unsplash

    View Slide

  19. ྲྀಈੑͷ୲อ͕ɺ


    ձࣾͱݸਓͷڝ૪ྗʹߩݙ͢Δ
    Shopify's Jungle Gym
    Feature Development, Core API architects, SREs

    View Slide

  20. What’s next?
    Shopify’s attracting talents from all over the world.
    • APAC is growing strong!

    • We embrace fully distributed environment

    Develop products that changes livelihood of millions of entrepreneurs
    • Huge potential in the cross border commerce (my former team)

    Contribute to one of the most powerful web app stack
    • Ruby, (not only) Rails, MySQL (KateSQL), k8s

    View Slide

  21. Thank you!
    @kenzan100
    @jp_miyama

    View Slide

  22. Bonus track - How hard was the transition?
    Shopify managers accepts its “Jungle Gym”

    1. Charge your “trust battery”

    2. Look for opportunities

    3. Probe with the managers

    View Slide