Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

Shopifyという世界規模で成長している会社で、SRE、Production Engineeringのプラクティスがどのように育ってきたか、二つの観点でお話しします。

一つ目は会社全体の歴史として。どういったタイミングでProduction Engineeringモデルが導入され、その中で現在私が所属するResiliencyというSREチームが何を担っているか。





May 18, 2022

Other Decks in Programming


  1. Yuta Miyama, Apr 15th 2022
    SRE bridge the gap:
    Feature development to Core API

    View Slide

  2. Who I am
    Yuta Miyama
    Student - Entrepreneur

    Maker - Self-taught programmer

    Now - Around-the-world migrants

    View Slide

  3. Around the world migrants
    2010 - Started programming career in Japan

    2016 - Moved to Berlin

    2020 - Moved to Toronto

    2022 - Back to Japan
    Photo by Amy Humphries on Unsplash

    View Slide

  4. What I want to talk about
    Introduce you to Shopify’s production engineering practice

    Encourage the cross discipline moves between feature development and
    production engineering

    View Slide

  5. Shopify’s history
    2004 - https://snowdevil.ca

    2006 - Shopify was born on Rails 1.x

    2022 - Becoming a “Retail Operating System”
    Size - $175.4 billion GMV (Gross Merchandise Volume) in 2021

    Entrepreneurship - $3 billion in “Shopify Capital” funding since 2016

    Global - “Shopify Market” Cross border commerce from day one

    View Slide

  6. Production Engineering at Shopify
    Misalignments among distinct teams

    Self-service toolings for feature dev, esp.
    monitoring and alerting

    Infra components ownership centralized

    3x deploy speed and frequency (150 / day)
    2015 ~ 2016 - Shopify adapted Production Engineering model
    Feature dev


    Feature dev
    Self service
    Monitoring / Alerting

    View Slide

  7. Incident Manager On Call a.k.a IMOC
    Core incident handling

    Follow the sun model

    Deep dive into “cracks” of distributed systems
    Edge, Ingress, Routing, Application, …
    2020 - The need for specialized team on Resiliency
    Resiliency at Production Engineering
    Photo by Alexas_Fotos on Unsplash

    View Slide

  8. — Tobi Lütke, CEO in internal essay on why we optimize for lash sales
    “We learned to absorb these shocks and
    become stronger as a result. [..] The
    school of hard knocks has taught us well.”

    View Slide

  9. Complexities of Shopify
    Highly dynamic tra
    ff i

    Flashsales / bots

    Highly con
    gurable shops


    API endpoints (Headless, ...

    View Slide

  10. Semian
    Load Shedder
    Toxiproxy / Game day
    Taming the large distributed systems
    Photo by Omar Flores on Unsplash

    View Slide

  11. Culture and process
    Follow the sun model also applies to Root
    Cause Analysis

    Autonomy based on “trust batteries”

    Lean on ChatOps enabling async learning
    Photo by Jay Heike on Unsplash

    View Slide

  12. Developing a “Journey Map”
    Observing “three di
    erent paths” for ICs
    1. Feature dev

    2. Core API maker

    3. SRE

    The analogy to “Swordsman”
    Photo by Javier Allegue Barros on Unsplash

    View Slide

  13. Feature development teams
    Deliver high impact product features to the merchants quickly

    Aim -> Scope -> Execute

    “How can we iterate quickly, so that we can learn?”

    ϓϩμΫτ։ൃ͕ओઓ৔ ʮϦʔϯʯ


    Photo by Krys Amon on Unsplash

    View Slide

  14. Core API makers
    Long term bets on fundamental components

    1. Backbones of web application architecture

    2. Investing on “Commerce Primitive” components


    Photo by Jonny Gios on Unsplash

    View Slide

  15. SRE
    We connect dots when distributed system fails

    • IMOC

    • Investigate on the “seams” of running system

    • Collaborate / communicate to drive resolution on

    Photo purchased from iStock

    View Slide

  16. Multiplication brings value
    App dev and SRE

    • Brings the high velocity project scoping

    • Distributed system 101

    Core API dev and App dev

    • User and Maker feedback

    Core API dev and SRE

    • High-level overview v.s. investing on your core interests

    View Slide

  17. We are all one team
    Growth brings specialization and operational e
    ff i

    Imagine the dysfunctional feedback loop:
    • Highly scalable system without the user growth

    • Growing features without resiliency toolkit

    • Exponential domain onboarding cost without
    simple interface to Core API
    Photo by Kier In Sight on Unsplash

    View Slide

  18. Chaos Engineer your org
    Hybrid (bridging) developer can disrupt specialization
    • Early adaption is quicker and better than an

    • It's easily adaptable, since the underlying failure is
    common across multiple applications

    • Usually IC has appetite for resiliency toolkits

    More bridging developers leads to organic early planning:
    a key to both speed and quality
    Photo by Olivier Guillard on Unsplash

    View Slide

  19. ྲྀಈੑͷ୲อ͕ɺ

    Shopify's Jungle Gym
    Feature Development, Core API architects, SREs

    View Slide

  20. What’s next?
    Shopify’s attracting talents from all over the world.
    • APAC is growing strong!

    • We embrace fully distributed environment

    Develop products that changes livelihood of millions of entrepreneurs
    • Huge potential in the cross border commerce (my former team)

    Contribute to one of the most powerful web app stack
    • Ruby, (not only) Rails, MySQL (KateSQL), k8s

    View Slide

  21. Thank you!

    View Slide

  22. Bonus track - How hard was the transition?
    Shopify managers accepts its “Jungle Gym”

    1. Charge your “trust battery”

    2. Look for opportunities

    3. Probe with the managers

    View Slide