Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRECont16

 SRECont16

SRECon16の気になったセッションの共有資料

tsuyoshi nakamura

August 31, 2016
Tweet

More Decks by tsuyoshi nakamura

Other Decks in Technology

Transcript

  1. SREcon
    2016-08-26
    ࣾ಺ษڧձ
    Tsuyoshi Nakamura

    View Slide

  2. https://www.usenix.org/conference/srecon16

    View Slide

  3. ษڧձͰॳΊͯ஌Γɺ֤
    SessionͷಈըɺεϥΠυΛؤ
    ுͬͯ௥͍͔͚ͨ

    View Slide

  4. Agenda
    1.  Learn about other companies of SRE
    1.  In case of Microsoft Azure SRE
    2.  In case of New Relic
    3.  In case of Pinterest
    4.  In case of Netflix
    2.  ࠷ޙ·ͱΊతͳ

    View Slide

  5. In case of Microsoft Azure SRE
    Caskey L. Dickson and Jake Welch
    https://www.usenix.org/sites/default/files/conference/protected-files/srecon16_slides_welch.pdf
    https://www.usenix.org/conference/srecon16/program/presentation/dickson

    View Slide

  6. Service Roast
    ໨తɿܽ఺ͩͬͨΓɺઃܭ্ͷߟྀ࿙Εɺօ͕͢Ͱʹ஌ͬͯΔϓϩμΫτ
    ͷ՝୊Λཧղ͠ɺ໌֬ʹࣔ͢
    Dev͔Βࡂ֐෮چ·ͰαʔϏεશମͷϥΠϑαΠΫϧΛ೺Ѳ
    վળ͢΂͖఺Λ͋͛ɺܧଓతʹվળͷҝΛଓ͚Δ

    View Slide

  7. Why do?
    •  Builds relationships and trust between the teams
    •  SRE learns about the service
    •  Dramatically speeds up ‘newbie to expert’ process
    •  Ճ଎౓తʹproductΛ੒௕ͤ͞Δ
    •  Exposes details that otherwise would be difficult (or painful) to learn of
    •  ൿ఻ͷλϨԽͷഉআ
    •  Creates a shared backlog of improvements
    •  ՝୊ͷڞ༗

    View Slide

  8. Tone
    •  Not an attack on the service
    •  Not a judgment of past choices
    •  Focus on ‘How’ questions not ‘Why’ questions
    •  Why’s can be seen as judgmental
    •  Every participant must understand this
    •  Managing emotions is critical to a safe discussion environment

    View Slide

  9. Tone
    •  Not an attack on the service
    •  Not a judgment of past choices
    •  Focus on ‘How’ questions not ‘Why’ questions
    •  Why’s can be seen as judgmental
    •  Every participant must understand this
    •  Managing emotions is critical to a safe discussion environment

    View Slide

  10. In case of New Relic
    Alice Goldfuss
    https://www.usenix.org/conference/srecon16/program/presentation/goldfuss
    https://www.usenix.org/sites/default/files/conference/protected-files/srecon16_slides_goldfuss_0.pdf

    View Slide

  11. View Slide

  12. Summary
    •  ੓෎΍܉ͷΠϯσϯτରԠϓϩηε͔Βద༻ͨ͠νʔϜ
    •  Incident Command SystemͷԠ༻
    •  ΞϝϦΞͩͱ݁ߏ༗໊Β͍͠
    •  ͦΕͧΕͷ໾ׂ͕໌֬ʹఆٛ
    •  શମӨڹΛಛʹߟྀ͞Ε͍ͯΔ

    View Slide

  13. In case of Pinterest
    Ernie Souhrada
    https://www.usenix.org/conference/srecon16/program/presentation/souhrada
    https://www.usenix.org/sites/default/files/conference/protected-files/srecon16_slides_souhrada.pdf

    View Slide

  14. View Slide

  15. History
    •  ࠓͱͳͬͯ͸AWSʹ100% hosted͍ͯ͠Δ͕ɺҎલ͸ΦϯϓϨϛε؀ڥ
    •  CloudαʔϏε͕ීٴ͢Δલͷ࿩
    •  1. Individual servers matter.
    •  2. Failure is expensive, so it must be prevented.
    •  3. Capacity planning can make or break you.
    •  4. Sometimes your destiny is still outside your control.
    Operational Materialism
    ӡ༻෺࣭ओٛʁ

    View Slide

  16. Now
    •  1. Cloud servers can, and do, fail at any time, for any reason.
    •  2. Trying to prevent this server failure is an endless source of suffering
    for SREs and DBAs alike.
    •  Trying to prevent server failure leads only to suffering
    •  3. Accepting the impermanence of our servers, we should design
    systems that are failure-resilient, not failure-resistant.
    •  Cloud-based servers can fail at any time, for any reason.
    •  Automated replacement
    •  Configuration management tools
    •  4. We can break the cycle of suffering and create a better experience for
    end users, internal customers, and colleagues
    Operational Buddhism
    ෹ͷΑ͏ͳ੩͔ͳ৺ͰݟकΓଓ͚Δʁw

    View Slide

  17. In case of Netflix
    Jonah Horowitz
    https://www.usenix.org/conference/srecon16/program/presentation/horowitz
    https://www.usenix.org/sites/default/files/conference/protected-files/srecon16_slides_horowitz.pdf

    View Slide

  18. View Slide

  19. topic
    •  190ΧࠃͰαʔϏεల։͍ͯ͠ΔͷʹSRE͸5໊ʂʁ
    •  SREs are expensive & hard to find
    •  Freedom & Responsibility

    View Slide

  20. View Slide

  21. ࠷ޙ·ͱΊతͳ
    ²  ·͊ձࣾʹΑͬͯroleͷ෦෼Ͱҧ͍͸౰વ͋Δ
    ²  DevOpsͷ࣌Ͱ΋ײ͚ͨ͡Ͳɺ݁ہαʔϏεΛεϐʔυײ΋ͬͯάϩʔε্͍ͤͯ͘͞Ͱ
    Ͳ͏ͯ͠΋ΆͯΜώοτ͕ੜ·Εͯ͠·͏
    ²  ͦͷΆͯΜώοτΛͲ͏΍ͬͯर͍͔ͬͯ͘ʹ࢝·ͬͯΔؾ͕͢Δ
    ²  νʔϜΛ༏ઌͯ͠ಈ͍͍ͯΕ͹ࣗવͱSREతͳλεΫΛ͜ͳ͍ͯ͠ΔࣄʹͳΔͱࢥ͏͚Ͳ
    ²  ͦͷ෦෼Λ͔ͬ͠ΓධՁ͠·͠ΐ͏ͱͳͬͯSREతͳλά͕෇͍ͨͱࢥ͏෦෼͕͋Δ
    ²  ٕज़తͳ΋ͷΑΓ΋Ή͠ΖϚΠϯυతͳ΋ͷ͕ॏཁʁʂ
    ²  PMతͳཁૉ΋৭ʑͱೖͬͯΔؾ͕͢Δ
    ²  “SRE should not be a Servant”
    ²  ษڧʹͳΔ৘ใ
    ²  https://github.com/dastergon/awesome-sre/blob/master/README.md

    View Slide