SREcon19 Americas 参加レポート #srefukuoka / srecon19-americas-report

SREcon19 Americas 参加レポート #srefukuoka / srecon19-americas-report

0a98ad166f9cdf8d27d92c37438c6e9d?s=128

Manabu Matsuzaki

June 12, 2019
Tweet

Transcript

  1. SREcon19 Americas ࢀՃϨϙʔτ SRE meetup at Fukuoka vol2 2019/06/12 @matsumana

  2. About me • Nameɿ Manabu Matsuzaki • Work atɿ LINE

    Fukuoka Corporation • Roleɿ SRE • Twitterɿ @matsumana
  3. Agenda • ΧϯϑΝϨϯε֓ཁ • ࣸਅͰৼΓฦΔSREcon19 Americas • ͍͔ͭ͘ͷηογϣϯΛ͝঺հ

  4. ΧϯϑΝϨϯε֓ཁ

  5. ΧϯϑΝϨϯε֓ཁ • ΦϑΟγϟϧαΠτɿ
 https://www.usenix.org/conference/srecon19americas • ೔ఔɿ 2019/03/25~27 • ձ৔ɿ New

    York Marriott (Brooklyn, NewYork) • ηογϣϯ਺ɿ ໿50
 ʢࢿྉͱಈը͕ެ։͞Εͯ·͢ʣ
 https://www.usenix.org/conference/srecon19americas/program
  6. ΧϯϑΝϨϯε֓ཁ • ࢀՃऀ਺ɿ 646ਓ • AM: Americas • AP: Asia/Pacific

    • Europe/Middle East/Africa see also: https://www.usenix.org/conferences/byname/925
  7. ΧϯϑΝϨϯε֓ཁ • GREE͞ΜͷࢀՃϨϙʔτɿ
 https://labs.gree.jp/blog/2019/04/18053/

  8. ࣸਅͰৼΓฦΔ SREcon19 Americas

  9. ձ৔ͷ֎؍

  10. εϙϯαʔϒʔε

  11. Ωʔϊʔτ४උத What's the Difference Between DevOps and SRE?
 ʢhttps://www.youtube.com/watch?v=uTEL8Ff1Zvkʣ Ͱ͓ೃછΈͷLiz͞Μͷ࢟΋ʢࠓճͷOrganizerͰͨ͠ʣ

  12. ηογϣϯதͷϥΠϒࣈນ

  13. ே৯

  14. ϥϯν

  15. ٳܜ

  16. Ϩηϓγϣϯ ύʔςΟ

  17. ͍͔ͭ͘ͷηογϣϯΛ͝঺հ

  18. શମతͳॴײ • ٕज़తͳηογϣϯ͸͔ͳΓগͳ͍ • SREͱͯ͠ͷ࣮ફతͳϓϥΫςΟεͷηογϣϯ͕΄ͱΜͲ • ʮSLO,Error budget͸طʹಋೖ͍ͯ͠Δʯͱ͍͏ࢀՃऀ͸͔ͳΓଟ͔ͬͨ
 ʢηογϣϯ։࢝࣌ʹࢀՃऀʹڍखΛٻΊΔεϐʔΧʔ͕Կਓ͔͍ͨʣ •

    SRE͸ٕज़తͳ࢓ࣄ͸΋ͪΖΜɺSLOɺError budgetɺon-callͳͲͷSRE จԽΛ࡞Δͷ΋େ੾ͳ࢓ࣄͩͱࢥ͏ͷͰɺࢀՃͯ͠Α͔ͬͨͰ͢
  19. What Breaks Our Systems: A Taxonomy of Black Swans •

    ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/nolan-taxonomy • εϐʔΧʔ
 Laura Nolan, Slack • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/sre19amer_slides_nolan.pdf
  20. What Breaks Our Systems: A Taxonomy of Black Swans •

    Black swanͱ͸ʁ • ҟৗͳΠϕϯτ • ༧ଌ͢Δͷ͕೉͍͠ • γϏΞͳΠϯύΫτ • ͜ͷηογϣϯͰ͸ɺ͍͔ͭ͘ͷ࣮ࡍͷαʔϏεো֐Λྫʹͯ͠ɺ
 ͦͷΑ͏ͳো֐Λ๷͙ύλʔϯ͕঺հ͞Ε·ͨ͠ • ঺հ͞Εͨख๏Λಋೖͨ͠ͱͯ͠΋ɺ༧ଌ͕೉͍͠ҟৗͳΠϕϯτΛશͯ๷͙ͷ͸೉͍͠ ͱࢥ͍·͕͢ɺࢀߟʹ͸ͳΔͱࢥ͍·͢
  21. What Breaks Our Systems: A Taxonomy of Black Swans •

    black swanͷछྨͱकΓํ • Hitting limits • load and capacity testing • Monitoring • Spreading Slowness • Fail fast • Use dashboards • Thundering Herds • Plan and test
  22. What Breaks Our Systems: A Taxonomy of Black Swans •

    black swanͷछྨͱकΓํ • Automation interactions • controll • Cyberattacks • Smaller blast radius • Dependency problems • Layer and test
  23. Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/ nolan-loadbalancing

    • εϐʔΧʔ
 Laura Nolan, Slack
 Murali Suriar, Google • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_nolan-load-balancing.pdf
  24. Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • LBͷجຊΛ঺հ • DNS

    ϥ΢ϯυϩϏϯ • Proxyํࣜ • L2DSR • L3DSR • DNS geo ϩʔυόϥϯγϯά • ΫϥΠΞϯταΠυ ϩʔυόϥϯγϯά
  25. Aperture: A Non-Cooperative, Client-Side Load Balancing Algorithm • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/oanta

    • εϐʔΧʔ
 Ruben Oanta, Twitter • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_oanta.pdf
  26. Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • TwitterͰ։ൃ͞Ε͍ͯΔFinagle(Web Framework)ͷ࿩ •

    ͍͔ͭ͘ͷϩʔυόϥϯγϯάΞϧΰϦζϜ͕બ୒Մೳ • P2C • Aperture + Least Loaded • etc • ެࣜυΩϡϝϯτ
 https://twitter.github.io/finagle/guide/Clients.html#load- balancing
  27. Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • Aperture Load BalancersΛ࣮૷ͯ͠αʔόϦιʔεΛվળͨ͠

    • 78% reduction in standard deviation for requests/sec • 91% drop in aggregate connections (~280k to ~25k) • 75% fewer failures • ~20% reduction in latency at 99.9%tile • 20~25% less CPU used • Total GC time cut in half
  28. Tracing, Fast and Slow: Digging into and Improving Your Web

    Service's Performance • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/root • εϐʔΧʔ
 Lynn Root, Spotify • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_root.pdf
  29. Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • ෼ࢄτϨʔγϯάͷجຊΛઆ໌ • ZipkinͳͲΛ·ͩ࢖ͬͨࣄ͕ແ͍ਓʹ͸͓͢͢Ί

  30. What I Wish I Knew before Going On-call • ηογϣϯ֓ཁ


    https://www.usenix.org/conference/srecon19americas/presentation/shu • εϐʔΧʔ
 Chie Shu and Wenting Wang, Yelp • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ srecon19americas_slides_wang.pdf
  31. What I Wish I Knew before Going On-call • YelpͰͷon-callΦϯϘʔσΟϯάϓϩηε

    • ݸਓతʹ͜ͷηογϣϯ͕Ұ൪ྑ͔ͬͨ
  32. ॳΊͯͷon-callલʹ४උສ୺ʁ

  33. ͳͥʁ • Afraid of unknown situations • Lack of confidence

    • Poor understanding of systems • Lack of protocol • Afraid of asking for help • etc
  34. on-callͷޡղ • ͢΂ͯΛ஌͍ͬͯͳ͍ͱ͍͚ͳ͍
 ˠ No • શͯͷ໰୊Λࣗ෼ࣗ਎Ͱղܾ͠ͳ͍ͱ͍͚ͳ͍
 ˠ No •

    etc
  35. ΦϯϘʔσΟϯάϓϩηε ͷਖ਼͍͠໨ඪΛ ઃఆ͢Δඞཁ͕͋Δ

  36. • on-callʹର͢ΔඞཁҎ্ͷڪාײΛͳ͘͢ • ΑΓੜ࢈తͰޮ཰తͳon-call ΦϯϘʔσΟϯάͷ໨ඪ

  37. ͦͷͨΊʹ͸ τϨʔχϯάϓϩάϥϜ ࡞੒͕͓͢͢Ί

  38. • ΧϦΩϡϥϜΛ࡞Δ • ৘ใΛ٧ΊࠐΈ͗͢ͳ͍ • ΠϯτϩμΫγϣϯ • γϯϓϧͳਤ • γεςϜͷ֓ཁ

    • ԿΛ͍ͯ͠ΔγεςϜͳͷ͔ • Կʹґଘ͍ͯ͠Δͷ͔ τϨʔχϯάϓϩάϥϜͷ࡞Γํ
  39. • ΞϥʔτͷछྨΛઆ໌͢Δ • ରԠํ๏΋ॻ͘ • աڈͷϙετϞʔςϜΛ࢖ͬͯઆ໌͢Δͱྑ͍ • ඞཁͳπʔϧΛઆ໌͢Δ • ϞχλϦϯάπʔϧͳͲ

    • μογϡϘʔυͷݟํ΋આ໌͢Δ τϨʔχϯάϓϩάϥϜͷ࡞Γํ
  40. φϨοδͷڞ༗΋ඞཁ

  41. • աڈͷΠϯγσϯτ͕ͲͷΑ͏ʹղܾ͞Εͨͷ͔ • աڈͷϙετϞʔςϜ͔ΒֶͿ • ෳ਺ਓͰΠϯγσϯτγϛϡϨʔγϣϯΛߦ͏ • staging؀ڥͳͲͰ΍Δͱ҆શʹߦ͑Δ φϨοδڞ༗

  42. खॱॻ΋େ੾

  43. • ٕज़తͳ಺༰ • ΠϯύΫτධՁ • ࣮ߦίϚϯυ • ඇٕज़తͳ಺༰ • ֤ࣗͷ໾ׂ෼୲

    • ίϛϡχέʔγϣϯํ๏ • ΤεΧϨʔγϣϯϙϦγʔ खॱॻʹؚΊΔ΋ͷ
  44. Thank you :)