Slide 1

Slide 1 text

SREcon19 Americas ࢀՃϨϙʔτ SRE meetup at Fukuoka vol2 2019/06/12 @matsumana

Slide 2

Slide 2 text

About me • Nameɿ Manabu Matsuzaki • Work atɿ LINE Fukuoka Corporation • Roleɿ SRE • Twitterɿ @matsumana

Slide 3

Slide 3 text

Agenda • ΧϯϑΝϨϯε֓ཁ • ࣸਅͰৼΓฦΔSREcon19 Americas • ͍͔ͭ͘ͷηογϣϯΛ͝঺հ

Slide 4

Slide 4 text

ΧϯϑΝϨϯε֓ཁ

Slide 5

Slide 5 text

ΧϯϑΝϨϯε֓ཁ • ΦϑΟγϟϧαΠτɿ
 https://www.usenix.org/conference/srecon19americas • ೔ఔɿ 2019/03/25~27 • ձ৔ɿ New York Marriott (Brooklyn, NewYork) • ηογϣϯ਺ɿ ໿50
 ʢࢿྉͱಈը͕ެ։͞Εͯ·͢ʣ
 https://www.usenix.org/conference/srecon19americas/program

Slide 6

Slide 6 text

ΧϯϑΝϨϯε֓ཁ • ࢀՃऀ਺ɿ 646ਓ • AM: Americas • AP: Asia/Pacific • Europe/Middle East/Africa see also: https://www.usenix.org/conferences/byname/925

Slide 7

Slide 7 text

ΧϯϑΝϨϯε֓ཁ • GREE͞ΜͷࢀՃϨϙʔτɿ
 https://labs.gree.jp/blog/2019/04/18053/

Slide 8

Slide 8 text

ࣸਅͰৼΓฦΔ SREcon19 Americas

Slide 9

Slide 9 text

ձ৔ͷ֎؍

Slide 10

Slide 10 text

εϙϯαʔϒʔε

Slide 11

Slide 11 text

Ωʔϊʔτ४උத What's the Difference Between DevOps and SRE?
 ʢhttps://www.youtube.com/watch?v=uTEL8Ff1Zvkʣ Ͱ͓ೃછΈͷLiz͞Μͷ࢟΋ʢࠓճͷOrganizerͰͨ͠ʣ

Slide 12

Slide 12 text

ηογϣϯதͷϥΠϒࣈນ

Slide 13

Slide 13 text

ே৯

Slide 14

Slide 14 text

ϥϯν

Slide 15

Slide 15 text

ٳܜ

Slide 16

Slide 16 text

Ϩηϓγϣϯ ύʔςΟ

Slide 17

Slide 17 text

͍͔ͭ͘ͷηογϣϯΛ͝঺հ

Slide 18

Slide 18 text

શମతͳॴײ • ٕज़తͳηογϣϯ͸͔ͳΓগͳ͍ • SREͱͯ͠ͷ࣮ફతͳϓϥΫςΟεͷηογϣϯ͕΄ͱΜͲ • ʮSLO,Error budget͸طʹಋೖ͍ͯ͠Δʯͱ͍͏ࢀՃऀ͸͔ͳΓଟ͔ͬͨ
 ʢηογϣϯ։࢝࣌ʹࢀՃऀʹڍखΛٻΊΔεϐʔΧʔ͕Կਓ͔͍ͨʣ • SRE͸ٕज़తͳ࢓ࣄ͸΋ͪΖΜɺSLOɺError budgetɺon-callͳͲͷSRE จԽΛ࡞Δͷ΋େ੾ͳ࢓ࣄͩͱࢥ͏ͷͰɺࢀՃͯ͠Α͔ͬͨͰ͢

Slide 19

Slide 19 text

What Breaks Our Systems: A Taxonomy of Black Swans • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/nolan-taxonomy • εϐʔΧʔ
 Laura Nolan, Slack • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/sre19amer_slides_nolan.pdf

Slide 20

Slide 20 text

What Breaks Our Systems: A Taxonomy of Black Swans • Black swanͱ͸ʁ • ҟৗͳΠϕϯτ • ༧ଌ͢Δͷ͕೉͍͠ • γϏΞͳΠϯύΫτ • ͜ͷηογϣϯͰ͸ɺ͍͔ͭ͘ͷ࣮ࡍͷαʔϏεো֐Λྫʹͯ͠ɺ
 ͦͷΑ͏ͳো֐Λ๷͙ύλʔϯ͕঺հ͞Ε·ͨ͠ • ঺հ͞Εͨख๏Λಋೖͨ͠ͱͯ͠΋ɺ༧ଌ͕೉͍͠ҟৗͳΠϕϯτΛશͯ๷͙ͷ͸೉͍͠ ͱࢥ͍·͕͢ɺࢀߟʹ͸ͳΔͱࢥ͍·͢

Slide 21

Slide 21 text

What Breaks Our Systems: A Taxonomy of Black Swans • black swanͷछྨͱकΓํ • Hitting limits • load and capacity testing • Monitoring • Spreading Slowness • Fail fast • Use dashboards • Thundering Herds • Plan and test

Slide 22

Slide 22 text

What Breaks Our Systems: A Taxonomy of Black Swans • black swanͷछྨͱकΓํ • Automation interactions • controll • Cyberattacks • Smaller blast radius • Dependency problems • Layer and test

Slide 23

Slide 23 text

Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/ nolan-loadbalancing • εϐʔΧʔ
 Laura Nolan, Slack
 Murali Suriar, Google • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_nolan-load-balancing.pdf

Slide 24

Slide 24 text

Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • LBͷجຊΛ঺հ • DNS ϥ΢ϯυϩϏϯ • Proxyํࣜ • L2DSR • L3DSR • DNS geo ϩʔυόϥϯγϯά • ΫϥΠΞϯταΠυ ϩʔυόϥϯγϯά

Slide 25

Slide 25 text

Aperture: A Non-Cooperative, Client-Side Load Balancing Algorithm • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/oanta • εϐʔΧʔ
 Ruben Oanta, Twitter • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_oanta.pdf

Slide 26

Slide 26 text

Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • TwitterͰ։ൃ͞Ε͍ͯΔFinagle(Web Framework)ͷ࿩ • ͍͔ͭ͘ͷϩʔυόϥϯγϯάΞϧΰϦζϜ͕બ୒Մೳ • P2C • Aperture + Least Loaded • etc • ެࣜυΩϡϝϯτ
 https://twitter.github.io/finagle/guide/Clients.html#load- balancing

Slide 27

Slide 27 text

Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • Aperture Load BalancersΛ࣮૷ͯ͠αʔόϦιʔεΛվળͨ͠ • 78% reduction in standard deviation for requests/sec • 91% drop in aggregate connections (~280k to ~25k) • 75% fewer failures • ~20% reduction in latency at 99.9%tile • 20~25% less CPU used • Total GC time cut in half

Slide 28

Slide 28 text

Tracing, Fast and Slow: Digging into and Improving Your Web Service's Performance • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/root • εϐʔΧʔ
 Lynn Root, Spotify • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ sre19amer_slides_root.pdf

Slide 29

Slide 29 text

Keeping the Balance:
 Internet-Scale Loadbalancing Demystified • ෼ࢄτϨʔγϯάͷجຊΛઆ໌ • ZipkinͳͲΛ·ͩ࢖ͬͨࣄ͕ແ͍ਓʹ͸͓͢͢Ί

Slide 30

Slide 30 text

What I Wish I Knew before Going On-call • ηογϣϯ֓ཁ
 https://www.usenix.org/conference/srecon19americas/presentation/shu • εϐʔΧʔ
 Chie Shu and Wenting Wang, Yelp • εϥΠυ
 https://www.usenix.org/sites/default/files/conference/protected-files/ srecon19americas_slides_wang.pdf

Slide 31

Slide 31 text

What I Wish I Knew before Going On-call • YelpͰͷon-callΦϯϘʔσΟϯάϓϩηε • ݸਓతʹ͜ͷηογϣϯ͕Ұ൪ྑ͔ͬͨ

Slide 32

Slide 32 text

ॳΊͯͷon-callલʹ४උສ୺ʁ

Slide 33

Slide 33 text

ͳͥʁ • Afraid of unknown situations • Lack of confidence • Poor understanding of systems • Lack of protocol • Afraid of asking for help • etc

Slide 34

Slide 34 text

on-callͷޡղ • ͢΂ͯΛ஌͍ͬͯͳ͍ͱ͍͚ͳ͍
 ˠ No • શͯͷ໰୊Λࣗ෼ࣗ਎Ͱղܾ͠ͳ͍ͱ͍͚ͳ͍
 ˠ No • etc

Slide 35

Slide 35 text

ΦϯϘʔσΟϯάϓϩηε ͷਖ਼͍͠໨ඪΛ ઃఆ͢Δඞཁ͕͋Δ

Slide 36

Slide 36 text

• on-callʹର͢ΔඞཁҎ্ͷڪාײΛͳ͘͢ • ΑΓੜ࢈తͰޮ཰తͳon-call ΦϯϘʔσΟϯάͷ໨ඪ

Slide 37

Slide 37 text

ͦͷͨΊʹ͸ τϨʔχϯάϓϩάϥϜ ࡞੒͕͓͢͢Ί

Slide 38

Slide 38 text

• ΧϦΩϡϥϜΛ࡞Δ • ৘ใΛ٧ΊࠐΈ͗͢ͳ͍ • ΠϯτϩμΫγϣϯ • γϯϓϧͳਤ • γεςϜͷ֓ཁ • ԿΛ͍ͯ͠ΔγεςϜͳͷ͔ • Կʹґଘ͍ͯ͠Δͷ͔ τϨʔχϯάϓϩάϥϜͷ࡞Γํ

Slide 39

Slide 39 text

• ΞϥʔτͷछྨΛઆ໌͢Δ • ରԠํ๏΋ॻ͘ • աڈͷϙετϞʔςϜΛ࢖ͬͯઆ໌͢Δͱྑ͍ • ඞཁͳπʔϧΛઆ໌͢Δ • ϞχλϦϯάπʔϧͳͲ • μογϡϘʔυͷݟํ΋આ໌͢Δ τϨʔχϯάϓϩάϥϜͷ࡞Γํ

Slide 40

Slide 40 text

φϨοδͷڞ༗΋ඞཁ

Slide 41

Slide 41 text

• աڈͷΠϯγσϯτ͕ͲͷΑ͏ʹղܾ͞Εͨͷ͔ • աڈͷϙετϞʔςϜ͔ΒֶͿ • ෳ਺ਓͰΠϯγσϯτγϛϡϨʔγϣϯΛߦ͏ • staging؀ڥͳͲͰ΍Δͱ҆શʹߦ͑Δ φϨοδڞ༗

Slide 42

Slide 42 text

खॱॻ΋େ੾

Slide 43

Slide 43 text

• ٕज़తͳ಺༰ • ΠϯύΫτධՁ • ࣮ߦίϚϯυ • ඇٕज़తͳ಺༰ • ֤ࣗͷ໾ׂ෼୲ • ίϛϡχέʔγϣϯํ๏ • ΤεΧϨʔγϣϯϙϦγʔ खॱॻʹؚΊΔ΋ͷ

Slide 44

Slide 44 text

Thank you :)