Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web サービスの信頼性を守るための取り組み / jtf-2017-site-reliability-engineering

rrreeeyyy
August 27, 2017

Web サービスの信頼性を守るための取り組み / jtf-2017-site-reliability-engineering

#jtf2017 ( http://2017.techfesta.jp/ ) にて『Web サービスの信頼性を守るための取り組み』というタイトルで発表しました。

rrreeeyyy

August 27, 2017
Tweet

More Decks by rrreeeyyy

Other Decks in Technology

Transcript

 1. Web αʔϏεͷ৴པੑΛ कΔͨΊͷऔΓ૊Έ ΫοΫύουגࣜձࣾ ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ ٢઒ ཽଠ (

  @rrreeeyyy ) July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 1
 2. Agenda • Site Reliability Engineering (SRE) ͱ͸ • Ұൠతͳ SRE

  ͷ׆ಈʹ͍ͭͯ • ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈʹ͍ͭͯ • JTF2017 ͷςʔϚʮIT ΤϯδχΞϦϯάͷຊ࣭ʯ • SRE ʹ͓͚ΔʮΤϯδχΞϦϯάʯͱ͸Կ͔ʁ • SRE ͷจ຺͔Β IT ΤϯδχΞϦϯάͷຊ࣭Λߟ͑Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 2
 3. !SSSFFFZZZ !SSSFFFZZZ IUUQTSSSFFFZZZDPN :PTIJLBXB3ZPUB Me • Yoshikawa Ryota ( @rrreeeyyy

  [reɪ] ) • ΫοΫύουגࣜձࣾ (2017/01 ʙ) • ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ • ڵຯྖҬ • ϞχλϦϯάɾ࣌ܥྻσʔλϕʔε • ෼ࢄγεςϜɾϩʔυόϥϯα • झຯ • League of Legends, ΀Α΀ΑΫϩχΫϧ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 3
 4. !SSSFFFZZZ !SSSFFFZZZ IUUQTSSSFFFZZZDPN :PTIJLBXB3ZPUB Me • Yoshikawa Ryota ( @rrreeeyyy

  [reɪ] ) • ΫοΫύουגࣜձࣾ (2017/01 ʙ) • ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ • ڵຯྖҬ • ϞχλϦϯάɾ࣌ܥྻσʔλϕʔε • ෼ࢄγεςϜɾϩʔυόϥϯα • झຯ • League of Legends, ΀Α΀ΑΫϩχΫϧ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 4
 5. Site Reliability Engineering ͱ͸ • αΠτͷ৴པੑͷશͯʹ੹೚ൣғΛ࣋ͭ • ಛʹɺιϑτ΢ΣΞɾγεςϜΤϯδχΞϦϯάΛ༻͍ͯ՝୊Λղܾ͢Δ • ྫ͑͹ɺύϑΥʔϚϯε

  • ύϑΥʔϚϯεʹ໰୊͕͋Δίʔυ͸ SRE ࣗ਎͕ίʔυΛॻ͍ͯमਖ਼͢Δ • ྫ͑͹ɺεέʔϥϏϦςΟɾΞϕΠϥϏϦςΟ • ਓؒʹΑΔखಈͷ࡞ۀΛίʔυʹΑΓࣗಈԽ͢Δ • ଐਓੑΛഉআͯ҆͠ఆͤ͞ΔɾਓؒΛ૿΍͞ͳͯ͘΋αʔϏεΛεέʔϧͤ͞Δ • ن໛͕૿େͯ͠΋҆ఆͯ͠αʔϏεΛఏڙͰ͖Δঢ়ଶʹ͢Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 6
 6. SRE ͷ׆ಈ SRE ͷ׆ಈ͸େ͖͘෼͚ͯ࣍ͷ 4 ͭʹ෼ྨͰ͖Δͱ͞Ε͍ͯΔ • So#ware Engineering •

  Systems Engineering • Toil • Overhead July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 7
 7. SRE ͷ׆ಈ • So#ware Engineering • ίʔυͷ࡞੒΍मਖ਼, ͦΕʹؔ࿈͢Δઃܭ΍υΩϡϝϯςʔγϣϯͷ࡞ۀͳͲ • ࣗಈԽεΫϦϓτͷ࡞੒,

  πʔϧ΍ϑϨʔϜϫʔΫͷ࡞੒ ... • αʔϏε΁ͷ৴པੑΛߴΊΔػೳͷ௥Ճ ... • Systems Engineering • 1 ճͷ࡞ۀͰӬଓతͳվળΛੜΉ࣍ͷΑ͏ͳ࡞ۀ • ຊ൪γεςϜͷઃఆมߋ ... • ։ൃνʔϜͷαʔϏεΞʔΩςΫνϟઃܭ ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 8
 8. SRE ͷ׆ಈ • Toil • αʔϏεΛՔಇͤ͞ΔͨΊ͚ͩʹ܁Γฦ͠ߦΘΕΔखಈͰͷ࡞ۀ • ௕ظతͳՁ஋Λ࣋ͨͳ͍, αʔϏεͷ੒௕ʹରͯ͠ O(n)

  Ͱ͋Δ ... • ͭ·Βͳ͍࢓ࣄͰ΋௕ظతͳՁ஋͕͋Ε͹ Toil Ͱ͸ͳ͍ • Overhead • αʔϏεΛՔಇͤ͞Δ͜ͱʹ௚݁͠ͳ͍؅ཧతͳ࡞ۀ • ࠾༻, ϛʔςΟϯά, ධՁ, τϨʔχϯά ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 9
 9. ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈ • SRE ຊʹ͋ΔΑ͏ͳ׆ಈΛΫοΫύου಺Ͱ΋༷ʑߦ͍ͬͯΔ • ΫοΫύουͷ SRE ͷ׆ಈΛ͍͔ͭ͘঺հ

  • ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ϞχλϦϯάͷվળ • ϩʔυόϥϯγϯάͷվળ • ৽ଔݚम • ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 11
 10. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • େ͖Ίͳػೳ͸αʔϏε୯ҐͰ෼ׂͯ͠։ൃ͢Δ • ৔߹ʹ΋ΑΔ͕ SRE ͕ 1

  ਓΞαΠϯ͞ΕΔ • ౰֘ͷαʔϏεͷ৴པੑʹؔ͢Δ͜ͱ͸جຊతʹશ෦΍Δ • ઃܭͷ૬ஊɾϨϏϡʔ • ϦϦʔεલͷෛՙࢼݧɾΩϟύγςΟϓϥϯχϯά • ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 13
 11. ઃܭͷ૬ஊɾϨϏϡʔ • ։ൃͷલʹҎԼͷΑ͏ͳ͜ͱΛڞ༗ͯ͠΋Β͍ٞ࿦͢Δ • ͲͷΑ͏ͳΞϓϦέʔγϣϯΛ࡞Δ͔ • ͲͷΑ͏ͳίϯϙʔωϯτ͕ඞཁ͔ • ͲͷΑ͏ͳߏ੒ɾεέʔϧઓུʹ͢Δ͔ •

  Design Document Λॻ͍ͯ΋Β͍ͦΕΛϕʔεʹ࿩Λͨ͠Γ͢Δ • େ͖ΊͷػೳͳͲΛ։ൃ͢Δ࣌͸ Design Document Λॻ͘͜ͱ͕ଟ͍ (SRE ΋) July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 14
 12. ෛՙࢼݧ • ͋Δఔ౓ग़དྷ͖ͯͨลΓͰෛՙࢼݧͷ࿩Λ͢Δ • Ͳͷ͘Β͍ͷϦΫΤετ͕૝ఆ͞ΕΔ͔ • ͲͷΑ͏ͳϢʔβϦΫΤετ͕૝ఆ͞ΕΔ͔ • SRE ଆͰෛՙࢼݧͷγφϦΦ΍؀ڥΛ࡞Δ

  • ෛՙࢼݧΛߦ͍ͳ͕Βຊ൪؀ڥͰͷΩϟύγςΟΛݟੵ΋Δ • ΞϓϦέʔγϣϯ্ͷϘτϧωοΫ͕Ͳ͜ʹ͋Δ͔ΛݟΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 15
 13. ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ৴པੑʹؔ͢Δ໰୊͕͋Δ৔߹͸ࣗ෼ͰίʔυΛॻ͘ • ྫ͑͹ύϑΥʔϚϯε্ͷ໰୊ͳͲ • ΞϓϦέʔγϣϯͷ࢓༷Λཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹Ͳ͏͍͏ը໘ͷͲ͏͍͏ΫΤϦͳͷ͔ •

  Ͳ͏͍͏৘ใ͕ཉͯ͘͠Ͳ͏͍͏੍໿͕͋Δͷ͔ • ࢖͍ͬͯΔϛυϧ΢ΣΞͷ࢓༷΋ཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹ SQL ͷվળͳΒ MySQL ͷΠϯσοΫεͷߏ଄΍಺෦ॲཧͷ஌͕ࣝ͋Δͱྑ͍ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 17
 14. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ͜ͷลΓ͸౰ͨΓલͷࣄΛී௨ʹ΍͍ͬͯΔ͚ͩ • ΋ͬͱվળग़དྷΔՕॴ͕ͨ͘͞Μ͋Δͱߟ͍͑ͯΔ • ྫ͑͹ෛՙࢼݧΛࣗಈͰߦ͏Α͏ʹ͢ΔɺͳͲ •

  ΞϓϦέʔγϣϯΛཧղ͠ɺඞཁͳΒ͹ίʔυΛॻ͘ • SRE ͸αΠτͷ৴པੑʹؔΘΔશͯʹ੹຿Λෛ͏ • ΞϓϦέʔγϣϯͷಛੑ΍ઃܭΛཧղ͠ϛυϧ΢ΣΞΛ૊Έ߹ΘͤΔ Systems Engineering • ࢓༷΍ϛυϧ΢ΣΞΛཧղ࣮͠ࡍʹΞϓϦέʔγϣϯͷϩδοΫΛॻ͘ So.ware Engineering July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 19
 15. ϞχλϦϯάͷվળ • ݱࡏ͸ϝτϦΫεऔಘʹ Zabbix Λ࢖༻͍ͯ͠Δ • ΑΓεέʔϧ͠ɺߴڃݴޠͰϝτϦΫε΍ΞϥʔτΛѻ͑ΔΑ͏ʹ͍ͨ͠ • SRE ຊ

  ͷ 10 ষʹॻ͔Ε͍ͯΔΑ͏ͳঢ়ଶʹͳΓ͍ͨ • Prometheus ͷಋೖΛঃʑʹ࣮ࢪ͍ͯ͠Δ • ϝτϦΫεͷղ૾౓΋ 15 ඵ୯ҐͳͲͰऔಘ͢ΔΑ͏ʹ͍ͯ͠Δ • ୯७ͳ HTTP ϨεϙϯεͰϝτϦΫεΛऩूग़དྷΔ • ΞϓϦέʔγϣϯࣗମͷϝτϦΫεͳͲ͕ൺֱత؆୯ʹऩूग़དྷΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 21
 16. ϞχλϦϯάͷվળ • Ͳ͏͍ͬͨϛυϧ΢ΣΞΛબఆ͢Δ͔ • ϞχλϦϯάͰ͋Ε͹࣌ܥྻσʔλϕʔεͳͲʹؔ͢Δਂ͍ཧղΛ͔ͯ͠ΒબͿ • Ͳ͏͍ͬͨಛੑ͕͋Δ͔ʁࣗ෼ͨͪͷن໛ʹ߹͍ͬͯΔ͔ʁεέʔϧ͢Δ͔ʁ • ଍Γͳ͍ίϯϙʔωϯτ͸ͳ͍͔ʁແ͍ͳΒࣗ࡞Ͱ͖ͦ͏ͳن໛͔ʁ •

  Prometheus ͳΒྫ͑͹ Long-Term Storage पΓͷίϯϙʔωϯτͳͲ • ͔ͬ͠Γͱͨ͠ Systems Engineering ͷ஌ݟΛར༻ͯ͠બఆ͢Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 25
 17. ϩʔυόϥϯγϯάͷվળ • ैདྷ͸ΞϓϦέʔγϣϯαʔόͷϩʔυόϥϯγϯάʹ haproxy Λར༻͍ͯͨ͠ • Weighted Round-robin (wrr) Λར༻ͯ͠όϥϯγϯάΛߦ͍ͬͯͨ

  • ෛՙͷ௿͍ΤϯυϙΠϯτͱߴ͍ΤϯυϙΠϯτ͕͋Δ৔߹ෛՙ͕͹Βͭ͘ • wrr ͩͱಛఆͷαʔόʹෛՙͷߴ͍ΤϯυϙΠϯτ͕ूத͢ΔՄೳੑ͕͋Δ • leastconn Λ࢖͑͹ྑͦ͞͏͕ͩ HTTP keepalive ͍ͯ͠Δ৔߹ʹ໰୊͕͋Δ • ίωΫγϣϯ͕࢒ΔͷͰ࣮ࡍʹॲཧ͍ͯ͠ΔϦΫΤετ਺ͱണ཭͢Δɹ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 28
 18. ryotarai/simproxy • h#ps:/ /github.com/ryotarai/simproxy • Go Ͱॻ͔ΕͨγϯϓϧͳϦόʔεϓϩΩγ • ϔϧενΣοΫͳͲͷجຊతͳػೳ͸͋Δ •

  Balancing Method ʹ leastreq Λ࣮૷͍ͯ͠Δ • ࢦඪͱͯ͠୯७ͳίωΫγϣϯ਺Ͱ͸ͳ࣍͘ͷΑ͏ͳ஋Λར༻͢Δ • όοΫΤϯυ͕ड͚औ͕ͬͨԠ౴͕ฦ͖͍ͬͯͯͳ͍ϦΫΤετͷ਺ • ͜ΕʹΑΓ HTTP keepalive ͕͋ͬͯ΋ leastconn ͷΑ͏ͳόϥϯγϯά͕Մೳ • ϩʔυόϥϯγϯάͷΞϧΰϦζϜ౳ʹؔ͢Δ Systems Engineering ͱ࣮૷͢Δ So.ware Engineering July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 29
 19. ৽ଔݚम • ৽ଔʹରͯ͠ΠϯϑϥͷݚमΛ 3 ೔ؒߦ͍ͬͯΔ • MySQL ͷΠϯσοΫεͷߏ଄Λཧղ͠ΫΤϦνϡʔχϯάΛߦ͏ ... •

  AWS ͷ ALB ΍ RDS ͳͲΛ࢖͍ࣗ෼ͷΞϓϦέʔγϣϯΛεέʔϧͤ͞Δ ... • ࣮ࡍͷΞϓϦέʔγϣϯΛॻ͘ϝϯόʔ͕Πϯϑϥͷཧղ͕͋Δํ͕ྑ͍ • શͯͷΞϓϦέʔγϣϯίʔυΛ SRE ͕νΣοΫ͍ͯ͠Δͱεέʔϧ͠ͳ͍ • ෼ྨ্͸ Overhead ͔΋͠Εͳ͍͕αʔϏεΛεέʔϧͤ͞ΔͨΊʹ͸ඞཁ • ࣮ࡍʹ͸ Systems Engineering ͷਂ͍஌ݟ͕ແ͍ͱڭ͑Δ͜ͱ͕ग़དྷͳ͍ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 32
 20. SRE ʹ͓͚Δ "ΤϯδχΞϦϯά" • The work of reducing toil and

  scaling up services is the "Engineering" in Site Reliability Engineering. • Engineering work is novel and intrinsically requires human judgment. It produces a permanent improvement in your service, and is guided by a strategy. It is frequently creaBve and innovaBve, taking a design-driven approach to solving a problem—the more generalized, the beFer. — Site Reliability Engineering, Chapter 5 - EliminaBng Toil July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 37
 21. SRE ʹ͓͚Δ "ΤϯδχΞϦϯά" • Toil ΛݮΒ͠αʔϏεΛεέʔϧͤ͞Δ͜ͱ͕ͦ͜ SRE ͷ "ΤϯδχΞϦϯά" •

  ΤϯδχΞϦϯάͷ࡞ۀ͸... • ৽͍͜͠ͱΛ͢Δ΋ͷͰ͋Δ • ຊ࣭తʹਓؒͷ൑அΛඞཁͱ͢Δ • αʔϏεʹ߃ٱతͳվળΛ༩͑Δ • ઓུʹΑͬͯಋ͔ΕΔ • ΫϦΤΠςΟϒ͔ͭΠϊϕʔςΟϒͰ͋Δ • ໰୊Λղܾ͢ΔͨΊʹઃܭओಋͷΞϓϩʔνΛऔΔ • ൚༻ੑ͕͋Δ΄Ͳ༏Ε͍ͯΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 38
 22. ࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ • IT ΤϯδχΞϦϯά͸޻ֶͷதͰ΋ಛʹʮ৘ใʯΛऔΓѻ͏ • ৘ใͱ͸(weblioࣙॻΑΓ): • ࣄ෺ɾग़དྷࣄͳͲͷ಺༰ɾ༷ࢠɻ·ͨɼͦͷ஌Βͤɻ

  • ػցܥ΍ੜମܥʹ༩͑ΒΕΔࢦྩ΍৴߸ɻ • ͋Δಛఆͷ໨తʹ͍ͭͯɼద੾ͳ൑அΛԼͨ͠Γɼߦಈͷҙ ࢥܾఆΛ͢ΔͨΊʹ໾ཱͭࢿྉ΍஌ࣝɻ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 39
 23. ࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ • Web αʔϏεͳͲͷ৘ใ఻ୡͷखஈΛ࢖ͬͯϢʔβʹՁ஋Λఏڙ͢Δ • ྫ͑͹ɺશੈքʹ͓͚ΔϨγϐͷڞ༗ͳͲ • ػցʹ༷ʑͳࢦྩ΍৴߸Λ༩͑ɺਓؒͷ࡞ۀΛޮ཰Խͤ͞Δ

  • ྫ͑͹ɺϓϩάϥϛϯάʹΑΔࣗಈԽ΍༏ΕͨઃܭʹΑΔޮ཰Խ • → ৘ใ఻ୡ΍ਓؒͷࢥߟ΍൑அΛػցʹߦΘͤΔ͜ͱʹΑͬͯՁ஋Λग़͢ • SRE ͱ͍͏จԽ͸ैདྷΑΓࣗ෼ͷߟ͑Δ͜ͷຊ࣭ʹ͍ۙͱࢥ͍ͬͯΔ • ߴ౓ͳࣗಈԽɺੜମͷΑ͏ʹࣗಈͰಈ࡞͢ΔγεςϜ͸ࣗ෼ͷڵຯྖҬ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 40
 24. ·ͱΊ • Ұൠతͳ SRE ͷ׆ಈͱΫοΫύουʹ͓͚Δ SRE ͷऔΓ૊ΈΛ঺հ • Ұൠతͳ SRE

  ͷ׆ಈ͸ So'ware, Systems Engineering Λओ࣠ʹ Toil, Overhead ͕͋Δ • SRE ຊʹ΋͋ΔΑ͏ͳΤοηϯεΛऔΓೖΕͭͭࣗ෼ͨͪͷ໨తɾن໛ʹ͋ͬͨΤϯ δχΞϦϯάΛΫοΫύουͰ͸΍͍ͬͯΔ • SRE ʹ͓͚ΔΤϯδχΞϦϯάͱɺࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ʹ͍ͭͯ • Toil ΛݮΒ͠αʔϏεΛεέʔϧͤ͞Δ͜ͱ • ৘ใ఻ୡ΍ਓؒͷࢥߟ΍൑அΛػցʹޮ཰తʹߦΘͤΔ͜ͱͰՁ஋Λग़͢͜ͱ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 41