Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web サービスの信頼性を守るための取り組み / jtf-2017-site-reliability-engineering

Web サービスの信頼性を守るための取り組み / jtf-2017-site-reliability-engineering

#jtf2017 ( http://2017.techfesta.jp/ ) にて『Web サービスの信頼性を守るための取り組み』というタイトルで発表しました。

rrreeeyyy

August 27, 2017
Tweet

More Decks by rrreeeyyy

Other Decks in Technology

Transcript

  1. Web αʔϏεͷ৴པੑΛ कΔͨΊͷऔΓ૊Έ ΫοΫύουגࣜձࣾ ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ ٢઒ ཽଠ (

    @rrreeeyyy ) July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 1
  2. Agenda • Site Reliability Engineering (SRE) ͱ͸ • Ұൠతͳ SRE

    ͷ׆ಈʹ͍ͭͯ • ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈʹ͍ͭͯ • JTF2017 ͷςʔϚʮIT ΤϯδχΞϦϯάͷຊ࣭ʯ • SRE ʹ͓͚ΔʮΤϯδχΞϦϯάʯͱ͸Կ͔ʁ • SRE ͷจ຺͔Β IT ΤϯδχΞϦϯάͷຊ࣭Λߟ͑Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 2
  3. !SSSFFFZZZ !SSSFFFZZZ IUUQTSSSFFFZZZDPN :PTIJLBXB3ZPUB Me • Yoshikawa Ryota ( @rrreeeyyy

    [reɪ] ) • ΫοΫύουגࣜձࣾ (2017/01 ʙ) • ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ • ڵຯྖҬ • ϞχλϦϯάɾ࣌ܥྻσʔλϕʔε • ෼ࢄγεςϜɾϩʔυόϥϯα • झຯ • League of Legends, ΀Α΀ΑΫϩχΫϧ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 3
  4. !SSSFFFZZZ !SSSFFFZZZ IUUQTSSSFFFZZZDPN :PTIJLBXB3ZPUB Me • Yoshikawa Ryota ( @rrreeeyyy

    [reɪ] ) • ΫοΫύουגࣜձࣾ (2017/01 ʙ) • ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ • ڵຯྖҬ • ϞχλϦϯάɾ࣌ܥྻσʔλϕʔε • ෼ࢄγεςϜɾϩʔυόϥϯα • झຯ • League of Legends, ΀Α΀ΑΫϩχΫϧ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 4
  5. Site Reliability Engineering ͱ͸ • αΠτͷ৴པੑͷશͯʹ੹೚ൣғΛ࣋ͭ • ಛʹɺιϑτ΢ΣΞɾγεςϜΤϯδχΞϦϯάΛ༻͍ͯ՝୊Λղܾ͢Δ • ྫ͑͹ɺύϑΥʔϚϯε

    • ύϑΥʔϚϯεʹ໰୊͕͋Δίʔυ͸ SRE ࣗ਎͕ίʔυΛॻ͍ͯमਖ਼͢Δ • ྫ͑͹ɺεέʔϥϏϦςΟɾΞϕΠϥϏϦςΟ • ਓؒʹΑΔखಈͷ࡞ۀΛίʔυʹΑΓࣗಈԽ͢Δ • ଐਓੑΛഉআͯ҆͠ఆͤ͞ΔɾਓؒΛ૿΍͞ͳͯ͘΋αʔϏεΛεέʔϧͤ͞Δ • ن໛͕૿େͯ͠΋҆ఆͯ͠αʔϏεΛఏڙͰ͖Δঢ়ଶʹ͢Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 6
  6. SRE ͷ׆ಈ SRE ͷ׆ಈ͸େ͖͘෼͚ͯ࣍ͷ 4 ͭʹ෼ྨͰ͖Δͱ͞Ε͍ͯΔ • So#ware Engineering •

    Systems Engineering • Toil • Overhead July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 7
  7. SRE ͷ׆ಈ • So#ware Engineering • ίʔυͷ࡞੒΍मਖ਼, ͦΕʹؔ࿈͢Δઃܭ΍υΩϡϝϯςʔγϣϯͷ࡞ۀͳͲ • ࣗಈԽεΫϦϓτͷ࡞੒,

    πʔϧ΍ϑϨʔϜϫʔΫͷ࡞੒ ... • αʔϏε΁ͷ৴པੑΛߴΊΔػೳͷ௥Ճ ... • Systems Engineering • 1 ճͷ࡞ۀͰӬଓతͳվળΛੜΉ࣍ͷΑ͏ͳ࡞ۀ • ຊ൪γεςϜͷઃఆมߋ ... • ։ൃνʔϜͷαʔϏεΞʔΩςΫνϟઃܭ ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 8
  8. SRE ͷ׆ಈ • Toil • αʔϏεΛՔಇͤ͞ΔͨΊ͚ͩʹ܁Γฦ͠ߦΘΕΔखಈͰͷ࡞ۀ • ௕ظతͳՁ஋Λ࣋ͨͳ͍, αʔϏεͷ੒௕ʹରͯ͠ O(n)

    Ͱ͋Δ ... • ͭ·Βͳ͍࢓ࣄͰ΋௕ظతͳՁ஋͕͋Ε͹ Toil Ͱ͸ͳ͍ • Overhead • αʔϏεΛՔಇͤ͞Δ͜ͱʹ௚݁͠ͳ͍؅ཧతͳ࡞ۀ • ࠾༻, ϛʔςΟϯά, ධՁ, τϨʔχϯά ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 9
  9. ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈ • SRE ຊʹ͋ΔΑ͏ͳ׆ಈΛΫοΫύου಺Ͱ΋༷ʑߦ͍ͬͯΔ • ΫοΫύουͷ SRE ͷ׆ಈΛ͍͔ͭ͘঺հ

    • ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ϞχλϦϯάͷվળ • ϩʔυόϥϯγϯάͷվળ • ৽ଔݚम • ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 11
  10. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • େ͖Ίͳػೳ͸αʔϏε୯ҐͰ෼ׂͯ͠։ൃ͢Δ • ৔߹ʹ΋ΑΔ͕ SRE ͕ 1

    ਓΞαΠϯ͞ΕΔ • ౰֘ͷαʔϏεͷ৴པੑʹؔ͢Δ͜ͱ͸جຊతʹશ෦΍Δ • ઃܭͷ૬ஊɾϨϏϡʔ • ϦϦʔεલͷෛՙࢼݧɾΩϟύγςΟϓϥϯχϯά • ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ... July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 13
  11. ઃܭͷ૬ஊɾϨϏϡʔ • ։ൃͷલʹҎԼͷΑ͏ͳ͜ͱΛڞ༗ͯ͠΋Β͍ٞ࿦͢Δ • ͲͷΑ͏ͳΞϓϦέʔγϣϯΛ࡞Δ͔ • ͲͷΑ͏ͳίϯϙʔωϯτ͕ඞཁ͔ • ͲͷΑ͏ͳߏ੒ɾεέʔϧઓུʹ͢Δ͔ •

    Design Document Λॻ͍ͯ΋Β͍ͦΕΛϕʔεʹ࿩Λͨ͠Γ͢Δ • େ͖ΊͷػೳͳͲΛ։ൃ͢Δ࣌͸ Design Document Λॻ͘͜ͱ͕ଟ͍ (SRE ΋) July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 14
  12. ෛՙࢼݧ • ͋Δఔ౓ग़དྷ͖ͯͨลΓͰෛՙࢼݧͷ࿩Λ͢Δ • Ͳͷ͘Β͍ͷϦΫΤετ͕૝ఆ͞ΕΔ͔ • ͲͷΑ͏ͳϢʔβϦΫΤετ͕૝ఆ͞ΕΔ͔ • SRE ଆͰෛՙࢼݧͷγφϦΦ΍؀ڥΛ࡞Δ

    • ෛՙࢼݧΛߦ͍ͳ͕Βຊ൪؀ڥͰͷΩϟύγςΟΛݟੵ΋Δ • ΞϓϦέʔγϣϯ্ͷϘτϧωοΫ͕Ͳ͜ʹ͋Δ͔ΛݟΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 15
  13. ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ৴པੑʹؔ͢Δ໰୊͕͋Δ৔߹͸ࣗ෼ͰίʔυΛॻ͘ • ྫ͑͹ύϑΥʔϚϯε্ͷ໰୊ͳͲ • ΞϓϦέʔγϣϯͷ࢓༷Λཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹Ͳ͏͍͏ը໘ͷͲ͏͍͏ΫΤϦͳͷ͔ •

    Ͳ͏͍͏৘ใ͕ཉͯ͘͠Ͳ͏͍͏੍໿͕͋Δͷ͔ • ࢖͍ͬͯΔϛυϧ΢ΣΞͷ࢓༷΋ཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹ SQL ͷվળͳΒ MySQL ͷΠϯσοΫεͷߏ଄΍಺෦ॲཧͷ஌͕ࣝ͋Δͱྑ͍ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 17
  14. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ͜ͷลΓ͸౰ͨΓલͷࣄΛී௨ʹ΍͍ͬͯΔ͚ͩ • ΋ͬͱվળग़དྷΔՕॴ͕ͨ͘͞Μ͋Δͱߟ͍͑ͯΔ • ྫ͑͹ෛՙࢼݧΛࣗಈͰߦ͏Α͏ʹ͢ΔɺͳͲ •

    ΞϓϦέʔγϣϯΛཧղ͠ɺඞཁͳΒ͹ίʔυΛॻ͘ • SRE ͸αΠτͷ৴པੑʹؔΘΔશͯʹ੹຿Λෛ͏ • ΞϓϦέʔγϣϯͷಛੑ΍ઃܭΛཧղ͠ϛυϧ΢ΣΞΛ૊Έ߹ΘͤΔ Systems Engineering • ࢓༷΍ϛυϧ΢ΣΞΛཧղ࣮͠ࡍʹΞϓϦέʔγϣϯͷϩδοΫΛॻ͘ So.ware Engineering July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 19
  15. ϞχλϦϯάͷվળ • ݱࡏ͸ϝτϦΫεऔಘʹ Zabbix Λ࢖༻͍ͯ͠Δ • ΑΓεέʔϧ͠ɺߴڃݴޠͰϝτϦΫε΍ΞϥʔτΛѻ͑ΔΑ͏ʹ͍ͨ͠ • SRE ຊ

    ͷ 10 ষʹॻ͔Ε͍ͯΔΑ͏ͳঢ়ଶʹͳΓ͍ͨ • Prometheus ͷಋೖΛঃʑʹ࣮ࢪ͍ͯ͠Δ • ϝτϦΫεͷղ૾౓΋ 15 ඵ୯ҐͳͲͰऔಘ͢ΔΑ͏ʹ͍ͯ͠Δ • ୯७ͳ HTTP ϨεϙϯεͰϝτϦΫεΛऩूग़དྷΔ • ΞϓϦέʔγϣϯࣗମͷϝτϦΫεͳͲ͕ൺֱత؆୯ʹऩूग़དྷΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 21
  16. ϞχλϦϯάͷվળ • Ͳ͏͍ͬͨϛυϧ΢ΣΞΛબఆ͢Δ͔ • ϞχλϦϯάͰ͋Ε͹࣌ܥྻσʔλϕʔεͳͲʹؔ͢Δਂ͍ཧղΛ͔ͯ͠ΒબͿ • Ͳ͏͍ͬͨಛੑ͕͋Δ͔ʁࣗ෼ͨͪͷن໛ʹ߹͍ͬͯΔ͔ʁεέʔϧ͢Δ͔ʁ • ଍Γͳ͍ίϯϙʔωϯτ͸ͳ͍͔ʁແ͍ͳΒࣗ࡞Ͱ͖ͦ͏ͳن໛͔ʁ •

    Prometheus ͳΒྫ͑͹ Long-Term Storage पΓͷίϯϙʔωϯτͳͲ • ͔ͬ͠Γͱͨ͠ Systems Engineering ͷ஌ݟΛར༻ͯ͠બఆ͢Δ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 25
  17. ϩʔυόϥϯγϯάͷվળ • ैདྷ͸ΞϓϦέʔγϣϯαʔόͷϩʔυόϥϯγϯάʹ haproxy Λར༻͍ͯͨ͠ • Weighted Round-robin (wrr) Λར༻ͯ͠όϥϯγϯάΛߦ͍ͬͯͨ

    • ෛՙͷ௿͍ΤϯυϙΠϯτͱߴ͍ΤϯυϙΠϯτ͕͋Δ৔߹ෛՙ͕͹Βͭ͘ • wrr ͩͱಛఆͷαʔόʹෛՙͷߴ͍ΤϯυϙΠϯτ͕ूத͢ΔՄೳੑ͕͋Δ • leastconn Λ࢖͑͹ྑͦ͞͏͕ͩ HTTP keepalive ͍ͯ͠Δ৔߹ʹ໰୊͕͋Δ • ίωΫγϣϯ͕࢒ΔͷͰ࣮ࡍʹॲཧ͍ͯ͠ΔϦΫΤετ਺ͱണ཭͢Δɹ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 28
  18. ryotarai/simproxy • h#ps:/ /github.com/ryotarai/simproxy • Go Ͱॻ͔ΕͨγϯϓϧͳϦόʔεϓϩΩγ • ϔϧενΣοΫͳͲͷجຊతͳػೳ͸͋Δ •

    Balancing Method ʹ leastreq Λ࣮૷͍ͯ͠Δ • ࢦඪͱͯ͠୯७ͳίωΫγϣϯ਺Ͱ͸ͳ࣍͘ͷΑ͏ͳ஋Λར༻͢Δ • όοΫΤϯυ͕ड͚औ͕ͬͨԠ౴͕ฦ͖͍ͬͯͯͳ͍ϦΫΤετͷ਺ • ͜ΕʹΑΓ HTTP keepalive ͕͋ͬͯ΋ leastconn ͷΑ͏ͳόϥϯγϯά͕Մೳ • ϩʔυόϥϯγϯάͷΞϧΰϦζϜ౳ʹؔ͢Δ Systems Engineering ͱ࣮૷͢Δ So.ware Engineering July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 29
  19. ৽ଔݚम • ৽ଔʹରͯ͠ΠϯϑϥͷݚमΛ 3 ೔ؒߦ͍ͬͯΔ • MySQL ͷΠϯσοΫεͷߏ଄Λཧղ͠ΫΤϦνϡʔχϯάΛߦ͏ ... •

    AWS ͷ ALB ΍ RDS ͳͲΛ࢖͍ࣗ෼ͷΞϓϦέʔγϣϯΛεέʔϧͤ͞Δ ... • ࣮ࡍͷΞϓϦέʔγϣϯΛॻ͘ϝϯόʔ͕Πϯϑϥͷཧղ͕͋Δํ͕ྑ͍ • શͯͷΞϓϦέʔγϣϯίʔυΛ SRE ͕νΣοΫ͍ͯ͠Δͱεέʔϧ͠ͳ͍ • ෼ྨ্͸ Overhead ͔΋͠Εͳ͍͕αʔϏεΛεέʔϧͤ͞ΔͨΊʹ͸ඞཁ • ࣮ࡍʹ͸ Systems Engineering ͷਂ͍஌ݟ͕ແ͍ͱڭ͑Δ͜ͱ͕ग़དྷͳ͍ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 32
  20. SRE ʹ͓͚Δ "ΤϯδχΞϦϯά" • The work of reducing toil and

    scaling up services is the "Engineering" in Site Reliability Engineering. • Engineering work is novel and intrinsically requires human judgment. It produces a permanent improvement in your service, and is guided by a strategy. It is frequently creaBve and innovaBve, taking a design-driven approach to solving a problem—the more generalized, the beFer. — Site Reliability Engineering, Chapter 5 - EliminaBng Toil July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 37
  21. SRE ʹ͓͚Δ "ΤϯδχΞϦϯά" • Toil ΛݮΒ͠αʔϏεΛεέʔϧͤ͞Δ͜ͱ͕ͦ͜ SRE ͷ "ΤϯδχΞϦϯά" •

    ΤϯδχΞϦϯάͷ࡞ۀ͸... • ৽͍͜͠ͱΛ͢Δ΋ͷͰ͋Δ • ຊ࣭తʹਓؒͷ൑அΛඞཁͱ͢Δ • αʔϏεʹ߃ٱతͳվળΛ༩͑Δ • ઓུʹΑͬͯಋ͔ΕΔ • ΫϦΤΠςΟϒ͔ͭΠϊϕʔςΟϒͰ͋Δ • ໰୊Λղܾ͢ΔͨΊʹઃܭओಋͷΞϓϩʔνΛऔΔ • ൚༻ੑ͕͋Δ΄Ͳ༏Ε͍ͯΔ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 38
  22. ࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ • IT ΤϯδχΞϦϯά͸޻ֶͷதͰ΋ಛʹʮ৘ใʯΛऔΓѻ͏ • ৘ใͱ͸(weblioࣙॻΑΓ): • ࣄ෺ɾग़དྷࣄͳͲͷ಺༰ɾ༷ࢠɻ·ͨɼͦͷ஌Βͤɻ

    • ػցܥ΍ੜମܥʹ༩͑ΒΕΔࢦྩ΍৴߸ɻ • ͋Δಛఆͷ໨తʹ͍ͭͯɼద੾ͳ൑அΛԼͨ͠Γɼߦಈͷҙ ࢥܾఆΛ͢ΔͨΊʹ໾ཱͭࢿྉ΍஌ࣝɻ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 39
  23. ࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ • Web αʔϏεͳͲͷ৘ใ఻ୡͷखஈΛ࢖ͬͯϢʔβʹՁ஋Λఏڙ͢Δ • ྫ͑͹ɺશੈքʹ͓͚ΔϨγϐͷڞ༗ͳͲ • ػցʹ༷ʑͳࢦྩ΍৴߸Λ༩͑ɺਓؒͷ࡞ۀΛޮ཰Խͤ͞Δ

    • ྫ͑͹ɺϓϩάϥϛϯάʹΑΔࣗಈԽ΍༏ΕͨઃܭʹΑΔޮ཰Խ • → ৘ใ఻ୡ΍ਓؒͷࢥߟ΍൑அΛػցʹߦΘͤΔ͜ͱʹΑͬͯՁ஋Λग़͢ • SRE ͱ͍͏จԽ͸ैདྷΑΓࣗ෼ͷߟ͑Δ͜ͷຊ࣭ʹ͍ۙͱࢥ͍ͬͯΔ • ߴ౓ͳࣗಈԽɺੜମͷΑ͏ʹࣗಈͰಈ࡞͢ΔγεςϜ͸ࣗ෼ͷڵຯྖҬ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 40
  24. ·ͱΊ • Ұൠతͳ SRE ͷ׆ಈͱΫοΫύουʹ͓͚Δ SRE ͷऔΓ૊ΈΛ঺հ • Ұൠతͳ SRE

    ͷ׆ಈ͸ So'ware, Systems Engineering Λओ࣠ʹ Toil, Overhead ͕͋Δ • SRE ຊʹ΋͋ΔΑ͏ͳΤοηϯεΛऔΓೖΕͭͭࣗ෼ͨͪͷ໨తɾن໛ʹ͋ͬͨΤϯ δχΞϦϯάΛΫοΫύουͰ͸΍͍ͬͯΔ • SRE ʹ͓͚ΔΤϯδχΞϦϯάͱɺࣗ෼͕ߟ͑Δ IT ΤϯδχΞϦϯάͷຊ࣭ʹ͍ͭͯ • Toil ΛݮΒ͠αʔϏεΛεέʔϧͤ͞Δ͜ͱ • ৘ใ఻ୡ΍ਓؒͷࢥߟ΍൑அΛػցʹޮ཰తʹߦΘͤΔ͜ͱͰՁ஋Λग़͢͜ͱ July Tech Festa 2017 (2017/08/27) | Yoshikawa Ryota ( @rrreeeyyy ) 41