Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE at Cookpad

rrreeeyyy
November 29, 2017

SRE at Cookpad

#hbstudy の 79 回目で SRE の話をしました。

rrreeeyyy

November 29, 2017
Tweet

More Decks by rrreeeyyy

Other Decks in Technology

Transcript

  1. SRE at Cookpad ΫοΫύουגࣜձࣾ ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ ٢઒ ཽଠ (

    @rrreeeyyy ) hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 1
  2. Agenda • SRE ʹ͍ͭͯ(վΊͯ) • ΫοΫύουʹ͓͚ΔΠϯϑϥߏ੒ • EC2 with spot

    instances • hako + ECS ؀ڥ • ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈ • ࢓ࣄͷྲྀΕɾऔΓ૊Έʹ͍ͭͯɹ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 2
  3. !SSSFFFZZZ !SSSFFFZZZ IUUQTSSSFFFZZZDPN :PTIJLBXB3ZPUB Me • Yoshikawa Ryota ( @rrreeeyyy

    [reɪ] ) • גࣜձࣾϋʔτϏʔπ (2010/11 ʙ 2016/12) • ΫοΫύουגࣜձࣾ (2017/01 ʙ) • ΠϯϑϥετϥΫνϟʔ෦ SRE άϧʔϓ • ڵຯྖҬ • ϞχλϦϯάɾ࣌ܥྻσʔλϕʔε • ෼ࢄγεςϜɾϩʔυόϥϯα • झຯ • League of Legends, ΀Α΀ΑΫϩχΫϧ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 3
  4. (վΊͯ) Site Reliability Engineering • αΠτͷ৴པੑͷશͯʹ੹೚ൣғΛ࣋ͭ • ಛʹɺιϑτ΢ΣΞɾγεςϜΤϯδχΞϦϯάΛ༻͍ͯ՝୊Λղܾ͢Δ • ྫ͑͹ɺύϑΥʔϚϯε

    • ύϑΥʔϚϯεʹ໰୊͕͋Δίʔυ͸ SRE ࣗ਎͕ίʔυΛॻ͍ͯमਖ਼͢Δ • ྫ͑͹ɺεέʔϥϏϦςΟɾΞϕΠϥϏϦςΟ • ਓؒʹΑΔखಈͷ࡞ۀΛίʔυʹΑΓࣗಈԽ͢Δ • ଐਓੑΛഉআͯ҆͠ఆͤ͞ΔɾਓؒΛ૿΍͞ͳͯ͘΋αʔϏεΛεέʔϧͤ͞Δ • ن໛͕૿େͯ͠΋҆ఆͯ͠αʔϏεΛఏڙͰ͖Δঢ়ଶʹ͢Δ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 7
  5. SRE ͷ׆ಈ SRE ͷ׆ಈ͸େ͖͘෼͚ͯ࣍ͷ 4 ͭʹ෼ྨͰ͖Δͱ͞Ε͍ͯΔ • So#ware Engineering •

    Systems Engineering • Toil • Overhead hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 8
  6. SRE ͷ׆ಈ • So#ware Engineering • ίʔυͷ࡞੒΍मਖ਼, ͦΕʹؔ࿈͢Δઃܭ΍υΩϡϝϯςʔγϣϯͷ࡞ۀͳͲ • ࣗಈԽεΫϦϓτͷ࡞੒,

    πʔϧ΍ϑϨʔϜϫʔΫͷ࡞੒ ... • αʔϏε΁ͷ৴པੑΛߴΊΔػೳͷ௥Ճ ... • Systems Engineering • 1 ճͷ࡞ۀͰӬଓతͳվળΛੜΉ࣍ͷΑ͏ͳ࡞ۀ • ຊ൪γεςϜͷઃఆมߋ ... • ։ൃνʔϜͷαʔϏεΞʔΩςΫνϟઃܭ ... hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 9
  7. SRE ͷ׆ಈ • Toil • αʔϏεΛՔಇͤ͞ΔͨΊ͚ͩʹ܁Γฦ͠ߦΘΕΔखಈͰͷ࡞ۀ • ௕ظతͳՁ஋Λ࣋ͨͳ͍, αʔϏεͷ੒௕ʹରͯ͠ O(n)

    Ͱ͋Δ ... • ͭ·Βͳ͍࢓ࣄͰ΋௕ظతͳՁ஋͕͋Ε͹ Toil Ͱ͸ͳ͍ • Overhead • αʔϏεΛՔಇͤ͞Δ͜ͱʹ௚݁͠ͳ͍؅ཧతͳ࡞ۀ • ࠾༻, ϛʔςΟϯά, ධՁ, τϨʔχϯά ... hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 10
  8. ΫοΫύουʹ͓͚Δ SRE • ΠϯϑϥετϥΫνϟʔ෦ • ͷதʹ SRE άϧʔϓ͕͋Δ • DWH,

    Security, ࣾ಺IT ͕ಉ͡෦ॺʹ͋Δ • ϝϯόʔ͸ SRE ͚ͩͰ 10 ਓఔ౓ + ւ֎ʹ 2 ໊ఔ౓ • ࠃ಺ 7 ׂɾάϩʔόϧ 1 3 ׂ͙Β͍ͷൺ཰ 1 h$ps:/ /speakerdeck.com/sorah/building-infrastructure-for-our-global-service hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 12
  9. ΫοΫύουʹ͓͚ΔΠϯϑϥߏ੒ • ༷ʑ͋Δ͕ϝΠϯͰಈ͍͍ͯΔͷ͸େ͖͘෼͚ͯ 2 ͭ • EC2 + spotscaler ؀ڥ

    • EC2 ʹΞϓϦέʔγϣϯΛ௚઀σϓϩΠ͍ͯ͠Δߏ੒ • EC2 Πϯελϯε͸εϙοτʹͳ͍ͬͯΔ • ECS + hako ؀ڥ • ECS Ϋϥελʹରͯ͠ίϯςφΛσϓϩΠ͍ͯ͠Δߏ੒ • σϓϩΠϝϯτʹ͸ eagletmt/hako Λར༻͍ͯ͠Δ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 14
  10. EC2 + spotscaler ؀ڥ • ΫοΫύουͷ͍ΘΏΔը໘ͷඳը෦ͳͲ͕͜Εʹ͋ͨΔ • PC ൛αΠτ, εϚϗ༻ͷ

    API ... ͳͲ • Route53 + ALB + EC2 ͷ AWS ͷҰൠతͳߏ੒ • EC2 ͸ spot instance Λ spotscaler 2 Λ࢖ͬͯ؅ཧ͍ͯ͠Δ • deployment ͸ serf + sorah/mamiya 3 Λ࢖ͬͯߦ͍ͬͯΔ 3 h$ps:/ /github.com/eagletmt/hako 2 h$ps:/ /speakerdeck.com/ryotarai/spot-instances-in-cookpad-number-cookpadtechconf-2017 hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 15
  11. ECS + hako ؀ڥ • ΫοΫύου಺ͷ༷ʑͳϚΠΫϩαʔϏε͕͜Εʹ͋ͨΔ • λΠϜϥΠϯ, ޿ࠂ, ೝূ...

    ͳͲ • ػցֶशͷج൫΋͜͜ʹ৐͍ͬͯΔʢg2 Πϯελϯεͷ ECS Ϋϥελʣ • ECS ͷΫϥελʹରͯ͠ hako 3 Λར༻ͯ͠σϓϩΠ͍ͯ͠Δ • GHE ্ͷϦϙδτϦʹΞϓϦέʔγϣϯఆٛͷ PR Λग़ͯ͠΋Β͏ • σϓϩΠ͸ Slack ্͔Β։ൃऀ͕جຊతʹࣗ༝ʹߦ͑Δ • PR ຖͷεςʔδϯάɾ౷߹ίϯιʔϧɾτϨʔγϯάͳͲ͕͋Δ 4 4 h$p:/ /techlife.cookpad.com/entry/2016/03/16/100043 3 h$ps:/ /github.com/eagletmt/hako hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 16
  12. ECS + hako ؀ڥ ʹ͓͚Δ YAML ઃఆ scheduler: <<: !include

    schedulers/default.yml elb_v2: vpc_id: vpc-xxxxxx health_check_path: /healthcheck listeners: - port: 80 protocol: HTTP # : autoscaling: min_capacity: 2 max_capacity: 24 policies: # : policies for autoscaling hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 17
  13. ECS + hako ؀ڥ ʹ͓͚Δ YAML ઃఆ app: image: xxxxxxxxxx/service

    # ECR Λࢦఆ͢Δ͜ͱ͕ଟ͍ cpu: 128 memory: 512 links: - fluentd env: $providers: - <<: !include env_providers/vault.yml directory: hako/group/service # vault ಺ hako/group/service ͷ஋͸ - !include env_providers/vault_shared.yml # ౰֘ͷάϧʔϓʹॴଐ͢Δਓ͕ࣗ༝ʹઃఆͰ͖Δ RAILS_ENV: production WORKER_NUM: '2' additional_containers: front: !include containers/front.yml fluentd: !include containers/fluentd.yml volumes: td-agent: source_path: /var/lib/hako/td-agent/service hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 18
  14. ΫοΫύουʹ͓͚ΔϚΠΫϩαʔϏεͷ࣮ફ • جຊతʹ͸ϝΠϯͷΞϓϦέʔγϣϯΛ hako ؀ڥʹ੾Γग़͢ • ৽نʹ࡞੒͞ΕΔΞϓϦέʔγϣϯ͸ hako ؀ڥΛඪ४ʹ͢Δ •

    ಉ࣌ʹɺmicroservice ԽʹΑΔ՝୊ʹ͍͔ͭ͘औΓ૊ΜͰ͍Δ • αʔϏεؒͷ Tracing ʹ͸ AWS X-Ray Λར༻͍ͯ͠Δ • αʔϏεϝογϡͷ࣮ݱʹ͸ envoyproxy/envoy Λར༻͍ͯ͠Δ • ͍͔ͭ͘՝୊͕͋Γͭͭ΋࣮ࡍʹಈ͔ͯ͠෼཭ΛਐΊ͍ͯΔ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 21
  15. ΫοΫύουʹ͓͚Δ SRE ͷ׆ಈ • ΫοΫύουͷ SRE ͷ׆ಈΛ͍͔ͭ͘঺հ • αʔόߏ੒ͷ؅ཧɾมߋ •

    ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ϞχλϦϯάͷվળ • ϩʔυόϥϯγϯάͷվળ • ৽ଔݚम • ... hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 25
  16. αʔόߏ੒ͷมߋɾ؅ཧ • αʔόߏ੒͸جຊతʹશͯίʔυԽ͞Ε͍ͯΔ • Codenize.tools 5 ΍ Itamae 6, Terraform

    ͳͲ • IAM User ΍ Route53 ΋ GHE ্Ͱී௨ʹ؅ཧ͞Ε͍ͯΔ • ։ൃऀ͕ࣗ༝ʹ PR Λग़͢͜ͱ͕ग़དྷΔ • αʔόߏ੒ͷมߋ͸ίʔυͷ Pull request ͱͯ͠ग़͢ • ϨϏϡʔΛड͚ɺϚʔδ͔ͯ͠Β൓ө • Ұ෦͸Ϛʔδͯࣗ͠ಈͰ൓ө͞ΕΔ 6 h$ps:/ /github.com/itamae-kitchen/itamae 5 h$ps:/ /codenize.tools/ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 26
  17. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • େ͖Ίͳػೳ͸αʔϏε୯ҐͰ෼ׂͯ͠։ൃ͢Δ • ৔߹ʹ΋ΑΔ͕ SRE ͕ 1

    ਓΞαΠϯ͞ΕΔ • ౰֘ͷαʔϏεͷ৴པੑʹؔ͢Δ͜ͱ͸جຊతʹશ෦΍Δ • ઃܭͷ૬ஊɾϨϏϡʔ • ϦϦʔεલͷෛՙࢼݧɾΩϟύγςΟϓϥϯχϯά • ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ... hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 30
  18. ઃܭͷ૬ஊɾϨϏϡʔ • ։ൃͷલʹҎԼͷΑ͏ͳ͜ͱΛڞ༗ͯ͠΋Β͍ٞ࿦͢Δ • ͲͷΑ͏ͳΞϓϦέʔγϣϯΛ࡞Δ͔ • ͲͷΑ͏ͳίϯϙʔωϯτ͕ඞཁ͔ • ͲͷΑ͏ͳߏ੒ɾεέʔϧઓུʹ͢Δ͔ •

    Design Document Λॻ͍ͯ΋Β͍ͦΕΛϕʔεʹ࿩Λͨ͠Γ͢Δ • େ͖ΊͷػೳͳͲΛ։ൃ͢Δ࣌͸ Design Document Λॻ͘͜ͱ͕ଟ͍ (SRE ΋) hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 31
  19. ෛՙࢼݧ • ͋Δఔ౓ग़དྷ͖ͯͨลΓͰෛՙࢼݧͷ࿩Λ͢Δ • Ͳͷ͘Β͍ͷϦΫΤετ͕૝ఆ͞ΕΔ͔ • ͲͷΑ͏ͳϢʔβϦΫΤετ͕૝ఆ͞ΕΔ͔ • SRE ଆͰෛՙࢼݧͷγφϦΦ΍؀ڥΛ࡞Δ

    • ෛՙࢼݧΛߦ͍ͳ͕Βຊ൪؀ڥͰͷΩϟύγςΟΛݟੵ΋Δ • ΞϓϦέʔγϣϯ্ͷϘτϧωοΫ͕Ͳ͜ʹ͋Δ͔ΛݟΔ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 32
  20. ύϑΥʔϚϯεʹର͢ΔΞϓϩʔν • ৴པੑʹؔ͢Δ໰୊͕͋Δ৔߹͸ࣗ෼ͰίʔυΛॻ͘ • ྫ͑͹ύϑΥʔϚϯε্ͷ໰୊ͳͲ • ΞϓϦέʔγϣϯͷ࢓༷Λཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹Ͳ͏͍͏ը໘ͷͲ͏͍͏ΫΤϦͳͷ͔ •

    Ͳ͏͍͏৘ใ͕ཉͯ͘͠Ͳ͏͍͏੍໿͕͋Δͷ͔ • ࢖͍ͬͯΔϛυϧ΢ΣΞͷ࢓༷΋ཧղͯ͠ద੾ͳΞϓϩʔνΛऔΔ • ྫ͑͹ SQL ͷվળͳΒ MySQL ͷΠϯσοΫεͷߏ଄΍಺෦ॲཧͷ஌͕ࣝ͋Δͱྑ͍ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 34
  21. ৽نαʔϏεʹର͢Δ SRE ͷಈ͖ • ͜ͷลΓ͸౰ͨΓલͷࣄΛී௨ʹ΍͍ͬͯΔ͚ͩ • ΋ͬͱվળग़དྷΔՕॴ͕ͨ͘͞Μ͋Δͱߟ͍͑ͯΔ • ྫ͑͹ෛՙࢼݧΛࣗಈͰߦ͏Α͏ʹ͢ΔɺͳͲ •

    ΞϓϦέʔγϣϯΛཧղ͠ɺඞཁͳΒ͹ίʔυΛॻ͘ • SRE ͸αΠτͷ৴པੑʹؔΘΔશͯʹ੹຿Λෛ͏ • ΞϓϦέʔγϣϯͷಛੑ΍ઃܭΛཧղ͠ϛυϧ΢ΣΞΛ૊Έ߹ΘͤΔ Systems Engineering • ࢓༷΍ϛυϧ΢ΣΞΛཧղ࣮͠ࡍʹΞϓϦέʔγϣϯͷϩδοΫΛॻ͘ So.ware Engineering hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 36
  22. ϞχλϦϯάͷվળ • ݱࡏ͸ϝτϦΫεऔಘʹ Zabbix Λ࢖༻͍ͯ͠Δ • ΑΓεέʔϧ͠ɺߴڃݴޠͰϝτϦΫε΍ΞϥʔτΛѻ͑ΔΑ͏ʹ͍ͨ͠ • SRE ຊ

    ͷ 10 ষʹॻ͔Ε͍ͯΔΑ͏ͳঢ়ଶʹͳΓ͍ͨ • Prometheus ͷಋೖΛঃʑʹ࣮ࢪ͍ͯ͠Δ • ϝτϦΫεͷղ૾౓΋ 15 ඵ୯ҐͳͲͰऔಘ͢ΔΑ͏ʹ͍ͯ͠Δ • ୯७ͳ HTTP ϨεϙϯεͰϝτϦΫεΛऩूग़དྷΔ • ΞϓϦέʔγϣϯࣗମͷϝτϦΫεͳͲ͕ൺֱత؆୯ʹऩूग़དྷΔ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 38
  23. ϩʔυόϥϯγϯάͷվળ • ैདྷ͸ΞϓϦέʔγϣϯαʔόͷϩʔυόϥϯγϯάʹ haproxy Λར༻͍ͯͨ͠ • Weighted Round-robin (wrr) Λར༻ͯ͠όϥϯγϯάΛߦ͍ͬͯͨ

    • ෛՙͷ௿͍ΤϯυϙΠϯτͱߴ͍ΤϯυϙΠϯτ͕͋Δ৔߹ෛՙ͕͹Βͭ͘ • wrr ͩͱಛఆͷαʔόʹෛՙͷߴ͍ΤϯυϙΠϯτ͕ूத͢ΔՄೳੑ͕͋Δ • leastconn Λ࢖͑͹ྑͦ͞͏͕ͩ HTTP keepalive ͍ͯ͠Δ৔߹ʹ໰୊͕͋Δ • ίωΫγϣϯ͕࢒ΔͷͰ࣮ࡍʹॲཧ͍ͯ͠ΔϦΫΤετ਺ͱണ཭͢Δɹ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 45
  24. ryotarai/simproxy • h#ps:/ /github.com/ryotarai/simproxy • Go Ͱॻ͔ΕͨγϯϓϧͳϦόʔεϓϩΩγ • ϔϧενΣοΫͳͲͷجຊతͳػೳ͸͋Δ •

    Balancing Method ʹ leastreq Λ࣮૷͍ͯ͠Δ • ࢦඪͱͯ͠୯७ͳίωΫγϣϯ਺Ͱ͸ͳ࣍͘ͷΑ͏ͳ஋Λར༻͢Δ • όοΫΤϯυ͕ड͚औ͕ͬͨԠ౴͕ฦ͖͍ͬͯͯͳ͍ϦΫΤετͷ਺ • ͜ΕʹΑΓ HTTP keepalive ͕͋ͬͯ΋ leastconn ͷΑ͏ͳόϥϯγϯά͕Մೳ • ϩʔυόϥϯγϯάͷΞϧΰϦζϜ౳ʹؔ͢Δ Systems Engineering ͱ࣮૷͢Δ So.ware Engineering hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 46
  25. ৽ଔݚम • ৽ଔʹରͯ͠ΠϯϑϥͷݚमΛ 3 ೔ؒߦ͍ͬͯΔ • MySQL ͷΠϯσοΫεͷߏ଄Λཧղ͠ΫΤϦνϡʔχϯάΛߦ͏ ... •

    AWS ͷ ALB ΍ RDS ͳͲΛ࢖͍ࣗ෼ͷΞϓϦέʔγϣϯΛεέʔϧͤ͞Δ ... • ࣮ࡍͷΞϓϦέʔγϣϯΛॻ͘ϝϯόʔ͕Πϯϑϥͷཧղ͕͋Δํ͕ྑ͍ • શͯͷΞϓϦέʔγϣϯίʔυΛ SRE ͕νΣοΫ͍ͯ͠Δͱεέʔϧ͠ͳ͍ • ෼ྨ্͸ Overhead ͔΋͠Εͳ͍͕αʔϏεΛεέʔϧͤ͞ΔͨΊʹ͸ඞཁ • ࣮ࡍʹ͸ Systems Engineering ͷਂ͍஌ݟ͕ແ͍ͱڭ͑Δ͜ͱ͕ग़དྷͳ͍ hbstudy#79 (2017/11/20) | Yoshikawa Ryota ( @rrreeeyyy ) 49