サービスのパフォーマンス数値と
依存関係を用いたサービス同士の
協調スケール構想 / Web System Architecture #1

E4619fc2a039391a1677beeac58dd487?s=47 itkq
December 23, 2017
450

サービスのパフォーマンス数値と
依存関係を用いたサービス同士の
協調スケール構想 / Web System Architecture #1

Web System Architecture 研究会 #1

E4619fc2a039391a1677beeac58dd487?s=128

itkq

December 23, 2017
Tweet

Transcript

  1. αʔϏεͷύϑΥʔϚϯε਺஋ͱ
 ґଘؔ܎Λ༻͍ͨαʔϏεಉ࢜ͷ
 ڠௐεέʔϧߏ૝ Web System Architecture ݚڀձ #1 @itkq

  2. Me 2 @itkq Takuya Kosugiyama ౦޻େ ৘ใ௨৴ܥ M2 όΠτ (SRE)

    ͍ͨ͜ (͖Ύʔ)
  3. ࢀՃ໨త • Web System ͷՄೳੑɺݶքɺֶज़తՁ஋ʹ
 ڵຯ͕͋Δ • ༷ʑͳਓʑͱٞ࿦͍ͨ͠ • ए͍͏ͪ

    [ཁग़య] ʹੈքΛ޿͍͛ͨ 3
  4. ࢀՃ໨త • Web System ͷՄೳੑɺݶքɺֶज़తՁ஋ʹ
 ڵຯ͕͋Δ • ༷ʑͳਓʑͱٞ࿦͍ͨ͠ • ए͍͏ͪ

    [ཁग़య] ʹੈքΛ޿͍͛ͨ 3 • ϦΞϧ id:y_uuki ͞ΜΛݟʹ͖ͨ
  5. ൃද಺༰ • ࠷ۙͷ Web ΞʔΩςΫνϟͷಈ޲ͱ SRE ຊΛ
 ಡΜͰߟ͑ͨ͜ͱ • ίϯςφϕʔεͷαʔϏεϝογϡΛ׆༻ͨ͠


    ӡ༻্ͷεέʔϧ໰୊ղܾͷͨΊͷ଍͕͔Γ 4
  6. എܠɿΞϓϦέʔγϣϯίϯςφ • ίϯςφܕԾ૝Խ • খΦʔόʔϔου • Container as a Service

    • Production ready • Docker, Kubernetes • Managed services (GKE, ECS, EKS) 5
  7. എܠɿαʔϏεࢦ޲ΞʔΩςΫνϟ • ϞϊϦεͷେن໛Խͱݶք • อकੑɺ։ൃޮ཰ɺ… • αʔϏεࢦ޲ΞʔΩςΫνϟ • ػೳΛαʔϏεͱͯ͠੾Γग़͢ •

    αʔϏεಉ͕࢜࿈ܞ 6 Q. αʔϏεςΟεΧόϦ? ϦτϥΠ? λΠϜΞ΢τ?
  8. എܠɿαʔϏεϝογϡ • service-to-service ௨৴ͷࡍͰϓϩΩγΛհ͢Δ • αʔϏεϝογϡɿ
 L7 ϓϩΩγʹΑΔωοτϫʔΫͷந৅ϨΠϠʔ 7 Service

    A Proxy Service B Proxy Controller Data Plane Control Plane
  9. എܠɿαʔϏεϝογϡͷ༻్ • Envoy, Linkerd • Advanced load balancing • Circuit

    breaking • Rate limiting • Service discovery • Observation • Statistics • Logging • Tracing 8 ։ൃج൫ͷྖҬʹۙ͘ɺӡ༻΁ͷԠ༻͸ߟ͑Δ༨஍͋Γ
  10. ίϯςφɾϚΠΫϩαʔϏεͷ࣮ӡ༻՝୊ 1. ϗετΩϟύγςΟϓϥϯχϯά • ίϯςφͷਫฏεέʔϧ͸༰қ • ͔͠͠ϗετΩϟύγςΟ͸ඞཁ 2. ίϯςφϦιʔεͷܾఆ๏ •

    “desired count” ͸୭͕Ͳ͏ܾΊΔͷ͔ 3. αʔϏεಉ͕࢜ڠௐͨ͠εέʔϧ • ґଘઌαʔϏεͷΩϟύγςΟ΋ߟྀ͢Δඞཁ 9 ͍ͣΕ΋ΩϟύγςΟͷ໰୊ͱͯ͠ू໿Ͱ͖Δ
  11. ΩϟύγςΟϓϥϯχϯάͷ޲͔͏΂͖࢟ • Site Reliability Engineering 18.2 10 ΠϯςϯτϕʔεͷΩϟύγςΟ໨ඪ Ϧιʔε੍໿ɾྉۚ࠷খԽͷ࠷దԽ໰୊ ґଘؔ܎ɾύϑΥʔϚϯεϝτϦΫεͷ೺Ѳ

  12. ΩϟύγςΟϓϥϯχϯάͷ޲͔͏΂͖࢟ • Site Reliability Engineering 18.2 10 ΠϯςϯτϕʔεͷΩϟύγςΟ໨ඪ Ϧιʔε੍໿ɾྉۚ࠷খԽͷ࠷దԽ໰୊ ґଘؔ܎ɾύϑΥʔϚϯεϝτϦΫεͷ೺Ѳ

  13. ࠶ܝɿ࣮ӡ༻՝୊ 11 1. ϗετΩϟύγςΟϓϥϯχϯά 2. ίϯςφϦιʔεͷܾఆ๏ 3. αʔϏεಉ͕࢜ڠௐͨ͠εέʔϧ

  14. ࠶ܝɿ࣮ӡ༻՝୊ 11 1. ϗετΩϟύγςΟϓϥϯχϯά 2. ίϯςφϦιʔεͷܾఆ๏ 3. αʔϏεಉ͕࢜ڠௐͨ͠εέʔϧ جૅݕ౼ɿ •

    ύϑΥʔϚϯεఆྔԽ
  15. ՝୊Πϝʔδɿ૝ఆঢ়گ 12 A B C user facing internal × 2

    × 1 × 1 D System × 1
  16. ՝୊Πϝʔδ 1 13 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ desired count: 3
  17. ՝୊Πϝʔδ 1 13 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ × 3 desired count: 3 C ͷߟྀ࿙Ε
  18. ՝୊Πϝʔδ 1 13 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ C × 3 desired count: 3 C ͷߟྀ࿙Ε
  19. ՝୊Πϝʔδ 1 13 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ C A × 3 desired count: 3 C ͷߟྀ࿙Ε
  20. ՝୊Πϝʔδ 2 14 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 desired count: 3 desired count: 3 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ
  21. ՝୊Πϝʔδ 2 14 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 × 3 × 3 desired count: 3 desired count: 3 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ D ͷߟྀ࿙Ε
  22. ՝୊Πϝʔδ 2 14 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 B × 3 × 3 desired count: 3 desired count: 3 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ D ͷߟྀ࿙Ε
  23. ՝୊Πϝʔδ 2 14 A B C user facing internal ×

    2 × 1 × 1 D System × 1 desired count: 6 × 6 B A × 3 × 3 desired count: 3 desired count: 3 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ D ͷߟྀ࿙Ε
  24. ղܾΠϝʔδɿ૝ఆঢ়گ 15 A 100 rps/container A B C user facing

    internal × 2 × 1 × 1 D System × 1 B 200 rps/container C 100 rps/container D 100 rps/container 0.4 0.2 0.7
  25. ղܾΠϝʔδ 16 A 100 rps/container A B C user facing

    internal × 2 × 1 × 1 D System × 1 B 200 rps/container C 100 rps/container D 100 rps/container 0.4 0.2 0.7 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ
  26. ղܾΠϝʔδ 16 A 100 rps/container A B C user facing

    internal × 2 × 1 × 1 D System × 1 B 200 rps/container C 100 rps/container D 100 rps/container 0.4 0.2 0.7 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ × 6
  27. ղܾΠϝʔδ 16 A 100 rps/container A B C user facing

    internal × 2 × 1 × 1 D System × 1 B 200 rps/container C 100 rps/container D 100 rps/container 0.4 0.2 0.7 ex. ϐʔΫ࣌
 A ͷෛՙ 3 ഒ (100 * 6 * 0.4 + 100 * 0.7) 
 / 200 ~ 2 (100 * 6 * 0.2) / 100 ~ 2 × 6 × 2 × 2 ύϑΥʔϚϯε਺஋ + ґଘ౓ → ඞཁ࠷௿ݶͷίϯςφ਺
  28. ໰୊ɿύϑΥʔϚϯεఆྔԽͱґଘ౓ • ఆྔԽʹؔͯ͠ • ͲͷΑ͏ʹଌఆ͢Δ͔ • ϝτϦΫε͸ rps ͰΑ͍͔ •

    ͲͷΤϯυϙΠϯτʹର͢Δ rps ͔ • ϦΫΤετύλʔϯ͸ prod ͱಉҰ͔ • αʔϏεͷґଘ౓͸Ͳ͏ܾ·Δ͔ 17 ύϑΥʔϚϯεࣗಈଌఆɾఆྔԽͷ࣮ྫ͕গͳ͍
  29. ໰୊ɿύϑΥʔϚϯεఆྔԽͱґଘ౓ • ఆྔԽʹؔͯ͠ • ͲͷΑ͏ʹଌఆ͢Δ͔ • ϝτϦΫε͸ rps ͰΑ͍͔ •

    ͲͷΤϯυϙΠϯτʹର͢Δ rps ͔ • ϦΫΤετύλʔϯ͸ prod ͱಉҰ͔ • αʔϏεͷґଘ౓͸Ͳ͏ܾ·Δ͔ 17 ύϑΥʔϚϯεࣗಈଌఆɾఆྔԽͷ࣮ྫ͕গͳ͍ αʔϏεϝογϡ + shadowing
  30. αʔϏεϝογϡߏ੒ྫ 18 front-proxy Request B proxy A proxy stats discovery

    • connection / request count • 1xx, 2xx, … 5xx count • etc. per proxy metrics:
  31. shadowing 19 front-proxy Request B proxy A proxy Prod Shadow

    # routes { "virtual_hosts": [ { "name": "service_a", "domains": [ "*" ], "routes": [ { "prefix": "/", "cluster": "service_a", "shadow": { "cluster": "service_a_prime" } ... A’ B’ strage front->A: GET / front->A: GET /users A->B: GET /awesome_process record log shadow shadow
  32. ύϑΥʔϚϯεଌఆҊ • ೖྗ • ϦΫΤετύεॏΈ෼෍ d • ಉ࣌઀ଓ਺ c •

    ϨεϙϯελΠϜ໨ඪ r • ग़ྗɿϝτϦΫε • d ʹैͬͨ WRR, c ฒྻͰϦΫΤετ 20 RTT ≦ r Ҏ಺͔ͭ 5xx Ҏ֎ͷϦΫΤετ਺ ୯Ґ࣌ؒ rps = GET / 0.3 GET /users 0.2 POST /users 0.2 GET /depends_b 0.1 …
  33. ·ͱΊ • ϚΠΫϩαʔϏεʹ൐͏αʔϏεϝογϡͷଘࡏ • ࣮ӡ༻΁ͷԠ༻͕ݕ౼༨஍͋Γ • ࠷ߴͷΩϟύγςΟϓϥϯχϯά΁ͷ଍͕͔Γ
 ͱͯ͠ɺύϑΥʔϚϯεఆྔԽʹண໨ • αʔϏεϝογϡͰϦΫΤετύεॏΈ෼෍


    Λࢉग़͠ϝτϦΫεΛܭࢉ • ύϑΥʔϚϯε࣮ଌɺґଘ౓͸ࠓޙͷ՝୊ 21