Upgrade to Pro — share decks privately, control downloads, hide ads and more …

k8sの可用性とScalabilityを担保するための大事な観点 / Best practic...

Avatar for Hiroki Sakamoto Hiroki Sakamoto
September 29, 2020

k8sの可用性とScalabilityを担保するための大事な観点 / Best practices for ensuring availability and scalability for k8s

Avatar for Hiroki Sakamoto

Hiroki Sakamoto

September 29, 2020
Tweet

More Decks by Hiroki Sakamoto

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ ࡔຊେক (Hiroki Sakamoto) Twitter: taisho6339 Github: taisho6339 ΩϟϦΞ Ϡϑʔ

    → ϦΫϧʔτςΫϊϩδʔζ → ϑϦʔϥϯε ݱࡏͷ࢓ࣄ k8sʹΑΔϚΠΫϩαʔϏεͷͨΊͷج൫ͮ͘Γͱӡ༻ ࠓޙͷํ਑ ΑΓࡋྔΛ΋ͬͯಇͨ͘Ίɺਖ਼ࣾһݕ౼தɻ
  2. ϨΠςϯγͷ୲อ ~PodͷεέʔϧΞοϓ~ containers: ... resources: limits: cpu: 1.0 memory: 512Mi

    requests: cpu: 0.2 memory: 512Mi ղܾࡦ1. PodͷεέʔϧΞοϓ • CPUɺϝϞϦͳͲͷϦιʔεΛࢦఆՄೳ • PodͷఆٛʹrequestͱlimitͰઃఆ
  3. ϨΠςϯγͷ୲อ ~PodͷεέʔϧΞοϓ~ limitʹΑΔࢦఆ • Pod͕࣮ࡍʹ࢖༻Ͱ͖ΔݶքͷϦιʔεྔ ◦ requestͷׂ౰ϦιʔεΛ௒͑Δ͜ͱ͕Ͱ͖Δ ◦ ࢦఆ͠ͳ͍ͱrequestͱಉ͡ʹͳΔ ◦

    ීஈ͸গͳ͍͍͕ͯ͘ɺҰ࣌తʹόʔετ͢ΔՄೳੑͷ͋ΔϫʔΫϩʔυʹ༗ޮ • limitΛӽ͑Α͏ͱ͢ΔͱεϩοτϦϯά͞Εɺ࢖༻཰Λ཈͑ΒΕΔ
  4. ϨΠςϯγͷ୲อ ~࠷దͳϊʔυ΁ͷ഑ஔ~ ϊʔυΛࢦఆ͢Δํ๏ • nodeSelector ◦ ಛఆͷϥϕϧΛ࣋ͭNodeʹ഑ஔ • NodeAffinity ◦

    ಛఆͷϥϕϧΛ࣋ͭNodeʹ഑ஔɻͪ͜Βͷ΄͏͕ΑΓॊೈ • Taint + Toleration ◦ Nodeʹ഑ஔ੍ݶΛՃ͑ɺ഑ஔද໌Λ͍ͯ͠ΔPodͷΈ഑ஔ ࢀߟϦϯΫ: Node্΁ͷPodͷεέδϡʔϦϯά
  5. εϧʔϓοτͷ୲อ Ingress Gateway Service A Service B Service C LB

    ϚΠΫϩαʔϏε + API Gatewayͳύλʔϯ LB͸6000RPS ग़Δʁ Nginx (੩తίϯςϯπΛฦ٫) جຊํ਑
  6. εϧʔϓοτͷ୲อ Ingress Gateway Service A Service B Service C LB

    ϚΠΫϩαʔϏε + API Gatewayͳύλʔϯ Ingress Gateway ͸6000RPSग़Δʁ جຊํ਑
  7. εϧʔϓοτͷ୲อ جຊํ਑ Ingress Gateway Service A Service B Service C

    LB ϚΠΫϩαʔϏε + API Gatewayͳύλʔϯ αʔϏε͸ 6000RPSग़Δʁ
  8. εύΠΫʹඋ͑Α͏ ~Horizontal Pod Autoscaler~ HPAͷܭࢉࣜ desiredReplicas = ceil[currentReplicas * (

    currentMetricValue / desiredMetricValue )] εέʔϧޙPod਺ = ceil [4 * (90 / 60)] = 6 ܭࢉྫ ܭࢉର৅ϝτϦΫε: CPU࢖༻཰ Target CPU࢖༻཰: 60% ݱࡏͷReplica਺ = 4 ݱࡏͷPodͷฏۉCPU࢖༻཰: 90%
  9. εύΠΫʹඋ͑Α͏ ରԠࡦ • ͕࣌ؒ༧ଌͰ͖Δ৔߹ (CMޮՌɺYahoo๒ͳͲ) ◦ CronJobͳͲͰಛఆ࣌ؒʹHPAͷminReplica਺Λ্͛Δ • ͕࣌ؒ༧ଌͰ͖ͳ͍৔߹ ◦

    ྫ͑͹CPU࢖༻཰ͷ໨ඪ஋ΛΏΔ͓ͯ͘͘͠ ◦ ༧ΊminReplica਺ΛੵΜͰ͓͘ ◦ CDNͳͲͷΩϟογϡઓུΛݟ௚͢
  10. ϊʔυͷμ΢ϯʹඋ͑Α͏ ϊʔυͷμ΢ϯʹର͢Δରࡦ 1. Podͷ৑௕Խͱκʔϯ෼ࢄ 2. ҆શͳPodͷఀࢭ ◦ Graceful Shutdownͷઃఆ ◦

    Podͷద੾ͳୀආઓུ 3. Ϋϥελͷਖ਼͍͠ઃఆ ◦ ϝϯςφϯε΢Οϯυ΢ͱSurge Upgradeͷઃఆ ◦ Մ༻ੑඇอূϊʔυͷਖ਼͍͠ӡ༻
  11. ϊʔυͷμ΢ϯʹඋ͑Α͏ ~Podͷ৑௕Խͱκʔϯ෼ࢄ~ Podͷ෼ࢄઓུ ~഑ஔϊʔυͷ෼ࢄ~ Node1 Node2 serviceA serviceA Pod Anti

    AffinityΛ׆༻ͯ͠ɺ ಉαʔϏεͷPod͕ ಉ͡ϊʔυʹ ͳΔ΂͘഑ஔ͞Εͳ͍Α͏ʹ͢Δ
  12. ϊʔυͷμ΢ϯʹඋ͑Α͏ ~Podͷ৑௕Խͱκʔϯ෼ࢄ~ Podͷ෼ࢄઓུ ~κʔϯͷ෼ࢄ~ Node1 asia-northeast1-a serviceA serviceA Node2 asia-northeast1-b

    Pod Anti AffinityΛ׆༻ͯ͠ɺ ಉαʔϏεͷPod͕ ಉ͡κʔϯʹ ͳΔ΂͘഑ஔ͞Εͳ͍Α͏ʹ͢Δ 1.18Ҏ্͸ Topology Spread Constraints ͕Φεεϝʂ ϦʔδϣφϧΫϥελʂ
  13. Pod Disruption Budget ϊʔυ͕PodΛഉग़͢Δͱ͖ʹಉ࣌ʹఀࢭ͢ΔPodͷ਺Λ੍ޚ͢ΔͨΊͷϦιʔε ϊʔυͷμ΢ϯʹඋ͑Α͏ ~҆શͳఀࢭ~ ࢀߟ: Disruption Node ࢦఆͷ਺ͣͭ

    ഉग़ apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: sample spec: maxUnavailable: "25%" selector: matchLabels: app: sample
  14. Preemptibleϊʔυͷӡ༻ ϊʔυͷμ΢ϯʹඋ͑Α͏ ~ඇՄ༻ੑอূͷϊʔυͷӡ༻~ Node1 ௨ৗͷNode Pool Node2 Node1 Preemptible Node

    Pool Node2 • ྆ํͷNode Poolʹ ഑ஔ͢Δ • ॏཁͳPod͸഑ஔ͠ͳ ͍ • શମͷϊʔυ਺ͷҰ෦ ʹݶఆ͢Δ ServiceA ServiceB ServiceA ServiceB
  15. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C ΫϥελΛ৑௕Խ ୯ҰͷVIPΛఏڙ͢ΔLB
  16. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C Ϋϥελͷߋ৽࣌
  17. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C LB͔Β੾Γ཭ͯ͠ Ϋϥελߋ৽࡞ۀ
  18. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C ࠶ϧʔςΟϯά
  19. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C ໰୊ͳ͚Ε͹ ͪ͜Β΋ߋ৽
  20. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C ࠶ϧʔςΟϯά
  21. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C Ϋϥελͷ ϩʔϦϯάΞοϓσʔτΛ࣮ݱ
  22. ϚϧνΫϥελͰՄ༻ੑΛ޲্ Ingress Gateway Service A Service B Service C LB

    Ingress Gateway Service A Service B Service C Ϧʔδϣϯো֐Ͱ΋ ϑΣΠϧΦʔόʔ
  23. ϚϧνΫϥελͷ࣮ݱํ๏ • GCLB + NEGΛ࢖ͬͨϧʔςΟϯάΛࣗಈ Ͱߏங • ManagedͳService MeshΛఏڙ •

    Observabilityͷ୲อͱSLO/SLIϞχλϦϯ ά • ΫϥελϦιʔεͷΫϥελؒಉظ GCP Anthos