Kubernetes 運用設計ガイド / A design guide for Kubernetes in production (Japanese)

Kubernetes 運用設計ガイド / A design guide for Kubernetes in production (Japanese)

2018/04/19 JAPAN CONTAINER DAYS V18.04 (https://containerdays.jp/) にて発表したものを加筆修正しました。

Abstract: Kubernetes は豊富な機能とその高い拡張性により、現実における様々なユースケースに対応できる一方、その多機能さゆえにどう使えば良いか迷っている方もいると思います。Kubernetes の基本を学んだ人や本番運用を始めた人を対象に、私がメルカリでの Kubernetes 本番運用経験を元に考えた、アプリケーション運用、インフラ運用、組織の 3 つの観点での設計の指針を紹介します。

32f2e5ddb187baa2abac66d7e8b283fe?s=128

Seigo Uchida

April 23, 2018
Tweet

Transcript

  1. 11.

    ςϩ 1. ςϩΛݕ஌ 3. ҙࢥܾఆ 2. ্૚෦΁఻ୡ 5. ߦಈ 4.

    ݱ৔΁఻ୡ ϐϥϛουܕɺதԝूݖܕͷҙࢥܾఆ ࢀߟ5&".0'5&".4
  2. 16.

    Kubernetes ࣗମͷૂ͍ͱζϨ͍ͯͳ͍͔ʁ Kubernetes is more than just a “container orchestrator”.

    It aims to eliminate the burden of orchestrating physical/ virtual compute, network, and storage infrastructure, and enable application operators and developers to focus entirely on container-centric primitives for self-service operation. Kubernetes Design and Architecutre Ҿ༻IUUQTHJUIVCDPNLVCFSOFUFTDPNNVOJUZCMPCNBTUFSDPOUSJCVUPSTEFTJHOQSPQPTBMTBSDIJUFDUVSFBSDIJUFDUVSFNE
  3. 17.

    Kubernetes ࣗମͷૂ͍ͱζϨ͍ͯͳ͍͔ʁ 1. Portable 2. General-Purpose 3. Meet users partway

    4. Flexible 5. Extensible 6. Automatable 7. Advance the state of the art Kubernetes Design and Architecutre Ҿ༻IUUQTHJUIVCDPNLVCFSOFUFTDPNNVOJUZCMPCNBTUFSDPOUSJCVUPSTEFTJHOQSPQPTBMTBSDIJUFDUVSFBSDIJUFDUVSFNE
  4. 32.

    • Ϋϥελͷϝϯςφϯε • ڞ௨ίϯϙʔωϯτͷϝϯςφϯε • σϓϩΠύΠϓϥΠϯ • ΫϥελϨϕϧͷϞχλϦϯά • ΫϥελϨϕϧͷηΩϡϦςΟ

    etc • ΞϓϦέʔγϣϯίʔυ • ςετ • ίϯςφԽ • σϓϩΠ • ΞϓϦέʔγϣϯϨϕϧͷϞχλϦϯά etc ΞϓϦέʔγϣϯܥ Πϯϑϥڞ௨ج൫ܥ γεςϜ͝ͱʹඞཁͳΤϯδχΞϦϯά࡞ۀ͕͋Δ
  5. 43.

    ؀ڥΛಛผࢹ͠ͳ͍ Service A Development Service B Production Service A Production

    Service B Production ։ൃ؀ڥͱຊ൪؀ڥΛ෼͚ͨͱͯ͠΋ɺ͋ΔαʔϏε͕͋ΔαʔϏεʹӨڹΛ༩͑ ͳ͍Α͏ʹ͠ͳ͍ͱ͍͚ͳ͍͜ͱʹ͸มΘΓ͸ͳ͍ɻͩͱ͢Ε͹։ൃ؀ڥͱຊ൪ ؀ڥΛࠞࡏͤͯ͞΋ಉ͡Ͱ͋Δ
  6. 44.

    Ϧʔδϣϯ͝ͱʹ 1 ͚ͭͩΫϥελΛ࡞Δ 1. AWS, GCP, Heroku ʹ։ൃ؀ڥઐ༻૭ޱ͸ͳ͍ 2. Ϣʔβʔ͕”։ൃ؀ڥ༻”ͱͯ͠ΞΧ΢ϯτΛ࡞͍ͬͯΔ͚ͩ

    3. Google ΋ GitHub ΋ Cluster per region Ͱ͋Δ 4. ηΩϡΞͳαʔϏε΋ಉ͡ΫϥελʹೖΕΔͷ͔ʁ
  7. 46.

    ϓϩμΫτɺαʔϏε͝ͱʹΫϥελΛ෼͚Δʁ Secure Ϋϥελؒ௨৴ Service A Ϋϥελ಺௨৴ Secure Service A Default

    Network Policy ΍ Istio ͳͲͷଘࡏʹΑͬͯݱ࣌఺Ͱ͸Ϋϥελ಺௨৴ͷํ͕ωο τϫʔΫͷ੍ޚ͕͠΍͍͢ɻ
  8. 50.

    Ϋϥελͷઃܭ·ͱΊ 1. Ϧʔδϣϯ͝ͱʹ 1 ͚ͭͩΫϥελΛ࡞Δ 2. ؀ڥ͸Ϋϥελ಺෦ͷ isolation ٕज़ʹΑͬͯ෼཭͢Δ 3.

    ಛఆͷαʔϏεઐ༻ϊʔυ͸ຊ౰ʹඞཁͳ࣌ʹ͚ͩ༻ҙ͢Δ 4. ϓϩμΫτ/αʔϏε͝ͱʹΫϥελΛ࡞Δͷ͸࠷ޙͷखஈ
  9. 55.

    Service A Development Service B Production Service A Production Service

    B Production ؀ڥ͚ͩͰͳ͘αʔϏε΋෼཭͍ͨ͠
  10. 58.

    Service A Development Service B Production Service A Production Service

    B Production Network Policy Ͱ Pod ؒͷ௨৴੍͕ޚͰ͖Δ
  11. 63.

    RBAC Ͱ Kubernetes ͷݖݶͷ؅ཧ͕Ͱ͖Δ • Deployment ͷ࡞੒ • Secrets ͷӾཡ

    • PVC ͷ࡟আ ϢʔβʔΞΧ΢ϯτ foo Role RoleBinding
  12. 65.

    Namespace Admin Role 1. ϓϦηοτͷ “admin” ΛϓϩμΫτνʔϜʹ෇༩ 2. ಛఆͷ Namespace

    ҎԼͷ؅ཧݖݶ ΛϓϩμΫτνʔϜʹݖݶҕৡ 3. ϓϩμΫτνʔϜଆͰඞཁʹԠͯ͡ edit(read-write) ΍ view(read-only) Λ࡞੒ 4. ૊৫తʹ੹೚ൣғΛ໌֬ʹఆٛ͢Δ͜ͱͱɺγεςϜతʹͦΕΛදݱ͢Δ͜ͱ ͸྆ྠɺͲͪΒ͕͚ܽͯ΋͍͚ͳ͍
  13. 66.

    Custom Cluster Admin Role 1. ϓϦηοτͷ “cluster-admin” ͸ԿͰ΋ग़དྷͯ͠·͏ 2. ݖݶΛ࣋ͭ͜ͱ͸ಉ࣌ʹ੹೚Λ࣋ͭ͜ͱΛҙຯ͢Δ

    3. “cluster-admin” ͔ΒݖݶΛམͱͨ͠ “custom-cluster-admin” Λ༻ҙ 4. ڞ௨ج൫ʹར༻͢Δ namespace ΍ node ؅ཧʹඞཁͳݖݶΛ෇༩ 5. αʔϏε/ϓϩμΫτ༻ namespace ͸ secrets Λআ͖ view ݖݶΛ෇༩ 6. ͋͘·ͰΫϥελͷ؅ཧʹప͠ɺϓϩμΫτʹؔ͢Δ෦෼͸೚ͤΔ 7. ϓϩμΫτͷ৴པੑͷ୲อ͕ඞཁͳ৔߹ɺSRE ͸ΫϥελΞυϛϯͰ ͸ͳ͘ϓϩμΫτνʔϜʹॴଐͯ͠׆ಈ͢Δ
  14. 70.

    1. Observable: ίϯςφ͕ਖ਼ৗ͔ҟৗ͔൑ผ͕͚ͭΒΕΔ͔ɺ໰୊ൃੜ࣌ʹݪҼڀ໌͕Ͱ͖Δ͔ 2. Disposable: ҟৗऴྃͨ͠ίϯςφ΍ো֐தͷϊʔυ্ʹ͋ΔίϯςφΛ͙͢ʹࣺͯΒΕΔ͔ 3. Immutable: ϩʔϧόοΫ΍εέʔϧ࣌ʹಉ͡ίϯςφ͕ىಈ͢Δ͜ͱΛอূͰ͖Δ͔ 4.

    Scalable: ϩʔυςετΛܦͯεέʔϧͷ͖͍͠஋͕ఆ·͍ͬͯΔ͔ 5. Loosely Coupled: σϓϩΠɺϩʔϧόοΫɺεέʔϧ࣌ʹґଘؔ܎Λߟྀ͠ͳͯ͘ࡁΉ͔ 6. Graceful: ѱӨڹΛग़ͣ͞ʹىಈɺఀࢭ͕Ͱ͖Δ͔ ࣗಈ / खಈ෮چɺࣗಈ / खಈεέʔϧʹඞཁͳཁૉ
  15. 71.

    1. Liveness Probe ͷར༻ 2. Readiness Probe ͷར༻ 3. ϩά

    ͷऩू 4. ϝτϦΫεͷऩू 5. τϨʔγϯά ᶃ Observable ϩά ϝτϦΫε τϨʔε ϔϧενΣοΫ ίϯςφ
  16. 72.

    Liveness Probe Liveness Probe Ͱ͸ϔϧενΣοΫʹ௨Βͳ͍৔߹ Kubernetes ͕ Pod (ίϯς φ)

    Λ࠶ىಈ͢ΔɻͦͷͨΊɺΞϓϦέʔγϣϯ͕ਖ਼ৗʹىಈ͔ͨ͠Λ൑ผ͢Δͨ Ίʹར༻͢Δɻٯʹݴ͏ͱɺͲ͏͍͏ঢ়گͰࣗಈ࠶ىಈ͍͔ͤͨ͞Λදݱ͢Δ৔ ॴͰ͋ΓɺKubernetes ʹඋΘ͍ͬͯΔࣗಈ෮چͷ࢓૊ΈͷҰ෦Ͱ͋Δɻϔϧε νΣοΫͷਫ਼౓͕؁͍ͱෆඞཁʹ࠶ىಈͯ͠͠·͏ॾਕͷ݋ͳ໘΋͋Δɻ
  17. 74.

    Readiness Probe Readiness Probe Ͱ͸ϔϧενΣοΫʹ௨ͬͨ৔߹ Kubernetes ͕ͦͷ Pod(ίϯ ςφ)Λ Service

    (ϩʔυόϥϯα) ʹొ࿥͢ΔɻͦͷͨΊɺσʔλϕʔε઀ଓͳͲ ΋ؚΊͯΞϓϦέʔγϣϯ͕ਖ਼ৗʹϨεϙϯεΛฦͤΔঢ়ଶ = ४උ͕Ͱ͖͔ͨ (Ready)Λ൑ผ͢ΔͨΊʹར༻͢Δɻ
  18. 78.

    1. ϦϦʔεલʹϩʔυςετΛߦ͍εέʔϧͷ͖͍͠஋Λग़͓ͯ͘͠ 2. Horizontal Pod Autoscaler Λ࢖͏ 3. (Vertical Pod

    Autoscaler Λ࢖͏) 4. Pod Disruption Budget Λ࢖͏ 5. Pod Priority ᶆ Scalable
  19. 80.

    Vertical Pod Autoscaler ίϯςφ CPU: 1 Memory: 1GB ίϯςφ CPU:

    1 Memory: 2GB ίϯςφ CPU: 1 Memory: 1GB ίϯςφ CPU: 2 Memory: 2GB
  20. 81.

    Pod Disruption Budget ίϯςφͷ਺: 10 PDB ͷྫ: ϊʔυϝϯςφϯε౳ͷܭըతͳϊʔυͷμ΢ϯ λΠϜ࣌ʹશମͷ 20%

    ·Ͱ͔͠ݮΔ͜ͱΛڐ༰͠ͳ͍ Քಇ͍ͯ͠Δίϯςφͷ਺: 8 ఀࢭͨ͠ίϯςφͷ਺: 2
  21. 82.

    Pod Disruption Budget ΫϥελΞυϛϯνʔϜ͕ɺܭըϝϯςφϯεʹΑͬͯϊʔυ͔Βίϯς φΛୀආͤ͞Α͏ͱͨ͠ͱ͖(kubectl drain)ɺKubernetes ͸ PDB ʹࢦఆ ͞Εͨ

    Pod ਺ΛԼճΒͳ͍Α͏ʹ͠ͳ͕ΒίϯςφΛୀආͤͯ͘͞ΕΔɻ ͜Ε͸ϓϩμΫτνʔϜͱΫϥελΞυϛϯνʔϜ͕͓ޓ͍ʹӨڹΛग़͞ ͳ͍Α͏ʹ͢ΔͨΊͷ࢓૊Έ = ૄ݁߹ԽͰ͋Δ
  22. 83.

    1. 1 ͭͷίϯςφʹ͸ 1 ͭͷ࢓ࣄΛͤ͞Δ 2. ϋʔυίʔυ΍ґଘؔ܎Λۃྗආ͚Δ 1. Label ͷར༻

    2. σϓϩΠॱং͸͋ΔΑΓͳ͍ํ͕͍͍ 3. Node Affinity ΋ۃྗආ͚Δ 4. Service Λ࢖ͬͯݻఆ IP ΋ආ͚Δ ᶇ Loosely Coupled
  23. 85.

    1. 1 ίϯςφ 1 ϓϩηε͕جຊ 2. 1 ͭͷίϯςφ(Πϝʔδ)ʹෳ਺ͷ੹຿͕͋Δͱίϯςφͷྑ͕͞ͳ͘ͳΔ 1. Dockerfile

    ͕ෳࡶʹͳΔ 2. εέʔϧ৚͕݅ෳࡶʹͳΔ 3. ϞχλϦϯά͕ෳࡶʹͳΔ 4. ىಈॲཧɺఀࢭॲཧ͕ෳࡶʹͳΔ 1 ͭͷίϯςφʹ͸ 1 ͭͷ࢓ࣄΛͤ͞Δ
  24. 89.

    1. ͜͜Ͱ͍͏ΦϖϨʔγϣϯͱ͸ʁ 1. ίϯςφͷσϓϩΠ(ྫ: Deployment ͷ࡞੒) 2. ڞ௨ج൫ͷϝϯςφϯε(ྫ: Node ͷΞοϓάϨʔυ)

    2. ۃྗࣗ཯తͳγεςϜʹ೚ͤΔ͜ͱɺਓྗ࡞ۀΛݮΒ͢͜ͱ͕ࢦ਑ ΦϖϨʔγϣϯͷઃܭ
  25. 90.

    Control Loop Kubernetes ͷࠜװʹ Control Loop ͱ͍͏࢓૊Έ͕͋ΔɻKubernetes ͸ pod ͳͲͷ

    resource Λ؅ཧ͢Δࡍʹɺdesired state(ཧ૝ঢ়ଶ)ͱ actual state(࣮ࡍͷঢ়ଶ)ͷ 2 ͭΛ͓࣋ͬͯΓɺactual state Λ desired state ʹۙ ͚ͮΑ͏ͱ͢ΔॲཧΛӬٱʹ܁Γฦ͍ͯ͠Δɻ
  26. 97.

    એݴతͳΞϓϩʔνΛऔΔ The declarative approach is key to the system’s self-healing

    and autonomic capabilities. Kubernetes Design and Architecture Ҿ༻IUUQTHJUIVCDPNLVCFSOFUFTDPNNVOJUZCMPCNBTUFSDPOUSJCVUPSTEFTJHOQSPQPTBMTBSDIJUFDUVSFBSDIJUFDUVSFNE
  27. 98.

    એݴతͳΞϓϩʔνΛऔΔ In particular, it should be straightforward (but not required)

    to manage declarative intent under version control, which is standard industry best practice and what Google does internally. Version control facilitates reproducibility, reversibility, and an audit trail. ... Version control enables the use of familiar tools and processes for change control, review, and conflict resolution. Declarative application management in Kubernetes Ҿ༻IUUQTEPDTHPPHMFDPNEPDVNFOUED-1(XF7&:S7R2W#-+HTY75S&3N./0#"@DY;186
  28. 99.

    ͳͥ Kubernetes ͕ YAML ϕʔεͳͷ͔ 1. DSL ʹൺ΂ͯଟ͘ͷݴޠͰαϙʔτ͞Ε͍ͯΔ 2. Lint

    πʔϧ΋ଟ͘ଘࡏ͢Δ 3. API schema ͷৄࡉΛ֮͑ͳ͍ͱॻ͚ͳ͍΋ͷͷ… 1. ஗͔Εૣ͔Ε API schema ΍ Kubernetes ͷ֓೦ʹֶ͍ͭͯͼͨ͘ͳΔɺ· ͨ͸ֶͿඞཁ͕ग़ͯ͘ΔͷͰແବʹͳΒͳ͍ 2. Ή͠ΖҰ؏ੑ͕͋ͬͯΑ͍ ࢀߟIUUQTHJUIVCDPNLVCFSOFUFTDPNNVOJUZCMPCNBTUFSDPOUSJCVUPSTEFTJHOQSPQPTBMTBSDIJUFDUVSFBSDIJUFDUVSFNE
  29. 100.

    YAML ϚχϑΣετ͕ॻ͚ΔͳΒ… FDIPb BQJ7FSTJPOBQQTWCFUB LJOE%FQMPZNFOU NFUBEBUB OBNFEFQMPZNFOUFYBNQMF TQFD SFQMJDBT SFWJTJPO)JTUPSZ-JNJU

    UFNQMBUF NFUBEBUB MBCFMT BQQOHJOY TQFD DPOUBJOFST OBNFOHJOY JNBHFOHJOY QPSUT DPOUBJOFS1PSU cLVCFDUMDSFBUFG
  30. 101.

    REST API ΋࢖͑ΔΑ͏ʹͳ͍ͬͯΔ DVSM91045)$POUFOU5ZQFBQQMJDBUJPOZBNMEBUB BQJ7FSTJPOBQQTWCFUB LJOE%FQMPZNFOU NFUBEBUB OBNFEFQMPZNFOUFYBNQMF TQFD SFQMJDBT

    SFWJTJPO)JTUPSZ-JNJU UFNQMBUF NFUBEBUB MBCFMT BQQOHJOY TQFD DPOUBJOFST OBNFOHJOY JNBHFOHJOY QPSUT DPOUBJOFS1PSU IUUQBQJTBQQTWOBNFTQBDFTEFGBVMUEFQMPZNFOU
  31. 102.

    YAML Λॻ͜͏ 1. DSL ΍ GUI ౳Ͱந৅Խ͞ΕͨϚχϑΣετΛॻ͘ͷ͸࣮͸ԕճΓͰ͸ͳ͍͔ 2. YAML Λॻ͘ͷ͸ΤϯδχΞϦϯάͰ͸ͳ͍ͱ͍͏ҙݟʹରͯ͠

    1. YAML Λॻ͘࡞ۀࣗମ͸ΤϯδχΞϦϯάͰ͸ͳ͍͔΋͠Εͳ͍͕ɺͦΕ͸ YAML Λॻ͘࡞ۀ෦෼͔͠ݟ͍ͯͳ͍ͱݴ͑Δ 2. ΤϯδχΞϦϯά͸ Kubernetes ʹΑͬͯ YAML ϑΝΠϧͷཪଆʹӅṭ͞Ε͍ͯΔ 3. YAML Λॻ͚ͩ͘ͰΠϯϑϥपΓͷ࡞ۀ͕׬݁͢Δɺखಈ࡞ۀ͕ෆཁʹͳ͍ͬͯΔ 4. ͦͷු͍ͨ࣌ؒͰผͷΤϯδχΞϦϯάλεΫΛ΍Δ΂͖Ͱ͋ΓɺYAML Λॻ͖ͨ ͘ͳ͍͔Β REST API Λ࢖ͬͯந৅Խ͢Δ౳ͷߦҝ͸ຊ຤స౗Ͱ͸ͳ͍͔
  32. 103.

    1 Ϧιʔε 1 YAML ϑΝΠϧ 1. ͲͷϑΝΠϧʹͲͷϦιʔε͕ఆٛ͞Ε͍ͯΔ͔໌֬Ͱ͋Δ 2. ͲͷϦιʔεʹରͯ͠ΦϖϨʔγϣϯ͢Δͷ͔໌֬Ͱ͋Δ 3.

    ࠶ར༻ੑ͕ߴ͍ 4. ϦιʔεΛ௥Ճ͢Δ࣌ʹͲ͜ʹॻ͘΂͖͔໌֬Ͱ͋Δ ࢀߟIUUQTHJUIVCDPNLVCFSOFUFTDPNNVOJUZCMPCNBTUFSDPOUSJCVUPSTEFTJHOQSPQPTBMTBSDIJUFDUVSFBSDIJUFDUVSFNE
  33. 110.
  34. 111.
  35. 122.

    1. Kubernetes Λ࢖ͬͯԿΛ࣮ݱ͍ͨ͠ͷ͔ɺKubernetes ͷઃܭऀ͸ԿΛҙਤͯ͠࡞ͬͨ ͷ͔Λҙࣝ͢Δͱɺݸʑͷػೳͷ࢖͍ํ΍ཁૉͷଘࡏҙ͕ٛݟ͑ͯ͘Δ 2. ΫϥελͷωοτϫʔΫઃܭ΍ϊʔυपΓɺηΩϡϦςΟɺϞχλϦϯάৄࡉ౳৮ΕΒ Εͯͳ͍߲໨΋͋Δ͕ɺεϥΠυຕ਺͕๲େʹͳ͖ͬͯͨͷͰࠓճ͸͜͜·Ͱ… 3. ͜͜Ͱ঺հͨ͠ઃܭ΍ϓϥΫςΟεΛશͯద༻͠Α͏ͱࢥ͏ͱ೔͕฻ΕΔɻ͍ͭ·Ͱܦͬ

    ͯ΋ຊ൪ӡ༻͕։࢝Ͱ͖ͳ͍ͷͰɺࣄલʹ΍Δ͜ͱͱɺࣄޙʹ΍Δ͜ͱͷόϥϯεΛऔ Γ·͠ΐ͏ 1. Ͳ͕͜ڥք͔ͱ͍͏ͱɺຊ൪ӡ༻։࢝ޙʹɺࣗಈԽ΍͜͜ʹॻ͍ͨϓϥΫςΟεΛ ద༻Ͱ͖Δ༨༟(શମͷ50%)͕ग़དྷΔ͘Β͍ͷ४උΛ͢Δͱ͍͍ͱࢥ͏ 2. શ͘४උͤͣʹຊ൪ӡ༻Λ։࢝͢Δͱ໰୊ͷରԠ΍ख࡞ۀʹຒ΋ΕͯෛͷεύΠϥ ϧʹؕΔ ·ͱΊͱิ଍
  36. 123.

    End