Pro Yearly is on sale from $80 to $50! »

Podとコンテナランタイムのアーキテクチャ

 Podとコンテナランタイムのアーキテクチャ

Hosting Casual Talk #5 @さくらインターネット福岡オフィス

2019/03/22
さくらインターネット株式会社
さくらインターネット研究所

松本亮介 / まつもとりー / @matsumotory

2b692bd83f4418103142a053ecf5ff59?s=128

MATSUMOTO Ryosuke

March 22, 2019
Tweet

Transcript

  1. ͘͞ΒΠϯλʔωοτגࣜձࣾ (C) Copyright 1996-2019 SAKURA Internet Inc ͘͞ΒΠϯλʔωοτݚڀॴ PodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ 2019/03/25

    ্ڃݚڀһ দຊ ྄հ Hosting Casual Talk #5 @͘͞ΒΠϯλʔωοτ෱ԬΦϑΟε
  2. 2 ɾ͘͞ΒΠϯλʔωοτݚڀॴ ্ڃݚڀһ ɾגࣜձࣾGrooves Forkewll ٕज़ސ໰ ɾϖύϘݚڀॴ ٬һݚڀһ ݚڀސ໰ ɾηΩϡϦςΟɾΩϟϯϓߨࢣ

    ɾ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ֤छҕһ ɾژ౎େֶത࢜ʢ৘ใֶʣ দຊ྄հ / ·ͭ΋ͱΓʔ / @matsumotory
  3. 3 1. ίϯςφϥϯλΠϜͷ෼ྨ 2. PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ 3. ·ͱΊ ໨࣍

  4. 1. ίϯςφϥϯλΠϜͷ෼ྨ

  5. 5 ίϯςφϥϯλΠϜͷϨΠϠʔϞσϧԽ CRI ίϯςφϥϯλΠϜ ϥϯλΠϜ ্هͷΑ͏ʹఆٛ͞ΕΔ͜ͱ͕ଟ͍͕ɺ ίϯςφϥϯλΠϜͷதʹruncͳͲͷ ϥϯλΠϜ͕͋Δͱ͍͏ͷ͸গ͠Θ͔ Γʹ͍͘ɻ CRI

    CRIϥϯλΠϜ OCI OCIϥϯλΠϜ ίϯςφϥϯλΠϜ ΛϥϯλΠϜͷ໾ׂ ͰϨΠϠʔϞσϧԽ CRIϥϯλΠϜͱOCIϥϯλΠϜͱఆٛ※1ɻ͜ͷ2ͭ ͷϥϯλΠϜΛ·ͱΊͯίϯςφϥϯλΠϜͱ͢Δɻ CRI : Container Runtime Interface OCI: Open Container Initiative Runtime/Image Format Specification ※1 Google CloudͷIan Lewisࢯ͸CRIϥϯλΠϜΛHigh-Level RuntimeɺOCIϥϯλΠϜΛLow-Level Runtimesͱఆٛ https://www.ianlewis.org/en/container-runtimes-part-1-introduction-container-r
  6. 6 ίϯςφपลͷجຊϨΠϠʔϞσϧ ΦʔέετϨʔγϣϯ CRI CRIϥϯλΠϜ OCI OCIϥϯλΠϜ Podͱίϯςφ܈ ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ

    ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ ʢrunCɺrunscɺrunncɺrunVɺkata- runtimeɺcc-runtimeͳͲʣ CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓɼ Pod΍ίϯςφΠϝʔδΛ؅ཧ͢ΔCRI ϥϯλΠϜʢcri-oɺcontainerdͳͲʣ
  7. 7 ྫɿίϯςφपลͷجຊϨΠϠʔϞσϧ kubelet CRI containerd OCI runC Podͱίϯςφ܈ ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ

    ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ ʢrunCɺrunscɺrunncɺrunVɺkata- runtimeɺcc-runtimeͳͲʣ CRIͱOCIʹ४ڌ͍ͯ͠Ε͹ɺ ΦʔέετϨʔγϣϯ૚͸ kubernetesΛ࢖͍ͭͭɺ޷͖ʹ CRIϥϯλΠϜ΍OCIϥϯλΠϜ Λஔ͖׵͑Մೳ CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓɼ Pod΍ίϯςφΠϝʔδΛ؅ཧ͢ΔCRI ϥϯλΠϜʢcri-oɺcontainerdͳͲʣ
  8. 2. PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ

  9. 9 Podͱίϯςφ • kubernetes͸ΦʔέετϨʔγϣϯπʔϧͱͯ͠CNCFʹΑΔඪ४Խ͕ਐΉ • ૬ޓʹ઀ଓੑͷ͋Δෳ਺ͷίϯςφΛแׅ͢ΔPod • cgroup()΍unshare()ͰαϯυϘοΫεͰ͋ΔPodΛ࡞Δ • ίʔυ্͸Pod͸Sandboxͱ໋໊͞Ε͍ͯΔ͜ͱ͕΄ͱΜͲ

    • PodʹٻΊΒΕΔཁ݅ • ηΩϡϦςΟɾੑೳɾαʔό΁ͷऩ༰ޮ཰ɾӡ༻ٕज़ͳͲ • Podͷॏཁੑ͕ඇৗʹߴ͘ͳ͖͍ͬͯͯΔ
  10. 10 PodͷॏཁੑͱPod΁ͷ஫໨͕ߴ·Δ • Pod࣍ୈͰηΩϡϦςΟ΍ੑೳɼऩ༰ޮ཰ɼӡ༻ٕज़͕େ͖͘ӨڹΛड͚Δ • ֤ࣾPodʹؔ࿈͢Δ༷ʑͳιϑτ΢ΣΞΛ࣮૷ɾެ։࢝͠Ί͍ͯΔ • GoogleͷgVisor (ϢʔβϥϯυͰͷΞΫηε੍ޚͰίϯςφΛִ཭) •

    Nable-Containers (ϢʔβϥϯυͷϢχΧʔωϧͰίϯςφΛִ཭) • AWSͷFirecracker (MicroVMͰίϯςφΛִ཭) • Kata-Containers (VMͰίϯςφΛִ཭)
  11. 11 PodͱCRI / OCIϥϯλΠϜͷجຊ • Pod͸جຊతʹCRIϥϯλΠϜʹΑͬͯ࡞ΒΕΔ • Podʹؔ͢ΔAPIͷ࢓༷͸CRI࢓༷ʹॻ͔Ε͍ͯΔ • https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/

    apis/cri/runtime/v1alpha2/api.proto • CRIϥϯλΠϜ͕ `crictl runp` ͳͲʹΑͬͯPodΛ࡞੒ • CRI Spec΍containerdͷGoͷίʔυతʹ͸ `RunPodSandbox()` • Pod࡞੒ޙʹPodྖҬͰOCIϥϯλΠϜ (runc) ͰίϯςφΛىಈ
  12. 12 PodͱOCIϥϯλΠϜ • ͦ΋ͦ΋OCI Specʹ͸Podͷݴٴ͸ແ͍ • https://www.opencontainers.org/release-notices/v1-0-0 • OCI Spec͸ίϯςφىಈʹඞཁͳύϥϝʔλ΍handlerͷఆٛͳͲ

    • runtime-specͱimage-spec • Podͱίϯςφ͸ີ઀ʹؔ܎͕͋Δ • PodͷৼΔ෣͍Λม͍͑ͨͱ͖ʹ͸CRIϥϯλΠϜʹ௥Ճ࣮૷͢΂͖ʁ
  13. 13 containerdͷRunPodSandboxΛݟΔ • CRIϥϯλΠϜͷ୅දతͳ࣮૷Ͱ͋Δcontainerd • RunPodSandbox()ͷதͰgetSandboxRuntime()ʹΑͬͯݸผ࣮૷Λݺͼग़͢ • ݸผ࣮૷Ͱ͋ΔRuntime Handler͸OCIϥϯλΠϜ͔Βݺͼग़͍ͯ͠Δ •

    `ociRuntime, err := c.getSandboxRuntime(config, r.GetRuntimeHandler())`
  14. 14 containerdͷRunPodSandboxΛݟΔ • getSandboxRuntime()ͷதͰworkloadΛνΣοΫͯ͠ϥϯλΠϜΛݺͼग़͢ if untrustedWorkload(config) { if runtimeHandler !=

    "" && runtimeHandler != criconfig.RuntimeUntrusted { return criconfig.Runtime{}, errors.New("untrusted workload with explicit runtime handler is not allowed") } if hostAccessingSandbox(config) { return criconfig.Runtime{}, errors.New("untrusted workload with host access is not allowed") } if c.config.ContainerdConfig.UntrustedWorkloadRuntime.Type != "" { return c.config.ContainerdConfig.UntrustedWorkloadRuntime, nil } runtimeHandler = criconfig.RuntimeUntrusted }
  15. 15 OCIϥϯλΠϜʹPodΛόΠόε͢Δఆٛ༗Γ • PodͷॲཧΛόΠόε͢ΔͨΊͷ `untrusted-workload` ઃఆ • CRIϥϯλΠϜʹPod࣮૷Λ࠶࣮૷͢ΔͷͰ͸ͳ͘OCIͷ࣮૷Λ࢖͏ • `untrusted-workload`

    ʹΑͬͯPodͷॲཧΛOCIϥϯλΠϜ΁όΠύε • docker΍OCIϥϯλΠϜ୯ମͰ࢖͏ͱ͖΋sandboxػೳΛఏڙͰ͖ΔΑ͏ʹ apiVersion: v1 kind: Pod metadata: name: container-untrusted annotations: io.kubernetes.cri.untrusted-workload: "true"
  16. 16 OCIϥϯλΠϜଆͰPodͷॲཧΛ࣮૷͢Δ • ྫ͑͹gVisor͸ϢʔβʔϥϯυΧʔωϧΛPodͱͯ͠࡞੒͢Δ • `crictl runp --runtime=runsc pod-config.json` •

    gVisorͷOCIϥϯλΠϜͰ͋ΔrunscʹPodͷॲཧΛόΠύε • gvisor-containerd-shimΛ࢖ͬͯcontainerdͷruntime handlerʹϑοΫ • runscଆͰPodͷॲཧΛड͚ͯPod૬౰ͷsandboxΛ࡞੒͢Δ • `createSandboxProcess()` in `gvisor/runsc/sandbox/sandbox.go`
  17. 17 containerdͷόʔδϣϯͰ֦ுํ๏ͷ͕ࠩ͋Δ 1. containerd v1.1Ҏ߱ͷUntrusted Workload CRI extention͸deprecated 2. containerd

    v1.2Ҏ্ͰCRI Runtime handlerͰOCIʹόΠύε • https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-quickstart.md 3. containerd v1.2Ҏ্Ͱshim v2Λ࢖ͬͨCRI Runtime handlerͰόΠύε • https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-shim-v2-quickstart.md • Runtime v2 • https://github.com/containerd/containerd/tree/master/runtime/v2 • containerd-shim-runsc-v1Λ࢖ͬͯઃఆ΋γϯϓϧʹ
  18. 18 Kata-Containersͷ৔߹΋ಉ༷ • Pod (ίʔυ্͸CreateSandbox())ͷ؅ཧΛCRIϥϯλΠϜ͔ΒόΠύε • CRIϥϯλΠϜͷcri-o͔Βkata-runtimeͰPodͷ؅ཧ΋ड͚औΔ • CRI →

    RunPodSandbox() → cri-o →create αϒίϚϯυ → kata-runtime → CreateSandbox() → virtcontainers → VM৭ʑઃఆ → hypervisor → proxyىಈ → shim-podىಈ → VM಺agentىಈ → kata-runtime → Podىಈ ׬ྃ → cri-oʹ׬ྃ௨஌ • https://github.com/kata-containers/runtime/blob/master/cli/create.go#L89
  19. 19 Docker͔Βͷίϯςφىಈͷ৔߹ • DockerίϚϯυͰ࣮ߦ͢Δ৔߹΋VMΛىಈ͔ͤͯ͞ΒίϯςφΛىಈ • OCIϥϯλΠϜʹ͓͚Δ `Create` ίϚϯυͰVMͱίϯςφΛ྆ํىಈ • CreateSandbox()

    ͔ͯ͠ΒίϯςφΛىಈ • OCI Specʹ͋Δcontainerىಈ࣌ͷ֤छϑοΫͰॲཧΛ͸͞ΜͰVMىಈ • ۩ମతʹ͸ `pre-start` ϑοΫͰVMͷىಈʹඞཁͳॲཧΛߦ͏
  20. 20 Podʹؔ͢ΔόΠύε΋shim v2Ͱ៉ྷʹ ref: https://github.com/kata-containers/documentation/blob/master/architecture.md

  21. 3. ·ͱΊ

  22. 22 PodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ • k8s͓ΑͼcontainerdͷCRIϥϯλΠϜ͕PodΛίϯτϩʔϧ • untrustedͳworkloadʹ͓͍ͯ͸Podͷ؅ཧΛOCIϥϯλΠϜʹόΠύε • `crictl runp`ΛOCIϥϯλΠϜʹόΠύεͯ͠OCIϥϯλΠϜ্ͷ `CreateSandbox()`

    ΍ `StartSandbox()` ͳͲͰPodΛ࡞੒ɾىಈ • Podͷ࢓༷͸OCI Specʹࡌ͍ͬͯͳ͍͕Ͳͷ࣮૷΋OCIϥϯλΠϜͰ࣮ݱ • gVisorɼKata-ContainersɼFirecrackerɼNable-ContainersͳͲ • Pod͚ͩͰͳ͘sandboxͱͯ͠ͷػೳΛOCI୯ମͰ΋ఏڙ͢ΔͨΊͱ൑அ