Hosting Casual Talk #5 @さくらインターネット福岡オフィス
2019/03/22 さくらインターネット株式会社 さくらインターネット研究所
松本亮介 / まつもとりー / @matsumotory
͘͞ΒΠϯλʔωοτגࣜձࣾ(C) Copyright 1996-2019 SAKURA Internet Inc͘͞ΒΠϯλʔωοτݚڀॴPodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ2019/03/25 ্ڃݚڀһ দຊ ྄հHosting Casual Talk #5 @͘͞ΒΠϯλʔωοτԬΦϑΟε
View Slide
2ɾ͘͞ΒΠϯλʔωοτݚڀॴ ্ڃݚڀһɾגࣜձࣾGrooves Forkewll ٕज़ސɾϖύϘݚڀॴ ٬һݚڀһ ݚڀސɾηΩϡϦςΟɾΩϟϯϓߨࢣɾใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ֤छҕһɾژେֶത࢜ʢใֶʣদຊ྄հ / ·ͭͱΓʔ / @matsumotory
31. ίϯςφϥϯλΠϜͷྨ2. PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ3. ·ͱΊ࣍
1.ίϯςφϥϯλΠϜͷྨ
5ίϯςφϥϯλΠϜͷϨΠϠʔϞσϧԽCRIίϯςφϥϯλΠϜϥϯλΠϜ্هͷΑ͏ʹఆٛ͞ΕΔ͜ͱ͕ଟ͍͕ɺίϯςφϥϯλΠϜͷதʹruncͳͲͷϥϯλΠϜ͕͋Δͱ͍͏ͷগ͠Θ͔Γʹ͍͘ɻCRICRIϥϯλΠϜOCIOCIϥϯλΠϜίϯςφϥϯλΠϜΛϥϯλΠϜͷׂͰϨΠϠʔϞσϧԽCRIϥϯλΠϜͱOCIϥϯλΠϜͱఆٛ※1ɻ͜ͷ2ͭͷϥϯλΠϜΛ·ͱΊͯίϯςφϥϯλΠϜͱ͢ΔɻCRI : Container Runtime InterfaceOCI: Open Container Initiative Runtime/Image Format Specification※1 Google CloudͷIan LewisࢯCRIϥϯλΠϜΛHigh-Level RuntimeɺOCIϥϯλΠϜΛLow-Level Runtimesͱఆٛhttps://www.ianlewis.org/en/container-runtimes-part-1-introduction-container-r
6ίϯςφपลͷجຊϨΠϠʔϞσϧΦʔέετϨʔγϣϯCRICRIϥϯλΠϜOCIOCIϥϯλΠϜPodͱίϯςφ܈ίϯςφͷߏใΠϝʔδͳͲ͔ΒίϯςφͷϦιʔεׂݖݶΛߦͬͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜʢrunCɺrunscɺrunncɺrunVɺkata-runtimeɺcc-runtimeͳͲʣCRIܦ༝ͰΦʔέετϨʔγϣϯʹج͖ͮίϯςφߏใΛड͚औͬͨΓɼPodίϯςφΠϝʔδΛཧ͢ΔCRIϥϯλΠϜʢcri-oɺcontainerdͳͲʣ
7ྫɿίϯςφपลͷجຊϨΠϠʔϞσϧkubeletCRIcontainerdOCIrunCPodͱίϯςφ܈ίϯςφͷߏใΠϝʔδͳͲ͔ΒίϯςφͷϦιʔεׂݖݶΛߦͬͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜʢrunCɺrunscɺrunncɺrunVɺkata-runtimeɺcc-runtimeͳͲʣCRIͱOCIʹ४ڌ͍ͯ͠ΕɺΦʔέετϨʔγϣϯkubernetesΛ͍ͭͭɺ͖ʹCRIϥϯλΠϜOCIϥϯλΠϜΛஔ͖͑ՄೳCRIܦ༝ͰΦʔέετϨʔγϣϯʹج͖ͮίϯςφߏใΛड͚औͬͨΓɼPodίϯςφΠϝʔδΛཧ͢ΔCRIϥϯλΠϜʢcri-oɺcontainerdͳͲʣ
2.PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ
9Podͱίϯςφ• kubernetesΦʔέετϨʔγϣϯπʔϧͱͯ͠CNCFʹΑΔඪ४Խ͕ਐΉ• ૬ޓʹଓੑͷ͋ΔෳͷίϯςφΛแׅ͢ΔPod• cgroup()unshare()ͰαϯυϘοΫεͰ͋ΔPodΛ࡞Δ• ίʔυ্PodSandboxͱ໋໊͞Ε͍ͯΔ͜ͱ͕΄ͱΜͲ• PodʹٻΊΒΕΔཁ݅• ηΩϡϦςΟɾੑೳɾαʔόͷऩ༰ޮɾӡ༻ٕज़ͳͲ• Podͷॏཁੑ͕ඇৗʹߴ͘ͳ͖͍ͬͯͯΔ
10PodͷॏཁੑͱPodͷ͕ߴ·Δ• Pod࣍ୈͰηΩϡϦςΟੑೳɼऩ༰ޮɼӡ༻ٕज़͕େ͖͘ӨڹΛड͚Δ• ֤ࣾPodʹؔ࿈͢Δ༷ʑͳιϑτΣΞΛ࣮ɾެ։࢝͠Ί͍ͯΔ• GoogleͷgVisor (ϢʔβϥϯυͰͷΞΫηε੍ޚͰίϯςφΛִ)• Nable-Containers (ϢʔβϥϯυͷϢχΧʔωϧͰίϯςφΛִ)• AWSͷFirecracker (MicroVMͰίϯςφΛִ)• Kata-Containers (VMͰίϯςφΛִ)
11PodͱCRI / OCIϥϯλΠϜͷجຊ• PodجຊతʹCRIϥϯλΠϜʹΑͬͯ࡞ΒΕΔ• Podʹؔ͢ΔAPIͷ༷CRI༷ʹॻ͔Ε͍ͯΔ• https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto• CRIϥϯλΠϜ͕ `crictl runp` ͳͲʹΑͬͯPodΛ࡞• CRI SpeccontainerdͷGoͷίʔυతʹ `RunPodSandbox()`• Pod࡞ޙʹPodྖҬͰOCIϥϯλΠϜ (runc) ͰίϯςφΛىಈ
12PodͱOCIϥϯλΠϜ• ͦͦOCI SpecʹPodͷݴٴແ͍• https://www.opencontainers.org/release-notices/v1-0-0• OCI SpecίϯςφىಈʹඞཁͳύϥϝʔλhandlerͷఆٛͳͲ• runtime-specͱimage-spec• Podͱίϯςφີʹ͕ؔ͋Δ• PodͷৼΔ͍Λม͍͑ͨͱ͖ʹCRIϥϯλΠϜʹՃ࣮͖͢ʁ
13containerdͷRunPodSandboxΛݟΔ• CRIϥϯλΠϜͷදతͳ࣮Ͱ͋Δcontainerd• RunPodSandbox()ͷதͰgetSandboxRuntime()ʹΑͬͯݸผ࣮Λݺͼग़͢• ݸผ࣮Ͱ͋ΔRuntime HandlerOCIϥϯλΠϜ͔Βݺͼग़͍ͯ͠Δ• `ociRuntime, err := c.getSandboxRuntime(config, r.GetRuntimeHandler())`
14containerdͷRunPodSandboxΛݟΔ• getSandboxRuntime()ͷதͰworkloadΛνΣοΫͯ͠ϥϯλΠϜΛݺͼग़͢if untrustedWorkload(config) {if runtimeHandler != "" && runtimeHandler != criconfig.RuntimeUntrusted {return criconfig.Runtime{}, errors.New("untrusted workload with explicit runtime handler is not allowed")}if hostAccessingSandbox(config) {return criconfig.Runtime{}, errors.New("untrusted workload with host access is not allowed")}if c.config.ContainerdConfig.UntrustedWorkloadRuntime.Type != "" {return c.config.ContainerdConfig.UntrustedWorkloadRuntime, nil}runtimeHandler = criconfig.RuntimeUntrusted}
15OCIϥϯλΠϜʹPodΛόΠόε͢Δఆٛ༗Γ• PodͷॲཧΛόΠόε͢ΔͨΊͷ `untrusted-workload` ઃఆ• CRIϥϯλΠϜʹPod࣮Λ࠶࣮͢ΔͷͰͳ͘OCIͷ࣮Λ͏• `untrusted-workload` ʹΑͬͯPodͷॲཧΛOCIϥϯλΠϜόΠύε• dockerOCIϥϯλΠϜ୯ମͰ͏ͱ͖sandboxػೳΛఏڙͰ͖ΔΑ͏ʹapiVersion: v1kind: Podmetadata:name: container-untrustedannotations:io.kubernetes.cri.untrusted-workload: "true"
16OCIϥϯλΠϜଆͰPodͷॲཧΛ࣮͢Δ• ྫ͑gVisorϢʔβʔϥϯυΧʔωϧΛPodͱͯ͠࡞͢Δ• `crictl runp --runtime=runsc pod-config.json`• gVisorͷOCIϥϯλΠϜͰ͋ΔrunscʹPodͷॲཧΛόΠύε• gvisor-containerd-shimΛͬͯcontainerdͷruntime handlerʹϑοΫ• runscଆͰPodͷॲཧΛड͚ͯPod૬ͷsandboxΛ࡞͢Δ• `createSandboxProcess()` in `gvisor/runsc/sandbox/sandbox.go`
17containerdͷόʔδϣϯͰ֦ுํ๏ͷ͕ࠩ͋Δ1. containerd v1.1Ҏ߱ͷUntrusted Workload CRI extentiondeprecated2. containerd v1.2Ҏ্ͰCRI Runtime handlerͰOCIʹόΠύε• https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-quickstart.md3. containerd v1.2Ҏ্Ͱshim v2ΛͬͨCRI Runtime handlerͰόΠύε• https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-shim-v2-quickstart.md• Runtime v2• https://github.com/containerd/containerd/tree/master/runtime/v2• containerd-shim-runsc-v1Λͬͯઃఆγϯϓϧʹ
18Kata-Containersͷ߹ಉ༷• Pod (ίʔυ্CreateSandbox())ͷཧΛCRIϥϯλΠϜ͔ΒόΠύε• CRIϥϯλΠϜͷcri-o͔Βkata-runtimeͰPodͷཧड͚औΔ• CRI → RunPodSandbox() → cri-o →create αϒίϚϯυ → kata-runtime→ CreateSandbox() → virtcontainers → VM৭ʑઃఆ → hypervisor →proxyىಈ → shim-podىಈ → VMagentىಈ → kata-runtime → Podىಈྃ → cri-oʹྃ௨• https://github.com/kata-containers/runtime/blob/master/cli/create.go#L89
19Docker͔Βͷίϯςφىಈͷ߹• DockerίϚϯυͰ࣮ߦ͢Δ߹VMΛىಈ͔ͤͯ͞ΒίϯςφΛىಈ• OCIϥϯλΠϜʹ͓͚Δ `Create` ίϚϯυͰVMͱίϯςφΛ྆ํىಈ• CreateSandbox() ͔ͯ͠ΒίϯςφΛىಈ• OCI Specʹ͋Δcontainerىಈ࣌ͷ֤छϑοΫͰॲཧΛ͞ΜͰVMىಈ• ۩ମతʹ `pre-start` ϑοΫͰVMͷىಈʹඞཁͳॲཧΛߦ͏
20Podʹؔ͢ΔόΠύεshim v2Ͱ៉ྷʹref: https://github.com/kata-containers/documentation/blob/master/architecture.md
3.·ͱΊ
22PodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ• k8s͓ΑͼcontainerdͷCRIϥϯλΠϜ͕PodΛίϯτϩʔϧ• untrustedͳworkloadʹ͓͍ͯPodͷཧΛOCIϥϯλΠϜʹόΠύε• `crictl runp`ΛOCIϥϯλΠϜʹόΠύεͯ͠OCIϥϯλΠϜ্ͷ`CreateSandbox()` `StartSandbox()` ͳͲͰPodΛ࡞ɾىಈ• Podͷ༷OCI Specʹࡌ͍ͬͯͳ͍͕Ͳͷ࣮OCIϥϯλΠϜͰ࣮ݱ• gVisorɼKata-ContainersɼFirecrackerɼNable-ContainersͳͲ• Pod͚ͩͰͳ͘sandboxͱͯ͠ͷػೳΛOCI୯ମͰఏڙ͢ΔͨΊͱஅ