$30 off During Our Annual Pro Sale. View Details »

Podとコンテナランタイムのアーキテクチャ

 Podとコンテナランタイムのアーキテクチャ

Hosting Casual Talk #5 @さくらインターネット福岡オフィス

2019/03/22
さくらインターネット株式会社
さくらインターネット研究所

松本亮介 / まつもとりー / @matsumotory

MATSUMOTO Ryosuke
PRO

March 22, 2019
Tweet

More Decks by MATSUMOTO Ryosuke

Other Decks in Technology

Transcript

  1. ͘͞ΒΠϯλʔωοτגࣜձࣾ
    (C) Copyright 1996-2019 SAKURA Internet Inc
    ͘͞ΒΠϯλʔωοτݚڀॴ
    PodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ
    2019/03/25 ্ڃݚڀһ দຊ ྄հ
    Hosting Casual Talk #5 @͘͞ΒΠϯλʔωοτ෱ԬΦϑΟε

    View Slide

  2. 2
    ɾ͘͞ΒΠϯλʔωοτݚڀॴ ্ڃݚڀһ
    ɾגࣜձࣾGrooves Forkewll ٕज़ސ໰
    ɾϖύϘݚڀॴ ٬һݚڀһ ݚڀސ໰
    ɾηΩϡϦςΟɾΩϟϯϓߨࢣ
    ɾ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ֤छҕһ
    ɾژ౎େֶത࢜ʢ৘ใֶʣ
    দຊ྄հ / ·ͭ΋ͱΓʔ / @matsumotory

    View Slide

  3. 3
    1. ίϯςφϥϯλΠϜͷ෼ྨ
    2. PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ
    3. ·ͱΊ
    ໨࣍

    View Slide

  4. 1.
    ίϯςφϥϯλΠϜͷ෼ྨ

    View Slide

  5. 5
    ίϯςφϥϯλΠϜͷϨΠϠʔϞσϧԽ
    CRI
    ίϯςφϥϯλΠϜ
    ϥϯλΠϜ
    ্هͷΑ͏ʹఆٛ͞ΕΔ͜ͱ͕ଟ͍͕ɺ
    ίϯςφϥϯλΠϜͷதʹruncͳͲͷ
    ϥϯλΠϜ͕͋Δͱ͍͏ͷ͸গ͠Θ͔
    Γʹ͍͘ɻ
    CRI
    CRIϥϯλΠϜ
    OCI
    OCIϥϯλΠϜ
    ίϯςφϥϯλΠϜ
    ΛϥϯλΠϜͷ໾ׂ
    ͰϨΠϠʔϞσϧԽ
    CRIϥϯλΠϜͱOCIϥϯλΠϜͱఆٛ※1ɻ͜ͷ2ͭ
    ͷϥϯλΠϜΛ·ͱΊͯίϯςφϥϯλΠϜͱ͢Δɻ
    CRI : Container Runtime Interface
    OCI: Open Container Initiative Runtime/Image Format Specification
    ※1 Google CloudͷIan Lewisࢯ͸CRIϥϯλΠϜΛHigh-Level RuntimeɺOCIϥϯλΠϜΛLow-Level Runtimesͱఆٛ
    https://www.ianlewis.org/en/container-runtimes-part-1-introduction-container-r

    View Slide

  6. 6
    ίϯςφपลͷجຊϨΠϠʔϞσϧ
    ΦʔέετϨʔγϣϯ
    CRI
    CRIϥϯλΠϜ
    OCI
    OCIϥϯλΠϜ
    Podͱίϯςφ܈
    ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β
    ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ
    ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ
    ʢrunCɺrunscɺrunncɺrunVɺkata-
    runtimeɺcc-runtimeͳͲʣ
    CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ
    ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓɼ
    Pod΍ίϯςφΠϝʔδΛ؅ཧ͢ΔCRI
    ϥϯλΠϜʢcri-oɺcontainerdͳͲʣ

    View Slide

  7. 7
    ྫɿίϯςφपลͷجຊϨΠϠʔϞσϧ
    kubelet
    CRI
    containerd
    OCI
    runC
    Podͱίϯςφ܈
    ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β
    ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ
    ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ
    ʢrunCɺrunscɺrunncɺrunVɺkata-
    runtimeɺcc-runtimeͳͲʣ
    CRIͱOCIʹ४ڌ͍ͯ͠Ε͹ɺ
    ΦʔέετϨʔγϣϯ૚͸
    kubernetesΛ࢖͍ͭͭɺ޷͖ʹ
    CRIϥϯλΠϜ΍OCIϥϯλΠϜ
    Λஔ͖׵͑Մೳ
    CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ
    ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓɼ
    Pod΍ίϯςφΠϝʔδΛ؅ཧ͢ΔCRI
    ϥϯλΠϜʢcri-oɺcontainerdͳͲʣ

    View Slide

  8. 2.
    PodͱίϯςφϥϯλΠϜΞʔΩςΫνϟ

    View Slide

  9. 9
    Podͱίϯςφ
    • kubernetes͸ΦʔέετϨʔγϣϯπʔϧͱͯ͠CNCFʹΑΔඪ४Խ͕ਐΉ
    • ૬ޓʹ઀ଓੑͷ͋Δෳ਺ͷίϯςφΛแׅ͢ΔPod
    • cgroup()΍unshare()ͰαϯυϘοΫεͰ͋ΔPodΛ࡞Δ
    • ίʔυ্͸Pod͸Sandboxͱ໋໊͞Ε͍ͯΔ͜ͱ͕΄ͱΜͲ
    • PodʹٻΊΒΕΔཁ݅
    • ηΩϡϦςΟɾੑೳɾαʔό΁ͷऩ༰ޮ཰ɾӡ༻ٕज़ͳͲ
    • Podͷॏཁੑ͕ඇৗʹߴ͘ͳ͖͍ͬͯͯΔ

    View Slide

  10. 10
    PodͷॏཁੑͱPod΁ͷ஫໨͕ߴ·Δ
    • Pod࣍ୈͰηΩϡϦςΟ΍ੑೳɼऩ༰ޮ཰ɼӡ༻ٕज़͕େ͖͘ӨڹΛड͚Δ
    • ֤ࣾPodʹؔ࿈͢Δ༷ʑͳιϑτ΢ΣΞΛ࣮૷ɾެ։࢝͠Ί͍ͯΔ
    • GoogleͷgVisor (ϢʔβϥϯυͰͷΞΫηε੍ޚͰίϯςφΛִ཭)
    • Nable-Containers (ϢʔβϥϯυͷϢχΧʔωϧͰίϯςφΛִ཭)
    • AWSͷFirecracker (MicroVMͰίϯςφΛִ཭)
    • Kata-Containers (VMͰίϯςφΛִ཭)

    View Slide

  11. 11
    PodͱCRI / OCIϥϯλΠϜͷجຊ
    • Pod͸جຊతʹCRIϥϯλΠϜʹΑͬͯ࡞ΒΕΔ
    • Podʹؔ͢ΔAPIͷ࢓༷͸CRI࢓༷ʹॻ͔Ε͍ͯΔ
    • https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/
    apis/cri/runtime/v1alpha2/api.proto
    • CRIϥϯλΠϜ͕ `crictl runp` ͳͲʹΑͬͯPodΛ࡞੒
    • CRI Spec΍containerdͷGoͷίʔυతʹ͸ `RunPodSandbox()`
    • Pod࡞੒ޙʹPodྖҬͰOCIϥϯλΠϜ (runc) ͰίϯςφΛىಈ

    View Slide

  12. 12
    PodͱOCIϥϯλΠϜ
    • ͦ΋ͦ΋OCI Specʹ͸Podͷݴٴ͸ແ͍
    • https://www.opencontainers.org/release-notices/v1-0-0
    • OCI Spec͸ίϯςφىಈʹඞཁͳύϥϝʔλ΍handlerͷఆٛͳͲ
    • runtime-specͱimage-spec
    • Podͱίϯςφ͸ີ઀ʹؔ܎͕͋Δ
    • PodͷৼΔ෣͍Λม͍͑ͨͱ͖ʹ͸CRIϥϯλΠϜʹ௥Ճ࣮૷͢΂͖ʁ

    View Slide

  13. 13
    containerdͷRunPodSandboxΛݟΔ
    • CRIϥϯλΠϜͷ୅දతͳ࣮૷Ͱ͋Δcontainerd
    • RunPodSandbox()ͷதͰgetSandboxRuntime()ʹΑͬͯݸผ࣮૷Λݺͼग़͢
    • ݸผ࣮૷Ͱ͋ΔRuntime Handler͸OCIϥϯλΠϜ͔Βݺͼग़͍ͯ͠Δ
    • `ociRuntime, err := c.getSandboxRuntime(config, r.GetRuntimeHandler())`

    View Slide

  14. 14
    containerdͷRunPodSandboxΛݟΔ
    • getSandboxRuntime()ͷதͰworkloadΛνΣοΫͯ͠ϥϯλΠϜΛݺͼग़͢
    if untrustedWorkload(config) {
    if runtimeHandler != "" && runtimeHandler != criconfig.RuntimeUntrusted {
    return criconfig.Runtime{}, errors.New("untrusted workload with explicit runtime handler is not allowed")
    }
    if hostAccessingSandbox(config) {
    return criconfig.Runtime{}, errors.New("untrusted workload with host access is not allowed")
    }
    if c.config.ContainerdConfig.UntrustedWorkloadRuntime.Type != "" {
    return c.config.ContainerdConfig.UntrustedWorkloadRuntime, nil
    }
    runtimeHandler = criconfig.RuntimeUntrusted
    }

    View Slide

  15. 15
    OCIϥϯλΠϜʹPodΛόΠόε͢Δఆٛ༗Γ
    • PodͷॲཧΛόΠόε͢ΔͨΊͷ `untrusted-workload` ઃఆ
    • CRIϥϯλΠϜʹPod࣮૷Λ࠶࣮૷͢ΔͷͰ͸ͳ͘OCIͷ࣮૷Λ࢖͏
    • `untrusted-workload` ʹΑͬͯPodͷॲཧΛOCIϥϯλΠϜ΁όΠύε
    • docker΍OCIϥϯλΠϜ୯ମͰ࢖͏ͱ͖΋sandboxػೳΛఏڙͰ͖ΔΑ͏ʹ
    apiVersion: v1
    kind: Pod
    metadata:
    name: container-untrusted
    annotations:
    io.kubernetes.cri.untrusted-workload: "true"

    View Slide

  16. 16
    OCIϥϯλΠϜଆͰPodͷॲཧΛ࣮૷͢Δ
    • ྫ͑͹gVisor͸ϢʔβʔϥϯυΧʔωϧΛPodͱͯ͠࡞੒͢Δ
    • `crictl runp --runtime=runsc pod-config.json`
    • gVisorͷOCIϥϯλΠϜͰ͋ΔrunscʹPodͷॲཧΛόΠύε
    • gvisor-containerd-shimΛ࢖ͬͯcontainerdͷruntime handlerʹϑοΫ
    • runscଆͰPodͷॲཧΛड͚ͯPod૬౰ͷsandboxΛ࡞੒͢Δ
    • `createSandboxProcess()` in `gvisor/runsc/sandbox/sandbox.go`

    View Slide

  17. 17
    containerdͷόʔδϣϯͰ֦ுํ๏ͷ͕ࠩ͋Δ
    1. containerd v1.1Ҏ߱ͷUntrusted Workload CRI extention͸deprecated
    2. containerd v1.2Ҏ্ͰCRI Runtime handlerͰOCIʹόΠύε
    • https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-quickstart.md
    3. containerd v1.2Ҏ্Ͱshim v2Λ࢖ͬͨCRI Runtime handlerͰόΠύε
    • https://github.com/google/gvisor-containerd-shim/blob/master/docs/runtime-handler-shim-v2-quickstart.md
    • Runtime v2
    • https://github.com/containerd/containerd/tree/master/runtime/v2
    • containerd-shim-runsc-v1Λ࢖ͬͯઃఆ΋γϯϓϧʹ

    View Slide

  18. 18
    Kata-Containersͷ৔߹΋ಉ༷
    • Pod (ίʔυ্͸CreateSandbox())ͷ؅ཧΛCRIϥϯλΠϜ͔ΒόΠύε
    • CRIϥϯλΠϜͷcri-o͔Βkata-runtimeͰPodͷ؅ཧ΋ड͚औΔ
    • CRI → RunPodSandbox() → cri-o →create αϒίϚϯυ → kata-runtime
    → CreateSandbox() → virtcontainers → VM৭ʑઃఆ → hypervisor →
    proxyىಈ → shim-podىಈ → VM಺agentىಈ → kata-runtime → Podىಈ
    ׬ྃ → cri-oʹ׬ྃ௨஌
    • https://github.com/kata-containers/runtime/blob/master/cli/create.go#L89

    View Slide

  19. 19
    Docker͔Βͷίϯςφىಈͷ৔߹
    • DockerίϚϯυͰ࣮ߦ͢Δ৔߹΋VMΛىಈ͔ͤͯ͞ΒίϯςφΛىಈ
    • OCIϥϯλΠϜʹ͓͚Δ `Create` ίϚϯυͰVMͱίϯςφΛ྆ํىಈ
    • CreateSandbox() ͔ͯ͠ΒίϯςφΛىಈ
    • OCI Specʹ͋Δcontainerىಈ࣌ͷ֤छϑοΫͰॲཧΛ͸͞ΜͰVMىಈ
    • ۩ମతʹ͸ `pre-start` ϑοΫͰVMͷىಈʹඞཁͳॲཧΛߦ͏

    View Slide

  20. 20
    Podʹؔ͢ΔόΠύε΋shim v2Ͱ៉ྷʹ
    ref: https://github.com/kata-containers/documentation/blob/master/architecture.md

    View Slide

  21. 3.
    ·ͱΊ

    View Slide

  22. 22
    PodͱίϯςφϥϯλΠϜͷΞʔΩςΫνϟ
    • k8s͓ΑͼcontainerdͷCRIϥϯλΠϜ͕PodΛίϯτϩʔϧ
    • untrustedͳworkloadʹ͓͍ͯ͸Podͷ؅ཧΛOCIϥϯλΠϜʹόΠύε
    • `crictl runp`ΛOCIϥϯλΠϜʹόΠύεͯ͠OCIϥϯλΠϜ্ͷ
    `CreateSandbox()` ΍ `StartSandbox()` ͳͲͰPodΛ࡞੒ɾىಈ
    • Podͷ࢓༷͸OCI Specʹࡌ͍ͬͯͳ͍͕Ͳͷ࣮૷΋OCIϥϯλΠϜͰ࣮ݱ
    • gVisorɼKata-ContainersɼFirecrackerɼNable-ContainersͳͲ
    • Pod͚ͩͰͳ͘sandboxͱͯ͠ͷػೳΛOCI୯ମͰ΋ఏڙ͢ΔͨΊͱ൑அ

    View Slide