超個体型データセンターOSとOCIランタイム

 超個体型データセンターOSとOCIランタイム

超個体型データセンターOSとOCIランタイム
はてな×さくらが考えるテクノロジーの未来 〜コンテナ・分散型データセンター〜

2019/03/20
さくらインターネット株式会社
さくらインターネット研究所
上級研究員
松本亮介 / まつもとりー / @matsumotory

2b692bd83f4418103142a053ecf5ff59?s=128

MATSUMOTO Ryosuke

March 20, 2019
Tweet

Transcript

  1. 1.

    ͘͞ΒΠϯλʔωοτגࣜձࣾ (C) Copyright 1996-2019 SAKURA Internet Inc ͘͞ΒΠϯλʔωοτݚڀॴ ௒ݸମܕσʔληϯλʔOSͱOCIϥϯλΠϜ 2019/03/20

    ্ڃݚڀһ দຊ ྄հ ͸ͯͳ×͘͞Β͕ߟ͑ΔςΫϊϩδʔͷະདྷ ʙίϯςφɾ෼ࢄܕσʔληϯλʔʙ
  2. 2.

    2 ɾ͘͞ΒΠϯλʔωοτݚڀॴ ্ڃݚڀһ ɾגࣜձࣾGrooves Forkewll ٕज़ސ໰ ɾϖύϘݚڀॴ ٬һݚڀһ ݚڀސ໰ ɾηΩϡϦςΟɾΩϟϯϓߨࢣ

    ɾ৘ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़ݚڀձ ֤छҕһ ɾژ౎େֶത࢜ʢ৘ใֶʣ দຊ྄հ / ·ͭ΋ͱΓʔ / @matsumotory
  3. 3.

    3 1. എܠͱ໨త 2. ௒ݸମܕσʔληϯλʔ 3. ௒ݸମܕσʔληϯλʔOSͱίϯςφ 4. ίϯςφͷOCIϥϯλΠϜͷαʔϕΠͱ࣮ݧ 5.

    ·ͱΊ ໨࣍ ※͜ͷݚڀʹج͍͍ͮͯ·͢: দຊ྄հ, ௶಺༎थ, ٶԼ߶ี, ෼ࢄܕσʔληϯλʔOSΛ໨ࢦͨ͠ϦΞΫςΟϒੑΛ࣋ͭίϯ ςφ࣮ߦج൫ٕज़, ৘ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2019-IOT-44, Vol.27, pp.1-8, 2018೥3݄.
  4. 5.

    5 େن໛σʔληϯλʔͷूத • σʔληϯλʔͷେن໛Խͱूத • ίϯϐϡʔλϦιʔεͱίετͷޮ཰Խ • Ϋϥ΢υར༻͕͜͜਺೥Ͱਵ෼ͱଅਐ͞Ε͖ͯͨ • ٕज़എܠͷมԽʹ൐ͬͯOSS΍Ϋϥ΢υαʔϏε΋ٸ଎ʹมԽ

    • ιϑτ΢ΣΞ΍ϕϯμʔʹڧ͘ґଘ͠ͳ͍มԽʹڧ͍ઃܭ͕ٸ຿ • αʔϏεͷػೳͷந৅Խͱૄ݁߹ͳઃܭ͕ීٴ • Ϋϥ΢υωΠςΟϒɾϚϧνΫϥ΢υɾϚΠΫϩαʔϏεԽ
  5. 6.

    6 σʔληϯλʔͷूத͔Β෼ࢄ • Ϋϥ΢υΛલఏʹϞϊϦγοΫͳαʔϏεઃܭ͔ΒϚΠΫϩαʔϏεԽ΁ • αʔϏεͷ֤ػೳΛখ͞ͳαʔϏεͱ࣮ͯ͠૷͠gRPC౳Ͱ࿈ܞ • ϚΠΫϩαʔϏε୯ҐͰͷଟ༷ͳνʔϜ։ൃ΍ӡ༻ͷޮ཰Խ • εέʔϦϯά΍ো֐࣌ͷӨڹͷہॴԽ

    • ϚΠΫϩαʔϏεؒͰͷଳҬෆ଍΍ϨΠςϯγʔͷ௿ݮ͕ٻΊΒΕΔ • େن໛σʔληϯλʔͷڑ཭ʢ౦ژͱੴङؒʣͰ΋ٞ࿦͕ੜ࢝͡ΊΔ • αʔό͚ͩͰͳ͘ηϯαʔ΍σόΠεͷߴ౓Խɾଟ਺ԽʹΑΔଳҬෆ଍
  6. 8.

    8 ຊൃද • ௒ݸମܕσʔληϯλʔʹ͓͚ΔίϯηϓτͱϏδϣϯͷ঺հ • ௒ݸମܕσʔληϯλʔOSʹඞཁͳཁ݅ͱ͸ • ݱ࣮తͳWebΞϓϦέʔγϣϯΛѻ͏ίϯςΩετͰ·ͣ͸ݕ౼ • ίϯϐϡʔςΟϯάϦιʔε͕෼ࢄԽͨ͠ࡍͷίϯςφͷ͋Γํ

    • σʔληϯλʔOSΛʹ͓͚Δϓϩηε΍εϨουͱͯ͠ͷίϯςφ • ίϯςφͷϦΞΫςΟϒੑͷॏཁੑΛٞ࿦ • ݱࡏͷ֤ۀքͷऔΓ૊Έ΍ίϯςφϥϯλΠϜͷ෼ྨͯ͠੔ཧ
  7. 17.

    17 ಁաੑͱίϯςφͷϦΞΫςΟϒੑ • σʔληϯλʔΛಁաత͔ͭ༗ػతʹίϯςφ͕ॲཧΛߦ͏ඞཁ͕͋Δ • ༷ʑͳίϯςφϥϯλΠϜΛϓϩηε΍εϨουͱݟཱͯΔ • ίϯςφ͕ϦΞΫςΟϒʹঢ়ଶΛม͑ΒΕΔΑ͏ʹ͢Δඞཁ͕͋Δ • ࣄલ༧ଌతͰ͸ͳ͘൓ԠతʹΞΫηεมԽͱϦιʔεׂ౰ΛҰகͤ͞Δ

    • ௒ݸମతʹߴ౓ʹ෼ࢄͨ͠σʔληϯλʔΛލ͍ͩ༗ػతͳ࿈ܞ • ίϯςφؒͷ࿈ܞ΍αʔό΍σʔληϯλʔؒΛߴ଎Ҡಈ͢Δඞཁ͕͋Δ • ϓϩηε΍εϨουͷΑ͏ͳOSΛʹ͓͚ΔϦΞΫςΟϒੑ͕ٻΊΒΕ͍ͯ͘
  8. 19.

    19 ίϯςφ࣌୅ͷWebαʔϏεج൫Ϟσϧ দຊ྄հ, ۙ౻Ӊஐ࿕, ࡾ୐༔հ, ྗ෢݈࣍, ܀ྛ݈ଠ࿠, FastContainer: ࣮ߦ؀ڥͷมԽʹૉૣ͘దԠͰ͖Δ߃ৗੑΛ࣋ͭγεςϜΞʔΩςΫνϟ, Πϯλʔωοτͱӡ༻ٕज़γϯϙδ΢Ϝ2017࿦จूɼ2017ɼ89-97ʢ2017-11-30ʣ,

    2017೥12݄. ← ͜͜Λߋʹਂ۷Γ 0SDIFTUSBUJPO-BZFS (,& &$4 .BSBUIPO ,VCFSOFUFT %PDLFS4XBSN 4USBUFHZ-BZFS 3BODIFS 'BTU$POUBJOFS 4FSWJDF-BZFS 8FC"QQMJDBUJPOPS4FSWJDFPO$POUBJOFST *OGSBTUSVDUVSF-BZFS ($1 "[VSF "84 0QFO4UBDL .FTPT #BSF.FUBM -JOVY,JU $POUBJOFS3VOUJNF-BZFS %PDLFS DPOUBJOFSE -9$ )BDPOJXB H7JTPS ,BUB$POUBJOFST $POUBJOFS3VOUJNF*OUFSGBDF $3*
  9. 20.

    20 ίϯςφϥϯλΠϜͷϨΠϠʔϞσϧԽ CRI ίϯςφϥϯλΠϜ ϥϯλΠϜ ্هͷΑ͏ʹఆٛ͞ΕΔ͜ͱ͕ଟ͍͕ɺ ίϯςφϥϯλΠϜͷதʹruncͳͲͷ ϥϯλΠϜ͕͋Δͱ͍͏ͷ͸গ͠Θ͔ Γʹ͍͘ɻ CRI

    CRIϥϯλΠϜ OCI OCIϥϯλΠϜ ίϯςφϥϯλΠϜ ΛϥϯλΠϜͷ໾ׂ ͰϨΠϠʔϞσϧԽ CRIϥϯλΠϜͱOCIϥϯλΠϜͱఆٛ※1ɻ͜ͷ2ͭ ͷϥϯλΠϜΛ·ͱΊͯίϯςφϥϯλΠϜͱ͢Δɻ CRI : Container Runtime Interface OCI: Open Container Initiative Runtime/Image Format Specification ※1 Google CloudͷIan Lewisࢯ͸CRIϥϯλΠϜΛHigh-Level RuntimeɺOCIϥϯλΠϜΛLow-Level Runtimesͱఆٛ https://www.ianlewis.org/en/container-runtimes-part-1-introduction-container-r
  10. 21.

    21 ίϯςφपลͷجຊϨΠϠʔϞσϧ ΦʔέετϨʔγϣϯ CRI CRIϥϯλΠϜ OCI OCIϥϯλΠϜ Podͱίϯςφ܈ CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓίϯ

    ςφΠϝʔδΛ؅ཧ͢ΔCRIϥϯλΠϜ ʢcri-oɺcontainerdͳͲʣ ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ ʢrunCɺrunscɺrunncɺrunVɺkata- runtimeɺcc-runtimeͳͲʣ
  11. 22.

    22 ྫɿίϯςφपลͷجຊϨΠϠʔϞσϧ kubelet CRI containerd OCI runC Podͱίϯςφ܈ ίϯςφͷߏ੒৘ใ΍ΠϝʔδͳͲ͔Β ίϯςφͷϦιʔεׂ౰΍ݖݶ෼཭Λߦͬ

    ͯίϯςφΛىಈͤ͞ΔOCIϥϯλΠϜ ʢrunCɺrunscɺrunncɺrunVɺkata- runtimeɺcc-runtimeͳͲʣ CRIͱOCIʹ४ڌ͍ͯ͠Ε͹ɺ ΦʔέετϨʔγϣϯ૚͸ kubernetesΛ࢖͍ͭͭɺ޷͖ʹ CRIϥϯλΠϜ΍OCIϥϯλΠϜ Λஔ͖׵͑Մೳ CRIܦ༝ͰΦʔέετϨʔγϣϯʹجͮ ͖ίϯςφߏ੒৘ใΛड͚औͬͨΓίϯ ςφΠϝʔδΛ؅ཧ͢ΔCRIϥϯλΠϜ ʢcri-oɺcontainerdͳͲʣ
  12. 25.

    25 OCIίϯςφϥϯλΠϜͷαʔϕΠͱ࣮ݧ • runCɼgVisorɼNabla-ContainersɼFirecrackerɼKata-Containersͷݱঢ়ௐࠪ • Hello Worldͱloop͢ΔDockerΠϝʔδΛ࡞੒ • ֤छOCIίϯςφϥϯλΠϜͰHello World(Cݴޠ)Λ࣮ߦ

    • TimeίϚϯυͰPodىಈ+ίϯςφىಈ+Hello worldͷ࣮ߦ࣌ؒΛܭଌ • loopίϯςφΛىಈͤͯ͞ϝϞϦαΠζʢRSSʣΛܭଌ • ࣮ݧϗετɿEC2 i3.metal Πϯελϯε, 72 vCPUsɼ512 GB ϝϞϦ
  13. 26.

    26 OCIίϯςφϥϯλΠϜίϚϯυͷ௚઀࣮ߦ time sudo runc run bundle time sudo runsc

    -log /dev/null run bundle time sudo kata-runtime run bundle cid=`sudo docker create mizzy/hello:latest` mkdir -p bundle/rootfs sudo docker export $cid | tar -C bundle/rootfs -xvf -
  14. 28.

    ϝοηʔδ ηΩϡϦςΟ ࣮૷ྫ helloworldੑೳ (Pod+ίϯςφىಈ଎౓) ऩ༰ޮ཰ (1ίϯςφ͋ͨΓͷϝϞ ϦͷϑοτϓϦϯτ) ϓϩηεܕ ωʔϜεϖʔεͷִ཭

    runC 0.159 s runc: 10216 KB ߹ܭ໿ 10 MB αϯυϘοΫεܕ ϢʔβϥϯυΧʔωϧ γεςϜίʔϧΞΫηε੍ޚ gVisor(runsc) 0.197 s runsc: 117748 KB runsc-gopher: 13028 KB runsc-sandbox: 18404 KB ߹ܭ໿ 150 MB ϢχΧʔωϧܕ ϢχΧʔωϧ෼཭ (ઐ༻appΠϝʔδͱ࠷௿ݶͷγ εςϜίʔϧ੍ݶ) Nabla-Containers(runnc) runncͷ࢓༷͕ίϯςφ࣮ߦ׬ ྃΛ଴ͨͳ͍ͨΊະܭଌ runncͷ࢓༷͕ίϯςφ࣮ߦ׬ ྃΛ଴ͨͳ͍ͨΊະܭଌ microVMܕ microVM (virtio-net,virtio-blockɼserial console, a 1-button key-board controller) Firecracker runc૬౰ͷίϚϯυͱݱ࣌఺Ͱ ௚઀࿈ܞͰ͖ͳ͍ͨΊະܭଌ runc૬౰ͷίϚϯυͱݱ࣌఺Ͱ ௚઀࿈ܞͰ͖ͳ͍ͨΊະܭଌ VMܕ VM Kata-Containers 1.392 s kata-runtime: 28424 KB qemu-lite-system-x86_64: 222208 KB kata-proxy: 6884 KB kata-shim: 19124 KB ߹ܭ໿ 280 MB
  15. 29.

    29 containerdΛܦ༝࣮ͨ͠ߦ time sudo ctr run \ --rm --runtime io.containerd.runc.v1

    \ docker.io/mizzy/hello:latest \ foo /hellotime sudo ctr run \ --rm \ --runtime io.containerd.runsc.v1 docker.io/mizzy/hello:latest ba /hello time sudo ctr run \ --rm \ --runtime io.containerd.kata.v2 \ docker.io/mizzy/hello:latest baz /hello time sudo ctr run \ --rm \ --runtime io.containerd.runtime.v1.linux \ docker.io/mizzy/hello:latest foo /hello time sudo ctr run \ --rm \ --snapshotter firecracker-naive \ --runtime aws.firecracker \ docker.io/mizzy/hello:latest foo /hello
  16. 31.

    ϝοηʔδ ηΩϡϦςΟ ࣮૷ྫ helloworldੑೳ (Pod+ίϯςφىಈ଎౓) ऩ༰ޮ཰ (1ίϯςφ͋ͨΓͷϝϞϦͷ ϑοτϓϦϯτ) ϓϩηεܕ ωʔϜεϖʔεͷִ཭

    runC 0.361 s ctr: 26592 KB ߹ܭ໿ 26 MB αϯυϘοΫεܕ ϢʔβϥϯυΧʔωϧ γεςϜίʔϧΞΫηε੍ޚ gVisor(runsc) 0.422 s ctr: 26600 KB runsc: 12296 KB containerd-shim-runsc-v1: 6908 KB runsc-gopher: 12296 KB runsc-sandbox: 18124 KB ߹ܭ໿ 75 MB ϢχΧʔωϧܕ ϢχΧʔωϧ෼཭ (ઐ༻appΠϝʔδͱ࠷௿ݶͷ γεςϜίʔϧ੍ݶ) Nabla-Containers(runnc) containerd shim API v2ʹରԠ͠ ͍ͯͳ͍ͨΊܭଌෆՄ containerd shim API v2ʹରԠ͍ͯ͠ ͳ͍ͨΊܭଌෆՄ microVMܕ microVM (virtio-net,virtio-blockɼ serial console, a 1-button key-board controller) Firecracker (naive snapshotter) 8.117 s ctr: 26120 KB containerd-shim-aws-firecracker: 13748 KB firecracker: 59152 KB ߹ܭ໿ 100 MB (native_snapshotter: 11400 KB) VMܕ VM Kata-Containers 1.570 s ctr: 26572 KB containerd-shim-kata-v2 : 19780 KB qemu-lite-system-x86_64: 195864 KB ߹ܭ໿ 241 MB
  17. 32.

    32 dockerdΛܦ༝࣮ͨ͠ߦ time sudo docker run --rm mizzy/hello:latest /hello time

    sudo docker run --rm --runtime=runsc mizzy/ hello:latest /hello time sudo docker run --rm --runtime=kata-runtime mizzy/ hello:latest /hello time sudo docker run --rm --runtime=runnc mizzy/ hello:latest /hello.nabla time sudo docker run --rm --runtime=kata-fc mizzy/ hello:latest /hello
  18. 34.

    ϝοηʔδ ηΩϡϦςΟ ࣮૷ྫ helloworldੑೳ (Pod+ίϯςφىಈ଎౓) ऩ༰ޮ཰ (1ίϯςφ͋ͨΓͷϝϞϦͷ ϑοτϓϦϯτ) ϓϩηεܕ ωʔϜεϖʔεͷִ཭

    runC 0.847 s docker: 50356 KB containerd-shim: 6124 KB ߹ܭ໿ 56 MB αϯυϘοΫεܕ ϢʔβϥϯυΧʔωϧ γεςϜίʔϧΞΫηε੍ޚ gVisor(runsc) 1.034 s docker: 50532 KB cintainerd-shim: 5812 KB runsc-gopher: 12296 KB runsc-sandbox: 18124 KB ߹ܭ໿ 85 MB ϢχΧʔωϧܕ ϢχΧʔωϧ෼཭ (ઐ༻appΠϝʔδͱ࠷௿ݶͷ γεςϜίʔϧ੍ݶ) Nabla-Containers(runnc) 0.897 s docker: 50720 KB containerd-shim: 5512 KB nabla-run: 6684 KB ߹ܭ໿ 62 MB microVMܕ microVM (virtio-net,virtio-blockɼ serial console, a 1-button key-board controller) Firecracker (devmapper snapshotter) (Kata plugin) 3.889 s docker: 1170808 KB docker-containerd-shim: 9960 KB kata-shim: 455664 KB firecracker: 145952 KB ߹ܭ໿ 1700 MB VMܕ VM Kata-Containers 2.415 s docker: 51056 KB containerd-shim: 6060 KB qemu-lite-system-x86_64: 227316 KB kata-proxy: 6132 KB kata-shim: 19536 KB ߹ܭ໿ 310 MB
  19. 35.

    35 Pod͓Αͼίϯςφͷىಈ࣌ؒͱAppੑೳ • VM΍MicroVMΞϓϩʔν͸Podىಈʹ͕͔͔࣌ؒΔ • Pod͕ىಈͯ͠͠·͑͹AppͷΞΫηε੍ޚ͸ݫີͰͳ͍ • ίϯςφ্ͷWebApp͸ൺֱతੑೳ͕ߴ͘ͳΔ • αϯυϘοΫ΍ϢχΧʔωϧͷΞϓϩʔν͸Podىಈ͸଎͍

    • AppͷγεςϜίʔϧ΍ϑΝΠϧΞΫηεΛ؂ࢹ͠ݫີʹΞΫηε੍ޚ • ίϯςφ্ͷWebApp͸ൺֱతੑೳ͕௿͘ͳΔ → ίϯςφͰಈ࡞͢ΔΞϓϦέʔγϣϯͷੑೳΛࠓޙ͸ܭଌ͍ͯ͘͠༧ఆ
  20. 36.

    36 ௒ݸମܕσʔληϯλʔʹ͓͚Δίϯςφ • ඞཁͳͱ͖΍ཁ݅ʹ߹Θͤͯద੾ͳOCIϥϯλΠϜͰىಈ • OSʹ͓͚Δϓϩηε΍εϨουͷ࢖͍ํͱಉ༷ • ίϯςφىಈ଎౓ͱىಈޙͷΞϓϦέʔγϣϯ଎౓ͷτϨʔυΦϑΛٞ࿦ • ֎తͳΞΫηε܏޲΍༧ଌͰ͖ͳ͍มԽʹϦΞΫςΟϒʹରԠͤ͞Δ

    • ϓϩηε΍εϨουؒͷ࿈ܞʹ͓͍ͯ΋ϗετಁաతʹॲཧ͢Δ • ߴ଎ʹίϯςφͷঢ়ଶΛมԽͤͨ͞ΓҠಈ͢Δݚڀ͕ඞཁ[1] [1] দຊ྄հɾ௶಺༎थɾٶԼ߶ี, CRIUΛར༻ͨ͠HTTPϦΫΤετ୯ҐͰίϯςφΛ࠶഑ஔͰ͖Δ௿ίετͰߴ଎ͳεέ δϡʔϦϯάख๏, IOT44, 2019೥3݄.
  21. 37.
  22. 38.

    38 ௒ݸମܕσʔληϯλʔOSΛ໨ࢦͯ͠ • ௒ݸମܕσʔληϯλʔͷίϯηϓτΛ঺հ • σʔληϯλʔػೳ͕ࣾձʹ༹͚ࠐΈͳ͕ΒΫϥ΢υͷϚγϯύϫʔΛ׆༻ • ۩ମతͳϏδϣϯΛ঺հ • σʔληϯλʔͱίϯςφͷεέδϡʔϦϯάͷ؍఺Ͱٞ࿦

    • ϦΞΫςΟϒʹঢ়ଶΛมߋՄೳʹ͢Δॏཁੑʹ͍ͭͯݕ౼ • ֤ࣾͷίϯςφͷOCIϥϯλΠϜ࣮૷ͷ঺հͱݱঢ়ͷ࣮ݧతධՁ • ίϯςφΛεϨου΍ϓϩηεͱݟཱͯͨ৔߹ͷ෼ྨΛ੔ཧ
  23. 39.

    39 ࠓޙͷ՝୊ͱݕ౼ • OCIϥϯλΠϜͷ෼ྨʹ͓͍ͯߋʹߟ࡯ • ΋ͬͱద੾ͳPodͱίϯςφͷ͋Γํ͕ͳ͍͔ • ूੵ཰ɾੑೳɾηΩϡϦςΟɾ࢖͍΍͢͞ͷόϥϯεΛ͞Βʹݕ౼͢Δ • Podͷىಈͷ଎౓ͱίϯςφͷΞΫηε੍ޚͷੑೳͷόϥϯεΛٞ࿦

    • ߴ౓ʹ෼ࢄ͞Εͨίϯςφͷ৘ใΛ؅ཧ͢Δ࿮૊Έͷઃܭͱ࣮૷ • ps΍topίϚϯυͷΑ͏ͳ΋ͷ͔ΒΑΓߴ౓ͳπʔϧ·Ͱ • ϓϩηε΍εϨουͷѻ͍Λศརʹ͢Δ֓೦ͳͲͷݕ౼