Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Dive into Runtime Shim

71c783224e1fccdb1d02ed37d494247f?s=47 moricho
August 22, 2020

Deep Dive into Runtime Shim

71c783224e1fccdb1d02ed37d494247f?s=128

moricho

August 22, 2020
Tweet

More Decks by moricho

Other Decks in Technology

Transcript

  1. 01. Deep Dive into Runtime Shim ContainerRuntime Meetup #2 August

    22, 2020
 by @_moricho_
  2. 02. Morito Ikeda Twitter: @_moricho_ Github: moricho

  3. High/Low level runtime ͷ ֓ཁ Runtime Shimͱ͸Կ͔ 03. ಘΒΕΔ͜ͱ ίϯςφͷstdout/stderr͸Ͳ͏؅ཧ͞ΕͯΔ͔

    ίϯςφϓϩηε͕Ͳ͏؅ཧ͞Ε͍ͯΔ͔
  4. 04. Introduction high/low level runtimeͷ͓͞Β͍ 1

  5. imageͷ؅ཧ (pull, rm, …) ΍ ίϯςφͷ͋Β ΏΔૢ࡞ΛΩοΫ͢Δ gRPCαʔϏε
 ࣮ࡍͷίϯςφૢ࡞ʹ͸ ɺ


    low level runtime (ޙड़) Λ࢖༻ 05. High level runtime (CRI runtime) https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto Kubelet ͔Β CRI (Container Runtime Interface) Λ௨ͯ͠ݺ͹ΕΔ ୅දతͳ΋ͷ͸ container-d, cri-o ͳͲ
  6. 06. High level runtime (CRI runtime) Kubelet ͔Β CRI (Container

    Runtime Interface) Λ௨ͯ͠ݺ͹ΕΔ
  7. high level rutimeͷ໋ྩʹΑͬͯɺ ࣮ࡍʹίϯςφϓϩηεΛ࣮ߦ͢Δ෦෼ 07. Low level runtime (OCI runtime)

    ୅දతͳ΋ͷ͸ runc, runsc (gVisor) ͳͲ ͨͩͷόΠφϦ
 state, create, start, kill, delete Λඋ͍͑ͯΔ
 opencontainers/runtime-specͷruntime.mdࢀর
  8. 08. Low level runtime (OCI runtime) create࣌ʹɺcapability, hostname, mount, ,,,ͳ

    Ͳίϯςφ࣮ߦʹඞཁͳ৘ใ͕ॻ͔Εͨ config.json ͕౉͞ΕΔ ৄࡉ͸ opencontainers/runtime-spec ͷ
 config.md
  9. 09. Runtime Shim ࠓ೔ͷຊ୊ 2

  10. ίϯςφϓϩηεͱhigh level runtime (containerdͳͲ) ͷؒͷίϛϡχέʔγϣϯΛऔΓ࣋ͭAPI
 ίϯςφͷ໘౗ΛݟΔdaemon 10. Runtime Shimͱ͸

  11. runcͷdetached modeͰͷىಈͷྫ (ӈਤ)
 
 low level runtime͸ίϯςφΛ্ཱͪ͛ͨΒ
 exitͯ͠͠·͏
 ͦͯ͠ίϯςφ͸defaultͰhostͷinitϓϩηεʹ
 reparent͞ΕΔ(high

    level runtime͔ΒΩοΫͨ͠
 ৔߹͸ͦͪΒ)
 
 => ίϯςφϓϩηε(ݽࣇϓϩηε)͕ࢮΜͩ
 ͱ͖ʹ௥͍੾Εͳ͍ɺhigh level runtimeΛ࠶ىಈ
 ͨ͠Γఀࢭ͢Δͱίϯςφ·Ͱࢮ͵ 11. low level runtime ͸Ͳ͜ʹ͍ͬͨʁ https://iximiuz.com/en/posts/implementing-container-runtime-shim/ runc container
  12. shim͕ low level runtime ΛΩοΫ
 low level runtime͕exitͨ͠ޙ΋ίϯςφͷ
 ໘౗Λݟͯ͘ΕΔ
 


    ɾίϯςφcreate࣌ͷerror handling΍
 statusͷreport
 ɾίϯςφͷstdout/stderrΛϩάϑΝΠϧ΁
 stream
 ɾexitίʔυͷtrack
 ͜ΕΒΛhigh level runtimeͱڞ༗ 12. Runtime Shimͷ໾ׂ https://iximiuz.com/en/posts/implementing-container-runtime-shim/ runc shim
  13. 13. Runtime Shimͷ໾ׂ ྫ͑͹conteinerdͷ৔߹ɺcontainerd-shim ͱ͍͏ίϯϙʔωϯτ͕ಉҰϦϙδτϦ಺Ͱ࣮૷͞Ε͍ͯΔ

  14. 14. ༨ஊ runsc (gVisor) ͱ࿈ܞ͢Δ༻ͷshim΋ଘࡏɻgVisorଆͰϝϯς͞Ε͍ͯΔɻ
 ͪͳΈʹҎલ͸”gvisor-containerd-shim”ͱ͍͏ผϦϙδτϦ͕ͩͬͨɺͪΐͬͱલʹ౷Ұ͞Εͨ

  15. 15. ༨ஊ gvisor-containerd-shimͰͷissue
 
 ίϯςφ͕OOMͰࢮΜͰΔͬΆ͍͕ɺ
 KubernetesͷํͰϩά͕දࣔ͞Εͳ͍ɻ
 ௐࠪͨ͠ΒɺgVisorͷshimͰOOMΛ
 ఻ୡ͢Δ༻ͷepollͷ࣮૷ൈ͚͕͋ͬͨ
 
 shimʹ͸ίϯςφͷঢ়ଶ΍ϩάΛ


    ্ͷϨΠϠʔʹਖ਼͘͠఻ୡ͢Δ੹຿͕͋Δ
  16. 16. Runtime Shimͷ໾ׂ ~subreaper~ low level runtime͕exit͢Δͱίϯςφϓϩηε͕hostͷinitϓϩηεʹreparent͞ΕΔ໰୊ shimϓϩηεΛsubreaperͱ͢Δ͜ͱͰɺinitͰ͸ͳ͘shimϓϩηεʹreparent
 
 =>

    shimϓϩηε͕ίϯςφͷexitΛtrackͯ͠ϑΝΠϧͳͲʹॻ͖ࠐΈɺ
 high level runtime͕ޙ͔Βࢀর͢Δ
  17. 17. Runtime Shimͷ໾ׂ ~subreaper~ ͋Δࢠϓϩηε͕͞Βʹforkͯ͠ଙϓϩηε͕ੜ·ΕΔ
 and ͦͷޙʹࢠϓϩηε͕ࢮΜͩ৔߹
 => ଙϓϩηε͸ݽࣇϓϩηεͱͳΓɺࣗಈతʹPID=1ʹ
 reparent

  18. 18. Runtime Shimͷ໾ׂ ~subreaper~ subreaperΛ࢖͏ͱ
 
 ΋ͱͷϓϩηε͔Β prctl(2) Λ
 “PR_SET_CHILD_SUBREAPER”

    ͜ͱҾ਺ʹ͠ ͯݺͿ
 ͜ͷϓϩηεͷࢠϓϩηε΍ͦͷࢠଙʹ͸͢΂ ͯ”subreaper”ͷϚʔΫ͕෇༩͞ΕΔ
 
 ݽࣇϓϩηε͕ࢮΜͩ৔߹
 => ࠷΋͍ۙઌ૆ͷ subreaper ϓϩηε ʹ”SIGCHLD”͕ૹΒΕɺwaitΛ࢖ͬͯऴྃεςʔ λεΛ஌Δ
  19. 19. Runtime Shimͷ໾ׂ ~ίϯςφͷstdout/stderrͷอ࣋~ high level runtime͕࠶ىಈ/ఀࢭͯ͠΋ɺshim͕ίϯςφͷstdout/stderrͷstreamΛಛఆϑΝΠϧʹྲྀ͢
 docker logs ΍

    kubectl logs Ͱ׆͖ͯ͘Δ Container Shim ϩά
  20. 20. Runtime Shimͷ໾ׂ ~ίϯςφͷstdout/stderrͷอ࣋~ Container Shim ϩά kubectl logs <pod>

    -c hoge
  21. 21. Runtime Shimͷ໾ׂ ~ίϯςφͷstdout/stderrͷอ࣋~ Container Shim ϩά kubectl logs <pod>

    -c hoge kubelet
  22. 22. Runtime Shimͷ໾ׂ ~ίϯςφͷstdout/stderrͷอ࣋~ Container Shim ϩά kubectl logs <pod>

    -c hoge kubelet High level
  23. 23. Runtime Shimͷ໾ׂ ~ίϯςφͷstdout/stderrͷอ࣋~ Container Shim ϩά kubectl logs <pod>

    -c hoge kubelet High level
  24. 23. Runtime Shimͷ໾ׂ Shim͕ίϯςφͷ؅ཧपΓͷ༷ʑͳλεΫΛר͖औͬͯ͘ΕΔ
 => High level runtime͸ίϯςφͷΩοΫ΍Πϝʔδ؅ཧʹઐ೦

  25. 24. Wrap Up Runtime Shim ɾHigh/Low level runtime͕஫໨͞Ε͕͕ͪͩɺ͔ܽͤͳ͍ॏཁͳίϯϙʔωϯτ
 ɾLow level

    runtime͸ίϯςφ࡞ͬͯૣʑexit => Shim͕໘౗ΛݟΔ
 ɾHigh level runtimeʹίϯςφʹؔ͢Δ৘ใΛڞ༗
 ɾ͋Μ·Γ೔ຊޠ৘ใམͪͯͳ͍
 
 Φεεϝͷӳޠهࣄ: https://iximiuz.com/en/posts/implementing-container-runtime-shim/
 minimamͳRuntime ShimΛRustͰ࣮૷͍ͯ͠Δ
  26. 25. એ఻ ɾίϯςφࣗ࡞ͷిࢠॻ੶ΛΠϯϓϨε͞Μ͔Βग़͠·͢ - @gorilla0513 ͞Μͱڞஶ ɾCNDT2020ͰgVisorͷ࿩Ͱొஃ͢ΔͷͰੋඇ

  27. 26. ࢀߟࢿྉ ɾImplementing Container Runtime Shim: runc https://iximiuz.com/en/posts/implementing-container-runtime-shim/ ɾDon’t Fear

    the Subreaper
 https://medium.com/@william.la.martin/dont-fear-the-subreaper-19c8127c031e ɾDealing with process termination in Linux (with Rust examples)
 https://iximiuz.com/en/posts/dealing-with-processes-termination-in-Linux/#awaiting-a-grandchild-process-termination ɾprctl(2) — Linux manual page https://man7.org/linux/man-pages/man2/prctl.2.html