Upgrade to Pro — share decks privately, control downloads, hide ads and more …

3 state-of-the-art technologies in Linux and future of the containers #SECKUN

2cf373725ded741824c50fd571eda6e1?s=47 KONDO Uchio
February 21, 2021

3 state-of-the-art technologies in Linux and future of the containers #SECKUN

2021.02.21 「新しいセキュリティビジネスキャリア」シンポジウム

2cf373725ded741824c50fd571eda6e1?s=128

KONDO Uchio

February 21, 2021
Tweet

Transcript

  1. ཁૉٕज़ͷstate of the art͔Βߟ͑Δ ۙ౻Ӊஐ࿕ / GMO Pepabo, Inc. 2021.02.21

    ʮ৽͍͠ηΩϡϦςΟϏδωεΩϟϦΞʯγϯϙδ΢Ϝ Linuxίϯςφͷະདྷ
  2. GMOϖύϘגࣜձࣾ γχΞϓϦϯγύϧ ٕज़෦ٕज़ج൫νʔϜॴଐ ۙ౻Ӊஐ࿕ (@udzura)

  3. ۙ౻Ӊஐ࿕ ུྺ • ࡾՏᅳͷਓɻچ٢ాൡߍ࣌शؗߴߍΛଔۀɺ౦ژେֶจֶ෦೔ຊޠ೔ຊจֶઐम՝ఔ ͷֶ࢜ଔʢ2007ʣɻ • Ϛείϛͷࣾ಺SEɺECαΠτ։ൃɺΦϯϥΠϯήʔϜ ։ൃͳͲΛܦͯ2013೥ΑΓݱ৬ɺಉ೥ʹ෱ԬҠॅɻ • RubyɺίϯςφɺΫϥ΢υωΠςΟϒٕज़ͳͲͷίϛϡχςΟͰ

    ׆ಈɻஶॻʹʮWebͰ࢖͑ΔmrubyγεςϜϓϩάϥϛϯάೖ໳ʯʢC&Rݚڀॴʣ • ޷͖ͳγεςϜίʔϧ͸ʢ࠷ۙ͸ʣ socketpair(2) ɻ
  4. ࠓ೔ͷ͓࿩ •ίϯςφͷཁૉٕज़ʹ͍ͭͯɺࢲͷߨٛͰʢͬ͘͟Γʣཧղ͞Εͨํ ޲͚ͷ಺༰Ͱ͢ɻཁૉٕज़ͷղઆ͸ࢀߟࢿྉΛͲ͏ͧɻ •ࢀߟ1: https://container-security.dev/ •ࢀߟ2: ʰίϯςφܕԾ૝Խ֓࿦ʱʢ઒ޱ, ΧοτγεςϜʣ •௚ۙͷΧʔωϧʹؚ·ΕΔ৽ٕज़ͷ͏ͪɺίϯςφʹؔ܎͢Δ΋ͷΛ ঺հ͠·͢ɻ

    •͕࣌ؒ͋Ε͹ɺͦͷ্ͰίϯςφͷະདྷΛߟ͑·͢ɻ
  5. cgroup v2

  6. cgroup ͷ͓͞Β͍ •Linux Kernelͷجຊٕज़ͷҰͭɻ •ϓϩηεΛάϧʔϐϯά͠ɺͦͷάϧʔϓ୯ҐͰϦιʔεར༻ͷ੍ݶ Λ͔͚Δٕज़ɻCPUɺϝϞϦɺIOɺϓϩηε਺... IUUQTHJIZPKQBENJOTFSJBMMJOVY@DPOUBJOFST

  7. cgroup v1ͷྫ •cgroupfs ͱ͍͏ϑΝΠϧγεςϜʹmkdir(2) read(2) write(2)ͳͲΛ࣮ ߦ͠ɺૢ࡞Λߦ͏

  8. cgroup v2 •v1 ͷ͍͔ͭ͘ͷܽ఺ - ओʹ੍ޚର৅ʢίϯτϩʔϥʣ͝ͱʹσΟϨΫ τϦΛ෼͚ͳ͚Ε͹͍͚ͳ͍࢓༷ - Λࠀ෰͢΂͘։ൃ͞Εͨ •େ͖ͳҧ͍ͱͯ͠ɺ

    v1 Ͱ͸ίϯτϩʔϥผʹσΟϨΫτϦ ΛϚ΢ϯ τɺݸผʹάϧʔϓʹॴଐͰ͖ͨͷʹର͠ɺv2Ͱ͸શίϯτϩʔϥΛ ·ͱΊͨҰͭͷσΟϨΫτϦͷΈΛϚ΢ϯτ͠ɺ·ͱΊͯάϧʔϓΛ ࡞੒͢ΔڍಈʹͳΔɻ •ίϯςφͱͯ͠͸ͪ͜Βͷํ͕౎߹͕͍͍ɻ
  9. Unified hierarchy /sys/fs/cgroup /sys/fs/cgroup /group-a /group-b /cpu.* /memory.* /io.* ...

    /cpu.* /memory.* /io.* ... /cpu /memory /blkio /group-a /group-b /group-a /group-c
  10. ίϯςφϥϯλΠϜͰͷcgroupͷར༻ •ʢOCIܥͷʣϥϯλΠϜͰ͸ҎԼͷ2ͭͷઃఆ߲໨͕͋Δ •Cgroup Driver: ίϯςφʹׂΓ౰ͯΔcgroupΛͲ͏ίϯτϩʔϧ͢Δ͔ •cgroupfs: cgroupfs΁ͷ௚઀ͷϑΝΠϧૢ࡞ •systemd: systemdʹΑΔ؅ཧ •Cgroup

    Version: Ϧιʔε੍ݶʹ v1/v2 ͲͪΒΛར༻͢Δ͔ •/sys/fs/cgroup ʹͲͷϑΝΠϧγεςϜ͕Ϛ΢ϯτ͞ΕͯΔ͔Ͱ൑ఆ
  11. v2 ͷ৽ػೳ •Unified Hierarchy •PSI(Pressure Stall Information) •eBPFͰcgoup IDͷऔಘ͕Մೳʹ •nsdelegate

    (ඇಛݖίϯςφʹ͸ॏཁ) •clone3(2) Ͱಛఆͷcgroup಺෦ʹ௚઀ϓϩηε࡞੒͕Մೳʹ •ͳͲͳͲ...
  12. e.g. PSI(Pressure Stall Information) •γεςϜશମɺ·ͨ͸cgroup୯ҐͰར༻Ͱ͖Δෛՙͷࢦඪ •CPU, ϝϞϦ, IO Ͱ stall

    ͨ͠୯Ґ࣌ؒͰͷׂ߹ ΛܭଌͰ͖Δ •e.g. 1෼ؒͰ45ඵؒɺάϧʔϓͷ ͋Δϓϩηε͕CPUىҼͰ ஗Ԇͨ͠৔߹ɺcpu some: 75.00
  13. e.g. eBPFͰͷτϥοΩϯά৘ใ •bpf_get_current_cgroup_id(void) ϔϧύʔ •eBPFͷΠϕϯτ͕ى͖ͨλεΫ͕Ͳͷcgroup(v2)ʹॴଐ͍ͯ͠Δ͔ɺ ͦͷIDΛฦ͢ɻ

  14. How cgroup-v2 and PSI Impacts Cloud Native? Uchio Kondo /

    GMO Pepabo, Inc. 2019.07.23 CloudNative Days Tokyo 2019 Image from pixabay: https://pixabay.com/images/id-3193865/
  15. eBPF per containers

  16. eBPFٕज़ͱ͸ •ϢʔβۭؒͰ࡞ͬͨϓϩάϥϜΛΧʔωϧͰಈ͔ٕ͢ज़ͷͻͱͭ •2012೥ʹseccomp΁ͷಋೖɺ2013೥ʹLinuxͷSDNͰͷԠ༻͕࣮૷͞ ΕɺͦΕҎ߱੒ख़͢Δ •ϑΟϧλϦϯά͕ಘҙʢtcpdump, seccomp, bpftraceʣ •Χʔωϧͷ৘ใʹΞΫηεͰ͖Δ͕ɺةݥͳίʔυ͸ಈ͔ͳ͍ͳͲ ҆શੑ͕͋Δఔ౓୲อ͞Ε͍ͯΔ

  17. eBPFͷԠ༻ྫ

  18. ίϯςφͷeBPFτϨʔεઓུ •ઓུ͕͍͔ͭ͋͘Δ •Linux Namespace·ͨ͸cgroup (v2)ͷ৘ใ͕ར༻Ͱ͖Δ

  19. ྫ1: task_struct ͷ৘ใΛḷΔ •task_struct→nsproxy ͔Β namespaceͷ৘ใΛ औಘͯ͠ϑΟϧλ͢Δ ʢcxrayʣ IUUQTHJUIVCDPNNSUDDYSBZCMPCNBTUFSQLHUSBDFSPQFOPQFOHP--

  20. ྫ2: NS಺/ϗετͰͷPIDΛൺֱ •BPFϓϩάϥϜͰऔಘͰ͖ͨ tidͱɺϗετͰͷtidΛ ൺֱ͠ɺҰக͠ͳ͚Ε͹ ίϯςφͱ൑ఆ͢Δ ʢTraceeʣ • task_structґଘ IUUQTHJUIVCDPNBRVBTFDVSJUZUSBDFFCMPCNBJOUSBDFFUSBDFFCQGD-ɹ

  21. ྫ3: cgroup helperΛར༻ IUUQTHJUIVCDPNVE[VSBDPQFODMPTFCMPCNBTUFSTSDCQGDPQFODMPTFCQGD

  22. ࣮૷ྫ •copenclose(8) •ۙ౻ͷPoC (BPF+Rust) •ϑϥάͰtask_struct/ cgroup v2 ID Λ੾Γସ͑

  23. bpf_get_current_cgroup_id(void) を添えて Uchio Kondo / Container Runtime Meetup #3 ランタイムとcgroupの


    xxxな関係 * Photo by Fukuoka City
  24. seccomp

  25. seccompͷ͓͞Β͍ •ϓϩάϥϜʹ͓͚ΔγεςϜίʔϧݺͼग़͠ΛϑΟϧλϦϯά͢Δ •γεςϜίʔϧͷҾ਺ͷ৚݅ʹΑͬͯࢦఆΛม͑ΒΕΔ •blacklist(denylist), whitelist(allowlist) ͳͲΛ࣮૷Ͱ͖Δ •ϑϥά͕ࡉ͔͘ଘࡏ͠ɺྫ͑͹γεςϜίʔϧͷauditϩάͷΈɺ೚ҙ ͷerrnoΛฦͤ͞ΔɺͳͲͷࢦఆ͕Ͱ͖Δ

  26. seccompͷར༻(mruby)

  27. User space notification •seccompʹΑΓγεςϜίʔϧݺͼग़͠Λݕ஌͠ɺͦͷڍಈΛϢʔβ ϥϯυͷϓϩάϥϜʹҕͶΔ͜ͱ͕Ͱ͖Δٕज़ •Linux 5.0 (2019/3) ͔Βͷಋೖ •൑அ͢Δ·ͰɺͦͷγεςϜίʔϧ͸ϒϩοΫ͢Δ

    •e.g. LXCͰͷσόΠεΞΫηεͷ੍ޚ IUUQTHJIZPKQBENJOTFSJBMMJOVY@DPOUBJOFSTɹ
  28. User space notification IUUQTHJIZPKQBENJOTFSJBMMJOVY@DPOUBJOFSTɹ • LXCͰֶͿίϯςφೖ໳ ୈ47ճɹඇಛݖίϯςφͷՄೳੑΛ޿͛Δseccomp notifyػೳ ΑΓ

  29. ࣮૷ྫʢmrubyར༻ʣ •ҎԼͷΑ͏ͳ acceptor.rbΛ ༻ҙ͢Δ

  30. ࣮૷ྫʢmrubyར༻ʣ •ҎԼͷinvokerΛܦ༝ͯ͠ϓϩάϥϜΛ ىಈɺ listen(3) ΛݺͿ

  31. listen(2) ͷىಈݕ஌ •acceptor.rb ଆͷίϯιʔϧͰڐՄ/ېࢭΛ੍ޚՄೳɻ •ېࢭͨ͠Βͦͷ··ىಈࣦഊͯ͠invokerϓϩηε͕མͪΔ •ڐՄͨ͠ΒԿ΋ͳ͔͔ͬͨͷΑ͏ʹɺىಈΛܧଓͯ͠Ϧοεϯɻ

  32. listen(2) ͷىಈݕ஌ ېࢭ࣌ͷग़ྗ ڐՄ࣌ͷग़ྗ

  33. Ԡ༻ʁ •ʮ೚ҙͷϥΠϒϥϦؔ਺ݺͼग़͠ʯͰϓϩηεΛఀࢭɺCRIU(*)ʹΑΓ ϓϩηεμϯϓΛ࡞੒͢Δ࣮ݧΛߦͬͨɻ •LD_PRELOAD + ϥούؔ਺ + ʮԿ΋͠ͳ͍ʯsyscall + seccomp

    IUUQTVE[VSBIBUFOBCMPHKQFOUSZ $IFDLQPJOUBOE3FTUPSF*O6TFSTQBDF ϓϩηεͷঢ়ଶΛอଘɺ͔ͦ͜Β࠶ੜ͢Δٕज़ IUUQTDSJVPSH.BJO@1BHF
  34. ߟ࡯

  35. ৽ٕज़ʹΑΓͰ͖Δ͜ͱ͸૿͑Δ͕... •৽ٕज़ͷʮग़ݱʯͱʮීٴʯͷλΠϛϯά͸ζϨΔ •ͨͱ͑͹ cgroup v2ͷॳग़͸2013೥ɻ •2019 ~ 2020 ೥ʹϥϯλΠϜͰͷରԠ͕ਐΜͩଆ໘ •पลͷπʔϧ͕ग़ݱ͢Δͷ͸΋ͬͱઌͰ͋Ζ͏

    •ग़ݱظʹ୯ମͰٕज़Λݕূ͠ɺʢηΩϡϦςΟؚΊʣͲ͏͍͏໰୊͕ ͋Δ͔ɺͲ͏͍͏Մೳੑ͕͋Δ͔ݕূ͢Δҙٛ͸େ͖͍
  36. eBPF ͸Linuxͷجຊٕज़ʹͳΓͭͭ͋Δ •ద༻ൣғ͕ͲΜͲΜ޿·͍ͬͯΔ •τϨʔγϯάɺଳҬ੍ޚ΍ωοτϫʔΩϯάͷ΄͔ɺcgroup(v2) deviceͷ൑ఆɺLSM BPF programͳͲͳͲ... •ηΩϡϦςΟͷจ຺Ͱ͸τϨʔεɺ؂ࠪɺҟৗݕ஌ͱ͔ܽͤͳ͍ٕज़ ʹͳΔ͜ͱ͕૝૾͞ΕΔ •Ұͭͷprog

    typeʹ৮͓͚ͬͯͩ͘Ͱ΋ײ֮͸෼͔Γͦ͏