Upgrade to Pro — share decks privately, control downloads, hide ads and more …

つくって学ぶLinuxコンテナの裏側

 つくって学ぶLinuxコンテナの裏側

YAPC::ASIA Hachioji 2016 mid in Shinagawa

Hayato Imai

July 02, 2016
Tweet

More Decks by Hayato Imai

Other Decks in Technology

Transcript

  1. -JOVYίϯςφ֓ཁ ϋʔυ΢ΣΞԾ૝Խͱ 04ϨϕϧԾ૝Խʢ̍ʣ ϋʔυ΢ΣΞԾ૝Խ w ෳ਺ͷԾ૝Խ͞Εͨϋʔυ΢ΣΞΛಈ͔͢Ծ૝Խํࣜ w 7.8BSF 7JSUVBM#PY 9FO

    ,7. )ZQFS7ͳͲͰ࠾ ༻͞Ε͍ͯΔ 04ϨϕϧԾ૝Խ w ෳ਺ͷԾ૝Խ͞Εͨ04؀ڥΛಈ͔͢Ծ૝Խํࣜ w $POUBJOFS -JOVY +BJM 'SFF#4% ;POF 4PSBMJT ͱ ݺ͹Ε͍ͯΔ
  2. -JOVYίϯςφ֓ཁ ୅දతͳίϯςφ࣮૷ w %PDLFS SVO$  IUUQTXXXEPDLFSDPN w -9$ IUUQTMJOVYDPOUBJOFSTPSH

    w SLU $PSF043PDLFU  IUUQTDPSFPTDPNSLU w TZTUFNE IUUQTXXXGSFFEFTLUPQPSHXJLJ4PGUXBSFTZTUFNE
  3. -JOVYίϯςφ֓ཁ ܰྔίϯςφ࣮૷ w KBJMJOH IUUQTHJUIVCDPNLB[VIPKBJMJOH w ESPPU IUUQTHJUIVCDPNZVVLJESPPU w NJODT

    IUUQTHJUIVCDPNNIJSBNBUNJODT w 'JSFKBJM IUUQTpSFKBJMXPSEQSFTTDPN w QqBTL IUUQTHJUIVCDPNHIFEPQqBTL
  4. ίϯςφΤϯδϯΛͭ͘Ζ͏ ίϯςφΤϯδϯZBQ$ $ sudo yapc /bin/bash $ sudo YAPC_CPU_QUOTA=50000 yapc

    \ /bin/bash -c "yes >/dev/null" $ sudo YAPC_CAPS="cap_net_raw" yapc ping 127.0.0.1 $ sudo YAPC_ROOT=centos yapc yum --help w γΣϧεΫϦϓτ w ϦιʔεΛ෼཭ͨ͠؀ڥͰϓϩάϥϜΛ࣮ߦ w $16΍ϝϞϦͳͲͷγεςϜϦιʔεΛ੍ݶՄೳ w ίϯςφ಺ͷݖݶΛ੍ݶՄೳ w SPPUϑΝΠϧγεςϜΛࢦఆՄೳ IUUQTHJUIVCDPNIBZBKPZBQ$USFFZBQDPKJNJE
  5. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ෼཭ /BNFTQBDFͷछྨ ໊લ Χʔωϧ ֓ཁ .PVOU  Ϛ΢ϯτϙΠϯτͷू߹Λ෼཭͢Δɻ 654 

    ϗετ໊ɺ/*4υϝΠϯ໊Λ෼཭͢Δɻ *1$  4ZT7*1$ΦϒδΣΫτɺ104*9ΩϡʔΛ෼཭͢Δɻ 1*%  1*%ۭؒΛ෼཭͢Δɻ /FUXPSL  ωοτϫʔΫʹؔ࿈͢ΔγεςϜϦιʔεΛ෼཭͢Δɻ 6TFS  6*%(*%ͳͲΛ෼཭͢Δɻ $HSPVQ  $HSPVQϧʔτσΟϨΫτϦΛ෼཭͢Δ
  6. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ෼཭ /BNFTQBDFͷૢ࡞ w DMPOF   ৽͍͠ϓϩηεΛੜ੒ͯ͠ωʔϜεϖʔεΛ෼཭͢Δ w VOTIBSF 

     ݱࡏͷϓϩηεͷωʔϜεϖʔεΛ෼཭͢Δ  w TFUOT   ࢦఆͨ͠ϓϩηεͷωʔϜεϖʔεΛมߋ͢Δ  w VOTIBSF   VOTIBSF  ͷ$-*ΠϯλʔϑΣʔε 1*%ωʔϜεϖʔε͸ࢠϓϩηεͷωʔϜεϖʔε͕෼཭Ҡಈ͞ΕΔ
  7. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ੍ޚ DHSPVQϑΝΠϧγεςϜ w cgroupfsΛϚ΢ϯτ͠ɺσΟϨΫτϦʹΑΔ֊૚ߏ଄Ͱάϧʔ ϓΛදݱ w cpu, memoryͳͲ୯ҰͷϦιʔεΛαϒγεςϜͱݺͿ w ҟͳΔෳ਺ͷ֊૚Λ࣋ͭ͜ͱ͕Մೳ

    ʢcpu, memoryͳͲͷαϒγεςϜ͝ͱʣ w ෳ਺ͷαϒγεςϜΛ૊Έ߹Θͤͨ֊૚ߏ଄΋࡞੡Մೳ ʢcpu+memoryͳͲʣ w Ұൠతʹ͸/sys/fs/cgroup/ҎԼʹϚ΢ϯτ
  8. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ੍ޚ αϒγεςϜͷछྨʢ̍ʣ αϒγεςϜ Χʔωϧ ֓ཁ DQV  $16࣮ߦ࣌ؒͷ੍ޚ DQVBDDU 

    $16Ϩϙʔτͷੜ੒ DQVTFU  $16ίΞͷׂ౰ EFWJDFT  σόΠεϑΝΠϧͷΞΫηε੍ޚ GSFF[FS  ϓϩηεͷҰ࣌ఀࢭ࠶։ NFNPSZ  ϝϞϦ্ݶͷ੍ޚ
  9. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ੍ޚ αϒγεςϜͷछྨʢ̎ʣ αϒγεςϜ Χʔωϧ ֓ཁ OFU@DMT  ωοτϫʔΫύέοτ΁ͷλά෇͚ CMLJP 

    ϒϩοΫσόΠεͷೖग़ྗ੍ޚ QSFG@FWFOU  QSFGπʔϧͰϞχλϦϯάͷ੍ޚ OFU@QSJP  ωοτϫʔΫτϥϑΟοΫͷ༏ઌ౓Λ੍ޚ IVHFUMC  αΠζͷେ͖͍Ծ૝ϝϞϦϖʔδͷ࠶ಡ QJET  ϓϩηε্ݶͷ੍ޚ
  10. ίϯςφΤϯδϯΛͭ͘Ζ͏Ϧιʔεͷ੍ޚ ZBQ$ͷDHSPVQ࣮૷ w cgcreatecgsetcgdeleteͰDHSPVQΛૢ࡞ IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- w ࢠϓϩηεͱͯ͠ىಈ͢ΔίϯςφͷϦιʔεΛ੍ޚ͢ΔͨΊʹɺ unshareͰ͸ࣗ਎ͷεΫϦϓτʢ$0ʣΛݺͼग़ͯ͠1*%νΣοΫ ʢ1*%/BNFTQBDF

    GPSLͳͷͰɺίϯςφͰͷ1*%͸ͱͳΔʣ IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- w trapͰϓϩηεऴྃ࣌ʹ࡞੒ͨ͠άϧʔϓΛ࡟আ IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD- IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD
  11. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ $BQBCJMJUZͱ͸ʢ̎ʣ vagrant@vagrant:~$ ls -l /bin/ping -rwsr-xr-x 1 root root

    44168 May 7 2014 /bin/ping vagrant@vagrant:~$ cp /bin/ping . vagrant@vagrant:~$ ls -l ./ping -rwxr-xr-x 1 vagrant vagrant 44168 Jun 29 23:51 ./ping vagrant@vagrant:~$ ./ping 127.0.0.1 ping: icmp open socket: Operation not permitted vagrant@vagrant:~$ sudo setcap CAP_NET_RAW+ep ./ping vagrant@vagrant:~$ getcap ./ping ./ping = cap_net_raw+ep vagrant@vagrant:~$ ./ping -c 3 127.0.0.1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.014 ms ... ྫʣTFUVTFS*%SPPUͰ͸ͳ͍QJOHϓϩάϥϜʹ $"1@/&5@3"8έʔύϏϦςΟΛ༩͑Δ
  12. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ $BQBCJMJUZͷछྨʢ̎ʣ %PDLFSͰσϑΥϧτͰ༗ޮʹͳΔ$BQBCJMJUZʢछྨʣ έʔύϏϦςΟ ֓ཁ $"1@"6%*5@83*5& Χʔωϧ؂ࠪͷϩάʹϨίʔυΛॻ͖ࠐΉ $"1@$)08/ ϑΝΠϧͷ6*%ͱ(*%Λ೚ҙʹมߋ͢Δ $"1@%"$@07&33*%&

    ϑΝΠϧͷSFBEXSJUFFYFDͷݖݶνΣοΫΛόΠύε͢Δ $"1@'08/&3 ϑΝΠϧͷ6*%ͱ(*%Λมߋ͢Δ $"1@'4&5*% ϑΝΠϧ͕มߋ͞Εͨͱ͖ʹTVJEͱTHJEϏοτΛΫϦΞ͠ͳ͍ $"1@,*-- γάφϧΛૹ৴͢ΔࡍʹݖݶνΣοΫ͕όΠύε͢Δ $"1@.,/0% NLOPE  ͰεϖγϟϧɾϑΝΠϧΛ࡞੒͢Δ $"1@/&5@#*/%@4&37*$& ΢Σϧϊ΢ϯϙʔτΛόΠϯυ͢Δ $"1@/&5@3"8 3"8ιέοτͱ1"$,&5ιέοτͷ࢖༻͢Δ $"1@4&5'$"1 ϑΝΠϧέʔύϏϦςΟΛઃఆ͢Δ $"1@4&5(*% ϓϩηεͷ(*%ͱ௥Ճͷ(*%ϦετΛૢ࡞͢Δ $"1@4&51$"1 ϓϩηεͷέʔύϏϦςΟΛૢ࡞͢Δ $"1@4&56*% ϓϩηεͷ6*%Λૢ࡞͢Δ $"1@4:4@$)3005 DISPPU  Λݺͼग़͢
  13. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ $BQBCJMJUZ4FU ϓϩηεɺϑΝΠϧ͝ͱʹέʔύϏϦςΟηοτΛ࣋ͭ έʔύϏϦςΟ ηοτ ϓϩηε ϑΝΠϧ આ໌ ڐՄʢ1SNʣ ✔

    ✔ &⒎ͱ*OIͰ࣋ͭ͜ͱ͕ڐՄ͞ΕΔέʔύϏϦςΟͷ ू߹ɻҰ౓0''ʹͨ͠΋ͷ͸ࣗྗͰ࠶ηοτෆՄ ܧঝʢ*OIʣ ✔ ✔ FYFDWF  ͨ͠ࡍʹܧঝ͢ΔέʔύϏϦςΟͷू߹ ࣮ޮʢ&⒎ʣ ✔ ✔ ࣮ࡍʹ൑ఆ͞ΕΔέʔύϏϦςΟͷू߹ ʢϑΝΠϧͰ͸Ϗοτʣ ό΢ϯσΟϯά ʢ#OEʣ ✔ ֫ಘͰ͖ΔέʔύϏϦςΟΛ੍ݶ͢ΔͨΊͷू߹ ؀ڥʢ"NCʣ ✔ ಛݖͷͳ͍ϓϩάϥϜΛFYFDWF  ͨ͠ࡍʹอ࣋͞ ΕΔέʔύϏϦςΟͷू߹
  14. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ $BQBCJMJUZ4FUͷ֬ೝ $ cat /proc/self/status | grep ^Cap CapInh: 0000000000000000

    CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 $ sudo cat /proc/self/status | grep ^Cap CapInh: 0000000000000000 CapPrm: 0000003fffffffff CapEff: 0000003fffffffff CapBnd: 0000003fffffffff CapAmb: 0000000000000000 $ getcap ./ping ./ping = cap_net_raw+ep ϓϩηεͷέʔύϏϦςΟηοτͷ֬ೝ ϑΝΠϧͷέʔύϏϦςΟηοτͷ֬ೝ
  15. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ $BQBCJMJUZͷٻΊํ P'(Amb) = (file is privileged) ? 0 :

    P(Amb) P'(Prm) = (P(Inh) & F(Inh)) | (F(Prm) & cap_bset) | P'(Amb) P'(Eff) = F(Eff) ? P'(Prm) : P'(Amb) P'(Inh) = P(Inh) 1FYFDWF  લͷϓϩηεͷέʔύϏϦςΟηοτ 1FYFDWF  ޙͷϓϩηεͷέʔύϏϦςΟηοτ 'ϑΝΠϧͷέʔύϏϦςΟηοτ DBQ@CTFUϓϩηεͷό΢ϯσΟϯάηοτ
  16. ίϯςφΤϯδϯΛͭ͘Ζ͏ݖݶͷ੍ݶ ZBQ$ͷ$BQBCJMJUZσϞ ✔ ping͕Ͱ͖ͳ͍͜ͱΛ֬ೝ $ sudo /vagrant/yapc.3 ping 127.0.0.1 $

    sudo YAPC_CAPS="cap_net_raw" \ /vagrant/yapc.3 ping 127.0.0.1 ✔ ping͕Ͱ͖Δ͜ͱΛ֬ೝ
  17. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ QJWPU@SPPUͷ৚݅ w σΟϨΫτϦͰͳ͚Ε͹ͳΒͳ͍ w new_rootͱput_old͸ݱࡏͷSPPUͱಉ͡ϑΝΠϧγεςϜʹ͋ͬ ͯ͸ͳΒͳ͍ w put_old͸new_rootҎԼʹͳ͚Ε͹ͳΒͳ͍ w

    ଞͷϑΝΠϧγεςϜ͕put_oldʹϚ΢ϯτ͞Ε͍ͯͯ͸ͳΒͳ͍ QJWPU@SPPͰࢦఆ͢Δ৽͍͠ϑΝΠϧγεςϜͱʢnew_rootʣͱݩ ͷϑΝΠϧγεςϜͷҠಈઌʢput_oldʣ͸ҎԼͷ੍ݶ͕͋Δ
  18. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ PWFSMBZGTͷMPXFS VQQFS XPSL MPXFS w Լ૚ͷσΟϨΫτϦɻಡΈऔΓઐ༻ w ϑΝΠϧγεςϜͷϕʔεͱͳΔσΟϨΫτϦ VQQFS

    w ্૚ͷσΟϨΫτϦɻॻ͖ࠐΈՄೳ w ৽ن࡞੒ɺߋ৽͞ΕͨϑΝΠϧ͸͜͜ʹॻ͖ग़͞ΕΔ XPSL w ࡞ۀ༻σΟϨΫτϦ w VQQFSͱಉ͡ϑΝΠϧγεςϜʹଘࡏ͢Δඞཁ͕͋Δ
  19. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ PWFSMBZGTͷૢ࡞ w NPVOU  NPVOU  $ CLONE_DIR=$(mktemp -d)

    $ for d in upper work root; do mkdir $CLONE_DIR/$d; done $ sudo mount \ -t overlay \ -o lowerdir=/,upperdir=$CLONE_DIR/upper,workdir=$CLONE_DIR/work \ overlayfs \ $CLONE_DIR/root
  20. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ ZBQ$ͷQJWPU@SPPU PWFSMBZGT࣮૷ w PWFSMBZGT NPVOU   IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- w

    QJWPU@SPPU   IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- w CJOE@NPVOU IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD-- ‣ ϑΝΠϧ΍σΟϨΫτϦͳͲΛϑΝΠϧγεςϜͷผͷ৔ॴͰݟ͑ ΔΑ͏ʹ͢Δ ‣ TZNMJOLͱҧ͍chrootpivot_rootʹΑΔ੍໿͕ͳ͍ ‣ IBSEMJOLͱҧ͍σΟϨΫτϦ΋0, IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQD
  21. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ ZBQ$ͷQJWPU@SPPU PWFSMBZGT σϞʢ̍ʣ $ sudo /vagrant/yapc.4 /bin/bash CONT# touch

    /yapc; ls -l /yapc ✔ ϗετͰ/yapc͕ଘࡏ͠ͳ͍͜ͱΛ֬ೝ ✔ ϗετͰ/tmp/yapc-<PID>.XXXXXX/upper/yapc ͕ଘࡏ͢Δ͜ͱΛ֬ೝ
  22. ίϯςφΤϯδϯΛͭ͘Ζ͏SPPUϑΝΠϧγεςϜͷมߋ ZBQ$ͷQJWPU@SPPU PWFSMBZGT σϞʢ̎ʣ $ sudo YAPC_ROOT=centos /vagrant/yapc.4 /bin/bash CONT#

    yum install -y epel-release; yum install -y sl; sl ✔ ࣄલʹdocker exportͰDFOUPTͷSPPUϑΝΠϧγεςϜ ΞʔΧΠϒΛ࡞੡ɻϗετͷ~/centos΁ల։͢Δ docker export $(docker create centos) > centos.tar ✔ yumίϚϯυ͕࢖͑Δ͜ͱΛ֬ೝ
  23. ίϯςφΤϯδϯΛͭ͘Ζ͏"QQFOEJYωοτϫʔΫͷ෼཭ /FUXPSL/BNFTQBDFͷૢ࡞ w JQOFUOT   • ip netns [list]

    ωοτϫʔΫωʔϜεϖʔεΛҰཡදࣔ • ip netns add ωοτϫʔΫωʔϜεϖʔεͷ࡞੡ • ip netns del ωοτϫʔΫωʔϜεϖʔεͷ࡟আ • ip netns exec ωοτϫʔΫωʔϜεϖʔεΛࢦఆͯ͠ίϚϯυ࣮ޮ w BOENPSF
  24. ίϯςφΤϯδϯΛͭ͘Ζ͏"QQFOEJYωοτϫʔΫͷ෼཭ ZBQ$ͷ /FUXPSL/BNFTQBDFͱWFUI࣮૷ w JQOFUOT  JQMJOL   IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQDB--

    ‣ ip netns execͰ/sys/͕ΞϯϚ΢ϯτ͞Εͯ͠·͏  DHSPVQ͕ઃఆͰ͖ͳ͍ ‣ pivot_rootͰϗετͷ/run(/var/run)͕ݟ͑ͳ͘ͳΔ  /FUXPSL/BNFTQBDF͕ར༻Ͱ͖ͳ͍ ‣ ্ه̎఺͔Β࣮૷͕͍ۤ͠ײ͡ʹʢྑ͍Ҋ͕͋Γ·ͨ͠Βڭ ͑ͯԼ͍͞ʣ IUUQTHJUIVCDPNIBZBKPZBQ$CMPCZBQDPKJNJEZBQDB
  25. ίϯςφΤϯδϯΛͭ͘Ζ͏"QQFOEJYωοτϫʔΫͷ෼཭ ZBQ$ͷ /FUXPSL/BNFTQBDFͱWFUIσϞʢ̎ʣ $ sudo ip link add name yapc0

    type bridge $ sudo ip link set dev yapc0 up $ sudo ip a add 10.0.0.1/24 \ broadcast 10.0.0.255 \ label yapc0 \ dev yapc0  ϒϦοδΛ࡞੒͢Δ
  26. ίϯςφΤϯδϯΛͭ͘Ζ͏"QQFOEJYωοτϫʔΫͷ෼཭ ZBQ$ͷ /FUXPSL/BNFTQBDFͱWFUIσϞʢ̏ʣ $ ip link # σόΠε໊Λ֬ೝ $ sudo

    ip link set dev vethXXXXXXX up $ sudo ip link set dev vethXXXXXXX master yapc0 CONT# ip link # eth0ͷଘࡏΛ֬ೝ CONT# ip link set dev eth0 up CONT# ip a add 10.0.0.10/24 dev eth0  ϗετଆͷWFUIΛϒϦοδʹొ࿥͢Δ  ίϯςφଆͷvethʹ*1ΞυϨεΛׂΓ౰Δ
  27. w -JOVYίϯςφ͸༷ʑͳػೳͷ૊Έ߹ΘͤͰͰ͖͍ͯΔ w Ϧιʔεͷ෼཭ ‣ /BNFTQBDF w Ϧιʔεͷ੍ݶ ‣ DHSPVQ

    w ݖݶͷ੍ݶ ‣ $BQBCJMJUZ w SPPUϑΝΠϧγεςϜͷมߋ ‣ DISPPUQJWPU@SPPU PWFSMBZGT ·ͱΊ
  28. w %PDLFSͰϗετΛ৐ͬऔΒΕͨ IUUQRJJUBDPNUJUJMBUJUFNTGGBDFF w ඇಛݖίϯςφ w ."$ w 4FDDPNQ w

    13@4&5@/0@/&8@13*74 ·ͱΊ ػձ͕͋Ε͹ίϯςφͷηΩϡϦςΟ·ΘΓͷ ػೳ΍ରࡦʹ͍ͭͯ࿩ͤΕ͹