「第10回 コンテナ型仮想化の情報交換会@東京」の発表資料です。 参考となる情報にはPDF中からリンクをしていますが、資料中のリンクは Speaker Deck 上ではクリックできないので PDF をダウンロードしてご覧ください。
cgroup v2ୈ 10 ճίϯςφܕԾԽͷใަձˏ౦ژՃ౻ହจ2016-10-291
View Slide
ࣗݾհՃ౻ହจ• http://www.ten-forward.ws/• @ten forward• http://gplus.to/tenforward• https://github.com/tenforward• http://d.hatena.ne.jp/defiant/ (ٕज़ϒϩά)2
ࣗݾհ• Plamo Linux ϝϯςφ• LXC ͰֶͿίϯςφೖɹʔܰྔԾԽڥΛ࣮ݱ͢Δٕज़gihyo.jp Ͱ࿈ࡌ3
ࣗݾհ• LXC/LXD ͷ։ൃʹগ͠ࢀՃ• man page ͷຊޠ༁• ެࣜϖʔδ (linuxcontainers.org) ༁• όάϑΟοΫεͳͲগ͚ͩ͠ίʔυʹߩݙ• LXD ຊޠϝοηʔδ4
ࠓͷඪ5
ඪ• cgroup v2 ͷجຊͷجຊΛհ͢Δ6
cgroup ͓͞Β͍7
cgroup ͱϓϩηεΛάϧʔϓԽ͠ɺάϧʔϓʹରͯ͠Ϧιʔε੍ݶΛߦ͏ɻίϯςφઐ༻ͷΈͰͳ͍ɻ• ػೳ͝ͱʹαϒγεςϜ (ίϯτϩʔϥ) ʹ͔ΕΔ• cgroupfs ΛϚϯτͯ͠σΟϨΫτϦͰάϧʔϓΛද͢• σΟϨΫτϦͷϑΝΠϧΛಡΈॻ͖͢Δ͜ͱͰૢ࡞Λߦ͏• ݱ࣌Ͱ͘ར༻͞Ε͍ͯΔ cgroup v1 ͱɺ4.5 ΧʔωϧͰstable ʹͳͬͨ cgroup v2 ͕͋Δ8
༻ޠcgroup v2 จॻͰ໌֬ʹఆٛ͞Ε·ͨ͠ɻ• cgroup খจࣈ• cgroup or cgroups?• ୯ܗ• ػೳΛද͢ࡍ• म০ࢠͱͯ͠ (“cgroup controllers” ͷΑ͏ʹ)• ෳܗ• ໌֬ʹෳͷ cgroup Λࣔ͢ͱ͖ʹ͏9
cgroup ͷ֊ߏ10
cgroup v1 ͷಛ• ෳͷ֊Λ࣋ͯΔɻcgroupfs ΛෳϚϯτͰ͖Δ• άϧʔϓͷొεϨου୯Ґ• cgroupfs πϦʔͷͲͷϨϕϧͷϊʔυ (σΟϨΫτϦ) ʹλεΫ͕ొͰ͖Δ11
cgroup v1 12
ෳ֊ෳ֊• ҙͷͷ֊Λ࡞Ͱ͖ɺͦΕͧΕʹ͍ͭ͘ͷίϯτϩʔϥ͕ॴଐͰ͖Δ• ॊೈͰศརͷͣʜ13
ίϯτϩʔϥͷ੍ݶͱ͜Ζ͕• ίϯτϩʔϥͻͱͭͷ֊ʹ͔͠ॴଐͰ͖ͳ͍• ෳͷ֊Ͱ͑Δͱศརͳίϯτϩʔϥ (ྫ͑ freezer)͕ɺಛఆͷ֊Ͱ͔͑͠ͳ͍• Ұ͋Δ֊ʹଐͨ͠ίϯτϩʔϥҠಈͰ͖ͳ͍• ಉ͡֊ʹଐ͍ͯ͠Δίϯτϩʔϥಉ͡ߏͰͳ͚ΕͳΒͳ͍14
݁ہॊೈੑͳΜͯͦΜͳʹͳ͔ͬͨ• ίϯτϩʔϥ͝ͱʹ֊Λ࡞͢Δͷ͕Ұൠతʹͳͬͨ• ີʹؔ͠ɺಉ͡Α͏ͳάϧʔϓͰѻ͏ҙຯͷ͋Δίϯτϩʔϥ͚ͩಉ͡֊ʹଐͤ͞Δ͜ͱʹ15
ίϯτϩʔϥؒͷؔ• ίϯτϩʔϥؒͷ࿈ܞ͕ͳ͍• ෳίϯτϩʔϥͰ࿈ܞͯ͠ಈ࡞ͤ͞ΒΕͳ͍• ίϯτϩʔϥʹΑͬͯಈ͖όϥόϥ• cpuset ͷ cgroup.clone children ϑΝΠϧ• memory ͷ memory.use hierarchy ϑΝΠϧ• ͳͲͳͲʜ16
λεΫͷѻ͍• Ͳͷ֊ͷϊʔυʹλεΫ͕ॴଐͰ͖Δ• ࢠͷ cgroup ͲͪΒʹλεΫ͕ଐ͍ͯ͠Δ߹ͷϦιʔεׂΓৼΓͱ͔ΧΦε• λεΫͷ୯Ґ͕εϨου୯Ґ• ίϯτϩʔϥʹΑͬͯҙຯ͕ͳ͍17
cgroup v2 ֓ཁ18
ྺ࢙• 3.16 Ͱ “unified control group hierarchy” ͱͯ͠ಋೖ• DEVEL sane behavior ΦϓγϣϯͰϚϯτͯͨ͠ (·ͱͳৼΔ͍Φϓγϣϯ!!)• (ࢀߟ)• The unified control group hierarchy in 3.16 (lwn.net)• Linux Χʔωϧͷͯ͢: cgroup ͷ࠶ઃܭ (linux.com)• 4.5 Ͱ stable ʹ19
ಛ• ୯Ұ֊ߏ• ཧϓϩηε୯Ґ• v1 ͱڞଘͰ͖Δ• Ұ෦ͷίϯτϩʔϥ v2 Ͱɺଞͷίϯτϩʔϥ v1 Ͱར༻ͱ͔Ͱ͖Δ• ૢ࡞ v1 ͱಉ͡ (σΟϨΫτϦཧɺϑΝΠϧͷಡΈॻ͖Ͱૢ࡞)20
࣮• 4.8 ࣌Ͱ memoryɺioɺpids ͷΈ• υΩϡϝϯτʹ cpu ͷهࡌ͋Γ·͕͢Ϛʔδ͞Ε͍ͯ·ͤΜ21
cgroup v2 ૢ࡞22
cgroup v2 ͷར༻• Ϛϯτ͢Δ # mount -t cgroup2 cgroup2 /sys/fs/cgroup 23
cgroup ͷ࡞ɾআ• Ϛϯτޙ root cgroup ͷΈଘࡏ• σΟϨΫτϦͷ࡞ʹΑΓάϧʔϓΛ࡞͢Δ # mkdir cgroup_name • σΟϨΫτϦͷআʹΑΓάϧʔϓΛআ͢Δ # rmdir cgroup_name • ࢠ cgroup ͳ͘ɺϓϩηεͳ͍ cgroup ͷΈআͰ͖Δ24
ϓϩηεΛ cgroup ʹॴଐͤ͞Δ• PID Λ cgroup.procs ʹॻ͖ࠐΉ # echo $PID > /path/to/cgroup/cgroup.procs • ϓϩηε͕Ͳͷ cgroup ʹଐ͢Δ͔/proc/$PID/cgroup ʹϦετ͞ΕΔ # cat /proc/$$/cgroup0::/cgroup_name 25
cgroup ͷঢ়ଶࢹ• root Ҏ֎ͷ cgroup ʹ cgroup.events ϑΝΠϧ͕ଘࡏ # cat cgroup.eventspopulated 0 (ࣗ͘͠ࢠଙͷ cgroup ʹϓϩηε͕ଘࡏ͠ͳ͍ͱ͖ populated ͕ 0)# echo $$ > cgroup.procs (ϓϩηεΛՃ͢Δͱʜ)# cat cgroup.eventspopulated 1 (ϓϩηε͕ଘࡏ͢Δͱ͖ populated ͕ 1) • populated ͷ͕มԽ͢ΔͱϑΝΠϧ͕มԽͨ͠Πϕϯτൃੜ (poll,dnotify,inotify) $ inotifywait -m /sys/fs/cgroup/test01/cgroup.events(test01 ʹϓϩηεΛՃ͢Δ)/sys/fs/cgroup/test01/cgroup.events MODIFY 26
ίϯτϩʔϥͷ੍ޚ• ֤ cgroup Ͱ༻Ͱ͖Δίϯτϩʔϥcgroup.controllers ʹϦετ͞ΕΔ # cat cgroup.controllersio memory pids • ࢠ cgroup Ͱ༻͍ͨ͠ίϯτϩʔϥcgroup.subtree control Ͱ੍ޚ͢Δ (͍͍ͨίϯτϩʔϥʹ"+"Λɺফڈ͍ͨ͠ɾΘͳ͍ίϯτϩʔϥʹ"-"Λ͚ͭΔ)# echo "-pids +memory +io" > cgroup.subtree_control# cat cgroup.subtree_controlio memory 27
ࢠ cgroup Λ࣋ͭ߹ͷ੍• ͕ࣗϓϩηεΛ࣋ͨͳ͍ͱ͖͚ͩɺࢠڙʹϦιʔεΛͰ͖Δ• ϓϩηεΛ࣋ͨͳ͍ cgroup ͷΈɺcgroup.subtree controlϑΝΠϧͰίϯτϩʔϥΛ༗ޮʹͰ͖Δ• root ͜ͷ੍Λड͚ͳ͍ # cat cgroup.procs35413577# mkdir child# echo "+io" > cgroup.subtree_controlbash: echo: write error: Device or resource busy# echo $$ > /sys/fs/cgroup/cgroup.procs(ϓϩηεΛ root ͠ݱάϧʔϓ͔Βআ)# echo "+io" > cgroup.subtree_control# cat cgroup.subtree_controlio 28
ඇಛݖϢʔβͷݖݶҕৡ• cgroup ͷσΟϨΫτϦͱ cgroup.procs ϑΝΠϧͷॻ͖ࠐΈݖݶΛ༩͑ɺඇಛݖϢʔβʹݖݶҕৡ͢Δ• ͦͷάϧʔϓҎԼࣗ༝ʹϦιʔεͰ͖Δ29
Ϧιʔεͷํ๏• Weights• άϧʔϓؒͷൺ• Limits• ઃఆྔ·ͰϦιʔεΛ༻Ͱ͖Δ (ΦʔόʔίϛοτՄೳ)• Protections• (ઌ੍͕ݶΛ͍͑ͯͳ͍ݶΓ) ׂΓ͕ͯอূ͞ΕΔ(ΦʔόʔίϛοτՄೳ)• Allocation• ༗ݶϦιʔεͷׂΓͯ (ΦʔόʔίϛοτෆՄ)30
ίϯτϩʔϧϑΝΠϧϑΝΠϧ໊• ΣΠτͰͷϦιʔεΛߦ͏߹ɺϑΝΠϧ“weight”• ઈରͰͷϦιʔεอূɺ੍ݶͷ߹ɺϑΝΠϧͦΕͧΕ“min”ɺ“max”• ϕετΤϑΥʔτͷϦιʔεอূɺ੍ݶͷ߹ɺϑΝΠϧͦΕͧΕ “low”ɺ“high”31
Ϧιʔε੍ݶͷํ๏• v1 ͱ΄΅ಉ͡• ͦΕͧΕͷίϯτϩʔϥͷ͍ํΛࢀর͠·͠ΐ͏• ΣΠτͷ߹ɺσϑΥϧτ͕ 100ɺൣғ 0ʙ10000• σϑΥϧτ͕ઃఆͰ͖Δ߹ “default” ͱΩʔϫʔυΛ͍ొ echo "default 100" > control_fileecho "8:0 150" > control_file ("8:0"ͷ੍ݶΛ 150 ʹ)echo "8:0 default" > control_file ("8:0"ͷ੍ݶΛσϑΥϧτʹ͢) • ੍ݶͳ͠ͷ߹ “max” Λ͏32
ઃఆྫ # cat pids.max (ϓϩηεͷ੍ݶͷදࣔ)max (σϑΥϧτ)# echo "2" > pids.max (ϓϩηεͷ੍ݶΛ 2 ʹઃఆ)# cat pids.max2# echo $$ > cgroup.procs# cat pids.current (ݱࡏͷϓϩηεͷදࣔ)2# ( echo "test" | cat )bash: fork: retry: No child processesbash: fork: retry: No child processesbash: fork: retry: No child processesbash: fork: retry: No child processesbash: fork: Resource temporarily unavailableTerminated# echo "max" > pids.max (ϓϩηεͷ੍ݶΛ֎͢)# cat pids.maxmax# ( echo "test" | cat )test 33
cgroup v2 ͷεςʔλε• stable ͚ͩͲ• CPU ίϯτϩʔϥ͕ରʹ͋ͬͯϚʔδ͞Ε͍ͯͳ͍• The case of the stalled CPU controller (lwn.net)• [Documentation] State of CPU controller in cgroup v2(lwn.net) (Tejun Heo ࢯʹΑΔ·ͱΊ)34