Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cgroupsとプロセス生成・終了処理

 cgroupsとプロセス生成・終了処理

第12回 コンテナ技術の情報交換会@オンラインで発表した資料、プロセスの生成・終了時のcgroupの処理の説明です。

https://ct-study.connpass.com/event/188440/

Masami Ichikawa

October 17, 2020
Tweet

More Decks by Masami Ichikawa

Other Decks in Technology

Transcript

  1. task_structߏ଄ମͱcgorups • task_structߏ଄ମͰ؅ཧ͢Δcgroupsͷओཁσʔλ͸2ͭ #ifdef CONFIG_CGROUPS /* Control Group info protected

    by css_set_lock: */ struct css_set __rcu *cgroups; /* cg_list protected by css_set_lock and tsk->alloc_lock: */ struct list_head cg_list; #endif https://elixir.bootlin.com/linux/v4.19/source/include/linux/sched.h#L982
  2. cgroup_subsysߏ଄ମͷొ࿥ • cgroup_subsysߏ଄ମͷొ࿥͸SUBSYSϚΫϩͰߦ͏ ◦ ifdefͱundefΛ܁Γฦͯ͠ෳ਺ͷ࣮૷͕͞Ε͍ͯΔͷͰϚΫϩల։ޙͷcgroup.iΛݟΔͷ͕ྑ͍ͱࢥ͏͚Ͳ·͋໘౗ • ొ࿥͸include/linux/cgroup_subsys.hͰશαϒγεςϜʹର࣮ͯ͠ࢪ • αϒγεςϜͰ͸௨ৗ௨Γʹม਺એݴͱॳظԽ Defined

    in 8 files: include/linux/cgroup-defs.h, line 40 (as a macro) include/linux/cgroup.h, line 70 (as a macro) include/linux/cgroup.h, line 74 (as a macro) kernel/cgroup/cgroup.c, line 117 (as a macro) kernel/cgroup/cgroup.c, line 124 (as a macro) kernel/cgroup/cgroup.c, line 131 (as a macro) kernel/cgroup/cgroup.c, line 139 (as a macro) kernel/cgroup/cgroup.c, line 145 (as a macro) #if IS_ENABLED(CONFIG_CGROUP_PIDS) SUBSYS(pids) #endif struct cgroup_subsys pids_cgrp_subsys = {
  3. ϓϩηεੜ੒ॲཧ • cgroupsؔ࿈ͷॲཧ͸͍͔ͭ͋͘Δ͕ɺ࣮࣭తͳॲཧ͸cgroup_post_fork()Ͱ ࣮ࢪ _do_fork() -> copy_process() -> cgroup_fork() ->

    ม਺ॳظԽ -> cgroup_threadgroup_change_begin() -> ηϚϑΥऔಘ -> cgroup_can_fork() -> νΣοΫ -> cgroup_post_fork() -> ϝΠϯॲཧ -> cgroup_threadgroup_change_end() -> ηϚϑΥղ์
  4. cgroup_fork() • Ҿ਺ͷchild͕࡞੒தϓϩηεͷtask_structߏ଄ମ • cgroupsม਺͕init_css_setΛࢦ͢Α͏ʹ͢Δ ◦ σϑΥϧτͷcss_setߏ଄ମͷσʔλ ◦ kernel/cgroup/cgroup.cͰએݴɾॳظԽ͞Ε͍ͯΔ •

    ࿈݁Ϧετͷcg_listΛॳظԽ void cgroup_fork(struct task_struct *child) { RCU_INIT_POINTER(child->cgroups, &init_css_set); INIT_LIST_HEAD(&child->cg_list); }
  5. cgroup_can_fork() • ϓϩηεΛ࡞੒ͯ͠΋ྑ͍͔֤αϒγεςϜʹ໰͍߹ΘͤΔ ◦ ϓϩηε࡞੒࣌ʹαϒγεςϜͰνΣοΫΛߦ͍͍ͨ৔߹͸can_fork()Λ࣮૷͢Δ ◦ ϓϩηε࡞੒Ͱ͖ͳ͍৔߹͸cancel_fork()Λ࣮ߦ ▪ out_revertϥϕϧʹඈΜͰΩϟϯηϧॲཧ •

    cgroup_can_fork()͕0Ҏ֎Λฦͨ͠৔߹͸fork()΋ࣦഊͱͳΔ int cgroup_can_fork(struct task_struct *child) { ʙུʙ do_each_subsys_mask(ss, i, have_canfork_callback) { ret = ss->can_fork(child); if (ret) goto out_revert; } while_each_subsys_mask(); return 0; ʙུʙ }
  6. ϓϩηεΛ࡞੒ग़དྷͳ͍৔߹ͱ͸ʁ • v4.19Ͱcan_fork()ͷॲཧΛ࣮૷͍ͯ͠Δͷ͸pids cgroupͷΈ ◦ pids cgroup͸ىಈͰ͖Δϓϩηε਺Λ੍ޚ • pids_can_fork()Ͱ͸pids_try_chage()ΛݺΜͰpids cgroupsͷ֊૚Λḷͬͯϓ

    ϩηε਺Λ1ͭ଍͍͖ͯ͠ɺ্ݶΛ௒͑ͳ͍͔ௐ΂Δ ◦ ্ݶΛ௒͑ͨ৔߹ɺpids_try_charge()͸଍ͨ͠෼ͷϓϩηε਺ΛݮΒͯ͠ݩʹ໭͢ static int pids_can_fork(struct task_struct *task) { ʙུʙ css = task_css_check(current, pids_cgrp_id, true); pids = css_pids(css); err = pids_try_charge(pids, 1); if (err) { /* Only log the first time events_limit is incremented. */ ʙུʙ cgroup_file_notify(&pids->events_file); } return err; }
  7. cgroup_post_fork() • cgroupsͷϓϩηεੜ੒࣌ϝΠϯॲཧ ◦ ͱ͍ͬͯ΋ɺͦΜͳʹෳࡶͳ͜ͱ͸͍ͯ͠ͳͯ͘ॲཧͱͯ͠͸͜ͷఔ౓ void cgroup_post_fork(struct task_struct *child) {

    struct cgroup_subsys *ss; ʙུʙ if (use_task_css_set_links) { struct css_set *cset; spin_lock_irq(&css_set_lock); cset = task_css_set(current); if (list_empty(&child->cg_list)) { get_css_set(cset); cset->nr_tasks++; css_set_move_task(child, NULL, cset, false); } spin_unlock_irq(&css_set_lock); } ʙུʙ do_each_subsys_mask(ss, i, have_fork_callback) { ss->fork(child); } while_each_subsys_mask(); }
  8. cgroup_post_fork() (cont’d) • cgroup_post_fork()ͷॲཧ͸େ͖͘෼͚ͯ2ͭ ◦ ifϒϩοΫ಺Ͱ࿈݁ϦετΛॲཧ͢ΔՕॴ(ᶃ) ◦ cgroupsͷαϒγεςϜݻ༗ॲཧΛݺͼग़͢Օॴ(ᶄ) void cgroup_post_fork(struct

    task_struct *child) { ʙུʙ if (use_task_css_set_links) { ᶃ ʙུʙ } ʙུʙ do_each_subsys_mask(ss, i, have_fork_callback) { ss->fork(child); ᶄ } while_each_subsys_mask(); }
  9. cgroup_post_fork() (cont’d) • use_task_css_set_linksม਺ ◦ cgroup(cgroups v1) / cgroup2(cgroups v2)ϑΝΠϧγεςϜΛϚ΢ϯτͨ͠ͱ͖ʹtrueΛઃఆ

    ▪ σϑΥϧτ஋͸false ▪ cgroupsΛ࢖͏৔߹ɺcgroup/cgroup2ϑΝΠϧγεςϜͷϚ΢ϯτ͕ඞਢ ◦ ͜ΕΒͷϑΝΠϧγεςϜ͕Ϛ΢ϯτ͞Ε͍ͯͳ͍ͳΒcgroups͸ར༻͍ͯ͠ͳ͍ ▪ ͦΕͳΒϦετ؅ཧ͠ͳͯ͘Α͍ͱݴ͏͜ͱͰϓϩηεੜ੒ॲཧΛܰ͘͢Δ໨తͰར༻ • https://elixir.bootlin.com/linux/v4.19/source/kernel/cgroup/cgroup.c#L1798 if (use_task_css_set_links) { struct css_set *cset; ʙུʙ }
  10. cgroup_post_fork() (cont’d) • cg_list͕ۭͳΒifϒϩοΫΛ࣮ߦ ◦ ϓϩηεੜ੒࣌͸cgroup_fork()ͰϦετΛॳظԽ͍ͯ͠ΔͷͰϦετ͸ۭ ◦ ΧϨϯτϓϩηεͷcss_setߏ଄ମΛऔಘ ◦ get_css_set()Ͱcsetͷࢀর਺ΛΠϯΫϦϝϯτ

    ◦ nr_tasksΛΠϯΫϦϝϯτͯ͠cestͰ؅ཧ͍ͯ͠ΔλεΫ਺Λ૿΍͢ ◦ css_set_move_task()Ͱcsetʹ࡞੒͍ͯ͠ΔϓϩηεΛcsetͷ࿈݁Ϧετʹ௥Ճ cset = task_css_set(current); if (list_empty(&child->cg_list)) { get_css_set(cset); cset->nr_tasks++; css_set_move_task(child, NULL, cset, false); }
  11. css_set_move_task() • ϓϩηεͷ࡞੒ɾऴྃɾҠಈ࣌ʹར༻͢Δؔ਺ • taskΛfrom_cset͔Βto_csetʹҠ͢ ◦ from_cset͔ΒtaskΛऔΓআ͖ɺto_csetʹtaskΛ௥Ճ͢Δ ◦ from_cset͕NULLͳΒto_csetʹtaskΛొ࿥͢Δ͚ͩ ▪

    ϓϩηεੜ੒࣌ ◦ to_cset͕NULLͳΒfrom_cset͔ΒtaskΛ࡟আ͚ͩ ▪ ϓϩηεऴྃ࣌ ◦ use_mg_tasks͸ϓϩηεͷҠಈ࣌ʹtrueΛઃఆ ▪ ͜ͷ஋ʹΑΓto_csetͷͲͷϦετʹtaskΛొ࿥͢Δ͔͕มΘΔ • trueͳΒmg_tasksɺfalseͳΒtasks static void css_set_move_task(struct task_struct *task, struct css_set *from_cset, struct css_set *to_cset, bool use_mg_tasks)
  12. cgroup_exit() • ม਺ͷએݴͱίϝϯτΛফ͢ͱ͜ͷఔ౓ • ॲཧ͸େ͖͘෼͚ͯ2ͭ • Ҿ਺ͷtsk͸࡟আର৅ͷϓϩηε void cgroup_exit(struct task_struct

    *tsk) { ~ུ~ cset = task_css_set(tsk); if (!list_empty(&tsk->cg_list)) { spin_lock_irq(&css_set_lock); css_set_move_task(tsk, cset, NULL, false); cset->nr_tasks--; spin_unlock_irq(&css_set_lock); } else { get_css_set(cset); } do_each_subsys_mask(ss, i, have_exit_callback) { ss->exit(tsk); } while_each_subsys_mask(); }
  13. cgroup_exit() (cont’d) • ϓϩηεͷcg_list͕ۭͰͳ͍৔߹ ◦ css_set_move_task()Λ࢖͍css_setߏ଄ମ͔ΒϓϩηεΛ࡟আ ◦ css_setߏ଄ମ͔ΒλεΫ਺ΛݮΒ͢ • ϓϩηεͷcg_list͕ۭͷ৔߹

    ◦ css_setߏ଄ମͷࢀর਺Λ૿΍͢ ▪ κϯϏϓϩηε΁ͷରԠΒ͍͠ if (!list_empty(&tsk->cg_list)) { spin_lock_irq(&css_set_lock); css_set_move_task(tsk, cset, NULL, false); cset->nr_tasks--; spin_unlock_irq(&css_set_lock); } else { get_css_set(cset); }