Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Container Migration and CRIU Details

Container Migration and CRIU Details

Join Adrian Reber, Red Hat, to get a detailed technical background on how Checkpoint/Restore In Userspace (CRIU) works and how CRIU enables container migration.

Red Hat Livestreaming

June 30, 2020
Tweet

More Decks by Red Hat Livestreaming

Other Decks in Technology

Transcript

  1. Container Live Migration
    Adrian Reber
    2020, June 30

    View Slide

  2. Red Hat Blog:
    Container migration with Podman on RHEL
    https://www.redhat.com/en/blog/container-migration-podman-rhel
    2

    View Slide

  3. Agenda
    Use cases
    Details
    Demos
    Future
    3

    View Slide

  4. Definition:
    Container Live Migration
    4

    View Slide

  5. Transfer Running Container
    5

    View Slide

  6. Serialize on Source System
    6

    View Slide

  7. Transfer to Destination System
    7

    View Slide

  8. Checkpoint/Restore in Userspace
    CRIU
    8

    View Slide

  9. Multiple Integrations Exist
    9

    View Slide

  10. Use Cases
    10

    View Slide

  11. Reboot and Save State
    11

    View Slide

  12. Host
    Container
    12

    View Slide

  13. Host
    Container
    13

    View Slide

  14. Host
    Container
    14

    View Slide

  15. Host
    Container
    15

    View Slide

  16. Quick Startup
    16

    View Slide

  17. Host
    Container
    17

    View Slide

  18. Host
    Container Container
    18

    View Slide

  19. Host
    Container Container
    Container
    Container
    19

    View Slide

  20. Container Live Migration
    20

    View Slide

  21. Source
    Container
    Destination
    21

    View Slide

  22. Source
    Container Container
    Destination
    22

    View Slide

  23. Source
    Container Container
    Destination
    Container
    Container
    23

    View Slide

  24. CRIU
    24

    View Slide

  25. First Step: Checkpointing
    25

    View Slide

  26. Seize Process Using
    ptrace()
    26

    View Slide

  27. Collect Details From
    /proc//*
    27

    View Slide

  28. Parasite Code
    28

    View Slide

  29. Parasite Code
    Most favorite part
    29

    View Slide

  30. Parasite Code
    And the craziest
    30

    View Slide

  31. Parasite Code
    Injected into the process
    31

    View Slide

  32. Parasite Code
    Daemon waiting for commands
    32

    View Slide

  33. Parasite Code
    Removed after usage
    33

    View Slide

  34. Checkpointed
    To Be
    Code
    Process
    Original
    34

    View Slide

  35. Checkpointed
    To Be
    Code
    Process
    Original
    35

    View Slide

  36. Checkpointed
    To Be
    Parasite Code
    Process
    Original
    36

    View Slide

  37. Checkpointed
    To Be
    Code
    Process
    Original
    37

    View Slide

  38. Checkpointing Finished
    38

    View Slide

  39. Checkpointing Finished
    All relevant information written
    39

    View Slide

  40. Checkpointing Finished
    Target process is killed
    40

    View Slide

  41. Checkpointing Finished
    Or continues to run
    41

    View Slide

  42. Container Live Migration
    SELinux
    Linux Security Summit EU 2019
    https://sched.co/Tymj
    42

    View Slide

  43. 2015: CRIU LSM support
    43

    View Slide

  44. During checkpointing
    Read /proc/PID/attr/current
    44

    View Slide

  45. During restore
    Write /proc/PID/attr/current
    45

    View Slide

  46. 1 if (!strstartswith(last, "unconfined_")) {
    2 pr_err("Non unconfined selinux contexts not supported %s\n", last);
    3 freecon(ctx);
    4 return -1;
    5 }
    46

    View Slide

  47. • setsockcreatecon(3) for parasite daemon
    • Write to /proc/PID/attr/current
    • Allow dyntransition
    • Do not set context of threads
    • Allow writing to /proc/sys/kernel/ns_last_pid
    • Fix socket labels
    • Pre-create CRIU log files with appropriate labels
    • Fix file descriptor leaks
    47

    View Slide

  48. Second/Last Step: Restoring
    48

    View Slide

  49. Read Checkpoint Images
    49

    View Slide

  50. clone() For Each PID/TID
    LPC: CRIU and the PID dance
    clone3() with Linux 5.5
    https://linuxplumbersconf.org/event/4/contributions/472/
    50

    View Slide

  51. PID dance
    open() /proc/sys/kernel/ns_last_pid
    write() (PID - 1) to ns_last_pid
    close() ns_last_pid
    clone()
    getpid()
    51

    View Slide

  52. Avoiding the PID dance (2010):
    eclone()
    https://lore.kernel.org/patchwork/patch/198220/
    52

    View Slide

  53. Avoiding the PID dance (2019):
    clone3()
    53

    View Slide

  54. ”In general, clone3() is extensible
    and allows for the implementation
    of new features.”
    https://git.kernel.org/pub/scm/linux/kernel/git/
    torvalds/linux.git/commit/?id=7f192e3cd316ba58c
    54

    View Slide

  55. clone3() with set_tid
    55

    View Slide

  56. 1 struct clone_args args = {0};
    2 pid_t *set_tid;
    3 set_tid[0] = 2020;
    4 args.set_tid = set_tid;
    5 args.set_tid_size = 1;
    6 syscall(__NR_clone3 , args, sizeof(struct clone_args));
    56

    View Slide

  57. CRIU Morphs Itself
    Open and position file descriptors
    57

    View Slide

  58. CRIU Morphs Itself
    Map memory pages
    58

    View Slide

  59. CRIU Morphs Itself
    Load security settings
    59

    View Slide

  60. CRIU Morphs Itself
    Jump into restored process
    60

    View Slide

  61. Container Live Migration
    61

    View Slide

  62. Container Live Migration
    OpenVZ
    62

    View Slide

  63. Container Live Migration
    Borg
    63

    View Slide

  64. Container Live Migration
    LXC/LXD
    64

    View Slide

  65. Container Live Migration
    Docker
    65

    View Slide

  66. Container Live Migration
    Podman
    66

    View Slide

  67. Podman: daemonless
    67

    View Slide

  68. Podman: rootless
    68

    View Slide

  69. Podman: Checkpoint/Restore
    October 2018
    69

    View Slide

  70. Podman: Checkpoint/Restore
    Required runc and CRIU changes
    70

    View Slide

  71. Podman: Container Live Migration
    June 2019
    71

    View Slide

  72. Podman: Container Live Migration
    Required runc, CRIU, SELinux
    changes
    72

    View Slide

  73. Checkpoint includes File System
    Changes
    73

    View Slide

  74. 1 # podman run --rm -d adrianreber/wildfly -hello
    2 699f33eb7fecbc5bbb00400be0aa79c888dbc63a54cac7bd2eed836a57d8a68a
    3 # podman inspect -l --format "{{.NetworkSettings.IPAddress}}"
    4 10.88.0.247
    5 # curl 10.88.0.247:8080/helloworld/
    6 0
    7 # curl 10.88.0.247:8080/helloworld/
    8 1
    9 # podman container checkpoint -l --export=/tmp/chkpt.tar.gz
    10 699f33eb7fecbc5bbb00400be0aa79c888dbc63a54cac7bd2eed836a57d8a68a
    11 # scp /tmp/chkpt.tar.gz rhel08:/tmp
    74

    View Slide

  75. 1 # podman container restore --import=/tmp/chkpt.tar.gz
    2 699f33eb7fecbc5bbb00400be0aa79c888dbc63a54cac7bd2eed836a57d8a68a
    3 # podman inspect -l --format "{{.NetworkSettings.IPAddress}}"
    4 10.88.0.247
    5 # curl 10.88.0.247:8080/helloworld/
    6 2
    7 # curl 10.88.0.247:8080/helloworld/
    8 3
    75

    View Slide

  76. 1 # podman container restore --import=/tmp/chkpt.tar.gz -n hello1
    2 d02feeec894d77f66cc82484fe77ae369396a85f6d05594dc156c21e685942dd
    3 # podman container restore --import=/tmp/chkpt.tar.gz -n hello2
    4 735efb4fee6961d3eee069beb28dde5cbc6fc46c1a32a43ecc993d04c02015b2
    5 # podman inspect --format "{{.NetworkSettings.IPAddress}}" hello1
    6 10.88.0.248
    7 # podman inspect --format "{{.NetworkSettings.IPAddress}}" hello2
    8 10.88.0.249
    9 # curl 10.88.0.248:8080/helloworld/
    10 2
    11 # curl 10.88.0.249:8080/helloworld/
    12 2
    76

    View Slide

  77. Future:
    kubectl migrate
    77

    View Slide

  78. Future:
    Non-root checkpoint/restore
    78

    View Slide

  79. Summary
    • CRIU can checkpoint and restore containers
    • Integrated in different containers engines
    • Used in production
    • Reboot into new kernel without losing container state
    • Start multiple copies
    • Migrate running containers
    79

    View Slide

  80. https://lisas.de/~adrian/container-live-migration-article.pdf
    https://asciinema.org/a/249922
    https://asciinema.org/a/249918
    https://lisas.de/~adrian/posts/2019-Apr-10-criu-and-selinux.html
    https://criu.org/Podman
    https://twitter.com/adrian__reber
    https://www.redhat.com/en/blog/container-migration-podman-rhel
    https://cfp.all-systems-go.io/ASG2019/talk/E88Z7V/
    https://sched.co/Tymj
    https://linuxplumbersconf.org/event/4/contributions/472/
    80

    View Slide

  81. Thank you

    View Slide