Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ConFoo 2020: systemd for developers and devops

ConFoo 2020: systemd for developers and devops

https://confoo.ca/en/yul2020/session/systemd-for-developers-and-devops

systemd is the default init system on several Linux distros and can do more than just start services on boot. It maintains dependencies between services, can restart failing services or activate them on demand, handle temp files, and much more. Services can also directly interact with its API. I'll show you how to use systemd to run and monitor your complex application and add tighten security like private /tmp or resource restriction on top.

Christian Heimes

February 26, 2020
Tweet

More Decks by Christian Heimes

Other Decks in Programming

Transcript

  1. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Is systemd evil?

    Does using systemd make me a bad developer? Is Lennart Poettering the Antichrist?
  2. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Who am I?

    • from Hamburg/Germany • Python core developer / Python security team • Principal Software Engineer at Red Hat Identity Management and Platform Security
  3. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Why systemd?

    • How to • create and customize service configuration • manage services • improve security & server stability • run applications • best practices Takeaways https://speakerdeck.com/tiran/
  4. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • POSIX 101

    / old SyV init • what is systemd? • systemd services and units • resource management • security & sandboxing • tips & tricks • ~ 5 minutes Q&A Agenda
  5. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 <kerio> how expensive

    is a hitman? <kerio> we can pool some bitcoins <Pali> cut his hands, so he will not be able to write any new line of code <kerio> Pali: there's speech-to-text now <kerio> i believe that any solution has to be more... radical <DocScrutinizer05> i'm afraid redhat would pull a clone out of their fridge if anything ever happened to poettering Is Lennart Poettering the Antichrist?
  6. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 “The Tragedy of

    systemd” Benno Rice Linux.conf.au 2019 BSDCan 2018 Keynote
  7. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • permissions •

    capabilities (CAP_SYS_ADMIN) • seccomp filter Kernel & user space hardware kernel userspace syscall man: syscalls(2) man: capabilities(7)
  8. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • fork(), clone()

    • exec() • exit(), signal (SIGTERM) • parent must call waitpid() • zombies… • PID 1 becomes parent of orphans process man: execve(2)
  9. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • handle to

    a Kernel resource • file, socket, pipe, epoll, inotify, … • inherited from parent process • shared over AF_UNIX socket • usually one permission check on access file descriptor man: open(2)
  10. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • control groups

    • resource limiting (memory, tasks, ...) • priorization (CPU, IO, ...) • resource accounting (CPU, memory, network) • namespace isolation • mount, pid, network, user, UTS, IPC, cgroups • hierarchical cgroups & namespaces man: cgroups(7)
  11. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • complicated shell

    scripts (sh, not bash!) • serialized start • inflexible ordering, no dependencies • limited respawn with inittab • no service management • singleton processes are hard (stale lock files) Old init
  12. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 0) [ident] fork(),

    exec() 1) [p] close fds 2) [p] reset signal handlers 3) [p] sigprocmask() 4) [p] sanitize env vars 5) [p] create pipe 6) [p] fork() to create background task 7) [c] setsid() 8) [c] fork() to detach terminal 9) [c] exit() to re-parent [d] to PID 1 Unix daemons 10) [d] stdin, stdout, stderr /dev/null → 11) [d] reset umask 12) [d] chdir("/") 13) [d] write PID file 14) [d] drop privileges 15) [d] notify parent [p] via pipe 16) [d] close pipe 17) [p] exit() 18) [ident] … continue man: daemon(7)
  13. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 By default a

    service running as root is able to • overwrite system files, /usr, Kernel, … • read your email, bitcoins, SSH private keys • change time, hostname, firewall settings • connect to any host or local service • consume all memory or CPU resources • write /dev/sda (SELinux confined services or AppArmor prevent some problems) Security problem
  14. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Start less

    • Start more in parallel • React on hardware and software events • Keep track of processes • Control and secure process environments (sandboxing) • API • (declarative) configuration "Rethinking PID 1" https://0pointer.de/blog/projects/systemd.html
  15. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • an init

    system for Linux • a system and service manager • a collection of services and tools • > 50 binaries • an active and mature Open Source project • 10 years old • > 1,000 contributors • Arch, Debian, Gentoo, Red Hat, SuSE, Ubuntu, ... What is systemd?
  16. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • service •

    socket • target • path • timer • slice • scope units • device • mount • automount • swap man: systemd.unit(5)
  17. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • systemd: PID

    1, user instances • dbus: RPC • systemd-journald: logging • systemd-udevd: hardware events, firmware • systemd-logind: login manager • systemd-localed: locale and key mappings • systemd-machined: VM and container registration manager • systemd-timedated: time, timezone • … components
  18. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • policy for

    shutdown/sleep • shutdown/sleep inhibition • multi-seat support • user services • per-user instances: dbus, pulseaudio, systemd • runtime directories: /run/user/1000 • user sessions • Gnome, KDE, terminal, … • lingering user services $ loginctl SESSION UID USER SEAT TTY 3 1000 heimes seat0 4 1000 heimes seat0 tty2 $ loginctl SESSION UID USER SEAT TTY 3 1000 heimes seat0 4 1000 heimes seat0 tty2 systemd-logind man: systemd-logind(8)
  19. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 simple service #

    simple.service [Unit] Description=Simple service [Service] ExecStart=sleep 20 [Install] WantedBy=multi-user.target # simple.service [Unit] Description=Simple service [Service] ExecStart=sleep 20 [Install] WantedBy=multi-user.target # cp simple.service /etc/systemd/system/ # systemctl daemon-reload # cp simple.service /etc/systemd/system/ # systemctl daemon-reload # systemctl enable simple.service Created symlink /etc/systemd/system/multi-user.target.wants/simple.service → /etc/systemd/system/simple.service. # systemctl start simple.service # systemctl enable simple.service Created symlink /etc/systemd/system/multi-user.target.wants/simple.service → /etc/systemd/system/simple.service. # systemctl start simple.service man: systemd.service(5)
  20. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 simple service (2)

    # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-01-08 11:11:27 CET; 13s ago Main PID: 866159 (sleep) Tasks: 1 (limit: 19012) Memory: 248.0K CGroup: /system.slice/simple.service └─866159 /usr/bin/sleep 20 # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-01-08 11:11:27 CET; 13s ago Main PID: 866159 (sleep) Tasks: 1 (limit: 19012) Memory: 248.0K CGroup: /system.slice/simple.service └─866159 /usr/bin/sleep 20 # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Active: inactive (dead) since Wed 2020-01-08 11:11:47 CET; 6s ago Process: 866159 ExecStart=/usr/bin/sleep 20 (code=exited, status=0/SUCCESS) Main PID: 866159 (code=exited, status=0/SUCCESS) # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Active: inactive (dead) since Wed 2020-01-08 11:11:47 CET; 6s ago Process: 866159 ExecStart=/usr/bin/sleep 20 (code=exited, status=0/SUCCESS) Main PID: 866159 (code=exited, status=0/SUCCESS) # journalctl -u simple.service -- Logs begin at Wed 2019-08-28 09:54:44 CEST, end at Wed 2020-01-08 11:15:11 CET. -- Jan 08 11:11:27 seneca systemd[1]: Started Simple service. Jan 08 11:11:47 seneca systemd[1]: simple.service: Succeeded. # journalctl -u simple.service -- Logs begin at Wed 2019-08-28 09:54:44 CEST, end at Wed 2020-01-08 11:15:11 CET. -- Jan 08 11:11:27 seneca systemd[1]: Started Simple service. Jan 08 11:11:47 seneca systemd[1]: simple.service: Succeeded.
  21. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 systemctl reload configuration

    # systemctl daemon-reload reload configuration # systemctl daemon-reload unit start/stop # systemctl start simple.service # systemctl stop simple.service # systemctl restart simple.service # systemctl reload simple.service # systemctl kill simple.service unit start/stop # systemctl start simple.service # systemctl stop simple.service # systemctl restart simple.service # systemctl reload simple.service # systemctl kill simple.service block unit (no implicit start) # systemctl mask simple.service # systemctl unmask simple.service block unit (no implicit start) # systemctl mask simple.service # systemctl unmask simple.service man: systemctl(1) unit explicit auto-start # systemctl enable simple.service # systemctl disable simple.service unit explicit auto-start # systemctl enable simple.service # systemctl disable simple.service
  22. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 systemctl unit status

    # systemctl status simple.service # systemctl show simple.service # systemctl is-active simple.service # systemctl is-enabled simple.service unit status # systemctl status simple.service # systemctl show simple.service # systemctl is-active simple.service # systemctl is-enabled simple.service man: systemctl(1) # systemctl enable --now simple.service # systemctl enable --now simple.service
  23. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 unit search paths

    /etc/systemd/system.control/ /run/systemd/system.control/ /run/systemd/transient/ /run/systemd/generator.early/ /etc/systemd/system/ /etc/systemd/systemd.attached/ /run/systemd/system/ /run/systemd/systemd.attached/ /run/systemd/generator/ ... /usr/lib/systemd/system/ /run/systemd/generator.late/ /etc/systemd/system.control/ /run/systemd/system.control/ /run/systemd/transient/ /run/systemd/generator.early/ /etc/systemd/system/ /etc/systemd/systemd.attached/ /run/systemd/system/ /run/systemd/systemd.attached/ /run/systemd/generator/ ... /usr/lib/systemd/system/ /run/systemd/generator.late/ • admin customization: /etc/systemd/system/ • non-persistent customization: /run/systemd/system/ • package maintainer: /usr/lib/systemd/system/ man: systemd.unit(5)
  24. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 customize service #

    mkdir /etc/systemd/system/simple.service.d # edit /etc/systemd/system/simple.service.d/custom.conf # mkdir /etc/systemd/system/simple.service.d # edit /etc/systemd/system/simple.service.d/custom.conf [Unit] Wants=network-online.target remote-fs.target After=network-online.target remote-fs.target [Service] Restart=always [Unit] Wants=network-online.target remote-fs.target After=network-online.target remote-fs.target [Service] Restart=always # systemctl daemon-reload # systemctl restart simple.service # systemctl daemon-reload # systemctl restart simple.service # systemctl edit simple.service # systemctl edit simple.service
  25. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 customize service (2)

    # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/simple.service.d └─custom.conf # systemctl status simple.service • simple.service - Simple service Loaded: loaded (/etc/systemd/system/simple.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/simple.service.d └─custom.conf # systemctl show -p Restart simple.service Restart=always # systemctl show -p Restart simple.service Restart=always
  26. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 customize service (3)

    # /etc/systemd/system/simple.service [Unit] Description=Simple service [Service] ExecStart=sleep 20 [Install] WantedBy=multi-user.target # /etc/systemd/system/simple.service.d/custom.conf [Unit] Wants=network-online.target remote-fs.target After=network-online.target remote-fs.target [Service] Restart=always # /etc/systemd/system/simple.service [Unit] Description=Simple service [Service] ExecStart=sleep 20 [Install] WantedBy=multi-user.target # /etc/systemd/system/simple.service.d/custom.conf [Unit] Wants=network-online.target remote-fs.target After=network-online.target remote-fs.target [Service] Restart=always # systemctl cat simple.service # systemctl cat simple.service # systemctl cat simple.service # systemctl cat simple.service
  27. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • grouping of

    units • synchronization points • replacement for SysV runlevels (rc3 → multi-user.target) • multiple targets can be active Target unit (.target) [Unit] Description=Emergency Mode Documentation=man:systemd.special(7) Requires=emergency.service After=emergency.service AllowIsolate=yes [Unit] Description=Emergency Mode Documentation=man:systemd.special(7) Requires=emergency.service After=emergency.service AllowIsolate=yes man: systemd.target(5)
  28. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Default units #

    systemctl list-units --type=target UNIT LOAD ACTIVE SUB DESCRIPTION basic.target loaded active active Basic System bluetooth.target loaded active active Bluetooth cryptsetup.target loaded active active Local Encrypted Volumes getty.target loaded active active Login Prompts graphical.target loaded active active Graphical Interface local-fs-pre.target loaded active active Local File Systems (Pre) local-fs.target loaded active active Local File Systems multi-user.target loaded active active Multi-User System network-online.target loaded active active Network is Online network-pre.target loaded active active Network (Pre) network.target loaded active active Network nss-user-lookup.target loaded active active User and Group Name Lookups paths.target loaded active active Paths remote-fs.target loaded active active Remote File Systems slices.target loaded active active Slices sockets.target loaded active active Sockets sound.target loaded active active Sound Card sshd-keygen.target loaded active active sshd-keygen.target swap.target loaded active active Swap sysinit.target loaded active active System Initialization timers.target loaded active active Timers # systemctl list-units --type=target UNIT LOAD ACTIVE SUB DESCRIPTION basic.target loaded active active Basic System bluetooth.target loaded active active Bluetooth cryptsetup.target loaded active active Local Encrypted Volumes getty.target loaded active active Login Prompts graphical.target loaded active active Graphical Interface local-fs-pre.target loaded active active Local File Systems (Pre) local-fs.target loaded active active Local File Systems multi-user.target loaded active active Multi-User System network-online.target loaded active active Network is Online network-pre.target loaded active active Network (Pre) network.target loaded active active Network nss-user-lookup.target loaded active active User and Group Name Lookups paths.target loaded active active Paths remote-fs.target loaded active active Remote File Systems slices.target loaded active active Slices sockets.target loaded active active Sockets sound.target loaded active active Sound Card sshd-keygen.target loaded active active sshd-keygen.target swap.target loaded active active Swap sysinit.target loaded active active System Initialization timers.target loaded active active Timers man: bootup(7)
  29. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 SysV daemon [Service]

    Type=forking ExecStart=/usr/bin/myservice PIDFile=/run/myservice.pid ExecReload=/bin/kill -HUP $MAINPID Environment=MYSERVICE_LOGGING=verbose EnvironmentFile=/etc/sysconfig/myservice EnvironmentFile=-/etc/sysconfig/myservice-override [Service] Type=forking ExecStart=/usr/bin/myservice PIDFile=/run/myservice.pid ExecReload=/bin/kill -HUP $MAINPID Environment=MYSERVICE_LOGGING=verbose EnvironmentFile=/etc/sysconfig/myservice EnvironmentFile=-/etc/sysconfig/myservice-override man: systemd.exec(5)
  30. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 [Service] Type=oneshot ExecStart=/usr/bin/custom-firewall.sh

    start ExecStop=/usr/bin/custom-firewall.sh stop ExecReload=/usr/bin/custom-firewall.sh reload RemainAfterExit=yes [Service] Type=oneshot ExecStart=/usr/bin/custom-firewall.sh start ExecStop=/usr/bin/custom-firewall.sh stop ExecReload=/usr/bin/custom-firewall.sh reload RemainAfterExit=yes Oneshot service short-duration process [Unit] Description=One-time temporary TLS key generation for httpd.service ConditionPathExists=|!/etc/pki/tls/certs/localhost.crt ConditionPathExists=|!/etc/pki/tls/private/localhost.key [Service] Type=oneshot ExecStart=/usr/libexec/httpd-ssl-gencerts [Unit] Description=One-time temporary TLS key generation for httpd.service ConditionPathExists=|!/etc/pki/tls/certs/localhost.crt ConditionPathExists=|!/etc/pki/tls/private/localhost.key [Service] Type=oneshot ExecStart=/usr/libexec/httpd-ssl-gencerts
  31. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 [Service] Type=notify ExecStart=/usr/sbin/httpd

    [Service] Type=notify ExecStart=/usr/sbin/httpd Status notification man: sd_notify(3) /* server is ready to handle requests */ sd_notify(0, "READY=1"); /* server is ready to handle requests */ sd_notify(0, "READY=1"); # systemctl status httpd | grep Status Status: "Running, listening on: port 80, port 443" # systemctl status httpd | grep Status Status: "Total requests: 10000; Idle/Busy workers 100/0;Requests/sec: 0.141; Bytes served/sec: 831 B/sec" # systemctl status httpd | grep Status Status: "Running, listening on: port 80, port 443" # systemctl status httpd | grep Status Status: "Total requests: 10000; Idle/Busy workers 100/0;Requests/sec: 0.141; Bytes served/sec: 831 B/sec" from systemd.daemon import notify notify("READY=1") from systemd.daemon import notify notify("READY=1")
  32. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Type=simple •

    foreground service • Typing=forking • Unix daemon mode (PIDFile=/path/to/file) • Type=oneshot • short-running script • Type=notify • sd_notify() • Type=dbus • BusName=org.freedesktop.example Service Type process monitoring & readiness
  33. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • ExecStart=/path/to/binary •

    exactly one • oneshot: 0...n • ExecStartPre, ExecStartPost • ExecStopPost • ExecCondition • ExecReload • ExecStop Service Exec Execute command
  34. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Restart •

    no, always, on-success, on-failure, on-abnormal, on-abort, watchdog • RestartSec • WatchdogSec Service Restart / Watchdog Restart crashed processes def watchdog_task(): while True: if check_app(): notify("WATCHDOG=1") # ok else: notify("WATCHDOG=trigger") # trigger failure wait = int(os.environ("WATCHDOG_USEC")) / 1_000_000 sleep(wait / 2) Thread(target=watchdog_task).start() def watchdog_task(): while True: if check_app(): notify("WATCHDOG=1") # ok else: notify("WATCHDOG=trigger") # trigger failure wait = int(os.environ("WATCHDOG_USEC")) / 1_000_000 sleep(wait / 2) Thread(target=watchdog_task).start() man: sd_watchdog_enabled(3)
  35. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 # borgmatic.timer [Unit]

    Description=Run borgmatic backup [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target # borgmatic.timer [Unit] Description=Run borgmatic backup [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target timer unit (.timer) cron & at man: systemd.timer(5) # borgmatic.service [Unit] Description=borgmatic backup Wants=network-online.target After=network-online.target ConditionACPower=true [Service] Type=oneshot ExecStart=/usr/bin/borgmatic # borgmatic.service [Unit] Description=borgmatic backup Wants=network-online.target After=network-online.target ConditionACPower=true [Service] Type=oneshot ExecStart=/usr/bin/borgmatic # systemctl enable --now borgmatic.timer # systemctl enable --now borgmatic.timer
  36. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Monotonic timers

    • OnActiveSec, OnBootSec, OnStartupSec • OnUnitActive, OnUnitDeactive • Wallclocker timer • OnCalendar=Mon..Fri, 08:00 • Randomization • AccuracySec=1h • RandomizedDelaySec • … more • OnClockChange, OnTimezoneChange • WakeSystem Timer options man: systemd.timer(5) man: systemd.time(7) $ systemd-analyze calendar "Mon..Fri 08:00" Original form: Mon..Fri 08:00 Normalized form: Mon..Fri *-*-* 08:00:00 Next elapse: Mon 2020-01-17 08:00:00 CET (in UTC): Mon 2020-01-17 07:00:00 UTC From now: 2 days left $ systemd-analyze calendar "Mon..Fri 08:00" Original form: Mon..Fri 08:00 Normalized form: Mon..Fri *-*-* 08:00:00 Next elapse: Mon 2020-01-17 08:00:00 CET (in UTC): Mon 2020-01-17 07:00:00 UTC From now: 2 days left
  37. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • PathExists •

    PathExistsGlob • PathChanged • PathModified • DirectoryNotEmpty Path activation unit (.path) man: systemd.path(5) # myservice-import.path [Unit] Description=My Service data importer [Path] DirectoryNotEmpty=/var/lib/myservice/import [Install] WantedBy=multi-user.target # myservice-import.path [Unit] Description=My Service data importer [Path] DirectoryNotEmpty=/var/lib/myservice/import [Install] WantedBy=multi-user.target
  38. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 # myserver.socket [Unit]

    Description=My Server socket [Socket] ListenStream=9000 BindIPv6Only=both [Install] WantedBy=sockets.target # myserver.socket [Unit] Description=My Server socket [Socket] ListenStream=9000 BindIPv6Only=both [Install] WantedBy=sockets.target Socket activation unit (.socket) man: systemd.socket(5) # myserver.service [Unit] Description=My Server application Wants=network-online.target After=network-online.target [Service] ExecStart=/usr/bin/myserver # myserver.service [Unit] Description=My Server application Wants=network-online.target After=network-online.target [Service] ExecStart=/usr/bin/myserver # systemctl enable --now myserver.socket # systemctl enable --now myserver.socket
  39. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 from socket import

    socket from systemd.daemon import listen_fds fds = listen_fds() srv_sock = socket(fileno=fds[0]) while True: client_conn, addr = srv_sock.accept() ... from socket import socket from systemd.daemon import listen_fds fds = listen_fds() srv_sock = socket(fileno=fds[0]) while True: client_conn, addr = srv_sock.accept() ... socket activation man: sd_listen_fds(3) $ systemd-socket-activate -l 9000 \ python3 -c \ "from systemd.daemon import listen_fds; from socket import socket; print(socket(fileno=listen_fds()[0]))" ... <socket.socket fd=3, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6, laddr=('::', 9000, 0, 0)> $ systemd-socket-activate -l 9000 \ python3 -c \ "from systemd.daemon import listen_fds; from socket import socket; print(socket(fileno=listen_fds()[0]))" ... <socket.socket fd=3, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6, laddr=('::', 9000, 0, 0)>
  40. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Socket families

    and types • TCP, UDP, AF_UNIX, FIFO, netlink, special, … • sockets survive: • crash • service restart • package upgrade • port < 1024 for unprivileged processes Socket activation features
  41. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 By default a

    service running as root is able to • overwrite system files, /usr, Kernel, … • read your email, bitcoins, SSH private keys • change time, hostname, firewall settings • connect to any host or local service • consume all memory or CPU resources • write /dev/sda (SELinux confined services or AppArmor prevent some problems) "root" services are a security problem
  42. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • User=myuser (effective

    uid) • Group (effective gid) • DynamicUser • dynamically allocated, unique UID/GID User, groups, directories seteuid / user namespace man: systemd.exec(5)
  43. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • PrivateTmp=yes •

    Private namespace /tmp and /var/tmp • auto-cleanup • JoinsNamespaceOf=parent.service • ProtectSystem={yes, strict, full} • /usr and /boot read-only, strict: also /etc, full: everything except /dev and /tmp • ProtectHome={yes, read-only} • /home, /root, /run/user appear empty • PrivateDevices=yes • Minimal /dev (zero, null, urandom, …), drop CAP_MKNOD File system protection mount namespace man: systemd.exec(5)
  44. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • ReadWriteDirectories, ReadOnlyDirectories,

    InaccessibleDirectories • RuntimeDirectory, StateDirectory, CacheDirectory, LogsDirectory • automatically created and chowned • StateDirectory=myservice →/var/lib/myservice • STATE_DIRECTORY env var • systemd-tmpfiles service Directories mount namespace man: systemd.exec(5)
  45. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • PrivateNetwork •

    only private loopback network device • JoinsNamespaceOf=parent.service • PrivateHostname • SystemCallFilter=@raw-io @module • CapabilityBoundingSet=~CAP_SYS_ADMIN • NoNewPrivileges • suid / sgid binaries • MemoryDenyWriteExecute=yes • ... Many more security flags network ns, UTS ns, seccomp, ... man: systemd.exec(5)
  46. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • ulimit •

    ulimit -n 1024 LimitNOFILE=1024 → • nice • OOMScoreAdjust • Out-Of-Memory killer score • CPU scheduler and affinity • CPUAffinity=0-3 • CPU NUMA policy • IO scheduler • IOSchedulingClass=idle Resource limits, scheduling ulimit, rlimit, priority, ... man: systemd.exec(5)
  47. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Security analyzer NAME

    DESCRIPTION EXPOSURE ✗ PrivateNetwork= Service has access to the host's network 0.6 ✗ User=/DynamicUser= Service runs as root user 0.5 ✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service may change UID/GID identities/capabilities 0.4 ✗ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has administrator privileges 0.4 ✗ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has ptrace() debugging abilities 0.4 ✓ RestrictAddressFamilies=~AF_(INET|INET6) Service cannot allocate Internet sockets ✓ RestrictNamespaces=~CLONE_NEWUSER Service cannot create user namespaces ✓ RestrictAddressFamilies=~… Service cannot allocate exotic sockets ✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service may change file ownership/access mode/capabilities unrestricted 0.3 ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks 0.3 ✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no network configuration privileges ✓ CapabilityBoundingSet=~CAP_RAWIO Service has no raw I/O access ✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot load kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes cannot change the system clock ✗ DeviceAllow= Service has no device ACL 0.3 ✓ IPAddressDeny= Service blocks all IP address ranges ✓ KeyringMode= Service doesn't share key material with other services ✗ NoNewPrivileges= Service processes may acquire new privileges 0.3 ✓ NotifyAccess= Service child processes cannot alter service state ✗ PrivateDevices= Service potentially has access to hardware devices 0.3 ✗ PrivateMounts= Service may install system mounts 0.3 PrivateTmp= Service runs in special boot phase, option does not apply ✗ PrivateUsers= Service has access to other users 0.3 ✗ ProtectControlGroups= Service may modify the control group file system 0.3 ProtectHome= Service runs in special boot phase, option does not apply ✗ ProtectKernelModules= Service may load or read kernel modules 0.3 ✗ ProtectKernelTunables= Service may alter kernel tunables 0.3 ProtectSystem= Service runs in special boot phase, option does not apply ✓ RestrictAddressFamilies=~AF_PACKET Service cannot allocate packet sockets NAME DESCRIPTION EXPOSURE ✗ PrivateNetwork= Service has access to the host's network 0.6 ✗ User=/DynamicUser= Service runs as root user 0.5 ✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service may change UID/GID identities/capabilities 0.4 ✗ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has administrator privileges 0.4 ✗ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has ptrace() debugging abilities 0.4 ✓ RestrictAddressFamilies=~AF_(INET|INET6) Service cannot allocate Internet sockets ✓ RestrictNamespaces=~CLONE_NEWUSER Service cannot create user namespaces ✓ RestrictAddressFamilies=~… Service cannot allocate exotic sockets ✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service may change file ownership/access mode/capabilities unrestricted 0.3 ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks 0.3 ✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no network configuration privileges ✓ CapabilityBoundingSet=~CAP_RAWIO Service has no raw I/O access ✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot load kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes cannot change the system clock ✗ DeviceAllow= Service has no device ACL 0.3 ✓ IPAddressDeny= Service blocks all IP address ranges ✓ KeyringMode= Service doesn't share key material with other services ✗ NoNewPrivileges= Service processes may acquire new privileges 0.3 ✓ NotifyAccess= Service child processes cannot alter service state ✗ PrivateDevices= Service potentially has access to hardware devices 0.3 ✗ PrivateMounts= Service may install system mounts 0.3 PrivateTmp= Service runs in special boot phase, option does not apply ✗ PrivateUsers= Service has access to other users 0.3 ✗ ProtectControlGroups= Service may modify the control group file system 0.3 ProtectHome= Service runs in special boot phase, option does not apply ✗ ProtectKernelModules= Service may load or read kernel modules 0.3 ✗ ProtectKernelTunables= Service may alter kernel tunables 0.3 ProtectSystem= Service runs in special boot phase, option does not apply ✓ RestrictAddressFamilies=~AF_PACKET Service cannot allocate packet sockets man: systemd-analyze(1) # systemd-analyze security systemd-journald.service # systemd-analyze security systemd-journald.service
  48. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Slice: resource

    group in cgroup hierarchy • -.slice (root) • system.slice • user.slice • machine.slice • Service: process (group) managed by systemd • Scope: group of external processes • user session "cgroups made easy" man: systemd.resource-control(5)
  49. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 systemd cgroups #

    systemd-cgls Control group /: -.slice ├─user.slice │ └─user-1000.slice │ ├─[email protected] │ │ └─ ... │ └─session-3.scope │ ├─ kded5 │ └─ ... ├─init.scope │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 30 └─system.slice ├─sshd.service │ └─1185 /usr/sbin/sshd -D ├─httpd.service │ ├─21895 /usr/sbin/httpd -DFOREGROUND ... # systemd-cgls Control group /: -.slice ├─user.slice │ └─user-1000.slice │ ├─[email protected] │ │ └─ ... │ └─session-3.scope │ ├─ kded5 │ └─ ... ├─init.scope │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 30 └─system.slice ├─sshd.service │ └─1185 /usr/sbin/sshd -D ├─httpd.service │ ├─21895 /usr/sbin/httpd -DFOREGROUND ... man: systemd-cgls(1) man: cgroups(7) $ loginctl SESSION UID USER SEAT TTY 3 1000 heimes seat0 $ loginctl SESSION UID USER SEAT TTY 3 1000 heimes seat0
  50. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 cgroups top #

    systemd-cgtop Control Group Tasks %CPU Memory Input/s Output/s user.slice 1181 31,7 5.0G - - user.slice/user-1000.slice 1181 31,7 5.0G - - user.slice/user-1000.slice/session-3.scope 991 30,0 4.6G - - / 1719 28,9 6.6G - - user.slice/use…000.slice/[email protected] 190 1,8 364.9M - - system.slice 404 0,6 1.4G - - system.slice/sddm.service 15 0,4 237.4M - - system.slice/httpd.service 278 0,1 14.5M - - system.slice/rsyslog.service 3 0,0 3.7M - - system.slice/wpa_supplicant.service 1 0,0 4.2M - - init.scope 1 - 21.2M - - system.slice/ModemManager.service 3 - 3.9M - - ... # systemd-cgtop Control Group Tasks %CPU Memory Input/s Output/s user.slice 1181 31,7 5.0G - - user.slice/user-1000.slice 1181 31,7 5.0G - - user.slice/user-1000.slice/session-3.scope 991 30,0 4.6G - - / 1719 28,9 6.6G - - user.slice/use…000.slice/[email protected] 190 1,8 364.9M - - system.slice 404 0,6 1.4G - - system.slice/sddm.service 15 0,4 237.4M - - system.slice/httpd.service 278 0,1 14.5M - - system.slice/rsyslog.service 3 0,0 3.7M - - system.slice/wpa_supplicant.service 1 0,0 4.2M - - init.scope 1 - 21.2M - - system.slice/ModemManager.service 3 - 3.9M - - ... man: systemd-cgtop(1)
  51. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Service resource control

    man: systemd.resource-control(5) # httpd.service.d/custom.conf [Service] MemoryMax=1G # 1 GB memory TasksMax=100 # 100 threads / process CPUQuota=300% # 3 CPU cores IOWeight=200 # double IO priority IPAccounting=yes # monitor network traffic # httpd.service.d/custom.conf [Service] MemoryMax=1G # 1 GB memory TasksMax=100 # 100 threads / process CPUQuota=300% # 3 CPU cores IOWeight=200 # double IO priority IPAccounting=yes # monitor network traffic # systemctl stop httpd.service # journalctl -f -o cat -u httpd.service ... Stopped The Apache HTTP Server. httpd.service: Consumed 4.927s CPU time, received 3.2M IP traffic, sent 58.3M IP traffic. # systemctl stop httpd.service # journalctl -f -o cat -u httpd.service ... Stopped The Apache HTTP Server. httpd.service: Consumed 4.927s CPU time, received 3.2M IP traffic, sent 58.3M IP traffic.
  52. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Custom slice (.slice)

    man: systemd.slice(5) # mydb.service [Unit] Slice=system-myapp.slice # mydb.service [Unit] Slice=system-myapp.slice # myserver.service [Unit] Slice=system-myapp.slice # myserver.service [Unit] Slice=system-myapp.slice └─system.slice └─system-myapp.slice ├─mydb.service │ └─12345 mydb └─myserver.service └─12347 myserver └─system.slice └─system-myapp.slice ├─mydb.service │ └─12345 mydb └─myserver.service └─12347 myserver # system-myapp.slice [Unit] Description=Limit DB and app to 2 GB memory in total [Slice] MemoryMax=2G # system-myapp.slice [Unit] Description=Limit DB and app to 2 GB memory in total [Slice] MemoryMax=2G
  53. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Runtime control man:

    systemctl(1) Limit slice to 1 GB # systemctl set-property system-myapp.slice MemoryMax=1G Limit slice to 1 GB # systemctl set-property system-myapp.slice MemoryMax=1G Remove limitation until next reboot # systemctl set-property --runtime system-myapp.slice MemoryMax= # systemctl show -p MemoryMax system-myapp.slice MemoryMax=infinity Remove limitation until next reboot # systemctl set-property --runtime system-myapp.slice MemoryMax= # systemctl show -p MemoryMax system-myapp.slice MemoryMax=infinity systemctl set-property httpd.service CPUWeight=200 IPAccounting=yes systemctl set-property httpd.service CPUWeight=200 IPAccounting=yes systemctl set-property --runtime httpd.service IPAddressDeny="10.0.0.0/8" systemctl set-property --runtime httpd.service IPAddressDeny="10.0.0.0/8"
  54. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 0) [ident] fork(),

    exec() 1) [p] close fds 2) [p] reset signal handlers 3) [p] sigprocmask() 4) [p] sanitize env vars 5) [p] create pipe 6) [p] fork() to create background task 7) [c] setsid() 8) [c] fork() to detach terminal 9) [c] exit() to re-parent [d] to PID 1 Unix daemons 10) [d] stdin, stdout, stderr /dev/null → 11) [d] reset umask 12) [d] chdir(“/”) 13) [d] write PID file 14) [d] drop privileges 15) [d] notify parent [p] via pipe 16) [d] close pipe 17) [p] exit() 18) [ident] … continue man: daemon(7)
  55. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • don't daemonize

    • follow LSB specification (Linux Standard Base) • exit codes • signals • paths • let systemd drop privileges • log to stdout / stderr • write a service file How to port a service to systemd
  56. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • sd_notify() •

    service ready • watchdog • status information • socket activation • oneshot services for initialization / upgrades • sd_journal_send() for improved logging Consider advanced features
  57. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Journald logging man:

    sd_journal_send(3) from systemd.journal import send, LOG_NOTICE send("Hello ConFoo", PRIORITY=LOG_NOTICE, CITY="Montreal", COUNTRY="Canada") from systemd.journal import send, LOG_NOTICE send("Hello ConFoo", PRIORITY=LOG_NOTICE, CITY="Montreal", COUNTRY="Canada") $ journalctl -f -o json-pretty { "_AUDIT_LOGINUID" : "1000", "MESSAGE" : "Hello ConFoo", "SYSLOG_IDENTIFIER" : "python3", "_PID" : "91755", "CITY" : "Montreal", "COUNTRY" : "Canada", "PRIORITY" : "5", "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023", ... "_EXE" : "/usr/bin/python3.7", "_SYSTEMD_CGROUP" : "/user.slice/user-1000.slice/session-3.scope", "_UID" : "1000", "_BOOT_ID" : "fe72a9b80a9b4b6ea9c7e1887bf827f7", "_SYSTEMD_UNIT" : "session-3.scope", "_COMM" : "python3" } $ journalctl -f -o json-pretty { "_AUDIT_LOGINUID" : "1000", "MESSAGE" : "Hello ConFoo", "SYSLOG_IDENTIFIER" : "python3", "_PID" : "91755", "CITY" : "Montreal", "COUNTRY" : "Canada", "PRIORITY" : "5", "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023", ... "_EXE" : "/usr/bin/python3.7", "_SYSTEMD_CGROUP" : "/user.slice/user-1000.slice/session-3.scope", "_UID" : "1000", "_BOOT_ID" : "fe72a9b80a9b4b6ea9c7e1887bf827f7", "_SYSTEMD_UNIT" : "session-3.scope", "_COMM" : "python3" }
  58. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Human readable name

    & documentation [Unit] Description=The Apache HTTP Server Documentation=man:httpd.service(8) [Unit] Description=The Apache HTTP Server Documentation=man:httpd.service(8) $ systemctl help httpd.service $ systemctl help $(pidof httpd) $ systemctl help httpd.service $ systemctl help $(pidof httpd)
  59. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Dependencies • Wants

    • Requires • Requisites • BindsTo • PartOf • Conflicts Dependencies & Ordering start and stop order Ordering • Before • After [Unit] Requires=database.service [Unit] Requires=database.service [Unit] Requires=database.service After=database.service [Unit] Requires=database.service After=database.service
  60. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Conditionals, mounts #

    don't run in battery mode ConditionACPower=true # file must exists ConditionPathExists=/path/to/file # at least 500 MB free memory ConditionMemory= >500M # don't run in battery mode ConditionACPower=true # file must exists ConditionPathExists=/path/to/file # at least 500 MB free memory ConditionMemory= >500M # /var/lib/database file system mounted Requires=var-lib-database.mount # /var/lib/database file system mounted Requires=var-lib-database.mount
  61. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • application .target

    • application .slice • initial configuration .service • upgrade .service Multi-service application
  62. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 Templated service (@.service)

    man: systemd.service(5) # sshd-keygen.target [email protected] [email protected] [email protected] # sshd-keygen.target [email protected] [email protected] [email protected] # sshd.service Wants=sshd-keygen.target # sshd.service Wants=sshd-keygen.target # [email protected] [Unit] Description=OpenSSH %i Server Key Generation [Service] Type=oneshot ExecStart=/usr/libexec/openssh/sshd-keygen %i [Install] WantedBy=sshd-keygen.target # [email protected] [Unit] Description=OpenSSH %i Server Key Generation [Service] Type=oneshot ExecStart=/usr/libexec/openssh/sshd-keygen %i [Install] WantedBy=sshd-keygen.target
  63. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Don't –

    keep it small • … unless it's a multi-service container • check out OCI-Hooks systemd in containers?
  64. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • Started on

    log-in first session • Stopped on log-off of last session • Linger with loginctl User services man: systemd.unit(5) ~/.config/systemd/user/ /etc/systemd/user/ ~/.local/share/systemd/user/ ... /usr/lib/systemd/user/ ~/.config/systemd/user/ /etc/systemd/user/ ~/.local/share/systemd/user/ ... /usr/lib/systemd/user/ $ systemctl --user daemon-reload $ systemctl --user enable --now myuserapp.service $ systemctl --user daemon-reload $ systemctl --user enable --now myuserapp.service
  65. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 D-Bus scripting man:

    busctl(1) # busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1 # busctl introspect org.freedesktop.systemd1 /org/freedesktop/systemd1 # busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \ org.freedesktop.systemd1.Manager \ GetUnit s "sssd.service" o "/org/freedesktop/systemd1/unit/sssd_2eservice" # busctl get-property org.freedesktop.systemd1 \ /org/freedesktop/systemd1/unit/sssd_2eservice \ org.freedesktop.systemd1.Unit ActiveState s "active" # busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \ org.freedesktop.systemd1.Manager \ GetUnit s "sssd.service" o "/org/freedesktop/systemd1/unit/sssd_2eservice" # busctl get-property org.freedesktop.systemd1 \ /org/freedesktop/systemd1/unit/sssd_2eservice \ org.freedesktop.systemd1.Unit ActiveState s "active" # busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \ org.freedesktop.systemd1.Manager \ StopUnit ss "sssd.service" "replace" o "/org/freedesktop/systemd1/job/102829" # busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 \ org.freedesktop.systemd1.Manager \ StopUnit ss "sssd.service" "replace" o "/org/freedesktop/systemd1/job/102829"
  66. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • systemctl •

    … cat unit.service • systemd-delta • systemd-analyze • … plot • … dot • … verify filename • … security • systemd-tmpfiles Useful commands
  67. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • systemd is

    all about units • process: service • activation: time, path, socket • grouping: target • resource control: cgroups,slice, service units • security & sandboxing: name space, filters Summary
  68. systemd, ConFoo 2020, @ChristianHeimes, CC BY-SA 4.0 • man: systemd(1)

    • https://www.freedesktop.org/wiki/Software/systemd/ • The systemd for Administrators Blog Series • The systemd for Developers Series • Documentation for Developers • Presentations (Youtube) • "Demystifying systemd" • "The Tragedy of systemd" • "The Six Stages of systemd" • Slides: https://speakerdeck.com/tiran/ Resources