Upgrade to Pro — share decks privately, control downloads, hide ads and more …

systemd for sysadmins. what to expect from your new service overlord

systemd for sysadmins. what to expect from your new service overlord

Presented at CentOS Dojo Santa Clara, Mar 31, 2014. This talk gives an overview of our experience using systemd on production servers for over 2 years. The audience is sysadmins who will soon be introduced to systemd as it replaces sysvinit in RHEL-7, CentOS-7, debian, and ubuntu

A93d007852c5a759a80e508084fba4fa?s=128

joemiller

March 31, 2014
Tweet

Transcript

  1. Two years living your future systemd is coming, what to

    expect CentOS Dojo - Santa Clara - March 2014
  2. Joe Miller Ops @ Pantheon (getpantheon.com) @miller_joe github.com/joemiller

  3. Platform for professional web developers systemd + fedora in production

    for > 2 years 600,000+ systemd managed services CTO - David Strauss - systemd, core committer
  4. “systemd is a system manager more than just an init

    system” - Lennart Poettering systemd manages more than just services automounts, mounts, timers (think cron), logging, devices, hostname, networking (soon), targets (runlevels), and more
  5. systemd seeks to provide common infrastructure that is often re-implemented

    across applications an example is systemd’s helpers for services: no code needed for dropping privileges no code needed for proper fork()’ing no code needed for logging, just print to stdout/stderr
  6. systemd adoption is accelerating systemd will replace init in RHEL7

    (CentOS7) debian - next release ubuntu - future release http://en.wikipedia.org/wiki/Systemd (Mar 22, 2014)
  7. but transitioning is going to be a pain? not really,

    lots of backwards compatibility If you like your init scripts, you can keep them but you won’t want to • distro provides .service files for most things • systemd can control legacy init scripts • systemd reads legacy /etc/fstab • journald forwards to syslog
  8. the things we (as sysadmins) like about systemd: NO MORE

    FIGHTING WITH INIT SCRIPTS fast, parallel boot declarative service configs cgroups integration powerful dependency model journald (rich logging)
  9. fast, parallel boot

  10. services

  11. declarative service definitions # /etc/systemd/system/cassandra.service [Unit] Description=Cassandra [Service] Type=simple User=cassandra

    Group=cassandra Environment=CASSANDRA_HOME=/usr/share/cassandra/ Environment=CASSANDRA_INCLUDE=/usr/share/cassandra/cassandra.in.sh Environment=MAX_HEAP_SIZE=6G ExecStart=/usr/sbin/cassandra -f LimitNOFILE=32768 CPUShares=1000 MemoryLimit=M MemorySoftLimit=M BlockIOWeight=100 [Install] WantedBy=multi-user.target /etc/systemd/system/ /usr/lib/systemd/system/
  12. basic service settings User, Group, Supplementary Groups, Environment, EnvironmentFile, Nice

    Level, CPU Affinity, CPU Priority, IO Scheduling, IO Priority, OOM Score Adjust, ulimits
  13. security settings chroot(), Set & limit capabilities(7), Read/Write directories, Read-only

    directories, Inaccessible directories, Private Tmp, Private Dev, SELinux Context, AppArmor Profile, Syscall Filtering, ...
  14. advanced resource control (cgroups) CPU Accounting, CPU Shares, Memory Accounting,

    Memory Limiting, Block IO Weight (global, per-device), Block IO bandwidth limit Can be set per-service, or shared amongst groups of services using .slices NOTE: not all cgroups settings are available. More in the future as cgroups interface stabilizes
  15. starting / stopping / restarting $ systemctl daemon-reload # must

    run this after modifying unit files $ systemctl start cassandra.service $ systemctl stop cassandra.service $ systemctl restart cassandra.service $ systemctl try-restart cassandra.service # restart only if service is running $ systemctl status cassandra.service cassandra.service - Cassandra Loaded: loaded (/etc/systemd/system/cassandra.service; enabled) Active: active (running) since Mon, 10 Mar 2014 02:38:01 +0000; 2 minutes Main PID: 21247 (java) CGroup: name=systemd:/system/cassandra.service └ 21247 java -ea -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar Mar 26 01:00:01 valhalla1f cassandra[21247]: INFO 01:00:01,016 Started hinted handof Mar 26 01:00:01 valhalla1f cassandra[21247]: INFO 01:00:01,018 Finished hinted hando
  16. .targets replace run-levels no more run-levels many targets can be

    active at once useful for grouping services or ordering startup multi-user.target is most common for your services
  17. enabling services at boot ### /etc/systemd/system/cassandra.service [Unit] Description=Cassandra … [Install]

    WantedBy=multi-user.target $ systemctl enable cassandra.service ln -s '/etc/systemd/system/cassandra.service' \ '/etc/systemd/system/multi-user.target.wants/cassandra.service' $ systemctl disable cassandra.service rm '/etc/systemd/system/multi-user.target.wants/cassandra.service' $ systemctl is-enabled cassandra.service ; echo $?
  18. listing services $ systemctl --all UNIT LOAD ACTIVE SUB DESCRIPTION

    proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary bloomd.service loaded inactive dead SYSV: Bloomd cassandra.service loaded active running Cassandra chronyd.service loaded inactive dead NTP client sys-fs-fuse-connections.mount loaded active mounted FUSE Control sys-kernel-config.mount loaded active mounted Configuration sys-kernel-debug.mount loaded active mounted Debug File tmp.mount loaded active mounted Temporary systemd-ask-password-console.path loaded inactive dead Dispatch core-file-cleanup.service loaded inactive dead Remove old crond.service loaded active running Command dbus.service loaded active running D-Bus System mysql.service loaded active running MySQL database netconsole.service loaded inactive dead SYSV: initia network.service loaded active exited LSB: Bring NetworkManager-dispatcher.service loaded inactive dead Network newrelic-daemon.service loaded inactive dead SYSV: Starts
  19. listing failed services $ systemctl --failed UNIT LOAD ACTIVE SUB

    DESCRIPTION elasticsearch.service loaded failed failed ElasticSearch great for writing monitoring checks
  20. instantiated services single template, many instances getty@.service [Unit] Description=Getty on

    %I [Service] ExecStart=-/sbin/agetty --noclear %I ... $ systemctl start getty@tty2.service $ systemctl start getty@tty3.service $ systemctl enable getty@tty2.service $ systemctl enable getty@tty3.service $ systemctl --all | grep getty getty@tty1.service loaded active running Getty on tty1 getty@tty2.service loaded active running Getty on tty2 getty@tty3.service loaded active running Getty on tty3
  21. dependency management dependencies are not only between services, but all

    unit types, e.g: wait for remote .mount, then start a .service flexibility (loose / strong, forwards / backwards): Wants, Wanted, WantedBy, Before, After, Requires, RequiredBy, Also, PartOf, ...
  22. dependency management example: wait for admin services to boot then

    start all container services ### /etc/systemd/system/containers.timer [Unit] Before=containers.target shutdown.target After=multi-user.target sysinit.target [Timer] OnActiveSec=60 Unit=containers.target [Install] WantedBy=multi-user.target ### /etc/systemd/system/containers.target [Unit] Description=Starts containers. ### /etc/systemd/system/container1.service ... [Install] WantedBy=containers.target
  23. cgroups resource control via cgroups automatically controlled by systemd can

    be set per-.service or shared between multiple services using .slice units changeable at runtime
  24. cgroups + user_1.slice |_ nginx_user_1.service |_ mariadb_user_1.service ### /etc/systemd/system/user_1.slice [Unit]

    Description=User One cgroup [Slice] CPUShares=512 MemoryLimit=2G ### /etc/systemd/system/nginx_user_1.service ... Slice=user_1.slice ### /etc/systemd/system/mariadb_user_1.service Slice=user_1.slice MemoryLimit=1.5G
  25. cgroups systemd-cgtop(1) $ system-cgtop Path Tasks %CPU Memory Input/s Output/s

    / 230 - 7.0G - - /system.slice 3 - - - - /system.slice/atlas-nginx.service 2 - - - - /system.slice/atlas-php-fpm.service 6 - - - - /system.slice/auditd.service 1 - - - - /system.slice/avahi-daemon.service 2 - - - - /system.slice/cassandra.service 1 - - - - /system.slice/crond.service 19 - - - - /system.slice/dbus.service 1 - - - -
  26. cgroups systemd-cgls(1) $ systemd-cgls ├─1 /usr/lib/systemd/systemd --system --deserialize 20 └─system.slice

    ├─tomcat.service │ ├─ 9914 /usr/bin/java -Djavax.sql.DataSource.Factory=org.apache ├─nginx.service │ ├─9776 nginx: master process /usr/sbin/ngin │ ├─9777 nginx: worker proces │ ├─9778 nginx: worker proces │ ├─9779 nginx: worker proces │ └─9780 nginx: worker proces ├─logstash_server.service │ └─10999 /usr/bin/java -server -Xms128m -Xmx128m ├─elasticsearch.service │ └─10967 /bin/java -server -Djava.net.preferIPv4Stack=true
  27. socket activation think inetd … systemd listens to a socket

    on behalf of a service upon connection systemd starts the service and passes in socket fd’s systemd can listen on low ports and handoff sockets to unprivileged services
  28. socket activation ### /etc/systemd/system/php_fpm_user1.socket [Socket] ListenStream=/home/user1/run/php-fpm.sock ### /etc/systemd/system/php_fpm_user1.service [Unit] Requires=php_fpm_user1.socket

    After=php_fpm_user1.socket [Service] User=user1 ExecStart=/binphp-fpm --fpm-config=/home/user1/php-fpm.conf
  29. journald

  30. journald is a replacement for syslog rich logging: arbitrary key=values

    on messages automatic logging for services stdout/stderr automatically stored in the journal optional Forward Secure Sealing for tamper- evident logging forwards to rsyslog by default (in fedora) for backwards compatibility
  31. journald captures stdout/stderr free logging for services $ systemctl status

    elasticsearch elasticsearch.service - ElasticSearch Loaded: loaded (/etc/systemd/system/elasticsearch.service; enabled) Active: active (running) since Wed 2014-03-26 23:31:56 UTC; 1 day 2h ago Main PID: 10967 (java) CGroup: /system.slice/elasticsearch.service └─10967 /bin/java -server -Djava.net.preferIPv4Stack=true Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:05,143][WARN...n Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:11,254][WARN...) Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:14,773][WARN...n Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:16,312][WARN...) Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:12,847][WARN...) Mar 26 23:54:58 host1 elasticsearch[10967]: [2014-03-26 23:54:12,847][WARN...e Mar 26 23:55:33 host1 elasticsearch[10967]: Exception in thread "elasticse...e Mar 26 23:55:47 host1 elasticsearch[10967]: Exception in thread "elasticse...e
  32. journald - journalctl(1) $ journalctl -u elasticsearch.service -f Mar 26

    23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:05,143][WARN...n Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:11,254][WARN...) Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:14,773][WARN...n Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:16,312][WARN...) Mar 26 23:54:20 host1 elasticsearch[10967]: [2014-03-26 23:54:12,847][WARN...) Mar 26 23:54:58 host1 elasticsearch[10967]: [2014-03-26 23:54:12,847][WARN...e Mar 26 23:55:33 host1 elasticsearch[10967]: Exception in thread "elasticse...e Mar 26 23:55:47 host1 elasticsearch[10967]: Exception in thread "elasticse...e $ journalctl -u elasticsearch.service -o verbose -f Wed 2014-03-26 23:56:55.116468 UTC [s=d309a8ac4d054931ac687cbdef5c444c;i=81c235; b=e77996bef6aa452fa456b07f037b6a55;m=3c76fadbe01;t=4f58b392966b4;x=dff372bece62bee3] PRIORITY=6 SYSLOG_FACILITY=3 _HOSTNAME=host1 _TRANSPORT=stdout _SYSTEMD_SLICE=system.slice _COMM=java _EXE=/usr/java/jdk1.6.0_35/bin/java SYSLOG_IDENTIFIER=elasticsearch _SYSTEMD_CGROUP=/system.slice/elasticsearch.service _SYSTEMD_UNIT=elasticsearch.service _PID=10967 _CMDLINE=[444B blob data] MESSAGE=Exception in thread "elasticsearch[joe-onebox-f19][transport_server_worker] [T#3]" java.lang.OutOfMemoryError: Java heap space
  33. journald - journalctl(1) `--since` argument simplifies log check utils, no

    need to keep state of log position between runs check-journal.rb (sensu/nagios) check https://github.com/sensu/sensu-community-plugins/blob/master/plugins/logging/check-journal.rb $ check-journal.rb --journalctl_args='-u elasticsearch.service' -q Error \ --since=-5minutes CheckJournal CRITICAL: 20 matches found for ‘Error’ in (threshold 1)
  34. systemd-cat(1) similar to logger(1) $ crontab -l 0 1 *

    * * /bin/sh /usr/bin/logstash_optimize_daily.sh | \ systemd-cat -t logstash_optimize_daily ### capture both stdout and stderr: $ systemd-cat -t my_script /bin/do_something.sh
  35. journald - rich logging adding key=vals to logs requires client

    library support official support: C/C++, python third party: go, node.js, ruby, ...
  36. journald + logstash + kibana better together

  37. journald + logstash + kibana

  38. journald + logstash + kibana getting logs from journald ->

    logstash: journal2gelf - https://github. com/systemd/journal2gelf beaver (WIP) - https://github.com/joemiller/beaver (my fork, not upstream yet)
  39. journald caveat the journal is not yet very performant, high

    cpu usage with high traffic not currently a good choice for high throughput streams such as a busy web server access logs
  40. thank you q & a additional systemd reading all of

    the man pages, especially: systemd.service(5), systemd.exec(5), systemd.unit(5), systemd.resource- control(5) http://0pointer.de/blog/projects/socket-activated-containers.html http://dynacont.net/documentation/linux/Useful_SystemD_commands/