Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Don't reboot, debug!

Joshua Thijssen
September 18, 2015
82

Don't reboot, debug!

Joshua Thijssen

September 18, 2015
Tweet

Transcript

  1. 1
    Don't reboot, debug!
    A medic first aid course in debugging your server
    Joshua Thijssen
    @JayTaph

    View full-size slide

  2. 4
    It is
    It's not

    View full-size slide

  3. Have you tried turning it off and on again?
    5

    View full-size slide

  4. Find the culprit
    6

    View full-size slide

  5. Bottleneck Troubleshooting Flowchart
    (BTF)
    7

    View full-size slide

  6. Site is slow or
    not responding.
    It’s your DB
    Bottleneck Troubleshooting Flowchart
    (BTF)
    7

    View full-size slide

  7. 8
    Other causes:

    View full-size slide

  8. ➡ Apache / PHP / nginx/php-fpm
    8
    Other causes:

    View full-size slide

  9. ➡ Apache / PHP / nginx/php-fpm
    ➡ Monitoring / backup
    8
    Other causes:

    View full-size slide

  10. ➡ Apache / PHP / nginx/php-fpm
    ➡ Monitoring / backup
    ➡ Hanging cron jobs & runaway tools
    8
    Other causes:

    View full-size slide

  11. ➡ Apache / PHP / nginx/php-fpm
    ➡ Monitoring / backup
    ➡ Hanging cron jobs & runaway tools
    ➡ Connectivity / DNS problems
    8
    Other causes:

    View full-size slide

  12. 9
    Linux 101
    201

    View full-size slide

  13. 11
    ➡ Isolated user space.

    View full-size slide

  14. 11
    ➡ Isolated user space.
    ➡ PID (process id) and state.

    View full-size slide

  15. 11
    ➡ Isolated user space.
    ➡ PID (process id) and state.
    ➡ Kernel “preempts”, or process yields.

    View full-size slide

  16. 11
    ➡ Isolated user space.
    ➡ PID (process id) and state.
    ➡ Kernel “preempts”, or process yields.
    ➡ Multitasking.

    View full-size slide

  17. 12
    ➡ R Running or runnable

    View full-size slide

  18. 12
    ➡ R Running or runnable
    ➡ S Interruptible sleep

    View full-size slide

  19. 12
    ➡ R Running or runnable
    ➡ S Interruptible sleep
    ➡ D Uninterruptible sleep

    View full-size slide

  20. 12
    ➡ R Running or runnable
    ➡ S Interruptible sleep
    ➡ D Uninterruptible sleep
    ➡ Z Defunct process (zombies)

    View full-size slide

  21. 14
    ➡ Most processes are sleeping.

    View full-size slide

  22. 14
    ➡ Most processes are sleeping.
    ➡ External processes (and the kernel) can
    “wake up” a process at any time by
    sending “signals”.

    View full-size slide

  23. 14
    ➡ Most processes are sleeping.
    ➡ External processes (and the kernel) can
    “wake up” a process at any time by
    sending “signals”.
    ➡ Fire signals with “kill”.

    View full-size slide

  24. 15
    ➡ Uninterruptible means it won’t handle
    signals (directly), but waits on its task to
    finish (it must wake up by itself).

    View full-size slide

  25. 15
    ➡ Uninterruptible means it won’t handle
    signals (directly), but waits on its task to
    finish (it must wake up by itself).
    ➡ Used for high-performance loops that
    needs to focus (like I/O).

    View full-size slide

  26. 15
    ➡ Uninterruptible means it won’t handle
    signals (directly), but waits on its task to
    finish (it must wake up by itself).
    ➡ Used for high-performance loops that
    needs to focus (like I/O).
    ➡ Still can be preempted by the kernel.

    View full-size slide

  27. 16
    ➡ Zombies aren’t bad.

    View full-size slide

  28. 16
    ➡ Zombies aren’t bad.
    ➡ It’s just bad programming or
    administration that creates
    zombies.

    View full-size slide

  29. 16
    ➡ Zombies aren’t bad.
    ➡ It’s just bad programming or
    administration that creates
    zombies.
    ➡ But there shouldn’t be many.

    View full-size slide

  30. 17
    Load average

    View full-size slide

  31. 18
    ➡ 1 minute, 5 minutes, 15 minutes averages

    View full-size slide

  32. 18
    ➡ 1 minute, 5 minutes, 15 minutes averages
    ➡ Calculated as the number of runnable
    processes (but has more sources
    nowadays).

    View full-size slide

  33. 18
    ➡ 1 minute, 5 minutes, 15 minutes averages
    ➡ Calculated as the number of runnable
    processes (but has more sources
    nowadays).
    ➡ Depends on number of CPU’s!

    View full-size slide

  34. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27

    View full-size slide

  35. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27
    ➡ 1.52 average runnable processes
    in the last minute.

    View full-size slide

  36. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27
    ➡ 1.52 average runnable processes
    in the last minute.
    ➡ 0.66 average in 5 minutes

    View full-size slide

  37. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27
    ➡ 1.52 average runnable processes
    in the last minute.
    ➡ 0.66 average in 5 minutes
    ➡ 0.27 average in 15 minutes.

    View full-size slide

  38. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27
    ➡ 1.52 average runnable processes
    in the last minute.
    ➡ 0.66 average in 5 minutes
    ➡ 0.27 average in 15 minutes.
    ➡ Single CPU: 52% more than it can handle.

    View full-size slide

  39. 19
    14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27
    ➡ 1.52 average runnable processes
    in the last minute.
    ➡ 0.66 average in 5 minutes
    ➡ 0.27 average in 15 minutes.
    ➡ Single CPU: 52% more than it can handle.
    ➡ Quad core system: not doing very much

    View full-size slide

  40. 21
    Q: How much memory does this process use?
    This is REALLY hard question to answer!
    It depends on many factors!

    View full-size slide

  41. 23
    ➡ Virtual memory (VIRT)
    ➡ Shared memory (SHR SHRD)
    ➡ Resident memory (RES or RSS)
    ➡ Swapped memory (SWP, SWAP)

    View full-size slide

  42. 24
    (on a 32bit system)

    View full-size slide

  43. 24
    ➡ Each process has 4GB memory space
    usable.
    (on a 32bit system)

    View full-size slide

  44. 24
    ➡ Each process has 4GB memory space
    usable.
    ➡ Even if you have less memory installed.
    (on a 32bit system)

    View full-size slide

  45. 24
    ➡ Each process has 4GB memory space
    usable.
    ➡ Even if you have less memory installed.
    ➡ 1GB is reserved for kernel.
    (on a 32bit system)

    View full-size slide

  46. 25
    0x00000000
    0xC0000000
    0xFFFFFFFF
    1 GB
    3 GB
    Virtual memory

    View full-size slide

  47. 25
    0x00000000
    0xC0000000
    0xFFFFFFFF
    1 GB
    3 GB
    Virtual memory
    Translation table

    View full-size slide

  48. 25
    0x00000000
    0xC0000000
    0xFFFFFFFF
    1 GB
    3 GB
    Virtual memory
    Translation table
    Physical memory

    View full-size slide

  49. 26
    Process A Process B Process C Physical
    Memory

    View full-size slide

  50. 26
    Process A Process B Process C Physical
    Memory
    & & &

    View full-size slide

  51. 26
    Process A Process B Process C Physical
    Memory
    & & &

    View full-size slide

  52. 26
    Process A Process B Process C Physical
    Memory
    & & &

    View full-size slide

  53. 26
    Process A Process B Process C Physical
    Memory
    & & &

    View full-size slide

  54. ➡ New phone book entries are created.
    ➡ VIRT will increase.
    ➡ Allocating memory != using memory.
    27
    Allocating memory

    View full-size slide

  55. $pid = pcntl_fork();
    if ($pid) {
    echo "Hello, this is the parent process\n";
    } else {
    echo "Hello, this is the child process\n";
    }
    29

    View full-size slide

  56. 30
    Process A
    fork()

    View full-size slide

  57. 30
    Process A Process B
    fork()

    View full-size slide

  58. 31
    C1
    B1
    A1
    C1`
    B1`
    A1`
    A1
    B1
    C1
    Physical
    Virtual
    Virtual
    fork() =>

    View full-size slide

  59. 32
    C1
    B1
    A1
    C1`
    B2
    A1`
    A1
    B1
    C1
    Physical
    Virtual
    Virtual
    fork() =>
    B2

    View full-size slide

  60. 34
    How much memory is
    our server using?

    View full-size slide

  61. $ free -m
    total used free shared buffers cached
    Mem: 3963 3500 462 0 722 1263
    -/+ buffers/cache: 1515 2448
    Swap: 400 20 379
    35

    View full-size slide

  62. $ free -m
    total used free shared buffers cached
    Mem: 3963 3500 462 0 722 1263
    -/+ buffers/cache: 1515 2448
    Swap: 400 20 379
    35

    View full-size slide

  63. 36
    Monitoring

    View full-size slide

  64. 37
    ➡ Monitor everything!
    ➡ System / infrastructure
    ➡ Application level

    View full-size slide

  65. 40
    ➡ With monitoring you have an excellent
    idea:
    ➡ what is happening
    ➡ what happened
    ➡ what will likely be happening

    View full-size slide

  66. 41
    Logging
    Logging

    View full-size slide

  67. 42
    ➡ Log everything from everywhere.
    ➡ filter later.

    View full-size slide

  68. 43
    ➡ syslog
    ➡ files
    ➡ mail
    ➡ slack / hipchat /irc
    ➡ logstash
    $ php composer.phar require monolog/monolog

    View full-size slide

  69. 44
    System tools

    View full-size slide

  70. 47
    ➡ Most daemons will log into /var/log
    ➡ tail -f /var/log/messages

    View full-size slide

  71. 49
    ➡ strace displays system calls and signals
    ➡ Communication between applications
    and the kernel.

    View full-size slide

  72. 50
    $ strace -ff -p
    ....
    socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 20
    fcntl(20, F_GETFL) = 0x2 (flags O_RDWR)
    fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    connect(20, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS
    (Operation now in progress)
    poll([{fd=20, events=POLLOUT}], 1, -1) = 1 ([{fd=20, revents=POLLOUT}])
    write(20, "get ez_client1/acls/"..., 44) = 44
    read(20, "END\r\n", 8196) = 5
    write(20, "get ez_client1/acl/g"..., 40) = 40
    read(20, "END\r\n", 8196) = 5
    write(20, "quit\r\n", 6) = 6
    shutdown(20, 2 /* send and receive */) = 0
    close(20) = 0
    mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists)
    chmod("/tmp/smarty", 0777) = 0
    mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists)
    chmod("/tmp/smarty", 0777) = 0
    access("/userdata/client1/user/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/user/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/theme/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/theme/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file
    or directory)
    access("/etc/noxlogic/root/themes/ezshopping/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or
    directory)
    mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists)
    chmod("/tmp/smarty", 0777) = 0
    access("/userdata/client1/user/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/user/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/theme/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/userdata/client1/theme/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory)
    access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or
    directory)
    access("/etc/noxlogic/root/themes/ezshopping/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or
    directory)
    mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists)

    View full-size slide

  73. 51
    $ strace ping www.google.com
    ....
    mprotect(0xb757f000, 4096, PROT_READ) = 0
    munmap(0xb76d8000, 44104) = 0
    stat64("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0
    socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3
    connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = 0
    gettimeofday({1347446161, 382120}, NULL) = 0
    poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
    send(3, "u\205\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL) = 32
    poll([{fd=3, events=POLLIN}], 1, 5000

    View full-size slide

  74. ➡ strace -e trace=open
    ➡ strace -ff -p
    52

    View full-size slide

  75. 53
    System tap / dtrace
    System TAP (dtrace)

    View full-size slide

  76. 54
    ➡ Unobtrusive probes inside the kernel
    ➡ Scripts written in D language.
    ➡ SUN / Solaris only (licensing)

    View full-size slide

  77. 55
    ➡ SystemTAP
    ➡ “GPL” version of dtrace
    ➡ Awesome, but complex
    ➡ But you need / want debug info packages

    View full-size slide

  78. probe syscall.open {
    printf(“%s(%d) open (%s)\n”, execname(), pid(), argstr);
    }
    56
    stap syscall.stp

    View full-size slide

  79. 57
    ➡ There are some “providers” in the PHP
    core (zend_dtrace.{c,h,d})
    ➡ file / line
    ➡ function entry / exit
    ➡ exception caught / thrown

    View full-size slide

  80. Other tools
    ➡ Valgrind
    ➡ GDB
    ➡ XDebug / Profiler
    ➡ MySQL Proxy / Charles
    58

    View full-size slide

  81. Find me on twitter: @jaytaph
    Find me for development and training: www.noxlogic.nl
    Find me on email: [email protected]
    Find me for blogs: www.adayinthelifeof.nl
    Thank You!
    https://joind.in/talk/view/15191

    View full-size slide