Slide 1

Slide 1 text

1 Don't reboot, debug! A medic first aid course in debugging your server Joshua Thijssen @JayTaph

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4 It's not

Slide 5

Slide 5 text

4 It is It's not

Slide 6

Slide 6 text

Have you tried turning it off and on again? 5

Slide 7

Slide 7 text

Find the culprit 6

Slide 8

Slide 8 text

Bottleneck Troubleshooting Flowchart (BTF) 7

Slide 9

Slide 9 text

Site is slow or not responding. It’s your DB Bottleneck Troubleshooting Flowchart (BTF) 7

Slide 10

Slide 10 text

8 Other causes:

Slide 11

Slide 11 text

➡ Apache / PHP / nginx/php-fpm 8 Other causes:

Slide 12

Slide 12 text

➡ Apache / PHP / nginx/php-fpm ➡ Monitoring / backup 8 Other causes:

Slide 13

Slide 13 text

➡ Apache / PHP / nginx/php-fpm ➡ Monitoring / backup ➡ Hanging cron jobs & runaway tools 8 Other causes:

Slide 14

Slide 14 text

➡ Apache / PHP / nginx/php-fpm ➡ Monitoring / backup ➡ Hanging cron jobs & runaway tools ➡ Connectivity / DNS problems 8 Other causes:

Slide 15

Slide 15 text

9 Linux 101

Slide 16

Slide 16 text

9 Linux 101 201

Slide 17

Slide 17 text

10 Processes

Slide 18

Slide 18 text

11

Slide 19

Slide 19 text

11 ➡ Isolated user space.

Slide 20

Slide 20 text

11 ➡ Isolated user space. ➡ PID (process id) and state.

Slide 21

Slide 21 text

11 ➡ Isolated user space. ➡ PID (process id) and state. ➡ Kernel “preempts”, or process yields.

Slide 22

Slide 22 text

11 ➡ Isolated user space. ➡ PID (process id) and state. ➡ Kernel “preempts”, or process yields. ➡ Multitasking.

Slide 23

Slide 23 text

12

Slide 24

Slide 24 text

12 ➡ R Running or runnable

Slide 25

Slide 25 text

12 ➡ R Running or runnable ➡ S Interruptible sleep

Slide 26

Slide 26 text

12 ➡ R Running or runnable ➡ S Interruptible sleep ➡ D Uninterruptible sleep

Slide 27

Slide 27 text

12 ➡ R Running or runnable ➡ S Interruptible sleep ➡ D Uninterruptible sleep ➡ Z Defunct process (zombies)

Slide 28

Slide 28 text

13

Slide 29

Slide 29 text

14

Slide 30

Slide 30 text

14 ➡ Most processes are sleeping.

Slide 31

Slide 31 text

14 ➡ Most processes are sleeping. ➡ External processes (and the kernel) can “wake up” a process at any time by sending “signals”.

Slide 32

Slide 32 text

14 ➡ Most processes are sleeping. ➡ External processes (and the kernel) can “wake up” a process at any time by sending “signals”. ➡ Fire signals with “kill”.

Slide 33

Slide 33 text

15

Slide 34

Slide 34 text

15 ➡ Uninterruptible means it won’t handle signals (directly), but waits on its task to finish (it must wake up by itself).

Slide 35

Slide 35 text

15 ➡ Uninterruptible means it won’t handle signals (directly), but waits on its task to finish (it must wake up by itself). ➡ Used for high-performance loops that needs to focus (like I/O).

Slide 36

Slide 36 text

15 ➡ Uninterruptible means it won’t handle signals (directly), but waits on its task to finish (it must wake up by itself). ➡ Used for high-performance loops that needs to focus (like I/O). ➡ Still can be preempted by the kernel.

Slide 37

Slide 37 text

16

Slide 38

Slide 38 text

16 ➡ Zombies aren’t bad.

Slide 39

Slide 39 text

16 ➡ Zombies aren’t bad. ➡ It’s just bad programming or administration that creates zombies.

Slide 40

Slide 40 text

16 ➡ Zombies aren’t bad. ➡ It’s just bad programming or administration that creates zombies. ➡ But there shouldn’t be many.

Slide 41

Slide 41 text

17 Load average

Slide 42

Slide 42 text

18

Slide 43

Slide 43 text

18 ➡ 1 minute, 5 minutes, 15 minutes averages

Slide 44

Slide 44 text

18 ➡ 1 minute, 5 minutes, 15 minutes averages ➡ Calculated as the number of runnable processes (but has more sources nowadays).

Slide 45

Slide 45 text

18 ➡ 1 minute, 5 minutes, 15 minutes averages ➡ Calculated as the number of runnable processes (but has more sources nowadays). ➡ Depends on number of CPU’s!

Slide 46

Slide 46 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27

Slide 47

Slide 47 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable processes in the last minute.

Slide 48

Slide 48 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable processes in the last minute. ➡ 0.66 average in 5 minutes

Slide 49

Slide 49 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes.

Slide 50

Slide 50 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes. ➡ Single CPU: 52% more than it can handle.

Slide 51

Slide 51 text

19 14:57:22 up 35 days, 18:57, 1 user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes. ➡ Single CPU: 52% more than it can handle. ➡ Quad core system: not doing very much

Slide 52

Slide 52 text

20 Memory

Slide 53

Slide 53 text

21 Q: How much memory does this process use? This is REALLY hard question to answer! It depends on many factors!

Slide 54

Slide 54 text

22

Slide 55

Slide 55 text

22

Slide 56

Slide 56 text

23 ➡ Virtual memory (VIRT) ➡ Shared memory (SHR SHRD) ➡ Resident memory (RES or RSS) ➡ Swapped memory (SWP, SWAP)

Slide 57

Slide 57 text

24 (on a 32bit system)

Slide 58

Slide 58 text

24 ➡ Each process has 4GB memory space usable. (on a 32bit system)

Slide 59

Slide 59 text

24 ➡ Each process has 4GB memory space usable. ➡ Even if you have less memory installed. (on a 32bit system)

Slide 60

Slide 60 text

24 ➡ Each process has 4GB memory space usable. ➡ Even if you have less memory installed. ➡ 1GB is reserved for kernel. (on a 32bit system)

Slide 61

Slide 61 text

25 0x00000000 0xC0000000 0xFFFFFFFF 1 GB 3 GB Virtual memory

Slide 62

Slide 62 text

25 0x00000000 0xC0000000 0xFFFFFFFF 1 GB 3 GB Virtual memory Translation table

Slide 63

Slide 63 text

25 0x00000000 0xC0000000 0xFFFFFFFF 1 GB 3 GB Virtual memory Translation table Physical memory

Slide 64

Slide 64 text

26 Process A Process B Process C Physical Memory

Slide 65

Slide 65 text

26 Process A Process B Process C Physical Memory & & &

Slide 66

Slide 66 text

26 Process A Process B Process C Physical Memory & & &

Slide 67

Slide 67 text

26 Process A Process B Process C Physical Memory & & &

Slide 68

Slide 68 text

26 Process A Process B Process C Physical Memory & & &

Slide 69

Slide 69 text

➡ New phone book entries are created. ➡ VIRT will increase. ➡ Allocating memory != using memory. 27 Allocating memory

Slide 70

Slide 70 text

28

Slide 71

Slide 71 text

Slide 72

Slide 72 text

30 Process A

Slide 73

Slide 73 text

30 Process A fork()

Slide 74

Slide 74 text

30 Process A Process B fork()

Slide 75

Slide 75 text

31 C1 B1 A1 C1` B1` A1` A1 B1 C1 Physical Virtual Virtual fork() =>

Slide 76

Slide 76 text

32 C1 B1 A1 C1` B2 A1` A1 B1 C1 Physical Virtual Virtual fork() => B2

Slide 77

Slide 77 text

33

Slide 78

Slide 78 text

34 How much memory is our server using?

Slide 79

Slide 79 text

$ free -m total used free shared buffers cached Mem: 3963 3500 462 0 722 1263 -/+ buffers/cache: 1515 2448 Swap: 400 20 379 35

Slide 80

Slide 80 text

$ free -m total used free shared buffers cached Mem: 3963 3500 462 0 722 1263 -/+ buffers/cache: 1515 2448 Swap: 400 20 379 35

Slide 81

Slide 81 text

36 Monitoring

Slide 82

Slide 82 text

37 ➡ Monitor everything! ➡ System / infrastructure ➡ Application level

Slide 83

Slide 83 text

38

Slide 84

Slide 84 text

39

Slide 85

Slide 85 text

40 ➡ With monitoring you have an excellent idea: ➡ what is happening ➡ what happened ➡ what will likely be happening

Slide 86

Slide 86 text

41 Logging Logging

Slide 87

Slide 87 text

42 ➡ Log everything from everywhere. ➡ filter later.

Slide 88

Slide 88 text

43 ➡ syslog ➡ files ➡ mail ➡ slack / hipchat /irc ➡ logstash $ php composer.phar require monolog/monolog

Slide 89

Slide 89 text

44 System tools

Slide 90

Slide 90 text

45

Slide 91

Slide 91 text

TAIL 46

Slide 92

Slide 92 text

47 ➡ Most daemons will log into /var/log ➡ tail -f /var/log/messages

Slide 93

Slide 93 text

48 strace

Slide 94

Slide 94 text

49 ➡ strace displays system calls and signals ➡ Communication between applications and the kernel.

Slide 95

Slide 95 text

50 $ strace -ff -p .... socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 20 fcntl(20, F_GETFL) = 0x2 (flags O_RDWR) fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(20, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=20, events=POLLOUT}], 1, -1) = 1 ([{fd=20, revents=POLLOUT}]) write(20, "get ez_client1/acls/"..., 44) = 44 read(20, "END\r\n", 8196) = 5 write(20, "get ez_client1/acl/g"..., 40) = 40 read(20, "END\r\n", 8196) = 5 write(20, "quit\r\n", 6) = 6 shutdown(20, 2 /* send and receive */) = 0 close(20) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/client1/user/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/user/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/theme/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/theme/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/client1/user/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/user/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/theme/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/client1/theme/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists)

Slide 96

Slide 96 text

51 $ strace ping www.google.com .... mprotect(0xb757f000, 4096, PROT_READ) = 0 munmap(0xb76d8000, 44104) = 0 stat64("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = 0 gettimeofday({1347446161, 382120}, NULL) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) send(3, "u\205\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL) = 32 poll([{fd=3, events=POLLIN}], 1, 5000

Slide 97

Slide 97 text

➡ strace -e trace=open ➡ strace -ff -p 52

Slide 98

Slide 98 text

53 System tap / dtrace System TAP (dtrace)

Slide 99

Slide 99 text

54 ➡ Unobtrusive probes inside the kernel ➡ Scripts written in D language. ➡ SUN / Solaris only (licensing)

Slide 100

Slide 100 text

55 ➡ SystemTAP ➡ “GPL” version of dtrace ➡ Awesome, but complex ➡ But you need / want debug info packages

Slide 101

Slide 101 text

probe syscall.open { printf(“%s(%d) open (%s)\n”, execname(), pid(), argstr); } 56 stap syscall.stp

Slide 102

Slide 102 text

57 ➡ There are some “providers” in the PHP core (zend_dtrace.{c,h,d}) ➡ file / line ➡ function entry / exit ➡ exception caught / thrown

Slide 103

Slide 103 text

Other tools ➡ Valgrind ➡ GDB ➡ XDebug / Profiler ➡ MySQL Proxy / Charles 58

Slide 104

Slide 104 text

59

Slide 105

Slide 105 text

Find me on twitter: @jaytaph Find me for development and training: www.noxlogic.nl Find me on email: jthijssen@noxlogic.nl Find me for blogs: www.adayinthelifeof.nl Thank You! https://joind.in/talk/view/15191