Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dont' reboot, debug! 4developers edition

Avatar for Joshua Thijssen Joshua Thijssen
April 12, 2013
130

Dont' reboot, debug! 4developers edition

Avatar for Joshua Thijssen

Joshua Thijssen

April 12, 2013
Tweet

Transcript

  1. Joshua Thijssen Freelance consultant, developer and trainer @ NoxLogic Founder

    of the Dutch Web Alliance Development in PHP, Python, Perl, C, Java. Lead developer of Saffire. Blog: http://adayinthelifeof.nl Email: [email protected] Twitter: @jaytaph 2 .whoami
  2. 5

  3. We will deal with the problem later! 16 ➡ Is

    it reproducible later? Probably not. ➡ Are you solving the problem, or desperately trying to remove a symptom? ➡ Short term relieve vs long term solution
  4. Deal with the problem now ➡ Actually analyze, maybe fix

    the problem. ➡ It will cost less to analyze/fix it now, than to fix it later. ➡ You just saved a few gazillion dollars! 17
  5. 19 ➡ We reboot our system every night! ➡ Why?

    Memory leaks? Just crappy code? ➡ There is some state not handled correctly! Fix it! ➡ What happens when # users increase with 200%? Restart every 12 hours? ➡ Here’s hoping not never getting many visitors!
  6. Title Text ➡ We use MySQL because it’s so easy

    to setup and use. ➡ No, it’s not... 23 MySQL is easy to setup and configure
  7. Other usual suspects ➡ Apache / PHP ➡ Monitoring /

    backup ➡ Hanging cron jobs ➡ Runaway tools ➡ Connectivity / DNS problems 25
  8. Processes 28 ➡ Isolated userspace ➡ PID and state. ➡

    Kernel “preempts”, or process yields. ➡ Multitasking
  9. 29

  10. Process states 30 ➡ R Running or runnable ➡ S

    Interruptible sleep ➡ D Uninterruptible sleep ➡ T Stopped ➡ Z Defunct process (zombies)
  11. Process states 31 ➡ Most processes are sleeping. ➡ External

    processes (and the kernel) can “wake up” a process at any time. ➡ fire signals with “kill”
  12. Process states 32 ➡ Uninterruptible means it won’t handle signals

    (directly). ➡ Used for high-performance loops that needs to focus (like I/O). ➡ Still can be preempted by the scheduler!
  13. 33 ➡ Zombies aren’t bad. ➡ It’s just bad programming

    or administration that creates zombies. ➡ They will not eat brains (at least not much). ➡ But there shouldn’t be many. Zombies
  14. Load average 35 ➡ 1 minute, 5 minutes, 15 minutes

    averages ➡ Calculated as the number of runnable processes. ➡ Linux also adds uninterruptible sleeps ➡ Depends on number of CPU’s!
  15. Load average (./uptime) 36 14:57:22 up 35 days, 18:57, 1

    user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable (or blocking) processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes. ➡ Single CPU: 52% more than it can handle. ➡ quad core system: not doing very much
  16. Memory 38 Q: How much memory does this process use?

    This is REALLY hard question to answer! It depends on many factors!
  17. Memory 39 ➡ 4GB memory space, even if you have

    less memory installed ➡ Kernel can swap out memory ➡ CPU pagefaults and loads back pages
  18. Memory 40 ➡ Process can allocate memory, but does not

    necessary use it (for instance: preallocation) ➡ VIRT will increase!
  19. Memory (as seen in “top”) 41 ➡ Virtual memory ➡

    Resident memory ➡ Shared memory ➡ Swapped memory
  20. Memory 42 Q: How much free memory does this system

    have? This is an easier, but still hard question to answer!
  21. $ free -m 43 total used free shared buffers cached

    Mem: 375 349 25 0 111 94 -/+ buffers/cache: 143 231 Swap: 400 7 392
  22. 46

  23. 47

  24. 53 ➡ Most daemons will log into /var/log/* ➡ tail

    -f /var/log/messages ➡ Many times, this is ALL you need!
  25. 54 ➡ Know your tools (top, htop, vmstat, iostat, ps)

    ➡ Know the /proc filesystem ➡ sniff around with tcpdump, netstat, nc etc... ➡ man <keyword>
  26. 57 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 20 fcntl(20, F_GETFL) = 0x2

    (flags O_RDWR) fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(20, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=20, events=POLLOUT}], 1, -1) = 1 ([{fd=20, revents=POLLOUT}]) write(20, "get ez_borentappenschuren-/acls/"..., 44) = 44 read(20, "END\r\n", 8196) = 5 write(20, "get ez_borentappenschuren-/acl/g"..., 40) = 40 read(20, "END\r\n", 8196) = 5 write(20, "quit\r\n", 6) = 6 shutdown(20, 2 /* send and receive */) = 0 close(20) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) strace -ff -p <apache_pid>
  27. 58 mprotect(0xb757f000, 4096, PROT_READ) = 0 munmap(0xb76d8000, 44104) = 0

    stat64("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = 0 gettimeofday({1347446161, 382120}, NULL) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) send(3, "u\205\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL) = 32 poll([{fd=3, events=POLLIN}], 1, 5000 strace ping www.google.com
  28. 61 ➡ traces library calls ➡ Grep and filter. ➡

    careful: ltrace php -r 'echo "hello world";' outputs 92476 lines!
  29. Dtrace 63 ➡ Unobtrusive probes inside the kernel ➡ Scripts

    written in D language. ➡ SUN / Solaris only (licensing)
  30. systemtap 64 ➡ “GPL” version of dtrace ➡ Awesome, but

    complex ➡ But you need / want debug info packages
  31. 67 ➡ Valgrind ➡ GDB ➡ XDebug / profiler ➡

    MySQL proxy Other really cool tools to look at
  32. 69 ➡ Design for (horizontal) scalability. ➡ Remove SPOFs. ➡

    Vertical scalability is easier, but more restrictive. ➡ Configuration is key. ➡ Don’t run on full capacity. Have a contingency buffer for peaks.
  33. 73 ➡ Don’t reboot, debug! ➡ Analyze what’s going on,

    ➡ and find and isolate the culprit. ➡ Threat the problem, not the symptoms.
  34. 74 ➡ There are many tools out there to analyze

    your system realtime. ➡ Know your running environment (even it’s “not your business”). ➡ Ask 3rd party help if needed.
  35. 75 ➡ One machine for one purpose (app / mail

    / cron / db / etc). ➡ Virtual machines are easy to setup and maintain (puppet) and are cheap. ➡ Try to async as much as possible. ➡ Message queues are easy to implement (gearman / *MQ etc).
  36. 77 Find me on twitter: @jaytaph Find me for development

    and training: www.noxlogic.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl Thank You!