Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dont' reboot, debug! 4developers edition

Joshua Thijssen
April 12, 2013
130

Dont' reboot, debug! 4developers edition

Joshua Thijssen

April 12, 2013
Tweet

Transcript

  1. Joshua Thijssen Freelance consultant, developer and trainer @ NoxLogic Founder

    of the Dutch Web Alliance Development in PHP, Python, Perl, C, Java. Lead developer of Saffire. Blog: http://adayinthelifeof.nl Email: [email protected] Twitter: @jaytaph 2 .whoami
  2. 5

  3. We will deal with the problem later! 16 ➡ Is

    it reproducible later? Probably not. ➡ Are you solving the problem, or desperately trying to remove a symptom? ➡ Short term relieve vs long term solution
  4. Deal with the problem now ➡ Actually analyze, maybe fix

    the problem. ➡ It will cost less to analyze/fix it now, than to fix it later. ➡ You just saved a few gazillion dollars! 17
  5. 19 ➡ We reboot our system every night! ➡ Why?

    Memory leaks? Just crappy code? ➡ There is some state not handled correctly! Fix it! ➡ What happens when # users increase with 200%? Restart every 12 hours? ➡ Here’s hoping not never getting many visitors!
  6. Title Text ➡ We use MySQL because it’s so easy

    to setup and use. ➡ No, it’s not... 23 MySQL is easy to setup and configure
  7. Other usual suspects ➡ Apache / PHP ➡ Monitoring /

    backup ➡ Hanging cron jobs ➡ Runaway tools ➡ Connectivity / DNS problems 25
  8. Processes 28 ➡ Isolated userspace ➡ PID and state. ➡

    Kernel “preempts”, or process yields. ➡ Multitasking
  9. 29

  10. Process states 30 ➡ R Running or runnable ➡ S

    Interruptible sleep ➡ D Uninterruptible sleep ➡ T Stopped ➡ Z Defunct process (zombies)
  11. Process states 31 ➡ Most processes are sleeping. ➡ External

    processes (and the kernel) can “wake up” a process at any time. ➡ fire signals with “kill”
  12. Process states 32 ➡ Uninterruptible means it won’t handle signals

    (directly). ➡ Used for high-performance loops that needs to focus (like I/O). ➡ Still can be preempted by the scheduler!
  13. 33 ➡ Zombies aren’t bad. ➡ It’s just bad programming

    or administration that creates zombies. ➡ They will not eat brains (at least not much). ➡ But there shouldn’t be many. Zombies
  14. Load average 35 ➡ 1 minute, 5 minutes, 15 minutes

    averages ➡ Calculated as the number of runnable processes. ➡ Linux also adds uninterruptible sleeps ➡ Depends on number of CPU’s!
  15. Load average (./uptime) 36 14:57:22 up 35 days, 18:57, 1

    user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable (or blocking) processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes. ➡ Single CPU: 52% more than it can handle. ➡ quad core system: not doing very much
  16. Memory 38 Q: How much memory does this process use?

    This is REALLY hard question to answer! It depends on many factors!
  17. Memory 39 ➡ 4GB memory space, even if you have

    less memory installed ➡ Kernel can swap out memory ➡ CPU pagefaults and loads back pages
  18. Memory 40 ➡ Process can allocate memory, but does not

    necessary use it (for instance: preallocation) ➡ VIRT will increase!
  19. Memory (as seen in “top”) 41 ➡ Virtual memory ➡

    Resident memory ➡ Shared memory ➡ Swapped memory
  20. Memory 42 Q: How much free memory does this system

    have? This is an easier, but still hard question to answer!
  21. $ free -m 43 total used free shared buffers cached

    Mem: 375 349 25 0 111 94 -/+ buffers/cache: 143 231 Swap: 400 7 392
  22. 46

  23. 47

  24. 53 ➡ Most daemons will log into /var/log/* ➡ tail

    -f /var/log/messages ➡ Many times, this is ALL you need!
  25. 54 ➡ Know your tools (top, htop, vmstat, iostat, ps)

    ➡ Know the /proc filesystem ➡ sniff around with tcpdump, netstat, nc etc... ➡ man <keyword>
  26. 57 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 20 fcntl(20, F_GETFL) = 0x2

    (flags O_RDWR) fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(20, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=20, events=POLLOUT}], 1, -1) = 1 ([{fd=20, revents=POLLOUT}]) write(20, "get ez_borentappenschuren-/acls/"..., 44) = 44 read(20, "END\r\n", 8196) = 5 write(20, "get ez_borentappenschuren-/acl/g"..., 40) = 40 read(20, "END\r\n", 8196) = 5 write(20, "quit\r\n", 6) = 6 shutdown(20, 2 /* send and receive */) = 0 close(20) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) strace -ff -p <apache_pid>
  27. 58 mprotect(0xb757f000, 4096, PROT_READ) = 0 munmap(0xb76d8000, 44104) = 0

    stat64("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = 0 gettimeofday({1347446161, 382120}, NULL) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) send(3, "u\205\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL) = 32 poll([{fd=3, events=POLLIN}], 1, 5000 strace ping www.google.com
  28. 61 ➡ traces library calls ➡ Grep and filter. ➡

    careful: ltrace php -r 'echo "hello world";' outputs 92476 lines!
  29. Dtrace 63 ➡ Unobtrusive probes inside the kernel ➡ Scripts

    written in D language. ➡ SUN / Solaris only (licensing)
  30. systemtap 64 ➡ “GPL” version of dtrace ➡ Awesome, but

    complex ➡ But you need / want debug info packages
  31. 67 ➡ Valgrind ➡ GDB ➡ XDebug / profiler ➡

    MySQL proxy Other really cool tools to look at
  32. 69 ➡ Design for (horizontal) scalability. ➡ Remove SPOFs. ➡

    Vertical scalability is easier, but more restrictive. ➡ Configuration is key. ➡ Don’t run on full capacity. Have a contingency buffer for peaks.
  33. 73 ➡ Don’t reboot, debug! ➡ Analyze what’s going on,

    ➡ and find and isolate the culprit. ➡ Threat the problem, not the symptoms.
  34. 74 ➡ There are many tools out there to analyze

    your system realtime. ➡ Know your running environment (even it’s “not your business”). ➡ Ask 3rd party help if needed.
  35. 75 ➡ One machine for one purpose (app / mail

    / cron / db / etc). ➡ Virtual machines are easy to setup and maintain (puppet) and are cheap. ➡ Try to async as much as possible. ➡ Message queues are easy to implement (gearman / *MQ etc).
  36. 77 Find me on twitter: @jaytaph Find me for development

    and training: www.noxlogic.nl Find me on email: [email protected] Find me for blogs: www.adayinthelifeof.nl Thank You!