Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Don't reboot, debug! - PHPNW12

Don't reboot, debug! - PHPNW12

1761ecd7fe763583553dde43e62c47bd?s=128

Joshua Thijssen

October 07, 2012
Tweet

Transcript

  1. http://www.techademy.nl http://joind.in/xxxx Techademy Workshop - dd-mmm-YYYY Don’t reboot! Joshua Thijssen

    debug!
  2. Joshua Thijssen / The Netherlands Freelance consultant, developer and trainer

    @ NoxLogic / TechAdemy Development in PHP, Python, Perl, C, Java.. also sysadmin. Lead developer of Saffire Blog: http://adayinthelifeof.nl Email: jthijssen@noxlogic.nl Twitter: @jaytaph 2 .whoami
  3. 3 The most question you can ask: wrong incorrect irritating

    annoying stupendous evil improper unethical immoral unjust wicked inaccurate
  4. Title Text Have you tried turning it off and on

    again? 4
  5. 5

  6. Title Text 6

  7. Title Text 7

  8. Title Text 8

  9. Title Text 9

  10. Have you tried turning it off and on again?

  11. Title Text 11

  12. Title Text 12

  13. Title Text 13 Fix it! Every minute we’re losing money!

  14. Title Text 14

  15. Title Text 15 Deal now or deal later?

  16. We will deal with the problem later! 16 ➡ Is

    it reproducible later? Probably not. ➡ Are you solving the problem, or desperately trying to remove a symptom? ➡ Short term relieve vs long term solution
  17. Deal with the problem now ➡ Actually analyze, maybe fix

    the problem. ➡ It will cost less to analyze/fix it now, than to fix it later. ➡ You just saved a few gazillion dollars! 17
  18. None
  19. 19 ➡ We reboot our system every night! ➡ Why?

    Memory leaks? Just crappy code? ➡ There is some state not handled correctly! Fix it! ➡ What happens when # users increase with 200%? Restart every 12 hours? ➡ Here’s hoping not never getting many visitors!
  20. Find the culprit

  21. Site is slow or not responding. It’s your DB Bottleneck

    Troubleshooting Flowchart (BTF)
  22. Title Text 22 MySQL

  23. Title Text ➡ We use MySQL because it’s so easy

    to setup and use. ➡ No, it’s not... 23 MySQL is easy to setup and configure
  24. my.cnf 24 max_heap_table_size = 16M tmp_table_size = 32M

  25. Other usual suspects ➡ Apache / PHP ➡ Monitoring /

    backup ➡ Hanging cron jobs ➡ Runaway tools ➡ Connectivity / DNS problems 25
  26. 26 Linux 101

  27. 27 Processes

  28. Processes 28 ➡ Isolated userspace ➡ PID and state. ➡

    Kernel “preempts”, or process yields. ➡ Multitasking
  29. 29

  30. Process states 30 ➡ R Running or runnable ➡ S

    Interruptible sleep ➡ D Uninterruptible sleep ➡ T Stopped ➡ Z Defunct process (zombies)
  31. Process states 31 ➡ Most processes are sleeping. ➡ External

    processes (and the kernel) can “wake up” a process at any time. ➡ fire signals with “kill”
  32. Process states 32 ➡ Uninterruptible means it won’t handle signals

    (directly). ➡ Used for high-performance loops that needs to focus (like I/O). ➡ Still can be preempted by the scheduler!
  33. 33 ➡ Zombies aren’t bad. ➡ It’s just bad programming

    or administration that creates zombies. ➡ They will not eat brains (at least not much). ➡ But there shouldn’t be many. Zombies
  34. 34 Load average

  35. Load average 35 ➡ 1 minute, 5 minutes, 15 minutes

    averages ➡ Calculated as the number of runnable processes. ➡ Depends on number of CPU’s! ➡ Linux also adds uninterruptible sleeps
  36. Load average (./uptime) 36 14:57:22 up 35 days, 18:57, 1

    user, load average: 1.52, 0.66, 0.27 ➡ 1.52 average runnable (or blocking) processes in the last minute. ➡ 0.66 average in 5 minutes ➡ 0.27 average in 15 minutes. ➡ Single CPU: 52% more than it can handle. ➡ quad core system: not doing very much
  37. 37 Memory

  38. Memory 38 Q: How much memory does this process use?

    This is REALLY hard question to answer! It depends on many factors!
  39. Memory 39 ➡ 4GB memory space, even if you have

    less memory installed ➡ Kernel can swap out memory ➡ CPU pagefaults and loads back pages
  40. Memory 40 ➡ Process can allocate memory, but does not

    necessary use it (for instance: preallocation) ➡ VIRT will increase!
  41. Memory (as seen in “top”) 41 ➡ Virtual memory ➡

    Resident memory ➡ Shared memory ➡ Swapped memory
  42. Memory 42 Q: How much free memory does this system

    have? This is an easier, but still hard question to answer!
  43. $ free -m 43 total used free shared buffers cached

    Mem: 375 349 25 0 111 94 -/+ buffers/cache: 143 231 Swap: 400 7 392
  44. 44 Monitoring

  45. 45 ➡ Monitor everything ➡ System / infra monitoring ➡

    Application monitoring
  46. 46

  47. 47

  48. 48 Logging Logging

  49. 49 ➡ Log EVERYTHING from all sources. ➡ Filter later.

  50. 50 Logstash Graylog2 wtf

  51. 51 System tools

  52. TAIL 52

  53. 53 ➡ Most daemons will log into /var/log/* ➡ tail

    -f /var/log/messages ➡ Many times, this is ALL you need!
  54. 54 ➡ Know your tools (top, htop, vmstat, iostat, ps)

    ➡ Know the /proc filesystem ➡ sniff around with tcpdump, netstat, nc etc... ➡ man <keyword>
  55. 55 strace

  56. 56 ➡ strace displays system calls and signals ➡ Communication

    between applications and the kernel.
  57. 57 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 20 fcntl(20, F_GETFL) = 0x2

    (flags O_RDWR) fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(20, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=20, events=POLLOUT}], 1, -1) = 1 ([{fd=20, revents=POLLOUT}]) write(20, "get ez_borentappenschuren-/acls/"..., 44) = 44 read(20, "END\r\n", 8196) = 5 write(20, "get ez_borentappenschuren-/acl/g"..., 40) = 40 read(20, "END\r\n", 8196) = 5 write(20, "quit\r\n", 6) = 6 shutdown(20, 2 /* send and receive */) = 0 close(20) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right-last.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) chmod("/tmp/smarty", 0777) = 0 access("/userdata/borentappenschuren/user/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/user/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/userdata/borentappenschuren/theme/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/nl/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) access("/etc/noxlogic/root/themes/ezshopping/templates/block-right.tpl", F_OK) = -1 ENOENT (No such file or directory) mkdir("/tmp/smarty", 0777) = -1 EEXIST (File exists) strace -ff -p <apache_pid>
  58. 58 mprotect(0xb757f000, 4096, PROT_READ) = 0 munmap(0xb76d8000, 44104) = 0

    stat64("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=59, ...}) = 0 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = 0 gettimeofday({1347446161, 382120}, NULL) = 0 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) send(3, "u\205\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1", 32, MSG_NOSIGNAL) = 32 poll([{fd=3, events=POLLIN}], 1, 5000 strace ping www.google.com
  59. ➡ strace -e trace=open ➡ strace -ff -p <pid> 59

  60. 60 ltrace

  61. 61 ➡ traces library calls ➡ Grep and filter. ➡

    careful: ltrace php -r 'echo "hello world";' outputs 92476 lines!
  62. 62 System tap / dtrace System TAP (dtrace)

  63. Dtrace 63 ➡ Unobtrusive probes inside the kernel ➡ Scripts

    written in D language. ➡ SUN / Solaris only (licensing)
  64. systemtap 64 ➡ “GPL” version of dtrace ➡ Awesome, but

    complex ➡ But you need / want debug info packages
  65. probe syscall.open { printf(“%s(%d) open (%s)\n”, execname(), pid(), argstr); }

    systemtap example 65 stap syscall.stp
  66. systemtap / dtrace 66 ➡ There are some “providers” in

    the PHP core (zend_dtrace.{c,h,d})
  67. 67 ➡ Valgrind ➡ GDB ➡ XDebug / profiler ➡

    MySQL proxy Other really cool tools to look at
  68. 68 Think about your app / infra BEFORE going live...

  69. 69 ➡ Design for (vertical) scalability. ➡ Remove SPOFs. ➡

    Horizontal scalability is easier, but more restrictive. ➡ Configuration is key. ➡ Don’t run on full capacity. Have a contingency buffer for peaks.
  70. Make a plan

  71. 71 Recap Conclusion

  72. None
  73. 73 ➡ Don’t reboot, debug! ➡ Analyze what’s going on,

    ➡ and find and isolate the culprit. ➡ Threat the problem, not the symptoms.
  74. 74 ➡ There are many tools out there to analyze

    your system realtime. ➡ Know your running environment (even it’s “not your business”). ➡ Ask 3rd party help if needed.
  75. 75 ➡ One machine for one purpose (app / mail

    / cron / db / etc). ➡ Virtual machines are easy to setup and maintain (puppet) and are cheap. ➡ Try to async as much as possible. ➡ Message queues are easy to implement (gearman / *MQ etc).
  76. Questions? 76

  77. 77 Find me on twitter: @jaytaph Find me for development

    and training: www.noxlogic.nl Find me on email: jthijssen@noxlogic.nl Find me for blogs: www.adayinthelifeof.nl Thank You! https://joind.in/6939