Introspecting Erlang and Haskell

Introspecting Erlang and Haskell

Debugging must be simple.

743e95b3e0c9b48e5162d9248efa893b?s=128

Vladimir Kirillov

May 31, 2013
Tweet

Transcript

  1. Introspecting Erlang & Haskell @darkproger

  2. you don’t need to know erlang and haskell but need

    to know what are they about assumptions
  3. web-like tasks and workloads (juggling concurrent queues) assumptions

  4. production-like environments -> no blind code reloads -> no random

    restarts -> no profiling builds assumptions
  5. A software system

  6. A (single-node) software system

  7. None
  8. None
  9. net hardware kernel app HVM?

  10. HVM? net hardware kernel app

  11. kernel app

  12. http://www.makelinux.net/kernel_map/

  13. watching the (linux) kernel proc(5) netlink(7) sysfs

  14. watching the kernel vmstat(1) iostat(1) netstat(1) mpstat(1) *top(1) ...

  15. # iostat -xnz 1 extended device statistics r/s w/s kr/s

    kw/s wait actv wsvc_t asvc_t %w %b device 919.1 0.0 1064.7 0.0 0.0 0.2 0.0 0.2 0 17 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 610.8 116.0 531.8 865.2 0.0 0.1 0.0 0.2 0 13 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 633.1 0.0 684.6 0.0 0.0 0.2 0.0 0.2 0 14 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 812.9 0.0 406.5 0.0 0.0 0.1 0.0 0.1 0 10 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 677.9 101.1 1230.1 480.6 0.0 0.1 0.0 0.2 1 12 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 634.0 0.0 1982.9 0.0 0.0 0.1 0.0 0.2 0 13 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
  16. kernel proc0 proc1 proc2 proc3 proc4

  17. app <-> world interaction see http://kirillov.im/python_debugging.pdf Guessing app activity

  18. HVM? net hardware kernel app

  19. HVM? net hardware kernel app vm/rts

  20. Let’s look on a VM!

  21. vm (jvm) 1:1

  22. event loop + helper threads, maybe a GIL

  23. python python python python node, ruby, php, whatever

  24. node node node node

  25. node node node node IPC

  26. None
  27. Debugging Challenges async pieces running on different threads shared data

    flying callback monsters
  28. vm sched sched sched sched M:N

  29. Erlang the Operating System

  30. (woot@clu)1> i(). Pid Initial Call Heap Reds Msgs Registered Current

    Function Stack <0.0.0> otp_ring0:start/2 2584 4888 0 init init:loop/1 2 <0.3.0> erlang:apply/2 2584 145247 0 erl_prim_loader erl_prim_loader:loop/3 6 <0.5.0> gen_event:init_it/6 610 264 0 error_logger gen_event:fetch_msg/5 8 <0.6.0> erlang:apply/2 1597 434 0 application_controlle gen_server:loop/6 7 <0.8.0> application_master:init/4 377 44 0 application_master:main_loop/2 6 <0.9.0> application_master:start_it/4 233 69 0 application_master:loop_it/4 5 <0.10.0> supervisor:kernel/1 4181 43928 0 kernel_sup gen_server:loop/6 9 <0.11.0> rpc:init/1 233 35 0 rex gen_server:loop/6 9 <0.12.0> global:init/1 233 50 0 global_name_server gen_server:loop/6 9 <0.13.0> erlang:apply/2 233 20 0 global:loop_the_locker/1 4
  31. shared nothing

  32. shared nothing concurrent GC separate heaps messages always copied except

    binaries >64b
  33. sync https://github.com/rustyio/sync

  34. sync ⌃X ⌃S

  35. sync

  36. % dtrace -P erlang$(pgrep -f 'beam.*sname.ewcf') dtrace: description 'erlang4354' matched

    1082 probes CPU ID FUNCTION:NAME 2 214164 process_main:process-scheduled 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214134 fixed_apply:global-function-entry 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214043 copy_struct:copy-struct 2 214136 process_main:global-function-entry ... wtf? the node seems to be doing nothing... run for a little while
  37. dtrace -p $pid -Cs darwinkit/sys/machtrace.d ... 4354/0x13e9b: 10429636 11 stat64("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0",

    0xB032D5A8, 0x7F9EC2169E20) = 0 0 4354/0x13e9b: 10429638 4 access("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0", 0x4, 0x0) = 0 0 4354/0x13e9b: 10429640 4 access("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0", 0x2, 0x0) = 0 0 4354/0x13e9b: 10429666 6 stat64("/tank/proger/dev/net_tcp/src/net_tcp_sup.erl\0", 0xB032D5A8, 0x7F9EC2169E20) = 0 0 4354/0x13e9b: 10429668 4 access("/tank/proger/dev/net_tcp/src/net_tcp_sup.erl\0", 0x4, 0x0) = 0 0 ... syscall close_nocancel 30 syscall fstatfs64 30 syscall open_nocancel 30 syscall getdirentries64 60 syscall select 114 syscall stat64 239 syscall access 382 wtf?! run for a little while
  38. github: proger/darwinkit https://github.com/proger/darwinkit

  39. % dtrace -p $pid \ -n 'stat64:entry /pid == $target

    && strstr(copyinstr(arg0), "src/nmf.erl") != 0/ { self->t = 1; trace(copyinstr(arg0)); }' \ -n 'erlang$target:::process-unscheduled \ /self->t/ \ { printf("%s", copyinstr(arg0)); exit(0); }' CPU ID FUNCTION:NAME 2 815 stat64:entry /tank/proger/dev/net_tcp/src/nmf.erl 2 214166 schedule:process-unscheduled <0.23.0> this guy
  40. > erlang:process_info(pid(0,23,0)). [{registered_name,file_server_2}, {current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, ... ] is that guy

  41. % dtrace -p $pid \ -n 'erlang$target:::message-send /copyinstr(arg1) == "<0.23.0>"/

    { @[copyinstr(arg0)] = count() }' aggregate by sender pid look for all messages to file_server_2
  42. % dtrace -p $pid \ -n 'erlang$target:::message-send /copyinstr(arg1) == "<0.23.0>"/

    { @[copyinstr(arg0)] = count() }' dtrace: description 'erlang$target:::message-send ' matched 3 probes ^C <0.63.0> 34 run for a little while aggregate by sender pid look for all messages to file_server_2 this guy this many times
  43. > erlang:process_info(pid(0,63,0)). [{registered_name,sync_scanner}, {current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, ... ]

    this guy is that sync app that does polling which kinda sucks
  44. watching erlang erlang(3) ets(3) dbg(3) etop(3) dtrace -P erlang'*'

  45. github: proger/erlkit https://github.com/proger/erlkit

  46. None
  47. None
  48. Prelude.head: empty list

  49. {-# LANGUAGE NoImplicitPrelude #-} import Safe http://hackage.haskell.org/package/safe

  50. Haskell Runtime GHC RTS embedded into every binary ghc-options: -rtsopts

    -with-rtsopts=-T -debug -prof -with-rtsopts=-xc (--enable-executable-profiling)
  51. Concurrent Haskell tasks threads (TSO) capabilities sparks (par thunks) http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler

  52. Concurrent Haskell import Control.Concurrent import GHC.Conc

  53. \> :info ThreadId data ThreadId = ThreadId GHC.Prim.ThreadId# \> :t

    GHC.Conc.threadStatus GHC.Conc.threadStatus :: ThreadId -> IO ThreadStatus \> :info ThreadStatus data ThreadStatus = ThreadRunning | ThreadFinished | ThreadBlocked BlockReason | ThreadDied \> :info BlockReason data BlockReason = BlockedOnMVar | BlockedOnBlackHole | BlockedOnException | BlockedOnSTM | BlockedOnForeignCall | BlockedOnOther
  54. \> :t labelThread labelThread :: ThreadId -> String -> IO

    ()
  55. ghc/rts/ThreadLabels.c: #if defined(DEBUG) static HashTable * threadLabels = NULL; //

    ...
  56. % dtrace -m hswebq -l ID PROVIDER MODULE FUNCTION NAME

    14853 HaskellEvent12585 hswebq traceCapDelete cap-delete 14854 HaskellEvent12585 hswebq traceCapsetAssignCap capset-assign-cap 14855 HaskellEvent12585 hswebq traceCapsetCreate capset-create 14856 HaskellEvent12585 hswebq traceCapsetDelete capset-delete 14857 HaskellEvent12585 hswebq traceCapsetRemoveCap capset-remove-cap 14858 HaskellEvent12585 hswebq traceEventCreateThread create-thread 14859 HaskellEvent12585 hswebq traceEventGcDone gc-done 14860 HaskellEvent12585 hswebq traceEventGcEndAtT gc-end 14861 HaskellEvent12585 hswebq traceEventGcIdle gc-idle 14862 HaskellEvent12585 hswebq traceEventGcStartAtT gc-start 14863 HaskellEvent12585 hswebq traceEventGcWork gc-work 14864 HaskellEvent12585 hswebq traceEventMigrateThread migrate-thread 14865 HaskellEvent12585 hswebq traceEventRunThread run-thread 14866 HaskellEvent12585 hswebq dtraceStartup startup 14867 HaskellEvent12585 hswebq traceEventStopThread stop-thread 14868 HaskellEvent12585 hswebq traceThreadLabel thread-label 14869 HaskellEvent12585 hswebq traceEventThreadRunnable thread-runnable 14870 HaskellEvent12585 hswebq traceEventThreadWakeup thread_wakeup 14871 HaskellEvent12585 hswebq traceFormatUserMsg user-msg
  57. EKG http://hackage.haskell.org/package/ekg

  58. github: proger/hswebq https://github.com/proger/hswebq (demo)

  59. type PubSubQ = Map Text (Bool, Set ThreadId) type QThreadStatus

    = (Text, ThreadId, ThreadStatus) threadStatus' :: Text -> ThreadId -> IO QThreadStatus threadStatus' k t = threadStatus t >>= \x -> return (k, t, x) qstats :: PubSubQ -> [IO QThreadStatus] qstats wq = concatMap (\(key, (_, threads)) -> map (threadStatus' key) $ toList threads) $ Map.toList wq
  60. takeaway! in production you guess app activity by watching system

    activity that kinda sucks
  61. takeaway! each complex app has as much watch points as

    an OS kernel (but both manage different resources)
  62. takeaway! callback monsters are tough threads tagging ftw

  63. takeaway! erlang beam is hardcore haskell needs babysitting ghc patch

    pending :)
  64. None
  65. k thx bai! @darkproger http://kirillov.im

  66. this slide is intentionally left almost blank

  67. http://farm9.staticflickr.com/8078/8291970818_eaafcc00f4_z.jpg http://www.weblancer.net/files/forum/228/22846/422441_141954.jpg