Slide 1

Slide 1 text

Introspecting Erlang & Haskell @darkproger

Slide 2

Slide 2 text

you don’t need to know erlang and haskell but need to know what are they about assumptions

Slide 3

Slide 3 text

web-like tasks and workloads (juggling concurrent queues) assumptions

Slide 4

Slide 4 text

production-like environments -> no blind code reloads -> no random restarts -> no profiling builds assumptions

Slide 5

Slide 5 text

A software system

Slide 6

Slide 6 text

A (single-node) software system

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

net hardware kernel app HVM?

Slide 10

Slide 10 text

HVM? net hardware kernel app

Slide 11

Slide 11 text

kernel app

Slide 12

Slide 12 text

http://www.makelinux.net/kernel_map/

Slide 13

Slide 13 text

watching the (linux) kernel proc(5) netlink(7) sysfs

Slide 14

Slide 14 text

watching the kernel vmstat(1) iostat(1) netstat(1) mpstat(1) *top(1) ...

Slide 15

Slide 15 text

# iostat -xnz 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 919.1 0.0 1064.7 0.0 0.0 0.2 0.0 0.2 0 17 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 610.8 116.0 531.8 865.2 0.0 0.1 0.0 0.2 0 13 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 633.1 0.0 684.6 0.0 0.0 0.2 0.0 0.2 0 14 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 812.9 0.0 406.5 0.0 0.0 0.1 0.0 0.1 0 10 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 677.9 101.1 1230.1 480.6 0.0 0.1 0.0 0.2 1 12 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 634.0 0.0 1982.9 0.0 0.0 0.1 0.0 0.2 0 13 c2t0d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device

Slide 16

Slide 16 text

kernel proc0 proc1 proc2 proc3 proc4

Slide 17

Slide 17 text

app <-> world interaction see http://kirillov.im/python_debugging.pdf Guessing app activity

Slide 18

Slide 18 text

HVM? net hardware kernel app

Slide 19

Slide 19 text

HVM? net hardware kernel app vm/rts

Slide 20

Slide 20 text

Let’s look on a VM!

Slide 21

Slide 21 text

vm (jvm) 1:1

Slide 22

Slide 22 text

event loop + helper threads, maybe a GIL

Slide 23

Slide 23 text

python python python python node, ruby, php, whatever

Slide 24

Slide 24 text

node node node node

Slide 25

Slide 25 text

node node node node IPC

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Debugging Challenges async pieces running on different threads shared data flying callback monsters

Slide 28

Slide 28 text

vm sched sched sched sched M:N

Slide 29

Slide 29 text

Erlang the Operating System

Slide 30

Slide 30 text

(woot@clu)1> i(). Pid Initial Call Heap Reds Msgs Registered Current Function Stack <0.0.0> otp_ring0:start/2 2584 4888 0 init init:loop/1 2 <0.3.0> erlang:apply/2 2584 145247 0 erl_prim_loader erl_prim_loader:loop/3 6 <0.5.0> gen_event:init_it/6 610 264 0 error_logger gen_event:fetch_msg/5 8 <0.6.0> erlang:apply/2 1597 434 0 application_controlle gen_server:loop/6 7 <0.8.0> application_master:init/4 377 44 0 application_master:main_loop/2 6 <0.9.0> application_master:start_it/4 233 69 0 application_master:loop_it/4 5 <0.10.0> supervisor:kernel/1 4181 43928 0 kernel_sup gen_server:loop/6 9 <0.11.0> rpc:init/1 233 35 0 rex gen_server:loop/6 9 <0.12.0> global:init/1 233 50 0 global_name_server gen_server:loop/6 9 <0.13.0> erlang:apply/2 233 20 0 global:loop_the_locker/1 4

Slide 31

Slide 31 text

shared nothing

Slide 32

Slide 32 text

shared nothing concurrent GC separate heaps messages always copied except binaries >64b

Slide 33

Slide 33 text

sync https://github.com/rustyio/sync

Slide 34

Slide 34 text

sync ⌃X ⌃S

Slide 35

Slide 35 text

sync

Slide 36

Slide 36 text

% dtrace -P erlang$(pgrep -f 'beam.*sname.ewcf') dtrace: description 'erlang4354' matched 1082 probes CPU ID FUNCTION:NAME 2 214164 process_main:process-scheduled 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214134 fixed_apply:global-function-entry 2 214138 process_main:local-function-entry 2 214138 process_main:local-function-entry 2 214043 copy_struct:copy-struct 2 214136 process_main:global-function-entry ... wtf? the node seems to be doing nothing... run for a little while

Slide 37

Slide 37 text

dtrace -p $pid -Cs darwinkit/sys/machtrace.d ... 4354/0x13e9b: 10429636 11 stat64("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0", 0xB032D5A8, 0x7F9EC2169E20) = 0 0 4354/0x13e9b: 10429638 4 access("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0", 0x4, 0x0) = 0 0 4354/0x13e9b: 10429640 4 access("/tank/proger/dev/net_tcp/src/net_tcp_app.erl\0", 0x2, 0x0) = 0 0 4354/0x13e9b: 10429666 6 stat64("/tank/proger/dev/net_tcp/src/net_tcp_sup.erl\0", 0xB032D5A8, 0x7F9EC2169E20) = 0 0 4354/0x13e9b: 10429668 4 access("/tank/proger/dev/net_tcp/src/net_tcp_sup.erl\0", 0x4, 0x0) = 0 0 ... syscall close_nocancel 30 syscall fstatfs64 30 syscall open_nocancel 30 syscall getdirentries64 60 syscall select 114 syscall stat64 239 syscall access 382 wtf?! run for a little while

Slide 38

Slide 38 text

github: proger/darwinkit https://github.com/proger/darwinkit

Slide 39

Slide 39 text

% dtrace -p $pid \ -n 'stat64:entry /pid == $target && strstr(copyinstr(arg0), "src/nmf.erl") != 0/ { self->t = 1; trace(copyinstr(arg0)); }' \ -n 'erlang$target:::process-unscheduled \ /self->t/ \ { printf("%s", copyinstr(arg0)); exit(0); }' CPU ID FUNCTION:NAME 2 815 stat64:entry /tank/proger/dev/net_tcp/src/nmf.erl 2 214166 schedule:process-unscheduled <0.23.0> this guy

Slide 40

Slide 40 text

> erlang:process_info(pid(0,23,0)). [{registered_name,file_server_2}, {current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, ... ] is that guy

Slide 41

Slide 41 text

% dtrace -p $pid \ -n 'erlang$target:::message-send /copyinstr(arg1) == "<0.23.0>"/ { @[copyinstr(arg0)] = count() }' aggregate by sender pid look for all messages to file_server_2

Slide 42

Slide 42 text

% dtrace -p $pid \ -n 'erlang$target:::message-send /copyinstr(arg1) == "<0.23.0>"/ { @[copyinstr(arg0)] = count() }' dtrace: description 'erlang$target:::message-send ' matched 3 probes ^C <0.63.0> 34 run for a little while aggregate by sender pid look for all messages to file_server_2 this guy this many times

Slide 43

Slide 43 text

> erlang:process_info(pid(0,63,0)). [{registered_name,sync_scanner}, {current_function,{gen_server,loop,6}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,0}, {messages,[]}, ... ] this guy is that sync app that does polling which kinda sucks

Slide 44

Slide 44 text

watching erlang erlang(3) ets(3) dbg(3) etop(3) dtrace -P erlang'*'

Slide 45

Slide 45 text

github: proger/erlkit https://github.com/proger/erlkit

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Prelude.head: empty list

Slide 49

Slide 49 text

{-# LANGUAGE NoImplicitPrelude #-} import Safe http://hackage.haskell.org/package/safe

Slide 50

Slide 50 text

Haskell Runtime GHC RTS embedded into every binary ghc-options: -rtsopts -with-rtsopts=-T -debug -prof -with-rtsopts=-xc (--enable-executable-profiling)

Slide 51

Slide 51 text

Concurrent Haskell tasks threads (TSO) capabilities sparks (par thunks) http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler

Slide 52

Slide 52 text

Concurrent Haskell import Control.Concurrent import GHC.Conc

Slide 53

Slide 53 text

\> :info ThreadId data ThreadId = ThreadId GHC.Prim.ThreadId# \> :t GHC.Conc.threadStatus GHC.Conc.threadStatus :: ThreadId -> IO ThreadStatus \> :info ThreadStatus data ThreadStatus = ThreadRunning | ThreadFinished | ThreadBlocked BlockReason | ThreadDied \> :info BlockReason data BlockReason = BlockedOnMVar | BlockedOnBlackHole | BlockedOnException | BlockedOnSTM | BlockedOnForeignCall | BlockedOnOther

Slide 54

Slide 54 text

\> :t labelThread labelThread :: ThreadId -> String -> IO ()

Slide 55

Slide 55 text

ghc/rts/ThreadLabels.c: #if defined(DEBUG) static HashTable * threadLabels = NULL; // ...

Slide 56

Slide 56 text

% dtrace -m hswebq -l ID PROVIDER MODULE FUNCTION NAME 14853 HaskellEvent12585 hswebq traceCapDelete cap-delete 14854 HaskellEvent12585 hswebq traceCapsetAssignCap capset-assign-cap 14855 HaskellEvent12585 hswebq traceCapsetCreate capset-create 14856 HaskellEvent12585 hswebq traceCapsetDelete capset-delete 14857 HaskellEvent12585 hswebq traceCapsetRemoveCap capset-remove-cap 14858 HaskellEvent12585 hswebq traceEventCreateThread create-thread 14859 HaskellEvent12585 hswebq traceEventGcDone gc-done 14860 HaskellEvent12585 hswebq traceEventGcEndAtT gc-end 14861 HaskellEvent12585 hswebq traceEventGcIdle gc-idle 14862 HaskellEvent12585 hswebq traceEventGcStartAtT gc-start 14863 HaskellEvent12585 hswebq traceEventGcWork gc-work 14864 HaskellEvent12585 hswebq traceEventMigrateThread migrate-thread 14865 HaskellEvent12585 hswebq traceEventRunThread run-thread 14866 HaskellEvent12585 hswebq dtraceStartup startup 14867 HaskellEvent12585 hswebq traceEventStopThread stop-thread 14868 HaskellEvent12585 hswebq traceThreadLabel thread-label 14869 HaskellEvent12585 hswebq traceEventThreadRunnable thread-runnable 14870 HaskellEvent12585 hswebq traceEventThreadWakeup thread_wakeup 14871 HaskellEvent12585 hswebq traceFormatUserMsg user-msg

Slide 57

Slide 57 text

EKG http://hackage.haskell.org/package/ekg

Slide 58

Slide 58 text

github: proger/hswebq https://github.com/proger/hswebq (demo)

Slide 59

Slide 59 text

type PubSubQ = Map Text (Bool, Set ThreadId) type QThreadStatus = (Text, ThreadId, ThreadStatus) threadStatus' :: Text -> ThreadId -> IO QThreadStatus threadStatus' k t = threadStatus t >>= \x -> return (k, t, x) qstats :: PubSubQ -> [IO QThreadStatus] qstats wq = concatMap (\(key, (_, threads)) -> map (threadStatus' key) $ toList threads) $ Map.toList wq

Slide 60

Slide 60 text

takeaway! in production you guess app activity by watching system activity that kinda sucks

Slide 61

Slide 61 text

takeaway! each complex app has as much watch points as an OS kernel (but both manage different resources)

Slide 62

Slide 62 text

takeaway! callback monsters are tough threads tagging ftw

Slide 63

Slide 63 text

takeaway! erlang beam is hardcore haskell needs babysitting ghc patch pending :)

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

k thx bai! @darkproger http://kirillov.im

Slide 66

Slide 66 text

this slide is intentionally left almost blank

Slide 67

Slide 67 text

http://farm9.staticflickr.com/8078/8291970818_eaafcc00f4_z.jpg http://www.weblancer.net/files/forum/228/22846/422441_141954.jpg