Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Raku Memory Manglement: Checking yourself out.

Raku Memory Manglement: Checking yourself out.

You cannot manage what you cannot measure. Which is fine, until you can't measure something...

The ProcStats module fills the gap for Raku on *NIX by polling rusage(2) to report initial, incremental, and final usage values.

This talk describes the rusage values and shows how they are acquired and output in Raku, including the data handling facilities in Raku used to manage the results.

Steven Lembark

July 09, 2022
Tweet

More Decks by Steven Lembark

Other Decks in Technology

Transcript

  1. Yes, size() matters. But it’s hard to find. Process-level stats.

    Mainly “RSS”. getrusage(2). Acquiring & analyze data. Raku tools.
  2. Raku? Started out as “Perl6”. Then became an entirely new

    language. 20/20 hindsight is a wonderful design tool.
  3. RSS? “Resident Set Size” Virtual pages in physical memory. Accessible

    without a page fault. Non-resident VM may be swapped. Requires a page fault to access.
  4. getrusage(2) struct rusage { struct timeval ru_utime; /* user CPU

    time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* IPC messages sent */ long ru_msgrcv; /* IPC messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ }; POSIX
  5. getrusage(2) struct rusage { struct timeval ru_utime; /* user CPU

    time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* IPC messages sent */ long ru_msgrcv; /* IPC messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ }; Linux
  6. getrusage(2) struct rusage { struct timeval ru_utime; /* user CPU

    time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ }; Only max RSS. No way to track reduction.
  7. getrusage(2) struct rusage { struct timeval ru_utime; /* user CPU

    time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ }; Total fault counts, not periodic.
  8. Viewing RSS Telemetry module. Takes periodic snapshots. Allows inserting a

    label to track events. Core with nqp. Not synchronous with tasks. Raku
  9. Viewing RSS ProcStats Exports “dump-rusage”. Differences from first sample. Only

    output changes. Track wallclock time. Optional label.
  10. :final Output all stats compared to first sample ProcStats sub

    dump-rusage ( Bool() :$final = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT )
  11. :first Values compared to first sample (vs. prior). ProcStats sub

    dump-rusage ( Bool() :$final = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT )
  12. :force Write all stats (vs. only changed). ProcStats sub dump-rusage

    ( Bool() :$final = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT )
  13. :label Add “label” key (default from :final). ProcStats sub dump-rusage

    ( Bool() :$final = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT )
  14. Wallclock time Elapsed vs. CPU sub dump-rusage ( Bool() :$final

    = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT ) { my $wtime = now.Num;
  15. Wallclock time Sample at top to avoid time-shift. sub dump-rusage

    ( Bool() :$final = False, Bool() :$first = $final, Bool() :$force = $final, Stringy() :$label = $final ?? 'Final' !! '' ) is export( :DEFAULT ) { my $wtime = now.Num;
  16. Values from RSS constant FIELDS = < maxrss ixrss idrss

    isrss minflt majflt nswap inblock oublock msgsnd msgrcv nsignals nvcsw nivcsw >; constant IGNORE = <ixrss idrss isrss ...>; constant REPORT = < maxrss majflt minflt inblock oublock >; constant MICRO = 10 ** -6; COMPARE avoids reporting on CPU swithes.
  17. Acquire data Times are sec + µsec, deal with them

    separately. “Z=>” zips fields & values into a hash. use nqp; nqp::getrusage( my int @raw ); my ( $user_s, $user_us, $syst_s, $syst_us ) = splice @raw, 0, 4; my %sample = FIELDS Z=> @raw; %sample{ IGNORE } :delete;
  18. Making time my $utime = ( $user_s + $user_us /

    1_000_000 ).round( MICRO ); my $stime = ( $syst_s + $syst_us / 1_000_000 ).round( MICRO ); user & system time begin as two ints. Round gives reasonable precision in output.
  19. Store baseline values. state %last = state %first = (

    |%sample, :$wtime, :$utime, :$stime, ); First is never updated. Get a working “last” value on the first pass.
  20. Store baseline values. Flatten %sample into pairs. state %last =

    state %first = ( |%sample, :$wtime, :$utime, :$stime, );
  21. Store baseline values. Times as pairs. state %last = state

    %first = ( |%sample, :$wtime, :$utime, :$stime, );
  22. First is last at first. After first last is last.

    my %prior = $first ?? %first !! %last ;
  23. What to compare? Force reports full sample. COMPARE limits keys

    compare to %prior & output. my %curr = ( $force || ! $passes ) ?? %sample !! do { my @diffs = REPORT.grep( { %sample{$_} != %prior{$_} } ); @diffs Z=> %sample{ @diffs } };
  24. Write out one stat heading & value. Compute column width

    once during execution. sub write-stat ( Pair $p ) { note sprintf '%-*s : %s', once {FIELDS».chars.max}, $p.key, $p.value ; }
  25. Write progressive value Numerics compared to starting baseline. Simplifies tracking

    code results. sub write-diff ( Pair $p ) { my $k = $p.key; my $v = $p.value - %first{ $k }; write-stat $k => $v; }
  26. First pass writes all stats. First pass has to report

    baseline values. state $write = &write-stat;
  27. First pass writes all stats. First pass has to report

    baseline values. After that report differences. state &write = &write-stat; ... write $stat; ... once { &write = &write-diff };
  28. for %curr.sort -> $stat { FIRST { note '---'; write-stat

    ( output => $++ ); write-stat ( :$passes ); write-stat ( :$label ) if $label; write-diff ( :$wtime ); write-diff ( :$utime ); write-diff ( :$stime ); } write $stat }
  29. for %curr.sort -> $stat { FIRST { note '---'; write-stat

    ( output => $++ ); write-stat ( :$passes ); write-stat ( :$label ) if $label; write-diff ( :$wtime ); write-diff ( :$utime ); write-diff ( :$stime ); } write $stat }
  30. for %curr.sort -> $stat { FIRST { note '---'; write-stat

    ( output => $++ ); write-stat ( :$passes ); write-stat ( :$label ) if $label; write-diff ( :$wtime ); write-diff ( :$utime ); write-diff ( :$stime ); } write $stat }
  31. for %curr.sort -> $stat { FIRST { note '---'; write-stat

    ( output => $++ ); write-stat ( :$passes ); write-stat ( :$label ) if $label; write-diff ( :$wtime ); write-diff ( :$utime ); write-diff ( :$stime ); } write $stat }
  32. for %curr.sort -> $stat { FIRST { note '---'; write-stat

    ( output => $++ ); write-stat ( :$passes ); write-stat ( :$label ) if $label; write-diff ( :$wtime ); write-diff ( :$utime ); write-diff ( :$stime ); } write $stat }
  33. Last steps Up total count. Store current sample for re-use.

    ++$passes; %last = %sample; once { &write = &write-diff };
  34. Baseline usage Bare for-loop Shows overhead of rusage output. #!/usr/bin/env

    Raku use v6.d; use FindBin::libs; use ProcStats; dump-rusage for 1 .. 1_000; dump-rusage( :final );
  35. Sample 0 Pass 0 as all values. Baseline for RSS

    & friends. --- output : 0 passes : 0 wtime : 1560968261.746507 utime : 0.344793 stime : 0.020896 inblock : 0 majflt : 0 maxrss : 99732 minflt : 25039 nivcsw : 10 nvcsw : 204 oublock : 64
  36. Sample 0 wtime is ‘real world’. Reasonable candidate key for

    sample history. --- output : 0 passes : 0 wtime : 1560968261.746507 utime : 0.344793 stime : 0.020896 inblock : 0 majflt : 0 maxrss : 99732 minflt : 25039 nivcsw : 10 nvcsw : 204 oublock : 64
  37. Sample 0 RSS is ~100MiB at startup. --- output :

    0 passes : 0 wtime : 1560968261.746507 utime : 0.344793 stime : 0.020896 inblock : 0 majflt : 0 maxrss : 99732 minflt : 25039 nivcsw : 10 nvcsw : 204 oublock : 64
  38. Output Output 1+ are relative to %first. Sample N ---

    output : 1 passes : 1 wtime : 0.0081639 utime : 0.007295 stime : 0.000228 maxrss : 1588 minflt : 255 --- ...
  39. Output Output 1+ are relative to %first. maxrss & minflt

    cause output. Output --- output : 1 passes : 1 wtime : 0.0081639 utime : 0.007295 stime : 0.000228 maxrss : 1588 minflt : 255 --- ...
  40. Output Inermediate passes. Output #130: minflt 1758 -> 1759. Output

    --- output : 129 passes : 812 wtime : 0.4603018 utime : 0.60607 stime : 0.000175 minflt : 1758 --- output : 130 passes : 813 wtime : 0.4636268 utime : 0.609417 stime : 0.000175 minflt : 1759 ---
  41. Output getrulsage( :final ); Shows all fields. About 1/8 of

    passes had output. “Final” sample --- output : 131 passes : 1000 label : Final wtime : 0.5086002 utime : 0.654374 stime : 0.000175 inblock : 0 majflt : 0 maxrss : 6996 minflt : 1759 nivcsw : 2 nvcsw : 35 oublock : 0
  42. Default label. --- output : 131 passes : 1000 label

    : Final wtime : 0.5086002 utime : 0.654374 stime : 0.000175 inblock : 0 majflt : 0 maxrss : 6996 minflt : 1759 nivcsw : 2 nvcsw : 35 oublock : 0 “Final” sample
  43. Fairly low overhead. --- output : 131 passes : 1000

    label : Final wtime : 0.5086002 utime : 0.654374 stime : 0.000175 inblock : 0 majflt : 0 maxrss : 6996 minflt : 1759 nivcsw : 2 nvcsw : 35 oublock : 0 “Final” sample
  44. Multiple threads: wallclock < user. --- output : 131 passes

    : 1000 label : Final wtime : 0.5086002 utime : 0.654374 stime : 0.000175 inblock : 0 majflt : 0 maxrss : 6996 minflt : 1759 nivcsw : 2 nvcsw : 35 oublock : 0 “Final” sample
  45. RSS grew by ~7MiB --- output : 131 passes :

    1000 label : Final wtime : 0.5086002 utime : 0.654374 stime : 0.000175 inblock : 0 majflt : 0 maxrss : 6996 minflt : 1759 nivcsw : 2 nvcsw : 35 oublock : 0 “Final” sample
  46. Really do something... Simulate traking userid’s on a web server:

    Add a hash key. Increment a random value. Drop a key.
  47. Roll your own Random hash key via random sample. sub

    random-string ( Int() :$size = ( 1 .. 10 ).pick --> Str ) { constant alpha = [ 'a' ... 'z', 'A' ... 'Z' ]; alpha.roll( $size ).join; }
  48. Roll your own pick() returns a single, random value. sub

    random-string ( Int() :$size = ( 1 .. 10 ).pick --> Str ) { constant alpha = [ 'a' ... 'z', 'A' ... 'Z' ]; alpha.roll( $size ).join; }
  49. Roll your own roll() returns a random sample. sub random-string

    ( Int() :$size = ( 1 .. 10 ).pick --> Str ) { constant alpha = [ 'a' ... 'z', 'A' ... 'Z' ]; alpha.roll( $size ).join; }
  50. Fake userid Track key counts, active keys. sub user-add {

    ++%user-data{ random-string }; ++$adds; $max-keys = max $max-keys, %user-data.elems; }
  51. Random key selection sub user-drop { %user-data or return; ++$drops;

    %user-data{ %user-data.pick.key } :delete; } sub user-op { %user-data or return; ++$ops; ++%user-data{ %user-data.pick.key }; }
  52. Randomized, weighted trial. for 1 .. 1000 { constant weighted_operations

    = ( &user-add => 0.10, &user-drop => 0.10, &user-op => 0.80, ).Mix; weighted_operations.roll( 1_000 )».(); dump-rusage(label => 'Keys: '~%user‑data.elems ); }
  53. Define op’s and weights. for 1 .. 1000 { constant

    weighted_operations = ( &user-add => 0.10, &user-drop => 0.10, &user-op => 0.80, ).Mix; weighted_operations.roll( 1_000 )».(); dump-rusage(label => 'Keys: '~%user‑data.elems ); }
  54. 1000 iterations of trial. for 1 .. 1000 { constant

    weighted_operations = ( &user-add => 0.10, &user-drop => 0.10, &user-op => 0.80, ).Mix; weighted_operations.roll( 1_000 )».(); dump-rusage(label => 'Keys: '~%user‑data.elems ); }
  55. Report summary “say” is stdout, dump-rusage is stderr. :final uses

    %first as reference for values. dump-rusage( :final ); say 'Total adds: ' ~ $adds; say 'Total drops: ' ~ $drops; say 'Total ops: ' ~ $ops; say 'Max keys: ' ~ $max-keys; say 'Final keys: ' ~ %user-data.elems;
  56. Stats results Final results from “say”. Total adds: 99738 Total

    drops: 98755 Total ops: 787133 Max keys: 213 Final keys: 144
  57. Stats results Final sample: 1000 iterations/pass. Extra time from threading.

    ~18MiB RSS growth. --- output : 518 passes : 1001 label : Final wtime : 18.668069 utime : 18.846082 stime : 0.01101 inblock : 0 majflt : 0 maxrss : 18404 minflt : 5522 nivcsw : 61 nvcsw : 83 oublock : 128
  58. What you see is all you get. RSS, faults. Per-process

    totals. Not per structure. Randomized trials simple in Raku. Monitor results after specific operations.