Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gathers-and-takes.pdf

 gathers-and-takes.pdf

Steven Lembark

November 03, 2023
Tweet

More Decks by Steven Lembark

Other Decks in Technology

Transcript

  1. Hypers & Gathers & Takes! Oh My! The Yellow Brick

    Road to Raku ETL. Steven Lembark Workhorse Computing [email protected]
  2. nr.gz is available on NCBI Non-redundent sequence from from NIH.

    123GB of BLAST format AA sequences. 257_100_652 sequences (recently). From a dozen to ~100K chars in length.
  3. Q: Is it really unique? NIH doesn’t its own quality

    control. Catch: There can be duplicate sequences. How to validate uniqueness?
  4. BLAST format Devised by biologists to make us all miserable:

    Prefix ‘>’ starts sequence. EOF ends last sequence. > arbitrary text...\n sequence line...\n sequence line...\n > and so it goes...\n sequence line...\n
  5. Comparing sequences Extract the sequences. Compare all 257_100_652 of them.

    Only 33_050_372_629_412_552 pairs! About 1_048_020 years at 1KHz.
  6. Comparing sequences But wait! Just SHA the sequences! SHA512 has

    collisions. No one-step way to compare via digests.
  7. Comparing sequences Ignore headers. Extract sequences. Generate two outputs: ID

    + Length + Digest ID + Sequence Merge by length + digest. Compare collision sequences.
  8. Read sequences No good way to thread reading: Sequences are

    variable length. Most larger than a filesystem block.
  9. Pick a format. xz: half the space in twice the

    time: 91483591675 nr.gz # gzip -9 -> 91GB 51888998124 nr.xz # xz -9e -> 51GB # time xzcat < nr.xz > /dev/null; real 110m3.088s user 109m3.145s sys 0m58.175s # time gzip -dc < nr.gz > /dev/null; real 41m28.595s user 40m16.881s sys 1m11.103s
  10. Pick a format. de-compressing on nvme is a non-trivial in

    both cases. 91483591675 nr.gz # gzip -9 -> 91GB 51888998124 nr.xz # xz -9e -> 51GB # time xzcat < nr.xz > /dev/null; real 110m3.088s user 109m3.145s sys 0m58.175s # time gzip -dc < nr.gz > /dev/null; real 41m28.595s user 40m16.881s sys 1m11.103s
  11. Processing sequences Result: Single-stream read with parallel processing. Chunk the

    input, pass chunks off to worker threads. Ooooh,worker pools! Q: Who here enjoys fork-exec? IPC? Data pipes? Semaphores?
  12. Processing sequences Result: Single-stream read with parallel processing. Chunk the

    input, pass chunks off to worker threads. A: Nobody. Raku ends your pain.
  13. One approach: seqence handlers subscribe. write sequence to channel. Graceful

    delivery, bookkeeping all under the hood. Nicely described in docs: https://docs.raku.org/language/concurrency Processing sequences
  14. First step: Read the input. Deal with input format. Chunk

    the data. Pass chunks to worker threads.
  15. Command line arguments --path=’X’ is input path of data. --threads=N

    count of worker threads. --chunk_size=N size of read chunk unit sub MAIN ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) , Int:D :$threads = 1 , Int:D :$chunk_size = 18 );
  16. Command line arguments $path defaults to /exec/path/../data/nr.gz. unit sub MAIN

    ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  17. Command line arguments Splat is a “twigil” ‘*’ is a

    pre-defined dynamic variable. unit sub MAIN ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  18. Command line arguments Dots are method calls. unit sub MAIN

    ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  19. Command line arguments Dots are method calls. dirname returns String.

    unit sub MAIN ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  20. Command line arguments Dots are method calls. dirname returns String.

    .IO is a constructor. unit sub MAIN ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  21. Command line arguments Dots are method calls. dirname returns String.

    .IO.add( … ) is a method call to the IO object. unit sub MAIN ( IO::Path:D :$path = $*PROGRAM.dirname.IO.add( '../data/nr.gz' ) ... );
  22. ROOT of all evil ROOT is a value. It cannot

    be changed. Re-assignment is a compile-time error. my \ROOT = $*PROGRAM.dirname.IO. add( '../out' ).IO.cleanup.add( $*PROGRAM.basename );
  23. ROOT of all evil Base for paths of output files.

    cleanup resolves “..” and “//” Not abspath. my \ROOT = $*PROGRAM.dirname.IO. add( '../out' ).IO.cleanup.add( $*PROGRAM.basename );
  24. Housekeeping with localizes its argument. Assigns to $_ by default.

    with ROOT.basename ~ '.' -> $match { .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  25. Housekeeping ROOT is a value: no sigil. with ROOT.basename ~

    '.' -> $match { .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  26. Housekeeping ‘~’ is catenate. with ROOT.basename ~ '.' -> $match

    { .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  27. Housekeeping -> is assignment with ROOT.basename ~ '.' -> $match

    { .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  28. Housekeeping postfix for iterates $_.unlink. with ROOT.basename ~ '.' ->

    $match { .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  29. Housekeeping Directory scan. with ROOT.basename ~ '.' -> $match {

    .unlink for ROOT.dirname.IO.dir ( test => { .IO.basename ~~ rx{^ $match } } ); }
  30. Housekeeping ‘:’ is syntatic sugar for trailing ( … ).

    Raku isn’t ((((LISP )))):-) with ROOT.basename ~ '.' -> $match { .unlink for ROOT.dirname.IO.dir: test => { .IO.basename ~~ rx{^ $match } } ; }
  31. Pick a format, any format... Input path defines extract command.

    my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
  32. Pick a format, any format... given localizes its argument to

    $_. when does a smartmatch. my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
  33. Pick a format, any format... gather accumulates the result of

    takes. take can be anywhere: block, sub-call... my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
  34. Reading the input run forks a shell command. :out makes

    stdout available. with run( “$extract $path”, :out ).out -> $input { # read from $input } else { die “Failed ‘$extract $path’, $!\n”; }
  35. Reading the input $input is assigned the stdout handle from

    run. with run( “$extract $path”, :out ).out -> $input { # read from $input } else { die “Failed ‘$extract $path’, $!\n”; }
  36. Reading the input Altneration of with is ‘undef’ in the

    assignment. Leaves in else-block to handle failed exectution. with run( “$extract $path”, :out ).out -> $input { # read from $input } else { die “Failed ‘$extract $path’, $!\n”; }
  37. Reading the input Catch: run over-buffers the input. Works fine

    for “ls -l” or “git status” as commands. Looks like a memory leak with large input. Fix: Use a named pipe.
  38. Find the pipe Generate & sanity check a pipe. my

    $pipe = gather with $*PROGRAM.dirname.IO.add( 'read.p' )->$sanity { $sanity.e or die "Non-existant: '$sanity'."; $sanity.r or die "Non-readable: '$sanity'."; take $sanity; # doesn’t need to be at the end! say "Pipe: '$sanity'"; }
  39. Write the pipe Shell syntax for extracting from a file

    into a named pipe. with "$extract < $path > $pipe &" -> $cmd { say "Extract: '$cmd'"; start shell $cmd; }
  40. Write the pipe Shell handles fork/exec. with "$extract < $path

    > $pipe &" -> $cmd { say "Extract: '$cmd'"; start shell $cmd; }
  41. Write the pipe start creates a thread. with "$extract <

    $path > $pipe &" -> $cmd { say "Extract: '$cmd'"; start shell $cmd; }
  42. Read the pipe Reading the named pipe: open a file.

    with open $pipe, :r -> $input { # ... process $input } else { die “Failed open: ‘$pipe’, $!”; }
  43. Read the pipe Raku files don’t auto-close! with open $pipe,

    :r -> $input { # ... process $input close $input; } else { die “Failed open: ‘$pipe’, $!”; }
  44. Progress meter my $chunks = 0; start loop { sleep

    5; say "$chunks chunks"; }; Check if processing stalls. say adds a newline.
  45. Reading the input. Balance chunk size with thread count. Ideally

    keep everything running. One option: readline with m{ ^ > } as end-of-record. Way too slow.
  46. Reading the input. Balance chunk size with thread count. Ideally

    keep everything running. Faster option: Fixed-size, unbuffered read. Chunk the data, deal with trailing records.
  47. Containers & Data Fixed data has a few advantages: Not

    an lvalue, causes compile-time error. Faster: No container object to un-wrap. Variables: update & interpolate. my \READ_BYTES = 2 ** 16; my \READ_COUNT = 2 ** ( $chunk_size - 16 ); my \INTERVAL = 2 ** ( 25 - $chunk_size );
  48. Chunky data read returns a Buffer. Buffers are blobs, not

    text. my $chunk = ‘’; for ( 1 .. READ_COUNT ) { $chunk ~= $input.read( READ_BYTES ).decode( 'ascii' ); $input.eof and last; }
  49. Chunky data decode produces text. ‘ascii’ is lowest-overhead. my $chunk

    = ‘’; for ( 1 .. READ_COUNT ) { $chunk ~= $input.read( READ_BYTES ).decode( 'ascii' ); $input.eof and last; }
  50. Chunky data decode produces text. ‘ascii’ is lowest-overhead. decoded text

    is appended to the buffer. my $chunk = ‘’; for ( 1 .. READ_COUNT ) { $chunk ~= $input.read( READ_BYTES ).decode( 'ascii' ); $input.eof and last; }
  51. Chunky data Chunks don’t align with records. state value preserved

    between calls. state $next = ‘’; state $chunk = ‘’; my \offset = $input.tell - $next.chars; $chunk = $next; for ( 1 .. READ_COUNT ) { ... }
  52. Chunky data Take feeds gather. No need to know where

    it goes. given $chunk.chars { when CHUNK_CHARS # complete chunk { my $i = 1 + $chunk.rindex( ‘>’ ); $next = $chunk.substr( $i ); take( offset, $chunk.substr( 0, $i ) ); } when 0 # empty $next after EOF. { put "\tInput complete, final chunk at $chunks."; } default # partial chunk on EOF. { $next = ‘’; take( offset, $chunk ~ ‘>’ ); }
  53. Chunky data Full, empty, or small chunk. given $chunk.chars {

    when CHUNK_CHARS # complete chunk { my $i = 1 + $chunk.rindex( ‘>’ ); $next = $chunk.substr( $i ); take( offset, $chunk.substr( 0, $i ) ); } when 0 # empty $next after EOF. { put "\tInput complete, final chunk at $chunks."; } default # partial chunk on EOF. { $next = ‘’; take( offset, $chunk ~ ‘>’ ); } }
  54. Chunky data Truncate chunk at final ‘>’. given $chunk.chars {

    when CHUNK_CHARS # complete chunk { my $i = 1 + $chunk.rindex( ‘>’ ); $next = $chunk.substr( $i ); take( offset, $chunk.substr( 0, $i ) ); } when 0 # empty $next after EOF. { put "\tInput complete, final chunk at $chunks."; } default # partial chunk on EOF. { $next = ‘’; take( offset, $chunk ~ ‘>’ ); } }
  55. Chunky data Take hands back two values. given $chunk.chars {

    when CHUNK_CHARS # complete chunk { my $i = 1 + $chunk.rindex( ‘>’ ); $next = $chunk.substr( $i ); take( offset, $chunk.substr( 0, $i ) ); } when 0 # empty $next after EOF. { put "\tInput complete, final chunk at $chunks."; } default # partial chunk on EOF. { $next = ‘’; take( offset, $chunk ~ ‘>’ ); } }
  56. Chunky data Subs are lexical to their defining scope. with

    open $pipe, :r -> $input { sub read-chunk ( --> Bool ) { $input.read( ... ) ... $chunk.Bool } }
  57. Chunky data Signatures are objects, define args & return type.

    with open $pipe, :r -> $input { sub read-chunk ( --> Bool ) { $input.read( ... ) ... $chunk.Bool } }
  58. Chunky data Return value requires explicit cast. with open $pipe,

    :r -> $input { sub read-chunk ( --> Bool ) { $input.read( ... )... $chunk.Bool } }
  59. sub read-chunk ( --> Bool ) { state $next =

    ‘’; state $chunk = ‘’; my \offset = $input.tell - $next.chars; $chunk = $next; for ( 1 .. $READ_COUNT ) { $chunk ~= $input.read( READ_BYTES ).decode( 'ascii' ); $input.eof and last; } given $chunk.chars { when CHUNK_CHARS { my $i = 1 + $chunk.rindex( EOR ); $next = $chunk.substr( $i ); take ( offset, $chunk.substr( 0, $i + 1 ) ); } when 0 { put "\tInput complete, final chunk at $chunks."; } default { $next = ''; take( offset, $chunk ~ ‘>’ ); } } $chunk.Bool } Chunked input handler: take accumulates data for caller. False from empty $chunk on EOF. Interface is simple and fast. Takes are passed back... Chunky data
  60. Sipping from a firehose … to the gather outside them.

    my @chunked-input = gather loop { read-chunk $input or last; };
  61. Sipping from a firehose Catch: @chunked-input reads all of the

    data. my @chunked-input = gather loop { read-chunk $input or last; };
  62. Sipping from a firehose lazy returns a promise. Populated on

    demand. my @chunked-input = lazy gather loop { read-chunk $input or last; };
  63. Sipping from a firehose Catch: @chunked-input still buffers input. Fix:

    \chunked-input is a bare promise. Delivers chunks without buffering them. my \chunked-input = lazy gather loop { read-chunk $input or last; };
  64. Sipping from a firehose Result: single-stream reader. Agnostic to take

    contents. my \chunked-input = lazy gather loop { read-chunk $input or last; };
  65. Sipping from a firehose Count chunks for the watchdog thread.

    my \chunked-input = lazy gather loop { read-chunk $input or last; ++$chunks; };
  66. chunked-input promises data. map reads it. One chunk at a

    time. chunked-input .map: { process-chunk |$_ } ; Sipping from a firehose
  67. take( $offset, $chunk ) saves list object. slip flattens object

    contents. Coordination between reader & processor, not map. chunked-input .map: { process-chunk |$_ } ; Sipping from a firehose
  68. Two-fisted drinking. hyper parallelizes an operator. Output guaranteed to be

    in the order of inputs. Order bookkeeping done under the hood. chunked-input .hyper( degree => $threads, batch => 1 ) .map: { process-chunk |$_ } ;
  69. Two-fisted drinking. Chunks are independent. race() doesn’t care about input

    order. Faster with independent inputs. chunked-input .race( degree => $threads ) .map: { process-chunk |$_ } ;
  70. Two-fisted drinking. Generic parallel dispatch in Raku: promise supplies data.

    map consumes it. chunked-input .race( degree => $threads ) .map: { process-chunk |$_ } ;
  71. Generic Raku ETL. Lazy reader: reader takes list. \inputs =

    lazy gather loop{ read-one or last }; sub read-next ( -->Bool ) { ... take (...); $data.Bool }; Threaded processor: slip flattens it out. inputs.hyper( threads = $x ).map: { process |$_ }; inputs.race( threads = $x ).map: { process |$_ };
  72. Generic Raku ETL. Two lines of code: \inputs = lazy

    gather loop{ read-next or last }; inputs.race( threads = $x ).map: { process |$_ };
  73. Process sequences For each chunk: Locate records. Skip headers. Extract

    sequence. Save Offset, Length, Digest, Sequence. Separate files for staged processing.
  74. Process sequences Two general approaches: Accumulate all records first, open

    and write files. Open files and write records as they are generated. Tradeoff: memory vs. system call rate. Buffer all sequence data or Lots of separate kernel calls.
  75. Take what you need Arguments expanded from |$_. read-chunk’s take

    == process-chunk’s args. sub process-chunk ( , Int:D \chunk_off , Str:D \chunk --> Nil ) { ... }
  76. Take what you need Arguments expanded from |$_. Returns nothing

    (map wasn’t assigned). sub process-chunk ( , Int:D \chunk_off , Str:D \chunk --> Nil ) { ... }
  77. Basename for output { my \stub = sprintf '%s.%08x', ROOT,

    offset; ... offset makes finding sequences easy. %x sorts lexically.
  78. Take what you need $i & $j are updated, need

    variables. start is static through the loop. my $i = 0; my $j = 0; my @seqs = gather loop { $i = chunk.index( "\n", $j ) or last; $j = chunk.index( '>', $i ) or die ... ; my \start = chunk_off + $j + 1;
  79. Take what you need substr & subst return Text objects.

    Suitable for daisy-chaining. my \seq = chunk.substr( $i, $j - $i ).subst( "\n", '', :g ); use Digest::MurmurHash3; my \hash = sprintf "%08x\t%08x", seq.chars, murmurhash3_32(seq,0); take [ start, hash, seq ]; }
  80. Take what you need use is lexically scoped. Avoids version

    collisions in multiple parts of code. my \seq = chunk.substr( $i, $j - $i ).subst( "\n", '', :g ); use Digest::MurmurHash3; my \hash = sprintf "%08x\t%08x", seq.chars, murmurhash3_32(seq,0); take [ start, hash, seq ]; }
  81. Take what you need take snags an array. Allows indexing

    on output. my \seq = chunk.substr( $i, $j - $i ).subst( "\n", '', :g ); use Digest::MurmurHash3; my \hash = sprintf "%08x\t%08x", seq.chars, murmurhash3_32(seq,0); take [ start, hash, seq ]; }
  82. Output what’s gathered < quoted word list > for <

    digest 1 4096 sequence 2 4096 > -> Str \name, Int \field, Int \buffsize { }
  83. Output what’s gathered Block signature takes three at a time.

    for < digest 1 4096 sequence 2 4096 > -> Str \name, Int \field, Int \bytes { }
  84. Output what’s gathered Adjust output buffer size. “True” is default,

    “False” is sync. for < digest 1 4096 sequence 2 4096 > -> Str \name, Int \field, Int \bytes { }
  85. Output what’s gathered ( list ).join.IO yields a file. Variable

    simplifies interpolation. my $file = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
  86. Output what’s gathered Open for write. my $file = (

    stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
  87. Output what’s gathered Phaser LEAVE called on exit from scope.

    Self-closing file handle. my $file = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
  88. Output what’s gathered Array slice uses ‘$’ not ‘@’. Where

    the sprintf “\t” comes in. my $file = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
  89. Output what’s gathered Stringy filehandle is the path. my $file

    = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
  90. Output what’s gathered Parallel writes. for < digest 1 …

    > { ... start { my $fh will leave { .close } = ...; $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; } }
  91. More than one way... Buffering all of @seqs can use

    a lot of memory. Nice if take in a closure or metadata. Allows genric process-chunks. Alternative: Immediate write.
  92. Instant gratification Store open files – no “on leave’. Requires

    separate close. my @filz = < digest 1 sequence 2 > .map: -> Str \name, Int \field { my $file = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; |( $file.open( :w, :enc('ascii') ), field ) } ;
  93. Instant gratification Block signature takes file handle & offset. loop

    { ... my \start = chunk_off + $j; ... for @filz -> \fh, \field { fh.say: ( start, hash, seq )[ 0, field ] } }
  94. Fewer files 14_000 to 200_000 files in one dir? XFS

    doesn’t mind. Simple to locate sequences for comparison. Smaller files speed up sequence compares.
  95. Fewer files Alternate: Re-cycle files by thread. my \stub =

    sprintf '%s.%02x', ROOT, $*THREAD.id; ... my $fh will leave { .close } = $file.open( :a, :enc('ascii') );
  96. Fewer files Basename uses thread number. Open in append mode

    at start of thread. my \stub = sprintf '%s.%02x', ROOT, $*THREAD.id; ... my $fh will leave { .close } = $file.open( :a, :enc('ascii') );
  97. Fewer files Catch: It is sloooooow. Overhead of append. With

    XFS using many files is twice the speed. my \stub = sprintf '%s.%02x', ROOT, $*THREAD.id; ... my $fh will leave { .close } = $file.open( :a, :enc('ascii') );
  98. my \chunked-input = lazy gather loop { state $prior =

    now; state $last = 0; read-chunk $input or last; $chunks % INTERVAL or do { my \after = now; my \curr = $chunks; my \rate = ( ( curr - $last ) * CHUNK_CHARS / Mi / ( after - $prior ) ).Int; $prior = after; $last = curr; print-stats label => "Chunk $chunks {rate} MiBaud."; VM.request-garbage-collection; }; ++$chunks; }; Logging
  99. Logging Feel free to mix your metaphores. $foo or {value}:

    both work. This is the real power: Choosing what makes sense. my \rate = ( ( curr - $last ) * CHUNK_CHARS / Mi / ( after - $prior ) ).Int; $prior = after; $last = curr; print-stats label => "Chunk $chunks {rate} MiBaud."; VM.request-garbage-collection; }; ++$chunks; };
  100. Benchmark Gather + Write vs. Immediate Write. 36 cores (via

    taskset) on 3 CPUS. 64 threads in race(). Chunk size 18, 19, 20, 21, 22, 24.
  101. Benchmark Wallclock Minimize. System time Hopefully small. User / Wallclock

    ~cpu utilization. MBaud Processing rate. Time includes housekeeping. Tradeoffs between file size and inode count.
  102. Benchmark Note: MOAR only uses threads it needs. race( degree=X

    ) != threads dispatched. degree=1000 dispatches ~60 threads.
  103. Benchmark Note: MOAR only uses threads it needs. race( degree=X

    ) != threads dispatched. degree=1000 dispatches ~60 threads. Ideally chunk processing ~ read time.
  104. Benchmark Pass1: Gather + write, 2 ** 17 chunk. label

    : Chunk 45056 20 MiBaud. output : 11 sample : maxrss : +104472 minflt : +59642 stime : +13.371429 13 / 929 ~ small. utime : +929.906223 929/25 ~ 36 cores active wtime : +25.104481
  105. Benchmark Pass1: Gather + write, 2 ** 18 chunk. Above

    this it slows down. label : Chunk 212992 21 MiBaud. output : 104 sample : maxrss : +3728 minflt : +33320 stime : +12.577622 utime : +902.396013 902/24 ~ 36 cores active wtime : +24.161267
  106. Benchmark Pass2: Immediate Write, 2**20 chunk 5380 / 173 ~

    31 active threads, 23MByte/sec. label : Chunk 286720 23 MiBaud. output : 70 sample : maxrss : +23016 minflt : +243763 stime : +60.411462 vs. 12.5 in Pass1. utime : +5380.4308 5380/173 ~ 31 cores wtime : +173.413913
  107. Benchmark Pass2: Immediate Write, 2**24 chunk Best throughput so far.

    label : Chunk 10240 26 MiBaud. output : 40 sample : majflt : +287 minflt : +1904178 stime : +66.829123 66 / 6600 ~ 10% system utime : +6485.117116 6485 / 155 = 41 cores wtime : +155.685644
  108. Benchmark Pass2: Immediate Write, chunk = 2**24 label : Total

    chunks: 141530 (1 .. 228da) Final : True output : 70 141 * 2 ** 21 / 12 ~ 25 MiB / sec sample : inblock : +0 majflt : +17 maxrss : +13813668 minflt : +23557458 oublock : +0 stime : +3955.09245 utime : +379836.841958 380 / 12 ~ 32 cores
  109. Benchmark Take + write handles smaller chunks. Less memory. More,

    smaller files. Better core utilization. Immediate write does better bigger chunks. Fewer threads & files.
  110. Summary Raku is well suited to ETL: Threading, data management

    straightforward. Handles large inputs. Declarative code offers convienent syntax. 20+ years of 20/20 hindsight in dynamic languages.
  111. Summary Even more than more than one way to do

    it: Functional programming an option with values. Speed advantage on really large tasks. Signitures make list handling more flexible. Use them or don’t, whichever works.
  112. https://docs.raku.org/ Nice documentation. Syntax & how-to. https://docs.raku.org/language.html X to Raku

    guides (show your friends): https://docs.raku.org/language/5to6-nutshell Tutorials. https://docs.raku.org/language/concurrency