my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
$_. when does a smartmatch. my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
takes. take can be anywhere: block, sub-call... my $extract = gather given $path { when /[.]gz $/ { take 'gzip -dc' } when /[.]bz 2? $/ { take 'bzip2 -dc' } when /[.]xz $/ { take 'xzcat' } default { die "Unknown file type:'$path' (gz,bz,xz)." } };
$pipe = gather with $*PROGRAM.dirname.IO.add( 'read.p' )->$sanity { $sanity.e or die "Non-existant: '$sanity'."; $sanity.r or die "Non-readable: '$sanity'."; take $sanity; # doesn’t need to be at the end! say "Pipe: '$sanity'"; }
and write files. Open files and write records as they are generated. Tradeoff: memory vs. system call rate. Buffer all sequence data or Lots of separate kernel calls.
variables. start is static through the loop. my $i = 0; my $j = 0; my @seqs = gather loop { $i = chunk.index( "\n", $j ) or last; $j = chunk.index( '>', $i ) or die ... ; my \start = chunk_off + $j + 1;
collisions in multiple parts of code. my \seq = chunk.substr( $i, $j - $i ).subst( "\n", '', :g ); use Digest::MurmurHash3; my \hash = sprintf "%08x\t%08x", seq.chars, murmurhash3_32(seq,0); take [ start, hash, seq ]; }
stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
the sprintf “\t” comes in. my $file = ( stub, name, 'tsv' ).join('.').IO; $file.e and die "Collision: '$file'"; my $fh will leave { .close } = $file.open: :w, :enc('ascii') ), :out-buffer( bytes ); $fh.say: $_[ 0, field ].join( "\t" ) for @seqs; say "\t$fh”;
XFS using many files is twice the speed. my \stub = sprintf '%s.%02x', ROOT, $*THREAD.id; ... my $fh will leave { .close } = $file.open( :a, :enc('ascii') );
both work. This is the real power: Choosing what makes sense. my \rate = ( ( curr - $last ) * CHUNK_CHARS / Mi / ( after - $prior ) ).Int; $prior = after; $last = curr; print-stats label => "Chunk $chunks {rate} MiBaud."; VM.request-garbage-collection; }; ++$chunks; };
it: Functional programming an option with values. Speed advantage on really large tasks. Signitures make list handling more flexible. Use them or don’t, whichever works.