Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Command-line Filters: Reprised

Command-line Filters: Reprised

Starting with the bare basics of command-line pipes and redirection, I further cover the standard "Unix filter" form of code, and the benefits of nudging old code into this form.

These techniques are my bread-and-butter for exploratory programming and data analysis, and include a few of my best tricks.

Slides will cover variants for (Raku|Perl|Python) and (Mac|Windows|Unix).

Filters are the basic tools of command-line data munging. grep head cut sort uniq wc diff, pipes + redirects, intro + advanced tips. Why stop there? *You* can write your *own*. Perl, Raku, & Python have strong support for writing filters. Existing code often works better when refactored as a filter.

Bruce Gray

July 12, 2023
Tweet

More Decks by Bruce Gray

Other Decks in Programming

Transcript

  1. Order of Battle • What are fi lters? • How

    to use fi lters? < | > >> • • •
  2. Order of Battle • What are fi lters? • How

    to use fi lters? < | > >> • Really? • •
  3. Order of Battle • What are fi lters? • How

    to use fi lters? < | > >> • Really! • •
  4. Order of Battle • What are fi lters? • How

    to use fi lters? < | > >> • Really! • HTF? •
  5. Order of Battle • What are fi lters? • How

    to use fi lters? < | > >> • Really! • HTF? • Now You!
  6. Tedious,
 and Well Covered Elsewhere • Getting a command line


    (Terminal, CMD, xterm...) • Current directory
 (and Windows "Drive Letter") • Text Editor
 (not Word Processor!) • wsl --install
  7. Very Useful,
 but Not Covered This Year • Command-line Perl

    • -e and -E • -n and -p • -a and -F • -l and -0777 • BEGIN and END
  8. perl -0777 -wnE 's{\rMSH}{\nMSH}g; s{\r\z}{\n}; my @L = split "\n";


    s{\A.+\r(OBR)}{$1} for @L; s{\r.+\z}{} for @L; tr{|}{\t} for @L; for (@L) {my @F = split "\t"; $ARGV =~ /lab_(\d\d\d\d)-/ or warn; say "$1\t$F[20]"}' lab_*.txt > ../assessions.txt
  9. Redirect, Pipe • seq 6 12 • seq 6 12

    > a.txt • seq 6 12 >> a.txt • less a.txt • less < a.txt • seq 6 12 | less
  10. Redirect, Pipe • seq 6 12 • seq 6 12

    > a.txt • seq 6 12 >> a.txt • less a.txt • less < a.txt • seq 6 12 | less
  11. Redirect, Pipe • seq 6 12 • seq 6 12

    > a.txt • seq 6 12 >> a.txt • less a.txt • less < a.txt • seq 6 12 | less
  12. Redirect, Pipe • seq 6 12 • seq 6 12

    > a.txt • seq 6 12 >> a.txt • less a.txt • less < a.txt • seq 6 12 | less
  13. Redirect, Pipe • seq 6 12 • seq 6 12

    > a.txt • seq 6 12 >> a.txt • less a.txt • less < a.txt • seq 6 12 | less
  14. $ seq 6 12 > a.a $ grep 1 a.a

    10 11 12 $ grep 1 <a.a
  15. $ seq 6 12 > a.a $ grep 1 a.a

    10 11 12 $ grep 1 <a.a $ cat a.a | grep 1
  16. $ seq 6 12 > a.a $ grep 1 a.a

    10 11 12 $ grep 1 <a.a $ cat a.a | grep 1
  17. $ seq 6 12 > a.a $ grep 1 a.a

    10 11 12 $ grep 1 <a.a $ cat a.a | grep 1
  18. $ seq 6 12 > a.a $ grep 1 a.a

    10 11 12 $ grep 1 <a.a $ cat a.a | grep 1 $ seq 6 12 | grep 1
  19. https://csvkit.readthedocs.io/ • csvcut • csvgrep • csvclean • csvjoin •

    csvsort • csvstack • csvformat • csvjson • csvlook • csvpy • csvsql • csvstat • in2csv • sql2csv
  20. https://csvkit.readthedocs.io/ • csvcut • csvgrep • csvclean • csvjoin •

    csvsort • csvstack • csvformat • csvjson • csvlook • csvpy • csvsql • csvstat • in2csv • sql2csv
  21. my_prog > a.txt (edit) my_prog > b.txt; diff -u {a,b}.txt

    | wc 0 On Windows, you may need `a.txt b.txt` instead of `{a,b}.txt`
  22. my_prog > a.txt (edit) my_prog > b.txt; diff -u {a,b}.txt

    | wc my_prog > b.txt; diff -u {a,b}.txt >
 ab.diff; wc ab.diff
  23. my_prog > a.txt (edit) my_prog > b.txt; diff -u {a,b}.txt

    | wc my_prog > b.txt; diff -u {a,b}.txt >
 ab.diff; wc ab.diff e ab.diff
  24. my_prog > a.txt (edit) my_prog > b.txt; diff -u {a,b}.txt

    | wc my_prog > b.txt; diff -u {a,b}.txt >
 ab.diff; wc ab.diff e ab.diff mv b.txt a.txt
  25. grep -Rh '#!' . | sort | uniq -c |

    sort -nr | head -14 556 #!./perl 322 #!/usr/bin/perl -w 300 #!/usr/bin/perl 196 #!perl 178 #!./perl -w 126 #!perl -w 37 #!/bin/sh 31 #!/usr/local/bin/perl 24 #!/pro/bin/perl 20 #!/usr/bin/perl -Tw 19 #!/usr/bin/perl 15 #! /bin/sh 13 #!/usr/bin/perl -wT 13 #!/usr/bin/env perl
  26. grep -Rh '#!' . | sort | uniq -c |

    sort -nr | head -14 556 #!./perl 322 #!/usr/bin/perl -w 300 #!/usr/bin/perl 196 #!perl 178 #!./perl -w 126 #!perl -w 37 #!/bin/sh 31 #!/usr/local/bin/perl 24 #!/pro/bin/perl 20 #!/usr/bin/perl -Tw 19 #!/usr/bin/perl 15 #! /bin/sh 13 #!/usr/bin/perl -wT 13 #!/usr/bin/env perl
  27. __END__ Adapted from original one-liner: perl -wpE 'if ($. !=

    1) {s/^"\d+",// or die}' v_o.csv | sort | uniq -c | sort -nr
  28. How it works • open my $fh, '<', 'my fi

    le.txt'; • Magic cookie • • • •
  29. How it works • open my $fh, '<', 'my fi

    le.txt'; • Magic cookie • 0,1,2 • STDIN, STDOUT, STDERR • $*IN, $*OUT, $*ERR • Sane defaults, invention of STDERR
  30. use autodie; my $input_path = './name1_to_edit_by_hand.txt'; my $output_path = './name2_to_edit_by_hand.txt';

    open my $fh_in, '<', $input_path; open my $fh_out, '>', $output_path; while ( my $line = <$fh_in> ) { chomp $line; my $output = do_something_with($line); say {$out_fh} $output; }
  31. use autodie; my $input_path = './name1_to_edit_by_hand.txt'; my $output_path = './name2_to_edit_by_hand.txt';

    open my $fh_in, '<', $input_path; open my $fh_out, '>', $output_path; while ( my $line = <$fh_in> ) { chomp $line; my $output = do_something_with($line); say {$out_fh} $output; }
  32. use autodie; my $input_path = './name1_to_edit_by_hand.txt'; my $output_path = './name2_to_edit_by_hand.txt';

    open my $fh_in, '<', $input_path; open my $fh_out, '>', $output_path; while ( my $line = <$fh_in> ) { chomp $line; my $output = do_something_with($line); say {$out_fh} $output; }
  33. #!/usr/bin/env python3 import fileinput for line in fileinput.input(): line =

    line.rstrip('\r\n') print(">>", line, "<<") https://docs.python.org/3/library/fileinput.html #!/usr/bin/env perl use 5.36.0; while (<>) { chomp; say ">> $_ <<"; } https://perldoc.perl.org/perlop#I/O-Operators #!/usr/bin/env raku use v6; for lines() { say ">> $_ <<"; } https://docs.raku.org/type/IO::CatHandle
  34. #!/usr/bin/env python3 import fileinput for line in fileinput.input(): line =

    line.rstrip('\r\n') print(">>", line, "<<") https://docs.python.org/3/library/fileinput.html #!/usr/bin/env perl use 5.36.0; while (<>) { chomp; say ">> $_ <<"; } https://perldoc.perl.org/perlop#I/O-Operators #!/usr/bin/env raku use v6; for lines() { say ">> $_ <<"; } https://docs.raku.org/type/IO::CatHandle
  35. #!/usr/bin/env python3 import fileinput for line in fileinput.input(): line =

    line.rstrip('\r\n') print(">>", line, "<<") https://docs.python.org/3/library/fileinput.html #!/usr/bin/env perl use 5.36.0; while (<>) { chomp; say ">> $_ <<"; } https://perldoc.perl.org/perlop#I/O-Operators #!/usr/bin/env raku use v6; for lines() { say ">> $_ <<"; } https://docs.raku.org/type/IO::CatHandle
  36. #!/usr/bin/env python3 import fileinput for line in fileinput.input(): line =

    line.rstrip('\r\n') print(">>", line, "<<") https://docs.python.org/3/library/ fi leinput.html #!/usr/bin/env perl use 5.36.0; while (<>) { chomp; say ">> $_ <<"; } https://perldoc.perl.org/perlop#I/O-Operators #!/usr/bin/env raku use v6; for lines() { say ">> $_ <<"; } https://docs.raku.org/type/IO::CatHandle
  37. # XXX Remove after testing !!! my $default_file = 'C:/path/to/trimmed.dat';

    push @ARGV, $default_file if not @ARGV; @ARGV ||= $default_file;
  38. # XXX Remove after testing !!! my $default_file = 'C:/path/to/trimmed.dat';

    push @ARGV, $default_file if not @ARGV; @ARGV ||= $default_file; # Fails sometimes
  39. Further Reading • https://utcc.utoronto.ca/~cks/space/blog/python/ ProgramFilterVsWrapper
 Programs as wrappers versus fi

    lters of other programs • http://catb.org/~esr/writings/taoup/html/ ch01s06.html
 Basics of the Unix Philosophy
  40. Copyright Information: Images and Video • Camelia • © 2009

    by Larry Wall
 http://github.com/perl6/mu/raw/master/misc/camelia.txt • Yellow Laundry Ticket • © (assumed) TC Dry Clean Supply of Houston, TX
 Used without permission, even though we are *in* Houston!
 https://drycleansupply.com/product-category/invoice-tag- thermal-ribbon/laundry-ticket-tag/ •
  41. Copyright Information: This Talk This work is licensed under a

    Creative Commons Attribution 4.0 International License. CC BY https://creativecommons.org/licenses/by/4.0/ (email me for the original Apple Keynote .key fi le)
  42. History • v 0.98 2022-06-02
 Presented incomplete version to Atlanta

    Perlmongers
 • v 1.00 2022-06-24
 Presented fi nal version to The Perl and Raku Conference in Houston, TX, USA
 • v 1.01 2023-07-12
 Reprised at The Perl and Raku Conference in Toronto, Ontario, CA