Slide 1

Slide 1 text

Refactoring and Readability:
 Crouching Regex, Hidden Structures
 Perl & Raku Conference in the Cloud
 
 2020-06-24

Slide 2

Slide 2 text

Refactoring and Readability 1.5:
 The Pre-Sequel http://speakerdeck.com/util
 
 <<< >>> >>> <<<

Slide 3

Slide 3 text

Refactoring and Readability 2:
 Crouching Regex, Hidden Structures

Slide 4

Slide 4 text

Or

Slide 5

Slide 5 text

Refactoring and Readability 1.5 :
 The Pre-Sequel

Slide 6

Slide 6 text

/me
 
 Bruce Gray
 
 'Util'

Slide 7

Slide 7 text

Perl 6 ==> Raku

Slide 8

Slide 8 text

Face Off ( Rubin's Vase )

Slide 9

Slide 9 text

Fun for the whole family • Regular Expressions (Regex) • Multi-dimensional Data Structures • Trees •

Slide 10

Slide 10 text

Fun for the whole family • Regular Expressions (Regex) • Multi-dimensional Data Structures • Trees •

Slide 11

Slide 11 text

Fun for the whole family • Regular Expressions (Regex) • Multi-dimensional Data Structures • Trees •

Slide 12

Slide 12 text

Fun for the whole family • Regular Expressions (Regex) • Multi-dimensional Data Structures • Trees •

Slide 13

Slide 13 text

Fun for the whole family • Regular Expressions (Regex) • Multi-dimensional Data Structures • Trees • Q & A + Bonus!!!

Slide 14

Slide 14 text

Sinkholes
 in the
 space/code continuum or, how to you decide what to learn, when you cannot judge its value until *after* you learn it?

Slide 15

Slide 15 text

Last time, on Dragonball Z

Slide 16

Slide 16 text

Modes Fix Add Refactor

Slide 17

Slide 17 text


 Forces Surrounding Code Code Easy to write Standard form Ease of change Easy to read Performance Boundaries of Responsibility

Slide 18

Slide 18 text


 Forces Surrounding Code Code Easy to write Standard form Ease of change Easy to read Performance Boundaries of Responsibility

Slide 19

Slide 19 text

Regular Expressions
 Regex

Slide 20

Slide 20 text

–Jamie Zawinski “Some people, when confronted with a problem, think
 `I know, I'll use regular expressions.`
 Now they have two problems.”

Slide 21

Slide 21 text

File Globbing Regex . Literal dot Any single character ? Any single character Quantifier: Once or none * Any string without '/' Quantifier: Zero or more + Quantifier: One or more {2,4} Alternation: '2' or '4' Quantifier: Two to four times (2|4) Alternation: '2' or '4'

Slide 22

Slide 22 text

ack https://beyondgrep.com/ ag https://github.com/ggreer/the_silver_searcher rg https://blog.burntsushi.net/ripgrep/

Slide 23

Slide 23 text

$ find . -not -type d ./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 1.29.14 PM.png ./SS_20200503/Screen Shot 2020-04-24 at 8.08.38 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 2.35.18 PM.png ./SS_20200527/Screen Shot 2020-05-03 at 7.40.14 PM.png …

Slide 24

Slide 24 text

./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 1.29.14 PM.png ./SS_20200503/Screen Shot 2020-04-24 at 8.08.38 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 2.35.18 PM.png ./SS_20200527/Screen Shot 2020-05-03 at 7.40.14 PM.png …

Slide 25

Slide 25 text

./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 1.29.14 PM.png ./SS_20200503/Screen Shot 2020-04-24 at 8.08.38 PM.png ./SS_20200503/Screen Shot 2020-04-28 at 2.35.18 PM.png ./SS_20200527/Screen Shot 2020-05-03 at 7.40.14 PM.png …

Slide 26

Slide 26 text

./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 27

Slide 27 text

./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 28

Slide 28 text

./SS_20200503/Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 29

Slide 29 text

Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 30

Slide 30 text

Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 31

Slide 31 text

Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 32

Slide 32 text

Screen Shot 2020-05-01 at 4.45.30 PM.png ^ ^ ^

Slide 33

Slide 33 text

Screen Shot 2020-05-01 at 4.45.30 PM.png

Slide 34

Slide 34 text

Screen Shot 2020-\d5-01 at 4.45.30 PM.png

Slide 35

Slide 35 text

Screen Shot 2020-\d\d-01 at 4.45.30 PM.png

Slide 36

Slide 36 text

Screen Shot 2020-\d\d-\d1 at 4.45.30 PM.png

Slide 37

Slide 37 text

Screen Shot 2020-\d\d-\d\d at 4.45.30 PM.png

Slide 38

Slide 38 text

Screen Shot 2020-\d\d-\d\d at \d.45.30 PM.png

Slide 39

Slide 39 text

Screen Shot 2020-\d\d-\d\d at \d\.45.30 PM.png

Slide 40

Slide 40 text

Screen Shot 2020-\d\d-\d\d at \d\.\d5.30 PM.png

Slide 41

Slide 41 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d.30 PM.png

Slide 42

Slide 42 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.30 PM.png

Slide 43

Slide 43 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d0 PM.png

Slide 44

Slide 44 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM.png

Slide 45

Slide 45 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png

Slide 46

Slide 46 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$

Slide 47

Slide 47 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$

Slide 48

Slide 48 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$

Slide 49

Slide 49 text

Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$

Slide 50

Slide 50 text

'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 51

Slide 51 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 52

Slide 52 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 53

Slide 53 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 54

Slide 54 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 55

Slide 55 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 56

Slide 56 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$ '

Slide 57

Slide 57 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 58

Slide 58 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20190828/Screen Shot 2019-08-10 at 9.28.52 PM.png ./SS_20190828/Screen Shot 2019-08-15 at 8.59.19 PM.png …

Slide 59

Slide 59 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20190828/Screen Shot 2019-08-10 at 9.28.52 PM.png ./SS_20190828/Screen Shot 2019-08-15 at 8.59.19 PM.png …

Slide 60

Slide 60 text

$ find . -not -type d | ack -v \ 'Screen Shot 2020-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20190828/Screen Shot 2019-08-10 at 9.28.52 PM.png ./SS_20190828/Screen Shot 2019-08-15 at 8.59.19 PM.png …

Slide 61

Slide 61 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$'

Slide 62

Slide 62 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20200503/Screen Shot 2020-04-27 at 7.14.42 AM.png ./SS_20190828/Screen Shot 2019-08-21 at 8.04.16 AM.png …

Slide 63

Slide 63 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20200503/Screen Shot 2020-04-27 at 7.14.42 AM.png ./SS_20190828/Screen Shot 2019-08-21 at 8.04.16 AM.png …

Slide 64

Slide 64 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d PM\.png$' ./SS_20200503/Screen Shot 2020-04-27 at 7.14.42 AM.png ./SS_20190828/Screen Shot 2019-08-21 at 8.04.16 AM.png …

Slide 65

Slide 65 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-27 at 7.14.42 AM.png ./SS_20190828/Screen Shot 2019-08-21 at 8.04.16 AM.png …

Slide 66

Slide 66 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d [AP]M\.png$'

Slide 67

Slide 67 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 68

Slide 68 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 69

Slide 69 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 70

Slide 70 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{}\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 71

Slide 71 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{}\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 72

Slide 72 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{1,2}\.\d\d\.\d\d [AP]M\.png$' ./SS_20200503/Screen Shot 2020-04-29 at 11.52.35 PM.png ./SS_20190828/Screen Shot 2019-07-31 at 10.14.57 AM.png

Slide 73

Slide 73 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{1,2}\.\d\d\.\d\d [AP]M\.png$'

Slide 74

Slide 74 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{1,2}\.\d\d\.\d\d [AP]M\.png$' ./SS_20200111/color_wheel_2.numbers ./SS_20200111/color_wheel_1.numbers ./SS_20200131/RenFlorenceWalk.mp3 ./SS_20200131/spa-barcelona-city.mp3 ./SS_20200131/six_flags_20190112.txt ./SS_20190716/c_04.png ./SS_20190716/c_77_edge.png

Slide 75

Slide 75 text

https://xkcd.com/208/

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee.

Slide 79

Slide 79 text

http://regex101.com/

Slide 80

Slide 80 text

https://regexper.com/

Slide 81

Slide 81 text

$ find . -not -type d | ack -v \ 'Screen Shot 20\d\d-\d\d-\d\d at \d{1,2}\.\d\d\.\d\d [AP]M\.png$' ./SS_20200111/color_wheel_2.numbers ./SS_20200111/color_wheel_1.numbers ./SS_20200131/RenFlorenceWalk.mp3 ./SS_20200131/spa-barcelona-city.mp3 ./SS_20200131/six_flags_20190112.txt ./SS_20190716/c_04.png ./SS_20190716/c_77_edge.png

Slide 82

Slide 82 text

• Navigation • Extraction • Exploration • Validation •

Slide 83

Slide 83 text

ftp://www13.warehouse.org/path/file-listing http://www.lexcorp.com/ http://www.umbrella.com/hive/plan_for_reopening

Slide 84

Slide 84 text

my $end_of_prefix = -1; for my $scheme ( qw ) { my $prefix = $scheme . '://'; my $p_len = length $prefix; if ( substr( $url, 0, $p_len ) eq $prefix ) { $end_of_prefix = $p_len; last; } } if ( $end_of_prefix == -1 ) { warn "Unexpected URL format: $url"; next; } my $slash_pos = index $url, '/', $end_of_prefix; if ( $slash_pos == -1 ) { warn "Unexpected URL format: $url"; next; } my $dot_pos = rindex $url, '.', $slash_pos; if ( $dot_pos == -1 ) { warn "Unexpected URL format: $url"; next; } my $start = $end_of_prefix; if ( my $i = index substr( $url, $start, $dot_pos - $start ), '.' ) { $start += $i + 1 if $i != -1; } say substr $url, $start, $dot_pos - $start;

Slide 85

Slide 85 text

my $url_re = qr{ ^ (https?|ftp) :// ([^/]+ \.)? ([^/]+) \. ([^/\.]+) / }msx; $url =~ /$url_re/ or warn "Unexpected URL format: $url" and next; say $3;

Slide 86

Slide 86 text

• Navigation • Extraction • Exploration • Validation •

Slide 87

Slide 87 text

• Navigation • Extraction • Exploration • Validation • Understanding

Slide 88

Slide 88 text

find . -ls

Slide 89

Slide 89 text

$ find . -ls 48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR

Slide 90

Slide 90 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line;

Slide 91

Slide 91 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR ^^ ^^ my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line;

Slide 92

Slide 92 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9;

Slide 93

Slide 93 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line;

Slide 94

Slide 94 text

my @F = split /\s+/, $line, 9; shift @F if $F[0] eq ''; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = @F;

Slide 95

Slide 95 text

my @F = split /\s+/, $line, 9; if ( $F[0] eq '' ) { ( my $empty, @F ) = split /\s+/, $line, 10; } my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = @F;

Slide 96

Slide 96 text

$line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9;

Slide 97

Slide 97 text

$line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9;

Slide 98

Slide 98 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR $line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9;

Slide 99

Slide 99 text

$line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9; my ( $year, $hour, $minute ); if ( substr($mod_time, 2, 1) eq ':' ) { ($hour, $minute) = split ':', $mod_time; } else { $year = $mod_time }

Slide 100

Slide 100 text

$line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9; my ( $year, $hour, $minute ); if ( substr($mod_time, 2, 1) eq ':' ) { ($hour, $minute) = split ':', $mod_time; } else { $year = $mod_time }

Slide 101

Slide 101 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff also 2336 Jul 31 2019 ./SS_20190828 - HPMoR $line =~ s{^\s+}{}; my ( $inode, $blocks, $perms, $links, $owner, $group, $size, $mod_time, $path ) = split /\s+/, $line, 9; my ( $year, $hour, $minute ); if ( substr($mod_time, 2, 1) eq ':' ) { ($hour, $minute) = split ':', $mod_time; } else { $year = $mod_time }

Slide 102

Slide 102 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 103

Slide 103 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 104

Slide 104 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 105

Slide 105 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 106

Slide 106 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 107

Slide 107 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff 2336 Jul 31 2019 ./SS_20190828 - HPMoR my $mod_re = qr{ (? Jan|Feb|Mar|Apr|May|Jun| Jul|Aug|Sep|Oct|Nov|Dec) [ ]{1,2} (? [1-3]?\d) [ ] (?: (? [0-2]\d:[0-5]\d) | [ ](? \d\d\d\d) ) }msx;

Slide 108

Slide 108 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 109

Slide 109 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 110

Slide 110 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 111

Slide 111 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 112

Slide 112 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 113

Slide 113 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 114

Slide 114 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 115

Slide 115 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 116

Slide 116 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 117

Slide 117 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 118

Slide 118 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 119

Slide 119 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 120

Slide 120 text

$line =~ /$find_ls_re/ or die "Failed to match '$line'"; print $line if $+{SIZE} > 1024;

Slide 121

Slide 121 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 122

Slide 122 text

grammar find_dash_ls { rule TOP { \s* } token inode { \d+ } token blocks { \d+ } token perms { <[\-dlcbsp]> <[\-r]> <[\-w]> <[\-xs]> <[\-r]> <[\-w]> <[\-xsS]> <[\-r]> <[\-w]> <[\-xtT]> } token links { \d+ } token owner { \w+ } token group { \w+ } token size { \d+ } token path { \S .* } constant @days = 1 .. 31; constant @months = ; token month { @months } token day { @days } token hour { <[0..2]> \d } token minute { <[0..5]> \d } token year { \d\d\d\d } rule modified { [ || ':' ] } }

Slide 123

Slide 123 text

my $re = qr{ \A \s* (? \d+ ) \s+ (? \d+ ) \s+ (? [-dlcbsp] [-r] [-w] [-xs] [-r] [-w] [-xsS] [-r] [-w] [-xtT] ) \s+ (? \d+ ) \s+ (? \w+ ) \s+ (? \w+ ) \s+ (? \d+ ) \s+ (? $mod_re ) \s+ (? .+? ) \s* \Z }msx;

Slide 124

Slide 124 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff also 2336 Jul 31 2019 ./SS_20190828 - HPMoR

Slide 125

Slide 125 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff also 2336 Jul 31 2019 ./SS_20190828 - HPMoR $line =~ /$find_ls_re/ or die "Failed to match '$line'";

Slide 126

Slide 126 text

48707625 0 drwxr-xr-x 38 bruce_pro staff 1216 Jun 11 09:32 ./SS_20200503 37742599 0 drwxr-xr-x 73 bruce_pro staff also 2336 Jul 31 2019 ./SS_20190828 - HPMoR #use re 'debug'; # Uncomment to get a trace. $line =~ /$find_ls_re/ or die "Failed to match '$line'"; 18 < 0 > | 5| 20:OPEN3 'PERMS'(22) 18 < 0 > | 5| 22:ANYOF[\-bcdlps](33) 19 < 0 d> | 5| 33:ANYOF[\-r](44)

Slide 127

Slide 127 text

• Navigation • Extraction • Exploration • Validation • Understanding

Slide 128

Slide 128 text

No content

Slide 129

Slide 129 text

No content

Slide 130

Slide 130 text

Regex Resources • http://regex.info/book.html
 The O'Reilly book "Mastering Regular Expressions"
 - Perl, Python, and more…even covers .NET ! • https://www.rubyguides.com/2015/06/ruby-regex/
 Ruby-oriented • https://towardsdatascience.com/regular-expressions- explained-c9bce508e672
 Python-oriented • https://learning.oreilly.com/videos/understanding-regular- expressions/9781491996300
 Damian Conway's 5-hour video tutorial (subscriber-only)

Slide 131

Slide 131 text

Multi-Dimensional Data Structures

Slide 132

Slide 132 text

Single-Dimensional Data Structures • Array: List, Vector, Stack, Queue…
 
 
 • Hash: Map, Record, Dictionary…

Slide 133

Slide 133 text

Single-Dimensional Data Structures • Array; List, Vector, Stack, Queue…
 
 
 • Hash; Map, Record, Dictionary…

Slide 134

Slide 134 text

AoWhat? • AoA Array of Arrays (or List of Lists) • AoH Array of Hashes • HoA Hash of Arrays • HoH Hash of Hashes • AoHoHoA Array of Hashes of Hashes of Arrays

Slide 135

Slide 135 text

Atlanta.pm says: • Who could conceptualize more than 3 levels?

Slide 136

Slide 136 text

Atlanta.pm says: • Who could conceptualize more than 3 levels?

Slide 137

Slide 137 text

Atlanta.pm says: • Who could conceptualize more than 3 levels?

Slide 138

Slide 138 text

Util 1.1.1

Slide 139

Slide 139 text

AoHoHoA • A Hospital •

Slide 140

Slide 140 text

AoHoHoA • A Hospital • has multiple numbered floors •

Slide 141

Slide 141 text

AoHoHoA • A Hospital • has multiple numbered floors • with multiple wards/units per floor •

Slide 142

Slide 142 text

AoHoHoA • A Hospital • has multiple numbered floors • with multiple wards/units per floor • with multiple nurses per ward •

Slide 143

Slide 143 text

AoHoHoA • A Hospital • has multiple numbered floors • with multiple wards/units per floor • with multiple nurses per ward • with a list of patients, in room-number order •

Slide 144

Slide 144 text

AoHoHoA • A Hospital • has multiple numbered floors • with multiple wards/units per floor • with multiple nurses per ward • with a list of patients, in room-number order • $hospital->[3]{'MedSurg'}{'Sarah'}[4];

Slide 145

Slide 145 text

Herman Munster 1313 Mockingbird Lane Mockingbird Heights Sherlock Holmes 221b Baker Street London …

Slide 146

Slide 146 text

@name = ( 'Herman Munster', 'Sherlock Holmes', # … ); @addr = … @city = …

Slide 147

Slide 147 text

for my $i ( keys @name ) { say join ':', $name[$i], $addr[$i], $city[$i]; }

Slide 148

Slide 148 text

for my $i ( reverse keys @name ) { next if $name[$i] !~ /herman/i; splice @name, $i, 1; splice @addr, $i, 1; splice @city, $i, 1; }

Slide 149

Slide 149 text

for my $i ( reverse keys @name ) { next if $name[$i] !~ /herman/i; splice @name, $i, 1; splice @addr, $i, 1; splice @city, $i, 1; }

Slide 150

Slide 150 text

my @to_keep = grep { $name[$i] !~ /herman/i } keys @name; @name = @name[@to_keep]; @addr = @addr[@to_keep]; @city = @city[@to_keep];

Slide 151

Slide 151 text

my @to_keep = grep { $name[$i] !~ /herman/i } keys @name; @name = @name[@to_keep]; @addr = @addr[@to_keep]; @city = @city[@to_keep];

Slide 152

Slide 152 text

https://perldoc.perl.org/functions/keys.html keys HASH keys ARRAY Called in list context, returns a list consisting of all the keys of the named hash, or in Perl 5.12 or later only, the indices of an array.

Slide 153

Slide 153 text

@name = ( 'Herman Munster', 'Sherlock Holmes', # … ); @addr = … @city = …

Slide 154

Slide 154 text

my @people_AoH = ( { NAME => 'Herman Munster', ADDR => '1313 Mockingbird Lane', CITY => 'Mockingbird Heights', }, { NAME => 'Sherlock Holmes', ADDR => '221b Baker Street', CITY => 'London', }, # … many more people loaded from the file … );

Slide 155

Slide 155 text

my @people_AoH = ( { NAME => 'Herman Munster', ADDR => '1313 Mockingbird Lane', CITY => 'Mockingbird Heights', }, { NAME => 'Sherlock Holmes', ADDR => '221b Baker Street', CITY => 'London', }, # … many more people loaded from the file … );

Slide 156

Slide 156 text

my @people_AoH = ( { NAME => 'Herman Munster', ADDR => '1313 Mockingbird Lane', CITY => 'Mockingbird Heights', }, { NAME => 'Sherlock Holmes', ADDR => '221b Baker Street', CITY => 'London', }, # … many more people loaded from the file … );

Slide 157

Slide 157 text

my @to_keep = grep { $name[$i] !~ /herman/i } keys @name; @name = @name[@to_keep]; @addr = @addr[@to_keep]; @city = @city[@to_keep];

Slide 158

Slide 158 text

@people_AoH = grep { $->{NAME} !~ /herman/i } @people_AoH;

Slide 159

Slide 159 text

@people_AoH .= grep: *. !~~ m:i/herman/;

Slide 160

Slide 160 text

Harry Potter

Slide 161

Slide 161 text

HPMoR: Harry Potter and the Methods of Rationality

Slide 162

Slide 162 text

Seventh Horcrux

Slide 163

Slide 163 text

Potter Who and the Wossname's Thingummy

Slide 164

Slide 164 text

half-blood prince Harry Potter "He accused me of being Dumbledore's man through and through. ... I told him I was." HPMoR Narrator And from that day onward, no matter what Hermione tried to tell anyone, it would be an accepted legend of Hogwarts that Harry Potter could make absolutely anything happen by snapping his fingers. HPMoR Narrator Either Harry Potter had thought of a lot of very good ideas very fast, or for some unimaginable reason he'd already spent a lot of time working out how to fight underwater. HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award for the Worst Acting in the History of Ever. wossnames thingummy The Potter M:"...they called me...do you know what they called me?" Flame and air? "The Sorting Hat called you brilliant," he said. "That's a Ravenclaw tie you're wearing. Where's your wand, Myrtle Smith?" wossnames thingummy The Potter "90% of a cup of coffee is the smell. And this is 200% coffee, Jamaican Blue Mountain." And it was working: Myrtle Smith was now available in colour. seventh horcrux Potter In retrospect, I have absolutely no idea how Horcruxes work. seventh horcrux Potter "Gryffindor." S.Hat: "But you would do so well in Slytherin" "I've already done well in Slytherin. Now I want to do well, in *Gryffindor*." seventh horcrux Potter "Hermione," I said sweetly, "Do you want to be friends?" Merlin bless the simple interactions of children. HPMoR Harry Potter (to Umbridge) "I make you this one offer," said the Boy-Who-Lived. "I never learn that you've been interfering with me or any of mine. And you never find out why the unkillable soul-eating monster is scared of me. Now sit down and shut up." HPMoR Greengrass The enemy is attacking Hogwarts students... And Hogwarts, is going to *fight* *back*.”

Slide 165

Slide 165 text

HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award for the Worst Acting in the History of Ever.

Slide 166

Slide 166 text

half-blood prince Harry Potter "He accused me of being Dumbledore's man through and through. ... I told him I was." HPMoR Narrator And from that day onward, no matter what Hermione tried to tell anyone, it would be an accepted legend of Hogwarts that Harry Potter could make absolutely anything happen by snapping his fingers. HPMoR Narrator Either Harry Potter had thought of a lot of very good ideas very fast, or for some unimaginable reason he'd already spent a lot of time working out how to fight underwater. HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award for the Worst Acting in the History of Ever. wossnames thingummy The Potter M:"...they called me...do you know what they called me?" Flame and air? "The Sorting Hat called you brilliant," he said. "That's a Ravenclaw tie you're wearing. Where's your wand, Myrtle Smith?" wossnames thingummy The Potter "90% of a cup of coffee is the smell. And this is 200% coffee, Jamaican Blue Mountain." And it was working: Myrtle Smith was now available in colour. seventh horcrux Potter In retrospect, I have absolutely no idea how Horcruxes work. seventh horcrux Potter "Gryffindor." S.Hat: "But you would do so well in Slytherin" "I've already done well in Slytherin. Now I want to do well, in *Gryffindor*." seventh horcrux Potter "Hermione," I said sweetly, "Do you want to be friends?" Merlin bless the simple interactions of children. HPMoR Harry Potter (to Umbridge) "I make you this one offer," said the Boy-Who-Lived. "I never learn that you've been interfering with me or any of mine. And you never find out why the unkillable soul-eating monster is scared of me. Now sit down and shut up." HPMoR Greengrass The enemy is attacking Hogwarts students... And Hogwarts, is going to *fight* *back*.”

Slide 167

Slide 167 text

HPMoR Greengrass The enemy is attacking Hogwarts students... HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award HPMoR Harry Potter (to Umbridge) "I make you this one offer," said the Boy-Who-Lived. HPMoR Narrator And from that day onward, no matter what Hermione tried to tell anyone, HPMoR Narrator Either Harry Potter had thought of a lot of very good ideas very fast, half-blood prince Harry Potter "He accused me of being Dumbledore's man through and through. seventh horcrux Potter In retrospect, I have absolutely no idea how Horcruxes work. seventh horcrux Potter "Gryffindor." seventh horcrux Potter "Hermione," I said sweetly, "Do you want to be friends?" wossnames thingummyThe Potter M:"...they called me...do you know what they called me?" wossnames thingummyThe Potter "90% of a cup of coffee is the smell. And this is 200% coffee, Jamaican Blue Mountain."

Slide 168

Slide 168 text

HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award for the Worst Acting in the History of Ever. Book Name Character Name One or more lines describing the Moment

Slide 169

Slide 169 text

Book Character First line that described the Moment Tabs HPMoR Harry Potter Harry's Internal Critic promptly awarded h…

Slide 170

Slide 170 text

HPMoR Harry Potter Harry's Internal Critic promptly awarded him the All-Time Award for the Worst Acting in the History of Ever. Book Name Character Name One or more lines describing the Moment Book Character First line that described the Moment Tabs HPMoR Harry Potter Harry's Internal Critic promptly awarded h…

Slide 171

Slide 171 text

idea from
 
 TVTropes.org

Slide 172

Slide 172 text

SQLite FTW?

Slide 173

Slide 173 text

GROUP BY book, character ORDER BY book, character

Slide 174

Slide 174 text

# After writing @lines out to tempfile: @lines = `sort -nr -k 5,7 tempfile`; # …versus keeping it in Perl: @lines = { $b->[4] <=> $a->[4] or $b->[6] <=> $a->[6] } @lines;

Slide 175

Slide 175 text

Summary should be… • clustered by Book • clustered by Character • •

Slide 176

Slide 176 text

Remaining data looks like… • • • multiple (kept-in-order) Moments • multiple (kept-in-order) lines of text per Moment.

Slide 177

Slide 177 text

HoHoAoA • clustered by Book • clustered by Character • multiple (kept-in-order) Moments • multiple (kept-in-order) lines of text per Moment.

Slide 178

Slide 178 text

%main{book}{char}[moment_num][line_num]

Slide 179

Slide 179 text

%main{book}{char}[moment_num][line_num]

Slide 180

Slide 180 text

Took a Level in Badass

Slide 181

Slide 181 text

use 5.010; $/ = ''; # Paragraph mode my %book_char_lines_HoHoAoA; while (<>) { chomp; my ( $book, $character, @lines ) = split "\n"; push @{ $book_char_lines_HoHoAoA{$book}{$character} }, [@lines]; } for my $book ( sort keys %book_char_lines_HoHoAoA ) { for my $char ( sort keys %{ $book_char_lines_HoHoAoA{$book} } ) { for my $aref ( @{ $book_char_lines_HoHoAoA{$book}{$char} } ) { say join "\t", $book, $char, $aref->[0]; } } say ''; }

Slide 182

Slide 182 text

use 5.024; $/ = ''; # Paragraph mode my %book_char_lines_HoHoAoA; while (<>) { chomp; my ( $book, $character, @lines ) = split "\n"; push $book_char_lines_HoHoAoA{$book}{$character}->@*, [@lines]; } for my $book ( sort keys %book_char_lines_HoHoAoA ) { for my $char ( sort keys $book_char_lines_HoHoAoA{$book}->%* ) { for my $aref ( $book_char_lines_HoHoAoA{$book}{$char}->@* ) { say join "\t", $book, $char, $aref->[0]; } } say ''; }

Slide 183

Slide 183 text

my %book_char_lines_HoHoAoA; for 'cmoa.txt'.IO.slurp.split("\n\n") { my ( $book, $character, @lines ) = .split: "\n"; push %book_char_lines_HoHoAoA{$book}{$character}, [@lines]; } for %book_char_lines_HoHoAoA .keys.sort -> $book { for %book_char_lines_HoHoAoA{$book} .keys.sort -> $char { for %book_char_lines_HoHoAoA{$book}{$char}.list -> @lines { say join "\t", $book, $char, @lines[0]; } } say ''; }

Slide 184

Slide 184 text

my %book_char_lines_HoHoAoA; for 'cmoa.txt'.IO.slurp.split("\n\n") { my ( $book, $character, @lines ) = .split: "\n"; push %book_char_lines_HoHoAoA{$book}{$character}, [@lines]; } # More efficient. Less readable, or more readable? for %book_char_lines_HoHoAoA.sort -> (:key($book), :value(%book_hash )) { for %book_hash\ .sort -> (:key($char), :value(@char_array)) { for @char_array -> @lines { say join "\t", $book, $char, @lines[0]; } } say ''; }

Slide 185

Slide 185 text

If that sounds good…

Slide 186

Slide 186 text

Deep DS Resources • https://perldoc.perl.org/perldsc.html
 Perl: see also: perllol, perlref, and perlreftut.html • https://www.perlmonks.org/?node_id=172833
 Auto-Vivification Explanation • https://web.stanford.edu/class/archive/cs/cs106ap/ cs106ap.1198/lectures/15-NestedCollections/15- Nested_Data_Structures.pdf
 Python: Nested DS starts at page 70

Slide 187

Slide 187 text

Trees

Slide 188

Slide 188 text

%main{book}{char}[moment_num][line_num]

Slide 189

Slide 189 text

Trees in the wild • Objects (Nodes) • Direction (Edges) • Hierarchy (Parent/Child) • Order (Siblings) • Navigation (Methods)

Slide 190

Slide 190 text

Doc 1 Stuff
2000-08-17

Slide 191

Slide 191 text

Doc 1 Stuff
2000-08-17 Doc 1 Stuff
2000-08-17

Slide 192

Slide 192 text

Doc 1 Stuff
2000-08-17 • html • head • title • "Doc 1" • body • "Stuff" • hr • "2000-08-17" Doc 1 Stuff
2000-08-17

Slide 193

Slide 193 text

Doc 1 Stuff
2000-08-17 • html • head • title • "Doc 1" • body • "Stuff" • hr • "2000-08-17" html / \ head body / / | \ title "Stuff" hr "2000-08-17" | "Doc 1" Doc 1 Stuff
2000-08-17

Slide 194

Slide 194 text

html / \ head body / / | \ title "Stuff" hr "2000-08-17" | "Doc 1"

Slide 195

Slide 195 text

html / \ head body / / | \ title "Stuff" hr "2000-08-17" | "Doc 1"

Slide 196

Slide 196 text

Doc 1 Stuff
2000-08-17

Slide 197

Slide 197 text

No content

Slide 198

Slide 198 text

No content

Slide 199

Slide 199 text

No content

Slide 200

Slide 200 text

No content

Slide 201

Slide 201 text

No content

Slide 202

Slide 202 text

No content

Slide 203

Slide 203 text

No content

Slide 204

Slide 204 text

curl -o weather.html 'https://www.wunderground.com/weather/us/al/auburn'

Slide 205

Slide 205 text

No content

Slide 206

Slide 206 text

N
0

Gusts 2 mph

Slide 207

Slide 207 text

N
0

Gusts 2 mph

Slide 208

Slide 208 text

N
0

Gusts 2 mph

Slide 209

Slide 209 text

N
0

Gusts 2 mph

Slide 210

Slide 210 text

N
0

Gusts 2 mph

Slide 211

Slide 211 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 212

Slide 212 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 213

Slide 213 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 214

Slide 214 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 215

Slide 215 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 216

Slide 216 text

use Modern::Perl; use File::Slurp; $_ = read_file('weather.html'); s{\A.+
]*class="condition-wind[^>]*>((?:.+?
){6}).+\z}{$1}ms; s{]+>}{ }msg; s{ } { }msg; tr{ }{}s; s{^\s+}{}; s{\s+$}{}; say;

Slide 217

Slide 217 text

N 0 Gusts 2 mph

Slide 218

Slide 218 text

my $from_div_to_strong = / '
]>* 'class="condition-wind' <-[>]>* '>' [ .+? '
' ] ** 6 /; my $tag_or_nbsp = / '<' '/'? <-[>]>+ '>' | ' ' /; say 'weather.html'.IO.slurp\ .match( $from_div_to_strong )\ .subst( $tag_or_nbsp, ' ', :global )\ .trans( ' ' => ' ', :squash )\ .trim;

Slide 219

Slide 219 text

use TokeParser;

Slide 220

Slide 220 text

primary clank in this location

Slide 221

Slide 221 text

primary clank in this location

Slide 222

Slide 222 text

primary clank in this location

Slide 223

Slide 223 text

primary clank in this location

Slide 224

Slide 224 text

primary clank in this location

Slide 225

Slide 225 text

use Modern::Perl; use HTML::TokeParser; my $p = HTML::TokeParser->new('weather.html') or die "Can't open: $!"; $p->empty_element_tags(1); my $in_wind = 0; my $level = 0; my @texts; # this_loop_is_on_the_next_slide(); s/ / /g for @texts; say @texts;

Slide 226

Slide 226 text

while ( my $t = $p->get_token ) { my $type = $t->[0]; if ( $type eq 'S' ) { my ( $tag, $attr ) = @{$t}[1,2]; $in_wind = 1 if $tag eq 'div' and ($attr->{class} // '') =~ /^condition-wind /; $level++ if $in_wind; } elsif ( $type eq 'E' and $in_wind ) { my $tag = $t->[1]; $level--; $in_wind = 0 if $level == 0; } elsif ( $type eq 'T' and $in_wind ) { push @texts, $t->[1]; } }

Slide 227

Slide 227 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 228

Slide 228 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 229

Slide 229 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 230

Slide 230 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 231

Slide 231 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 232

Slide 232 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 233

Slide 233 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 234

Slide 234 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 235

Slide 235 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 236

Slide 236 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 237

Slide 237 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 238

Slide 238 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 239

Slide 239 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 240

Slide 240 text

while ( my $t = $p->get_token ) { last if $t->[0] eq 'S' and $t->[1] eq 'div' and($t->[2]{class} // '') =~ /^condition-wind /; } my ( $level, @texts ) = (1); while ( my $t = $p->get_token ) { if ( $t->[0] eq 'S' ) { $level++ } elsif ( $t->[0] eq 'E' ) { $level--; last if !$level; } elsif ( $t->[0] eq 'T' ) { push @texts, $t->[1]; } }

Slide 241

Slide 241 text

use HTML::TreeBuilder;

Slide 242

Slide 242 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die;

Slide 243

Slide 243 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down(

Slide 244

Slide 244 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div',

Slide 245

Slide 245 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, );

Slide 246

Slide 246 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); $wind

Slide 247

Slide 247 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); $wind->as_trimmed_text

Slide 248

Slide 248 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); $wind->as_trimmed_text( extra_chars => '\xA0' );

Slide 249

Slide 249 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); say $wind->as_trimmed_text( extra_chars => '\xA0' );

Slide 250

Slide 250 text

N 0 Gusts 2 mph

Slide 251

Slide 251 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); say $wind->as_trimmed_text( extra_chars => '\xA0' );

Slide 252

Slide 252 text

use Modern::Perl; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('weather.html') or die; my $wind = $tree->look_down( _tag => 'div', class => qr/^condition-wind /, ); say $wind->as_trimmed_text( extra_chars => '\xA0' );

Slide 253

Slide 253 text

Micro-languages • Strings • Regex • Trees • XPath • JSON • jq • JSONPath

Slide 254

Slide 254 text

Tree Resources • https://metacpan.org/pod/HTML::Tree::Scanning • https://www.perlmonks.org/?node_id=153259
 Introduction to the more generic Tree::DAG_Node • https://perlhacks.com/2014/04/data-munging-perl/
 Book: Data Munging with Perl
 Now freely available for download! • https://docs.python-guide.org/scenarios/scrape/ • https://stackoverflow.com/questions/14172028/html- parse-tree-using-python-2-7
 Answers also for Javascript

Slide 255

Slide 255 text

Q&A

Slide 256

Slide 256 text

Refactoring and Readability 1.5:
 The Pre-Sequel http://speakerdeck.com/util
 
 <<< >>> >>> <<<

Slide 257

Slide 257 text

Bonus Round

Slide 258

Slide 258 text

Static Analysis • Lint (and other linters) • Scanners for style issues and common bugs: • Perl::Critic • RubyCritic • Rubocop • Prospector
 • Scanners for security issues: • Brakeman • Bandit • Assisted Refactoring • PyCharm • ReSharper

Slide 259

Slide 259 text

Static Analysis • Lint (and other linters) • Scanners for style issues and common bugs: • Perl::Critic • RubyCritic • Rubocop • Prospector
 • Scanners for security issues: • Brakeman • Bandit • Assisted Refactoring • PyCharm • ReSharper • Automated Refactoring • Blue Tiger

Slide 260

Slide 260 text

Refactoring and Readability 1.5:
 The Pre-Sequel http://speakerdeck.com/util
 
 <<< >>> >>> <<<

Slide 261

Slide 261 text

Thanks!

Slide 262

Slide 262 text

Copyrights

Slide 263

Slide 263 text

Copyright Information: Images • Camelia • (c) 2009 by Larry Wall
 http://github.com/perl6/mu/raw/master/misc/ camelia.txt • Regular Expressions • © Randall Munroe
 https://xkcd.com/208/ • Brantley Grayson Wolters • (c) 2011 by his mother, Amy Wolters
 Util 1.1.1 is my eldest grandchild
 (in RCS numbering)

Slide 264

Slide 264 text

Copyright Information: This Talk This work is licensed under a Creative Commons Attribution 4.0 International License. CC BY https://creativecommons.org/licenses/by/4.0/ (email me for the original Apple Keynote .key file)

Slide 265

Slide 265 text

History • v 1.01 2020-06-12
 Presented at Southeast LinuxFest
 60 minutes with Q&A • v 1.02 2020-06-24
 Presented at Perl & Raku Conference in the Cloud
 50 minutes with Q&A

Slide 266

Slide 266 text

Removed
 (Not presented, 
 but maybe worth reading)

Slide 267

Slide 267 text

Key knowledge (Perl) • List::Util first max min sum all any none uniq shuffle • List::UtilsBy max_by min_by count_by • perlfunc Perl Functions by Category • Test::Tutorial Intro to Automated Testing • Devel::Cover Shows code that lacks testing • Benchmark Performance comparisons • Devel::NYTProf Perfomance profiler

Slide 268

Slide 268 text

my @to_keep = grep { $name[$i] !~ /herman/i } keys @name; @name = @name[@to_keep]; @addr = @addr[@to_keep]; @city = @city[@to_keep];

Slide 269

Slide 269 text

"N" "0"

" Gusts " "2" "mph"

Slide 270

Slide 270 text

XXX If that sounds good…

Slide 271

Slide 271 text

Static Analysis • Lint (and other linters) • Scanners for style issues and common bugs: • Perl::Critic • RubyCritic • Rubocop • Prospector
 • Scanners for security issues: • Brakeman • Bandit • Assisted Refactoring • PyCharm • ReSharper • Automated Refactoring • Blue Tiger

Slide 272

Slide 272 text

• TODO: • Add Regex history • Credit Damian on P6 Regex • Review all long slide text, and split

Slide 273

Slide 273 text

Raku == 
 Perl 5 minus Warts plus Awesome