Raku Regexen: Reducing line noise in your code. - Speaker Deck

Tweet

Tweet

Slide 1

Slide 1 text

Raku Regexen: Reducing line noise in your code. Steven Lembark Workhorse Computing [email protected]

Slide 2

Slide 2 text

The difference … prefer readabity over compactness. – Larry Wall Regexes you can read. Code you can maintain. Do what I mean need.

Slide 3

Slide 3 text

Aside: Just what is line noise? Random g88;d^C\@x*fhe screen. Due to acous*^&(*&;dljfa;dskjf tic interference.

Slide 4

Slide 4 text

Raku Is more than "just syntax". New way to approach problems. Also true with regexen. Not just isolated "pattern matching"

Slide 5

Slide 5 text

The old way RX-from-hell with alternations. Regex tokenizes, code parses. Interwined mess of code and regexes. Order-of-execution errors abound.

Slide 6

Slide 6 text

Parse *.ini file my $header = qr{ ^ \h* \[ (? [^][]+ ) \] \h* $ }xm; my $property = qr{ ^ \h* (? .+? ) \h* = \h* (? .+ ) $ }xm; my $comment = qr{ ^ \h* \# }xm; my $empty_line = qr{ ^ \h* $}xm;

Slide 7

Slide 7 text

Parse *.ini file for my $nextline (readline($INI_FILE)) { # If it's a header line, start a new section... if ($nextline =~ /$header/) { $section = $config{ $+{name} } //= {}; } # If it's a property, add the key and value to the current section... elsif ($nextline =~ /$property/) { $section->{ $+{key} } = $+{value}; } # Ignore comments or empty lines elsif ($nextline =~ /$comment|$empty_line/) { # Do nothing } # Report anything else as a probable error... else { warn "Invalid data in INI file at line $.\n" . "\t$nextline\n"; } }

Slide 8

Slide 8 text

Maintainable? Inter-related order of code and rx. Code changes affect rx? Rx changes affect code? Find out: Try it and see...

Slide 9

Slide 9 text

Raku: Grammars Structure in one place. Declarative. No iterative code.

Slide 10

Slide 10 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 11

Slide 11 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 12

Slide 12 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 13

Slide 13 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 14

Slide 14 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 15

Slide 15 text

Tokens & Structure grammar INI { token TOP { * } token section { [ ^ | ] } token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ | <.emptylines> | <.comment> ]* } token property { \h* $=\N+? \h* '=' \h* $=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }

Slide 16

Slide 16 text

Process the content class INI::hash_builder { method TOP ($/) { make %( $».ast ) } method section ($/) { make ~($//'') => $.ast } method block ($/) { make %( $».ast ) } method property ($/) { make ~$ => ~$ } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;

Slide 17

Slide 17 text

Process the content class INI::hash_builder { method TOP ($/) { make %( $».ast ) } method section ($/) { make ~($//'') => $.ast } method block ($/) { make %( $».ast ) } method property ($/) { make ~$ => ~$ } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;

Slide 18

Slide 18 text

Process the content class INI::hash_builder { method TOP ($/) { make %( $».ast ) } method section ($/) { make ~($//'') => $.ast } method block ($/) { make %( $».ast ) } method property ($/) { make ~$ => ~$ } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;

Slide 19

Slide 19 text

Process the content class INI::hash_builder { method TOP ($/) { make %( $».ast ) } method section ($/) { make ~($//'') => $.ast } method block ($/) { make %( $».ast ) } method property ($/) { make ~$ => ~$ } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;

Slide 20

Slide 20 text

Process the content class INI::hash_builder { method TOP ($/) { make %( $».ast ) } method section ($/) { make ~($//'') => $.ast } method block ($/) { make %( $».ast ) } method property ($/) { make ~$ => ~$ } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;

Slide 21

Slide 21 text

Care to guess what this does? /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/

Slide 22

Slide 22 text

Care to guess what this does? Q: Which char's match themselves? /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/

Slide 23

Slide 23 text

Care to guess what this does? Q: Which char's match themselves? A: None, in Raku, since they are punctuation. /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/

Slide 24

Slide 24 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

Slide 25

Slide 25 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

Slide 26

Slide 26 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

Slide 27

Slide 27 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

Slide 28

Slide 28 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

Slide 29

Slide 29 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /

Slide 30

Slide 30 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Consume WS

Slide 31

Slide 31 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Literals

Slide 32

Slide 32 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Non-capturing match

Slide 33

Slide 33 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Separator

Slide 34

Slide 34 text

Saner metachars Match integers enclosed in braces: [ 1, 2, 3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /

Slide 35

Slide 35 text

Aside: Future speed Raku regexen execute as DFA where possible. Opportunity for much faster execution. For some definition of ...

Slide 36

Slide 36 text

Aside: Future speed Raku regexen execute as DFA where possible. Also benefit from saner results. DFA more likely to Do What I Mean. Alternations match best, not first match.

Slide 37

Slide 37 text

Smartest matching Raku: say $& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/;

Slide 38

Slide 38 text

Smartest matching Raku: say $& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/;

Slide 39

Slide 39 text

Smartest matching Raku: say $& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;

Slide 40

Slide 40 text

Smartest matching Raku: say $& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;

Slide 41

Slide 41 text

Smartest matching Raku: say $& if 'Which regex engine is smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;

Slide 42

Slide 42 text

Perl5 nested structure #! /usr/bin/env perl use 5.010; # We're going to need this to extract hierarchical data structures... our @stack = []; my $LIST = qr{ # Match this... (?&NESTED) # Which is defined as... (?(DEFINE) (? # Keep track of recursions on a stack... (?{ local @::stack = (@::stack, []); }) # Match a list of items... \[ \s* (?> (?&ITEM) (?: \s* , \s* (?&ITEM) )*+ )? \s* \] # Pop the stack and add that frame to the growing data structure... (?{ local @::stack = @::stack; my $nested = pop @stack; push @{$::stack[-1]}, $nested; }) ) # For each item, push it onto the stack if it's a leaf node... (? (\d+) (?{ push @{$stack[-1]}, $^N }) | (?&NESTED) ) ) }x; # Match, extracting a data structure... '[1,2,[3,3,[4,4]],5]' =~ /$LIST/; # Retrieve the data structure... my $parse_tree = pop @stack; # Show it... use Data::Dumper 'Dumper'; say Dumper($parse_tree);

Slide 43

Slide 43 text

Perl6 nested grammar #! /usr/bin/env rakudo use v6; # Define the structure of a list... grammar LIST { rule TOP { '[' * % ',' ']' } token ITEM { \d+ | } } # Define how to convert list elements to a suitable data structure... class TREE { method TOP ($/) { make [ $».ast ] } method ITEM ($/) { make $.ast // +$/ } } # Parse, extracting the data structure... my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast; say $parse_tree.perl;

Slide 44

Slide 44 text

Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~ /'[' [ (\d+ ) | $<0>=<~~> ]* % ',' ']' /; say $/;

Slide 45

Slide 45 text

Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~ /'[' [ (\d+) | $<0>=<~~> ]* % ',' ']' /; say $/;

Slide 46

Slide 46 text

In Raku Regexen are saner.

Slide 47

Slide 47 text

In Raku Regexen are saner. Grammars offer cleaner code.

Slide 48

Slide 48 text

In Raku Regexen are saner. Grammars offer cleaner code. Smart matching works.

Slide 49

Slide 49 text

In Raku Regexen are saner. Grammars offer cleaner code. Smart matching works. Objects have useful methods.

Slide 50

Slide 50 text

In Raku Regexen are saner. Grammars offer cleaner code. Smart matching works. Objects have useful methods. It's new.

Slide 51

Slide 51 text

In Raku Regexen are saner. Grammars offer cleaner code. Smart matching works. Objects have useful methods. It's worth learning.