Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Raku Regexen: Reducing line noise in your code.

Raku Regexen: Reducing line noise in your code.

Perl's Regular Expression engine is the envy of all other languages -- notice how many use PCRE. You can parse almost anything with Perl's RE, if you don't go cross-eyed trying to read them first...

A bit of 20/20 hindsight is helping all of our vision these days, with Raku and its grammars. They allow for saner, layered processing of even complicated content.

This talk looks at basics of Raku RE and how they are used in grammars for parsing some real-world examples.

Steven Lembark

July 09, 2022
Tweet

More Decks by Steven Lembark

Other Decks in Technology

Transcript

  1. The difference … prefer readabity over compactness. – Larry Wall

    Regexes you can read. Code you can maintain. Do what I mean need.
  2. Aside: Just what is line noise? Random g88;d^C\@x*fhe screen. Due

    to acous*^&(*&;dljfa;dskjf tic interference.
  3. Raku Is more than "just syntax". New way to approach

    problems. Also true with regexen. Not just isolated "pattern matching"
  4. The old way RX-from-hell with alternations. Regex tokenizes, code parses.

    Interwined mess of code and regexes. Order-of-execution errors abound.
  5. Parse *.ini file my $header = qr{ ^ \h* \[

    (?<name> [^][]+ ) \] \h* $ }xm; my $property = qr{ ^ \h* (?<key> .+? ) \h* = \h* (?<value> .+ ) $ }xm; my $comment = qr{ ^ \h* \# }xm; my $empty_line = qr{ ^ \h* $}xm;
  6. Parse *.ini file for my $nextline (readline($INI_FILE)) { # If

    it's a header line, start a new section... if ($nextline =~ /$header/) { $section = $config{ $+{name} } //= {}; } # If it's a property, add the key and value to the current section... elsif ($nextline =~ /$property/) { $section->{ $+{key} } = $+{value}; } # Ignore comments or empty lines elsif ($nextline =~ /$comment|$empty_line/) { # Do nothing } # Report anything else as a probable error... else { warn "Invalid data in INI file at line $.\n" . "\t$nextline\n"; } }
  7. Maintainable? Inter-related order of code and rx. Code changes affect

    rx? Rx changes affect code? Find out: Try it and see...
  8. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  9. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  10. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  11. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  12. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  13. Tokens & Structure grammar INI { token TOP { <section>*

    } token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
  14. Process the content class INI::hash_builder { method TOP ($/) {

    make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  15. Process the content class INI::hash_builder { method TOP ($/) {

    make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  16. Process the content class INI::hash_builder { method TOP ($/) {

    make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  17. Process the content class INI::hash_builder { method TOP ($/) {

    make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  18. Process the content class INI::hash_builder { method TOP ($/) {

    make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
  19. Care to guess what this does? Q: Which char's match

    themselves? /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/
  20. Care to guess what this does? Q: Which char's match

    themselves? A: None, in Raku, since they are punctuation. /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/
  21. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
  22. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
  23. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
  24. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
  25. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
  26. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /
  27. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Consume WS
  28. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Literals
  29. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Non-capturing match
  30. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Separator
  31. Saner metachars Match integers enclosed in braces: [ 1, 2,

    3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /
  32. Aside: Future speed Raku regexen execute as DFA where possible.

    Opportunity for much faster execution. For some definition of ...
  33. Aside: Future speed Raku regexen execute as DFA where possible.

    Also benefit from saner results. DFA more likely to Do What I Mean. Alternations match best, not first match.
  34. Smartest matching Raku: say $& if 'Which regex engine is

    smartest?' =~ /smart|smarter|smartest/;
  35. Smartest matching Raku: say $& if 'Which regex engine is

    smartest?' =~ /smart|smarter|smartest/;
  36. Smartest matching Raku: say $& if 'Which regex engine is

    smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  37. Smartest matching Raku: say $& if 'Which regex engine is

    smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  38. Smartest matching Raku: say $& if 'Which regex engine is

    smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
  39. Perl5 nested structure #! /usr/bin/env perl use 5.010; # We're

    going to need this to extract hierarchical data structures... our @stack = []; my $LIST = qr{ # Match this... (?&NESTED) # Which is defined as... (?(DEFINE) (?<NESTED> # Keep track of recursions on a stack... (?{ local @::stack = (@::stack, []); }) # Match a list of items... \[ \s* (?> (?&ITEM) (?: \s* , \s* (?&ITEM) )*+ )? \s* \] # Pop the stack and add that frame to the growing data structure... (?{ local @::stack = @::stack; my $nested = pop @stack; push @{$::stack[-1]}, $nested; }) ) # For each item, push it onto the stack if it's a leaf node... (?<ITEM> (\d+) (?{ push @{$stack[-1]}, $^N }) | (?&NESTED) ) ) }x; # Match, extracting a data structure... '[1,2,[3,3,[4,4]],5]' =~ /$LIST/; # Retrieve the data structure... my $parse_tree = pop @stack; # Show it... use Data::Dumper 'Dumper'; say Dumper($parse_tree);
  40. Perl6 nested grammar #! /usr/bin/env rakudo use v6; # Define

    the structure of a list... grammar LIST { rule TOP { '[' <ITEM>* % ',' ']' } token ITEM { \d+ | <TOP> } } # Define how to convert list elements to a suitable data structure... class TREE { method TOP ($/) { make [ $<ITEM>».ast ] } method ITEM ($/) { make $<TOP>.ast // +$/ } } # Parse, extracting the data structure... my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast; say $parse_tree.perl;
  41. Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~

    /'[' [ (\d+ ) | $<0>=<~~> ]* % ',' ']' /; say $/;
  42. Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~

    /'[' [ (\d+) | $<0>=<~~> ]* % ',' ']' /; say $/;
  43. In Raku Regexen are saner. Grammars offer cleaner code. Smart

    matching works. Objects have useful methods.
  44. In Raku Regexen are saner. Grammars offer cleaner code. Smart

    matching works. Objects have useful methods. It's new.
  45. In Raku Regexen are saner. Grammars offer cleaner code. Smart

    matching works. Objects have useful methods. It's worth learning.