Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Raku Regexen: Reducing line noise in your code.

Raku Regexen: Reducing line noise in your code.

Perl's Regular Expression engine is the envy of all other languages -- notice how many use PCRE. You can parse almost anything with Perl's RE, if you don't go cross-eyed trying to read them first...

A bit of 20/20 hindsight is helping all of our vision these days, with Raku and its grammars. They allow for saner, layered processing of even complicated content.

This talk looks at basics of Raku RE and how they are used in grammars for parsing some real-world examples.

Steven Lembark
PRO

July 09, 2022
Tweet

More Decks by Steven Lembark

Other Decks in Technology

Transcript

  1. Raku Regexen:
    Reducing line noise in your code.
    Steven Lembark
    Workhorse Computing
    [email protected]

    View Slide

  2. The difference
    … prefer readabity over compactness.
    – Larry Wall
    Regexes you can read.
    Code you can maintain.
    Do what I mean need.

    View Slide

  3. Aside: Just what is line noise?
    Random g88;d^C\@x*fhe screen.
    Due to acous*^&(*&;dljfa;dskjf
    tic interference.

    View Slide

  4. Raku
    Is more than "just syntax".
    New way to approach problems.
    Also true with regexen.
    Not just isolated "pattern matching"

    View Slide

  5. The old way
    RX-from-hell with alternations.
    Regex tokenizes, code parses.
    Interwined mess of code and regexes.
    Order-of-execution errors abound.

    View Slide

  6. Parse *.ini file
    my $header = qr{ ^ \h* \[ (? [^][]+ ) \] \h* $ }xm;
    my $property = qr{ ^ \h* (? .+? ) \h* = \h* (? .+ ) $ }xm;
    my $comment = qr{ ^ \h* \# }xm;
    my $empty_line = qr{ ^ \h* $}xm;

    View Slide

  7. Parse *.ini file
    for my $nextline (readline($INI_FILE)) {
    # If it's a header line, start a new section...
    if ($nextline =~ /$header/) {
    $section = $config{ $+{name} } //= {};
    }
    # If it's a property, add the key and value to the current section...
    elsif ($nextline =~ /$property/) {
    $section->{ $+{key} } = $+{value};
    }
    # Ignore comments or empty lines
    elsif ($nextline =~ /$comment|$empty_line/) {
    # Do nothing
    }
    # Report anything else as a probable error...
    else {
    warn "Invalid data in INI file at line $.\n"
    . "\t$nextline\n";
    }
    }

    View Slide

  8. Maintainable?
    Inter-related order of code and rx.
    Code changes affect rx?
    Rx changes affect code?
    Find out: Try it and see...

    View Slide

  9. Raku: Grammars
    Structure in one place.
    Declarative.
    No iterative code.

    View Slide

  10. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  11. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  12. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  13. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  14. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  15. Tokens & Structure
    grammar INI
    {
    token TOP { * }
    token section { [ ^ | ] }
    token header { '[' $ = <-[ \[ \] \n ]>+ ']' \h* \n }
    token block { [ | <.emptylines> | <.comment> ]* }
    token property { \h* $=\N+? \h* '=' \h* $=\N+ \n }
    token comment { ^^ \h* '#' \N* \n }
    token emptylines { [ ^^ \h* \n ]+ }
    }

    View Slide

  16. Process the content
    class INI::hash_builder
    {
    method TOP ($/) { make %( $».ast ) }
    method section ($/) { make ~($//'') => $.ast }
    method block ($/) { make %( $».ast ) }
    method property ($/) { make ~$ => ~$ }
    }
    my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
    say %config.perl;

    View Slide

  17. Process the content
    class INI::hash_builder
    {
    method TOP ($/) { make %( $».ast ) }
    method section ($/) { make ~($//'') => $.ast }
    method block ($/) { make %( $».ast ) }
    method property ($/) { make ~$ => ~$ }
    }
    my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
    say %config.perl;

    View Slide

  18. Process the content
    class INI::hash_builder
    {
    method TOP ($/) { make %( $».ast ) }
    method section ($/) { make ~($//'') => $.ast }
    method block ($/) { make %( $».ast ) }
    method property ($/) { make ~$ => ~$ }
    }
    my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
    say %config.perl;

    View Slide

  19. Process the content
    class INI::hash_builder
    {
    method TOP ($/) { make %( $».ast ) }
    method section ($/) { make ~($//'') => $.ast }
    method block ($/) { make %( $».ast ) }
    method property ($/) { make ~$ => ~$ }
    }
    my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
    say %config.perl;

    View Slide

  20. Process the content
    class INI::hash_builder
    {
    method TOP ($/) { make %( $».ast ) }
    method section ($/) { make ~($//'') => $.ast }
    method block ($/) { make %( $».ast ) }
    method property ($/) { make ~$ => ~$ }
    }
    my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast;
    say %config.perl;

    View Slide

  21. Care to guess what this does?
    /{~}[email protected]#$%^(&*)-+=[\/]:;"'<.,>?/

    View Slide

  22. Care to guess what this does?
    Q: Which char's match themselves?
    /{~}[email protected]#$%^(&*)-+=[\/]:;"'<.,>?/

    View Slide

  23. Care to guess what this does?
    Q: Which char's match themselves?
    A: None, in Raku, since they are punctuation.
    /{~}[email protected]#$%^(&*)-+=[\/]:;"'<.,>?/

    View Slide

  24. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

    View Slide

  25. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

    View Slide

  26. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

    View Slide

  27. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

    View Slide

  28. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x

    View Slide

  29. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' /

    View Slide

  30. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' / Consume WS

    View Slide

  31. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' / Literals

    View Slide

  32. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' / Non-capturing match

    View Slide

  33. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' / Separator

    View Slide

  34. Saner metachars
    Match integers enclosed in braces: [ 1, 2, 3 ]
    Perl 5:
    / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
    Raku:
    / :s '[' [\d+]* % ',' ']' /

    View Slide

  35. Aside: Future speed
    Raku regexen execute as DFA where possible.
    Opportunity for much faster execution.
    For some definition of ...

    View Slide

  36. Aside: Future speed
    Raku regexen execute as DFA where possible.
    Also benefit from saner results.
    DFA more likely to Do What I Mean.
    Alternations match best, not first match.

    View Slide

  37. Smartest matching
    Raku:
    say $& if 'Which regex engine is smartest?'
    =~ /smart|smarter|smartest/;

    View Slide

  38. Smartest matching
    Raku:
    say $& if 'Which regex engine is smartest?'
    =~ /smart|smarter|smartest/;

    View Slide

  39. Smartest matching
    Raku:
    say $& if 'Which regex engine is smartest?'
    =~ /smart|smarter|smartest/;
    Perl 6:
    say $/ if 'Which regex engine is smartest?'
    ~~ /smart|smarter|smartest/;

    View Slide

  40. Smartest matching
    Raku:
    say $& if 'Which regex engine is smartest?'
    =~ /smart|smarter|smartest/;
    Perl 6:
    say $/ if 'Which regex engine is smartest?'
    ~~ /smart|smarter|smartest/;

    View Slide

  41. Smartest matching
    Raku:
    say $& if 'Which regex engine is smartest?'
    =~ /smart|smarter|smartest/;
    Perl 6:
    say $/ if 'Which regex engine is smartest?'
    ~~ /smart|smarter|smartest/;

    View Slide

  42. Perl5 nested structure
    #! /usr/bin/env perl
    use 5.010;
    # We're going to need this to extract hierarchical data structures...
    our @stack = [];
    my $LIST = qr{
    # Match this...
    (?&NESTED)
    # Which is defined as...
    (?(DEFINE)
    (?
    # Keep track of recursions on a stack...
    (?{ local @::stack = (@::stack, []); })
    # Match a list of items...
    \[ \s* (?>
    (?&ITEM)
    (?:
    \s* , \s* (?&ITEM)
    )*+
    )? \s*
    \]
    # Pop the stack and add that frame to the growing data structure...
    (?{ local @::stack = @::stack;
    my $nested = pop @stack;
    push @{$::stack[-1]}, $nested;
    })
    )
    # For each item, push it onto the stack if it's a leaf node...
    (?
    (\d+) (?{ push @{$stack[-1]}, $^N })
    | (?&NESTED)
    )
    )
    }x;
    # Match, extracting a data structure...
    '[1,2,[3,3,[4,4]],5]' =~ /$LIST/;
    # Retrieve the data structure...
    my $parse_tree = pop @stack;
    # Show it...
    use Data::Dumper 'Dumper';
    say Dumper($parse_tree);

    View Slide

  43. Perl6 nested grammar
    #! /usr/bin/env rakudo
    use v6;
    # Define the structure of a list...
    grammar LIST {
    rule TOP { '[' * % ',' ']' }
    token ITEM { \d+ | }
    }
    # Define how to convert list elements to a suitable data structure...
    class TREE {
    method TOP ($/) { make [ $».ast ] }
    method ITEM ($/) { make $.ast // +$/ }
    }
    # Parse, extracting the data structure...
    my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast;
    say $parse_tree.perl;

    View Slide

  44. Perl6 nested regex
    #! /usr/bin/env rakudo
    use v6;
    '[1,2,[3,3,[4,4]],5]'
    ~~ /'[' [ (\d+ ) | $<0>=<~~> ]* % ',' ']' /;
    say $/;

    View Slide

  45. Perl6 nested regex
    #! /usr/bin/env rakudo
    use v6;
    '[1,2,[3,3,[4,4]],5]'
    ~~ /'[' [ (\d+) | $<0>=<~~> ]* % ',' ']' /;
    say $/;

    View Slide

  46. In Raku
    Regexen are saner.

    View Slide

  47. In Raku
    Regexen are saner.
    Grammars offer cleaner code.

    View Slide

  48. In Raku
    Regexen are saner.
    Grammars offer cleaner code.
    Smart matching works.

    View Slide

  49. In Raku
    Regexen are saner.
    Grammars offer cleaner code.
    Smart matching works.
    Objects have useful methods.

    View Slide

  50. In Raku
    Regexen are saner.
    Grammars offer cleaner code.
    Smart matching works.
    Objects have useful methods.
    It's new.

    View Slide

  51. In Raku
    Regexen are saner.
    Grammars offer cleaner code.
    Smart matching works.
    Objects have useful methods.
    It's worth learning.

    View Slide