Raku Regexen: Reducing line noise in your code.

Steven Lembark PRO

July 09, 2022

33

Raku Regexen: Reducing line noise in your code.

Perl's Regular Expression engine is the envy of all other languages -- notice how many use PCRE. You can parse almost anything with Perl's RE, if you don't go cross-eyed trying to read them first...

A bit of 20/20 hindsight is helping all of our vision these days, with Raku and its grammars. They allow for saner, layered processing of even complicated content.

This talk looks at basics of Raku RE and how they are used in grammars for parsing some real-world examples.

Steven Lembark PRO

July 09, 2022

Tweet

More Decks by Steven Lembark

See All by Steven Lembark

Yet Another Program on Closures, 2025

PRO

0

10

Object::Pad: Keeping your objects comfy

PRO

0

4

It's Only Logical: LVM for Linux

PRO

0

51

Avoid BSDM: Obey the Command. Line.

PRO

1

35

BASH for Control Freaks

PRO

0

63

PRO

0

43

Getting Testy With Raku: Things you wish you knew.

PRO

0

36

More closures in Perl & Raku

PRO

0

75

gathers-and-takes.pdf

PRO

0

65

Other Decks in Technology

See All in Technology

ファインディにおける Dataform ブランチ戦略

0

230

FAST導入1年間のふりかえり〜現実を直視し、さらなる進化を求めて〜 / Review of the first year of FAST implementation

1

210

大規模イベントを支える ABEMA のアーキテクチャ変遷 2025

5

560

P2P ではじめる WebRTC のつまづきどころ

1

280

2025-07-25 NOT A HOTEL TECH TALK ━ スマートホーム開発の最前線 ━ SOFTWARE

0

180

AI によるドキュメント処理を加速するためのOCR 結果の永続化と再利用戦略

0

210

地域コミュニティへの「感謝」と「恩返し」 / 20250726jawsug-tochigi

0

110

LLMでAI-OCR、実際どうなの？ / llm_ai_ocr_layerx_bet_ai_day_lt

0

350

TypeScript 上達の道

23

4.9k

会社もクラウドも違うけど通じたコスト削減テクニック/Cost optimization strategies effective regardless of company or cloud provider

2

410

Tiptapで実現する堅牢で柔軟なエディター開発

1

180

ユーザー理解の爆速化とPdMの価値

PRO

1

110

Featured

See All Featured

Why You Should Never Use an ORM

PRO

58

9.5k

[RailsConf 2023 Opening Keynote] The Magic of Rails

29

9.6k

Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End

251

21k

461

140k

Sharpening the Axe: The Primacy of Toolmaking

44

2.4k

It's Worth the Effort

185

28k

Save Time (by Creating Custom Rails Generators)

PRO

31

1.3k

Making Projects Easy

117

6.3k

Build your cross-platform service in a week with App Engine

231

18k

sergeychernyshev

32

1k

The Web Performance Landscape in 2024 [PerfNow 2024]

8

720

VelocityConf: Rendering Performance Case Studies

332

24k

Transcript

Raku Regexen: Reducing line noise in your code. Steven Lembark
Workhorse Computing [email protected]
The difference … prefer readabity over compactness. – Larry Wall
Regexes you can read. Code you can maintain. Do what I mean need.
Aside: Just what is line noise? Random g88;d^C\@x*fhe screen. Due
to acous*^&(*&;dljfa;dskjf tic interference.
Raku Is more than "just syntax". New way to approach
problems. Also true with regexen. Not just isolated "pattern matching"
The old way RX-from-hell with alternations. Regex tokenizes, code parses.
Interwined mess of code and regexes. Order-of-execution errors abound.
Parse *.ini file my $header = qr{ ^ \h* \[
(?<name> [^][]+ ) \] \h* $ }xm; my $property = qr{ ^ \h* (?<key> .+? ) \h* = \h* (?<value> .+ ) $ }xm; my $comment = qr{ ^ \h* \# }xm; my $empty_line = qr{ ^ \h* $}xm;
Parse *.ini file for my $nextline (readline($INI_FILE)) { # If
it's a header line, start a new section... if ($nextline =~ /$header/) { $section = $config{ $+{name} } //= {}; } # If it's a property, add the key and value to the current section... elsif ($nextline =~ /$property/) { $section->{ $+{key} } = $+{value}; } # Ignore comments or empty lines elsif ($nextline =~ /$comment|$empty_line/) { # Do nothing } # Report anything else as a probable error... else { warn "Invalid data in INI file at line $.\n" . "\t$nextline\n"; } }
Maintainable? Inter-related order of code and rx. Code changes affect
rx? Rx changes affect code? Find out: Try it and see...
Raku: Grammars Structure in one place. Declarative. No iterative code.
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Tokens & Structure grammar INI { token TOP { <section>*
} token section { [ ^ | <header> ] <block> } token header { '[' $<ID> = <-[ \[ \] \n ]>+ ']' \h* \n } token block { [ <property> | <.emptylines> | <.comment> ]* } token property { \h* $<name>=\N+? \h* '=' \h* $<value>=\N+ \n } token comment { ^^ \h* '#' \N* \n } token emptylines { [ ^^ \h* \n ]+ } }
Process the content class INI::hash_builder { method TOP ($/) {
make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
Process the content class INI::hash_builder { method TOP ($/) {
make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
Process the content class INI::hash_builder { method TOP ($/) {
make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
Process the content class INI::hash_builder { method TOP ($/) {
make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
Process the content class INI::hash_builder { method TOP ($/) {
make %( $<section>».ast ) } method section ($/) { make ~($<header><ID>//'') => $<block>.ast } method block ($/) { make %( $<property>».ast ) } method property ($/) { make ~$<name> => ~$<value> } } my %config = INI.parsefile( 'example.ini', :actions(INI::hash_builder) ).ast; say %config.perl;
Care to guess what this does? /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/
Care to guess what this does? Q: Which char's match
themselves? /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/
Care to guess what this does? Q: Which char's match
themselves? A: None, in Raku, since they are punctuation. /{~}!@#$%^(&*)-+=[\/]:;"'<.,>?/
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Consume WS
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Literals
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Non-capturing match
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' / Separator
Saner metachars Match integers enclosed in braces: [ 1, 2,
3 ] Perl 5: / \[ \s* (?: \d+ (?: \s* , \s* \d+ )* \s* )? \] /x Raku: / :s '[' [\d+]* % ',' ']' /
Aside: Future speed Raku regexen execute as DFA where possible.
Opportunity for much faster execution. For some definition of ...
Aside: Future speed Raku regexen execute as DFA where possible.
Also benefit from saner results. DFA more likely to Do What I Mean. Alternations match best, not first match.
Smartest matching Raku: say $& if 'Which regex engine is
smartest?' =~ /smart|smarter|smartest/;
Smartest matching Raku: say $& if 'Which regex engine is
smartest?' =~ /smart|smarter|smartest/;
Smartest matching Raku: say $& if 'Which regex engine is
smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
Smartest matching Raku: say $& if 'Which regex engine is
smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
Smartest matching Raku: say $& if 'Which regex engine is
smartest?' =~ /smart|smarter|smartest/; Perl 6: say $/ if 'Which regex engine is smartest?' ~~ /smart|smarter|smartest/;
Perl5 nested structure #! /usr/bin/env perl use 5.010; # We're
going to need this to extract hierarchical data structures... our @stack = []; my $LIST = qr{ # Match this... (?&NESTED) # Which is defined as... (?(DEFINE) (?<NESTED> # Keep track of recursions on a stack... (?{ local @::stack = (@::stack, []); }) # Match a list of items... \[ \s* (?> (?&ITEM) (?: \s* , \s* (?&ITEM) )*+ )? \s* \] # Pop the stack and add that frame to the growing data structure... (?{ local @::stack = @::stack; my $nested = pop @stack; push @{$::stack[-1]}, $nested; }) ) # For each item, push it onto the stack if it's a leaf node... (?<ITEM> (\d+) (?{ push @{$stack[-1]}, $^N }) | (?&NESTED) ) ) }x; # Match, extracting a data structure... '[1,2,[3,3,[4,4]],5]' =~ /$LIST/; # Retrieve the data structure... my $parse_tree = pop @stack; # Show it... use Data::Dumper 'Dumper'; say Dumper($parse_tree);
Perl6 nested grammar #! /usr/bin/env rakudo use v6; # Define
the structure of a list... grammar LIST { rule TOP { '[' <ITEM>* % ',' ']' } token ITEM { \d+ | <TOP> } } # Define how to convert list elements to a suitable data structure... class TREE { method TOP ($/) { make [ $<ITEM>».ast ] } method ITEM ($/) { make $<TOP>.ast // +$/ } } # Parse, extracting the data structure... my $parse_tree = LIST.parse('[1,2,[3,3,[4,4]],5]', :actions(TREE)).ast; say $parse_tree.perl;
Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~
/'[' [ (\d+ ) | $<0>=<~~> ]* % ',' ']' /; say $/;
Perl6 nested regex #! /usr/bin/env rakudo use v6; '[1,2,[3,3,[4,4]],5]' ~~
/'[' [ (\d+) | $<0>=<~~> ]* % ',' ']' /; say $/;
In Raku Regexen are saner.
In Raku Regexen are saner. Grammars offer cleaner code.
In Raku Regexen are saner. Grammars offer cleaner code. Smart
matching works.
In Raku Regexen are saner. Grammars offer cleaner code. Smart
matching works. Objects have useful methods.
In Raku Regexen are saner. Grammars offer cleaner code. Smart
matching works. Objects have useful methods. It's new.
In Raku Regexen are saner. Grammars offer cleaner code. Smart
matching works. Objects have useful methods. It's worth learning.