A PCRE centric presentation on regular expressions. Introduction and conceptual analysis of use cases, including a dive into "Is it the right tool for the job?"
Steep Learning Curve • Did my modem just throw up into my code? • “regex”, “regexp”, “regex engine” Example: 0 - 255, with or without leading zeros (25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})
Globally match (don’t stop at the first match) m Treat multi-line strings as a single string x Extend readability by allowing whitespace and comments $string =~ m/^a/i;
abcdefghijklmnopqrstuvwxyz [^a-z] Inverted Character Class Matches characters except those specified (…) Grouping w/ Capture Stores the matched substring in $1,$2,… (?:…) Grouping w/o Capture Allows a programmer to group without capturing
character [a-zA-Z0-9_] (includes utf8 if applicable) \W \d Matches any digit [0-9] (includes utf8 if applicable) \D \s Matches all whitespace characters (includes utf8 if applicable) \S
more times + Matches if found 1 or more times ? Matches if found 0 or 1 times {x,y} Matches if found between x and y times {x,} Matches if found at least x times {,y} Matches if found no more than y times {x} Matches if found exactly x times *** These are all greedy quantifiers ***
my $string = ‘abcdefgh’; $string =~ m/.*abc/; # READ AS: # .* takes ‘abcdefgh’ = Match Fails # .* gives back ‘h’ = Match Fails # .* gives back ‘g’ = Match Fails # ... # After .* gives back ‘a’ # Engine checks, # ‘a’ => SUCCESS # followed by ‘b’ => SUCCESS # followed by ‘c’ => SUCCESS # MATCH SUCCESS
Burgundy - Stay Classy’; ! # Check if $string starts with a Number my $test1 = $string =~ /^[0-9]/; # 1 ! # Check for starts with 1 or more numbers my $test2 = $string =~ /^[0-9]+/; # 1 ! # Check for 3 numbers my $test3 = $string =~ /^[0-9]{3}/; # 1
Ron Burgundy - Go Fuck Yourself’; ! # Check if $string starts with a Number my $test1 = $string =~ /^[^0-9]/; # 0 ! # Contains “Bad data” ? (input sanitation) my $test2 = $string =~ /[^a-zA-Z0-9 \-]/; # 0
Is it an IP my $isip = $string =~ /\d+\.\d+\.\d+\.\d+/; # $isip = 1 # Also matches 888.888.888.888 # or 8.888888888.8888888888.8 /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/; # Still matches 888.888.888.888 # We can shorten it: /\d{1,3}(\.\d{1,3}){3}/; # Might be “good enough”
-98% good_enough 581395/s 5058% -- ! ! $ perl --version ! This is perl 5, version 12, subversion 2 (v5.12.2) built for i686-linux ! Copyright 1987-2010, Larry Wall
consume as much as they can to allow the ENTIRE regex to match • Non-greedy Quantifiers are lazy and consume only enough of the string that is necessary to allow the ENTIRE regex to match
more times if needed +? Matches if 1, or more times if needed {x,y}? Matches if x times, up to y if needed {x,}? Matches if x times, more if needed {,y}? Matches if 0 times, up to y times if needed