Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl Regular Expressions

ynonperek
January 04, 2012

Perl Regular Expressions

Working with regular expressions in perl

ynonperek

January 04, 2012
Tweet

More Decks by ynonperek

Other Decks in Programming

Transcript

  1. The Regular Problem Regular Expressions provide a language that describes

    text and other languages It’s a mini language of its own, with syntax rules Regular Expressions are a power tool for solving text related problems Thursday, December 22, 2011
  2. The Regexp Story Started in Mathematics 1968 Entered the Unix

    world through Ken Thompson’s qed 1984 Standardized by Henry Spencer’s implementation Thursday, December 22, 2011
  3. Regular Expressions & Unix Many UNIX tools take regular expressions

    grep/egrep filters its input based on regular expressions more/less/most search uses regular expressions vi/vim search and replace use regular expressions Thursday, December 22, 2011
  4. Regular Expressions Today Used by all programming languages, including: Php,

    Python, Perl, Tcl JavaScript, ActionScript, Microsoft .NET, Oracle Java Objective C And More Thursday, December 22, 2011
  5. Examples Expression Meaning foo Match only input lines that include

    the word ‘foo’ unix Match only input lines that include the word ‘unix’ Thursday, December 22, 2011
  6. Rule #2 A character class matches a single character from

    the class Thursday, December 22, 2011
  7. Character Class Syntax A class is denoted by [...] Can

    use any character sequence inside the squares [012], [abc], [aAbBcZ] Can use ranges inside the squares [0-9], [a-z], [a-zA-Z], [0-9ab] Thursday, December 22, 2011
  8. Examples Expression Meaning [0-9][0-9] Match only input lines that include

    at least two digits [Uu][Nn][Ii][Xx] Match only input lines that include the word ‘unix’ in any casing Thursday, December 22, 2011
  9. Which of these match ? hello [ux][012] hello world hello

    [ux][012] hello unix hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, December 22, 2011
  10. Which of these match ? hello [ux][012] hello world hello

    [ux][012] hello unix hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, December 22, 2011
  11. Predefined Character Classes [:alnum:] [:alpha:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:]

    [:punct:] [:space:] [:upper:] [:xdigit:] cheat sheet at: http://www.petefreitag.com/cheatsheets/regex/character-classes/ Thursday, December 22, 2011
  12. Predefined Character Classes \w (\W) - match [0-9a-zA-Z_] (or other)

    \s (\S) - match a white space (or other) \d (\D) - match a digit (or other) cheat sheet at: http://www.petefreitag.com/cheatsheets/regex/character-classes/ Thursday, December 22, 2011
  13. Predefined Character Classes Note that brackets are part of the

    class name, therefore the correct use is: [[:digit:]] This allows using: [[:digit:][:lower:]] Thursday, December 22, 2011
  14. Rule #3 A quantifier denotes how many times a letter

    will match Thursday, December 22, 2011
  15. Quantifiers Syntax * means match zero or more times +

    means match one or more times ? means match zero or one time {n,m} means match at least n but no more than m times {n} means match exactly n times Thursday, December 22, 2011
  16. Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7}

    054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, December 22, 2011
  17. Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7}

    054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, December 22, 2011
  18. Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com

    http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, December 22, 2011
  19. Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com

    http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, December 22, 2011
  20. Backtracking When the engine encounters a quantifier, it will keep

    on adding matches to the quantified element as long as possible If a match failure occurs later on, the engine will backtrack Thursday, December 22, 2011
  21. Rule #4 An assertion will match on a condition, not

    capturing input characters Thursday, December 22, 2011
  22. Assertions ^ matches the beginning of a line $ matches

    the end of a line \b matches word boundary Thursday, December 22, 2011
  23. Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x

    dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, December 22, 2011
  24. Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x

    dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, December 22, 2011
  25. Which of these match ? ^.$ x ^.$ mmm ^.$

    42 ^.$ 9 ^.$ ... Thursday, December 22, 2011
  26. Which of these match ? ^.$ x ^.$ mmm ^.$

    42 ^.$ 9 ^.$ ... Thursday, December 22, 2011
  27. Capturing Parens Use parens to capture matched expression Use \1,

    \2, etc. to refer to captured match Thursday, December 22, 2011
  28. Which of these match ? (\d)(\d)\2\1 1111 (\d)(\d)\2\1 1001 (\d)(\d)\2\1

    1414 (\d)(\d)\2\1 12321 (\d)(\d)\2\1 9889 Thursday, December 22, 2011
  29. Which of these match ? (\d)(\d)\2\1 1111 (\d)(\d)\2\1 1001 (\d)(\d)\2\1

    1414 (\d)(\d)\2\1 12321 (\d)(\d)\2\1 9889 Thursday, December 22, 2011
  30. Define A Regexp use qr to define a regular expression

    my $DIGITS_RE = qr { ^ \d+ $ }xms; xms make the regexp more readable, and work better on multiline strings Thursday, December 22, 2011
  31. Match Against a Regexp $text =~ $DIGITS_RE returns true if

    $text matches the pattern Can also use inline regexp: $text =~ /^\d+$/; Thursday, December 22, 2011
  32. Match With Capture If a regexp has captures, the return

    value of the match operator is a list of the captured groups my ($key, $value) = $line =~ $CONFIG_LINE; Thursday, December 22, 2011
  33. Search & Replace Use s/// to perform a search &

    replace operation replace first occurrence of $PATTERN in $text with contents of $new: $text =~ s/$PATTERN/$new/; Thursday, December 22, 2011
  34. Search & Replace replace all occurrence of $PATTERN in $text

    with contents of $new: $text =~ s/$PATTERN/$new/g; Thursday, December 22, 2011