Slide 1

Slide 1 text

Regular Expressions Write less, Say more Thursday, December 22, 2011

Slide 2

Slide 2 text

The Regular Problem Thursday, December 22, 2011

Slide 3

Slide 3 text

The Regular Problem Regular Expressions provide a language that describes text and other languages It’s a mini language of its own, with syntax rules Regular Expressions are a power tool for solving text related problems Thursday, December 22, 2011

Slide 4

Slide 4 text

The Regexp Story Started in Mathematics 1968 Entered the Unix world through Ken Thompson’s qed 1984 Standardized by Henry Spencer’s implementation Thursday, December 22, 2011

Slide 5

Slide 5 text

Regular Expressions Alternatives Shell Wildcards Dedicated perl/c/java program *.txt x*x[0-9] Thursday, December 22, 2011

Slide 6

Slide 6 text

Regular Expressions & Unix Many UNIX tools take regular expressions grep/egrep filters its input based on regular expressions more/less/most search uses regular expressions vi/vim search and replace use regular expressions Thursday, December 22, 2011

Slide 7

Slide 7 text

Regular Expressions Today Used by all programming languages, including: Php, Python, Perl, Tcl JavaScript, ActionScript, Microsoft .NET, Oracle Java Objective C And More Thursday, December 22, 2011

Slide 8

Slide 8 text

Regular Expressions The Rules Thursday, December 22, 2011

Slide 9

Slide 9 text

Rule #1 A Simple character matches itself Thursday, December 22, 2011

Slide 10

Slide 10 text

Examples Expression Meaning foo Match only input lines that include the word ‘foo’ unix Match only input lines that include the word ‘unix’ Thursday, December 22, 2011

Slide 11

Slide 11 text

Rule #2 A character class matches a single character from the class Thursday, December 22, 2011

Slide 12

Slide 12 text

Character Classes abcdABCD 0123 7 a07 B27 d17 Thursday, December 22, 2011

Slide 13

Slide 13 text

Character Class Syntax A class is denoted by [...] Can use any character sequence inside the squares [012], [abc], [aAbBcZ] Can use ranges inside the squares [0-9], [a-z], [a-zA-Z], [0-9ab] Thursday, December 22, 2011

Slide 14

Slide 14 text

Examples Expression Meaning [0-9][0-9] Match only input lines that include at least two digits [Uu][Nn][Ii][Xx] Match only input lines that include the word ‘unix’ in any casing Thursday, December 22, 2011

Slide 15

Slide 15 text

Which of these match ? hello [ux][012] hello world hello [ux][012] hello unix hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, December 22, 2011

Slide 16

Slide 16 text

Which of these match ? hello [ux][012] hello world hello [ux][012] hello unix hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, December 22, 2011

Slide 17

Slide 17 text

Predefined Character Classes [:alnum:] [:alpha:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] cheat sheet at: http://www.petefreitag.com/cheatsheets/regex/character-classes/ Thursday, December 22, 2011

Slide 18

Slide 18 text

Predefined Character Classes \w (\W) - match [0-9a-zA-Z_] (or other) \s (\S) - match a white space (or other) \d (\D) - match a digit (or other) cheat sheet at: http://www.petefreitag.com/cheatsheets/regex/character-classes/ Thursday, December 22, 2011

Slide 19

Slide 19 text

Predefined Character Classes Note that brackets are part of the class name, therefore the correct use is: [[:digit:]] This allows using: [[:digit:][:lower:]] Thursday, December 22, 2011

Slide 20

Slide 20 text

Rule #3 A quantifier denotes how many times a letter will match Thursday, December 22, 2011

Slide 21

Slide 21 text

Quantifiers a + ab b aaab aye cabtain Thursday, December 22, 2011

Slide 22

Slide 22 text

Quantifiers Syntax * means match zero or more times + means match one or more times ? means match zero or one time {n,m} means match at least n but no more than m times {n} means match exactly n times Thursday, December 22, 2011

Slide 23

Slide 23 text

Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7} 054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, December 22, 2011

Slide 24

Slide 24 text

Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7} 054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, December 22, 2011

Slide 25

Slide 25 text

Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, December 22, 2011

Slide 26

Slide 26 text

Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, December 22, 2011

Slide 27

Slide 27 text

Backtracking When the engine encounters a quantifier, it will keep on adding matches to the quantified element as long as possible If a match failure occurs later on, the engine will backtrack Thursday, December 22, 2011

Slide 28

Slide 28 text

Backtracking Examine the expression: [a-z]*b+c Input string: aaaaaaaaaaabbbbbbbbbbbbbcccccccccccc Thursday, December 22, 2011

Slide 29

Slide 29 text

Backtracking Examine the expression: [a-z]*b*c Input string: aaaaaaaaaaabbbbbbbbbbbbbcccccccccccc \ Thursday, December 22, 2011

Slide 30

Slide 30 text

Rule #4 An assertion will match on a condition, not capturing input characters Thursday, December 22, 2011

Slide 31

Slide 31 text

Assertions ^ matches the beginning of a line $ matches the end of a line \b matches word boundary Thursday, December 22, 2011

Slide 32

Slide 32 text

Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, December 22, 2011

Slide 33

Slide 33 text

Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, December 22, 2011

Slide 34

Slide 34 text

Which of these match ? ^.$ x ^.$ mmm ^.$ 42 ^.$ 9 ^.$ ... Thursday, December 22, 2011

Slide 35

Slide 35 text

Which of these match ? ^.$ x ^.$ mmm ^.$ 42 ^.$ 9 ^.$ ... Thursday, December 22, 2011

Slide 36

Slide 36 text

Captures Thursday, December 22, 2011

Slide 37

Slide 37 text

Capturing Parens Use parens to capture matched expression Use \1, \2, etc. to refer to captured match Thursday, December 22, 2011

Slide 38

Slide 38 text

Capturing Parens Paris in the the spring (\b\w+\b) \1 Thursday, December 22, 2011

Slide 39

Slide 39 text

Which of these match ? (\d)(\d)\2\1 1111 (\d)(\d)\2\1 1001 (\d)(\d)\2\1 1414 (\d)(\d)\2\1 12321 (\d)(\d)\2\1 9889 Thursday, December 22, 2011

Slide 40

Slide 40 text

Which of these match ? (\d)(\d)\2\1 1111 (\d)(\d)\2\1 1001 (\d)(\d)\2\1 1414 (\d)(\d)\2\1 12321 (\d)(\d)\2\1 9889 Thursday, December 22, 2011

Slide 41

Slide 41 text

Q & A Regular Expressions Classes Quantifiers Assertions Captures Thursday, December 22, 2011

Slide 42

Slide 42 text

Let’s Talk Perl Thursday, December 22, 2011

Slide 43

Slide 43 text

Define A Regexp use qr to define a regular expression my $DIGITS_RE = qr { ^ \d+ $ }xms; xms make the regexp more readable, and work better on multiline strings Thursday, December 22, 2011

Slide 44

Slide 44 text

Match Against a Regexp $text =~ $DIGITS_RE returns true if $text matches the pattern Can also use inline regexp: $text =~ /^\d+$/; Thursday, December 22, 2011

Slide 45

Slide 45 text

Match With Capture If a regexp has captures, the return value of the match operator is a list of the captured groups my ($key, $value) = $line =~ $CONFIG_LINE; Thursday, December 22, 2011

Slide 46

Slide 46 text

Search & Replace Use s/// to perform a search & replace operation replace first occurrence of $PATTERN in $text with contents of $new: $text =~ s/$PATTERN/$new/; Thursday, December 22, 2011

Slide 47

Slide 47 text

Search & Replace replace all occurrence of $PATTERN in $text with contents of $new: $text =~ s/$PATTERN/$new/g; Thursday, December 22, 2011

Slide 48

Slide 48 text

Q & A Thursday, December 22, 2011

Slide 49

Slide 49 text

Thank You Ynon Perek [email protected] ynonperek.com All Rights Reserved to Ynon Perek Thursday, December 22, 2011