Regular Expressions

33ab5062beb333c981b7a1530024f7bc?s=47 Karmen Blake
September 26, 2011

Regular Expressions

33ab5062beb333c981b7a1530024f7bc?s=128

Karmen Blake

September 26, 2011
Tweet

Transcript

  1. Pattern Matching and Text Processing Using Regular Expressions Karmen Blake

  2. Regular Expressions What is a regular expression? • A pattern

    of characters which may or may not predict (match) a given string. • Use the pattern match/no match for conditional branching • Scan for a pattern • String substitutions • Split strings
  3. Regular Expressions Examples of what a "pattern" is: • the

    letter a, followed by a digit • Any uppercase letter, followed by at least one lowercase letter • Three digits, followed by a hyphen, followed by four digits • The beginning of a line, followed by one or more whitespace characters • The character . (period) at the end of a string • An uppercase letter at the beginning of a word
  4. Regular Expressions Regular expression literal // //.class in irb gives

    you Regexp Between the slashes is where you put patterns /some-pattern-here/
  5. Regular Expressions A pattern matching adventure requires two creatures: •

    a regular expression • string The regular expression makes its predictions on the string. The predictions either match or don't
  6. Regular Expressions "A match made in heaven" puts "Neo" if

    /Neo/.match "Neo is in the Matrix" puts "Neo" if "Neo is in the Matrix".match /Neo/ Returns a match object (more on this later) if true. /Neo/.match "Neo is in the Matrix" in irb would show a match object returned. If no match occurs a nil object is returned which has an implicit value of false (in Ruby).
  7. Regular Expressions puts "Neo" if /Neo/ =~ "Neo is in

    the Matrix" puts "Neo" if "Neo is in the Matrix" =~ /Neo/ Using =~ differs in that it returns the index of the string where the regular expression was found. /Neo/ =~ "Neo is in the Matrix" in irb gives us 0
  8. Regular Expressions Building a pattern • Literal characters: "match this

    character" • The dot wild card character (.): "match any character" • Character classes: "match one of these characters"
  9. Regular Expressions Literal Characters /a/ matches the string "a" as

    well as any string containing the letter "a" Weird characters like ^,$,?,.,/,\,[,],{,},(,),+, and * need a special \ to make it literal. Reason being is that these characters are special in regular expression syntax. In order to match a literal ? the regular expression would have to look like this: /\?/
  10. Regular Expressions The wildcard character . (dot) Match any character

    in the string. /.pen/ Valid matches: /.pen/ =~ "the bank is open" /.pen/ =~ "open" /.ejected/ =~ "dejected" /.ejected/ =~ "rejected"
  11. Regular Expressions Character classes Explicit list of characters placed inside

    of square brackets /[dr]ejected/ Match either 'd' or 'r' and no other characters followed by 'ejected' Match /[dr]ejected/ =~ "dejected" /[dr]ejected/ =~ "rejected" No match /[dr]ejected/ =~ "bejected"
  12. Regular Expressions Character classes Range of characters /[a-z]/ Match any

    character a through f (upper or lower) or any digit /[A-Fa-f0-9]/ Negating a character match (^) /[^A-Fa-f0-9]/
  13. Regular Expressions Character classes Special escape sequences To match any

    digit you can do this: /[0-9]/ You can accomplish the same thing with: /\d/ Other useful escape sequences are: \w matches any digit, alphabetical character, or _ \s matches any whitespace character (space, tab, newline)
  14. Regular Expressions Character classes Negated special escape sequences \D matches

    any character that is not a digit \W matches any character other than an alphanumeric character \S matches any non-whitespace character
  15. Regular Expressions Matching and MatchData: getting beyond yes/no success/failure stuff...

    English pattern: blake,karmen blake, karmen "last name followed by a comma followed by an optional space followed by first name" /^[A-Za-z]+,\s*[A-Za-z]+$/ * optional (zero or more) + (one or more, or at least one)
  16. Regular Expressions MatchData Parenthetical Groupings - Add parens to a

    rule and get contents out after evaluation. :-) This /^[A-Za-z]+,\s*[A-Za-z]+$/ may turn into /(^[A-Za-z]+),\s*([A-Za-z]+$)/ Test it in irb: /(^[A-Za-z]+),\s*([A-Za-z]+$)/.match "blake, karmen" puts $1 outputs "blake" puts $2 outputs "karmen"
  17. Regular Expressions Capturing more data from a match name =

    "blake, karmen" name_format = /(^[A-Za-z]+),\s*([A-Za-z]+$)/ name_match = name_format.match(name) #save match name_match[0] #"blake, karmen" entire string name_match[1] #"blake" first capture name_match.begin(1) #0 name_match.end(1) #5 name_match[2] #"karmen" second capture name_match.begin(2) #7 name_match.end(2) #13
  18. Regular Expressions What the heck is this???!!!?? /^x?[yz]{2}.*\z/ You will

    learn soon my young padawans. Quantifiers!! Zero or one I want to match Mr, Mr., Mrs, Mrs. English version the character M, followed by the character r, followed by zero or one of the character s, followed by zero or one of the character '.'
  19. Regular Expressions ? to the rescue /Mrs?\.?/ Rock and Roll!!

    Valid matches! /Mrs?\.?/ =~ "Mr" /Mrs?\.?/ =~ "Mr." /Mrs?\.?/ =~ "Mrs" /Mrs?\.?/ =~ "Mrs."
  20. Regular Expressions Zero or more * How do you spell

    boo? /booo*/ Rock and Roll!! Valid matches! /boo*/ =~ "boo" /boo*/ =~ "booo" /boo*/ =~ "boooo" /boo*/ =~ "booooo" /boo*/ =~ "boooooo"
  21. Regular Expressions One or more + How many digits? /\d+/

    Rock and Roll!! Valid matches! /\d+/ =~ "2" /\d+/ =~ "34" /\d+/ =~ "3566"
  22. Regular Expressions Number of repetitions For example, a basic phone

    number pattern 3 digits followed by a hyphen followed by 4 digits /\d{3}-\d{4}/ Valid match: /\d{3}-\d{4}/ =~ "333-4444"
  23. Regular Expressions Let's get real! Get ids out of a

    file "123 karmen\n234 john\n456 mary". scan(/\d{3}/) => ["123", "234", "456"] Create permalink "john doe 1234 hello message". gsub(/\s/,"-") => "john-doe-1234-hello-message" Capitalize my string "a title of a book".gsub(/\b\w/) {|s| s.upcase}
  24. Regular Expressions Let's get real! Grepilicious Uses regular expressions to

    extract information out of collections. ["JOHN","Doe","Mary","SWANSON"].find_all {|name| /[a-z]/ =~ name} OR ["JOHN","Doe","Mary","SWANSON"].grep(/[a-z]/) Both return a result array: => ["Doe", "Mary"]
  25. Regular Expressions Let's get real! Grepilicious ["JOHN","Doe","Mary","SWANSON"].find_all{|name| name =~ /[a-z]/}.

    collect{|name| name.upcase} OR ["JOHN","Doe","Mary","SWANSON"].grep(/[a-z]/) {|name| name. upcase} Both return array: => ["DOE", "MARY"]