Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unix Regular Expressions

ynonperek
October 12, 2011

Unix Regular Expressions

ynonperek

October 12, 2011
Tweet

More Decks by ynonperek

Other Decks in Technology

Transcript

  1. The Regular Problem Regular Expressions provide a language that describes

    text and other languages It’s a mini language of its own, with syntax rules Regular Expressions are a power tool for solving text related problems Thursday, January 26, 2012
  2. The Regexp Story Started in Mathematics 1968 Entered the Unix

    world through Ken Thompson’s qed 1984 Standardized by Henry Spencer’s implementation Thursday, January 26, 2012
  3. Regular Expressions & Unix Many UNIX tools take regular expressions

    grep/egrep filters its input based on regular expressions more/less/most search uses regular expressions vi/vim search and replace use regular expressions Thursday, January 26, 2012
  4. Regular Expressions Today Used by all programming languages, including: Php,

    Python, Perl, Tcl JavaScript, ActionScript, Microsoft .NET, Oracle Java Objective C And More Thursday, January 26, 2012
  5. Examples Command Meaning egrep foo Display only input lines that

    include the word ‘foo’ egrep unix Display only input lines that include the word ‘unix’ Thursday, January 26, 2012
  6. Rule #2 A character class matches a single character from

    the class Thursday, January 26, 2012
  7. Character Class Syntax A class is denoted by [...] Can

    use any character sequence inside the squares [012], [abc], [aAbBcZ] Can use ranges inside the squares [0-9], [a-z], [a-zA-Z], [0-9ab] Can use not [^abc], [^0-9] Thursday, January 26, 2012
  8. Examples Command Meaning egrep ‘[0-9][0-9]’ Display only input lines that

    include at least two digits egrep ‘[Uu][Nn][Ii][Xx]’ Display only input lines that include the word ‘unix’ in any casing Thursday, January 26, 2012
  9. Which of these match ? hello [ux][012] hello world hello

    [ux][012] hello unix hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, January 26, 2012
  10. Which of these match ? hello [ux][012] hello world hello

    [ux][012] hello uni0 hello [ux][012] hello u2 hello [ux][012] hello x10 hello [ux][012] HELLO U2 Thursday, January 26, 2012
  11. Predefined Character Classes [:alnum:] [:alpha:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:]

    [:punct:] [:space:] [:upper:] [:xdigit:] . cheat sheet at: http://www.petefreitag.com/cheatsheets/regex/character-classes/ Thursday, January 26, 2012
  12. Predefined Character Classes Note that brackets are part of the

    class name, therefore the correct use is: [[:digit:]] This allows using: [[:digit:][:lower:]] Thursday, January 26, 2012
  13. Rule #3 A quantifier denotes how many times a letter

    will match Thursday, January 26, 2012
  14. Quantifiers Syntax * means match zero or more times -

    {0,} + means match one or more times - {1,} ? means match zero or one time - {0,1} {n,m} means match at least n but no more than m times {n} means match exactly n times Thursday, January 26, 2012
  15. Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7}

    054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, January 26, 2012
  16. Which of these match ? [[:digit:]]{2}-?[[:digit:]]{7} 08-9112232 [[:digit:]]{2}-?[[:digit:]]{7} 421121212 [[:digit:]]{2}-?[[:digit:]]{7}

    054-2201121 [[:digit:]]{2}-?[[:digit:]]{7} Phone: 03-9112121 [[:digit:]]{2}-?[[:digit:]]{7} Bond 007 Thursday, January 26, 2012
  17. Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com

    http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, January 26, 2012
  18. Which of these match ? (http://)?w{3}\.[a-z]+\.com www.google.com (http://)?w{3}\.[a-z]+\.com www.ynet.co.il (http://)?w{3}\.[a-z]+\.com

    http://mail.google.com (http://)?w{3}\.[a-z]+\.com http://www.home.com (http://)?w{3}\.[a-z]+\.com http://www.tel-aviv.com Thursday, January 26, 2012
  19. Backtracking When the engine encounters a quantifier, it will keep

    on adding matches to the quantified element as long as possible If a match failure occurs later on, the engine will backtrack Thursday, January 26, 2012
  20. Rule #4 An assertion will match on a condition, not

    capturing input characters Thursday, January 26, 2012
  21. Assertions ^ matches the beginning of a line $ matches

    the end of a line Thursday, January 26, 2012
  22. Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x

    dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, January 26, 2012
  23. Which of these match ? ^d drwxr-xr-x dive ^d -rwxr-xr-x

    dive ^d lrwxr-xr-x dive ^d drwxr-xr-x /home ^d -rwxr-xr-x /etc/passwd Thursday, January 26, 2012
  24. Regular Expressions Variants Old style regexps didn’t have a +

    or {}. Therefore, in grep, these have to be backslashed Use egrep when possible for cleaner syntax Thursday, January 26, 2012