$30 off During Our Annual Pro Sale. View Details »

Regex-fu

 Regex-fu

Presented on September 10 2020 at the PHPBenelux virtual meetup.
https://www.meetup.com/phpbenelux/events/273015264/
---------------------------------------------------------------
Regular expression, you either hate them or you love them, but do you really know how to harness their power ? Based on the PCRE implementation, this talk will show you how to get the most out of your /^regex(es)?$/, how switches affect your results, how to be less greedy, how to assert your power and let's not forget: when *not* to use regex.
---------------------------------------------------------------

Juliette Reinders Folmer

September 10, 2020
Tweet

More Decks by Juliette Reinders Folmer

Other Decks in Programming

Transcript

  1. RegEx Fu
    Juliette Reinders Folmer
    @jrf_nl
    regexcheatsheets.com

    View Slide

  2. View Slide

  3. Wildcards on Steroids

    View Slide

  4. Pattern Recognition

    View Slide

  5. Regex Engines
    POSIX
    PCRE
    ECMAscript
    Oniguruma
    Boost
    DEELX RE2
    TRE
    Pattwo
    GRETA
    GLib/
    GRegex
    FREJ
    RGX
    QT
    CL-PPCRE
    Jakarta
    Henry
    Spencer’s
    regex

    View Slide

  6. Regex Engines
    Boost
    DEELX RE2
    TRE
    Pattwo
    GRETA
    GLib/
    GRegex
    FREJ
    RGX
    QT
    CL-PPCRE
    Jakarta
    Henry
    Spencer’s
    regex
    Oniguruma
    POSIX
    ECMAscript
    PCRE

    View Slide

  7. Syntax Overlap

    View Slide

  8. PCRE

    View Slide

  9. Terminology
    /[a-z0-9]+/im
    Regular Expression
    /[a-z0-9]+/im
    Delimiters
    /[a-z0-9]+/im Modifiers

    View Slide

  10. A a 1
    .
    ? * + {#}
    [...]
    ( ... | ... )
    ^ ... $
    \w \d \s
    g m s i
     Literals
     Wildcard
     Quantifiers
     Character ranges
     Grouping and alternation
     Anchors
     Shorthand character codes
     Modifiers
    Basic Syntax
    A a 1
    ? * +
    {#}
    [...]
    \w \d \s
    ( ... | ... )
    ^ ... $
    g m s i
    .

    View Slide

  11. Tips & Tricks

    View Slide

  12. Photo by Scott Liddell
    1.
    If you need a screwdriver,
    why use a hammer ?

    View Slide

  13. Jamie Zawinski, August 1997
    alt.religion.emacs
    Some people, when confronted with a
    problem, think
    "I know, I'll use regular expressions."
    Now they have two problems.

    View Slide

  14. 2.
    Nothing in life is to
    be feared. It is only
    to be understood.
    Marie Curie

    View Slide

  15. Allow listing Deny listing
    Input string Input string
    ? ?

    View Slide

  16. 3.
    Not all matches are made in heaven...
    Photo by Petr Kratochvil

    View Slide

  17. 4.
    Only
    Elephants
    Remember
    Everything
    © Photo by Juliette Reinders Folmer

    View Slide

  18. (?:)

    View Slide

  19. Less
    is the
    new more
    5

    View Slide

  20. / /
    o
    on
    one
    one.
    one.*
    one.*s
    one.*s.
    one.*s.?
    one.*s.?t
    one.*s.?t [a-z]
    one.*s.?t[a-z]+
    one.*s.?t[a-z]+p
    = space
    one.*s.?t[a-z]+p
    one.*s.?t[a-z]+p .
    one.*s.?t[a-z]+p . {2,}
    one.*s.?t[a-z]+p .{2,},
    one.*s.?t[a-z]+p .{2,},
    We take one step forward, two steps back
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

    View Slide

  21. / /
    We take one step back, two steps forward

    View Slide

  22. {,m}?
    {n,}?
    {n,m}?
    *?
    +?
    ??
    Reluctant Quantifiers

    View Slide

  23. 6.
    Being negative
    isn't always a bad
    thing
    © Photo by Juliette Reinders Folmer

    View Slide

  24. [^]

    View Slide

  25. 7.
    Explore
    Your
    Boundaries
    Photo by Miguel A.C. Domingo

    View Slide

  26. Beginning of string
    Beginning of line
    Word boundaries
    End of string
    End of line

    View Slide

  27. 8. The first love
    is the deepest...

    View Slide

  28. /#?([A-F0-9]{6}|[A-F0-9]{3})/i

    View Slide

  29. 9.
    What's this
    global village
    people
    keep talking
    about ???

    View Slide

  30. Character classes PCRE POSIX
    [0-9] [^0-9] \d \D [[:digit:]] [^[:digit:]]
    [A-Za-z0-9_] [^A-Za-z0-9_] \w \W [[:word:]] [^[:word:]]
    [\t\f\r\n \v] [^\t\f\r\n \v] \s \S [[:space:]] [^[:space:]]
    [\t\f ] [^\t\f ] \h \H [[:blank:]] [^[:blank:]]
    [\r\n] [^\r\n] \v \V - -

    View Slide

  31. déjà vu [\w ]+
    French (fr)
    déjà vu [\w ]+
    English (en)

    View Slide

  32. 10.
    Escape
    and
    escape again

    View Slide

  33. String delimiter
    - for prog language
    Regex delimiter
    - for regex
    - for prog language
    Meta-characters
    - for regex
    - for prog language
    What to Escape ?

    View Slide

  34. \[ \] \( \) \| \. \? \* \+ \{ \} \^ \$ \\ \/
    Literals
    [ ] ( ) | . ? * + { } ^ $ \ / (delimiter)
    Special Meaning
    Escaping Meta Characters

    View Slide

  35. [(] [)] [|] [.] [?][*][+][{][}] [$] [/]
    Literals
    [ ] ( ) | . ? * + { } ^ $ \ / (delimiter)
    Special Meaning
    Escaping Meta Characters

    View Slide

  36. Java String.quote()
    quoteReplacement()
    PHP preg_quote()
    Matlab regexptranslate() Python re.escape()
    Objective-C escapedTemplateForString()
    escapedPatternForString()
    Ruby Regexp.escape()
    Regexp.quote()
    Escaping Arbitrary Strings
    // Javascript:
    function escapeInputString( str ) {
    return str.replace(/[[\]\/\\{}()|?+^$*.-]/g, "\\$&");
    }

    View Slide

  37. Matching a Literal Backslash
    \\\\ The actual
    backslash
    \\\\
    Escaping for use in regex
    \\\\
    String escape

    View Slide

  38. Modify
    your
    behaviour
    11

    View Slide

  39. View Slide

  40. Setting: Unsetting: Combined: Apply to subpattern
    (non-capturing):
    Inline Modifiers
    (?i) (?-i) (?im-sx) (?i:subp)

    View Slide

  41. Explore

    View Slide

  42. View Slide

  43. /^((
    25[0-5]| # Match 250-255 range
    2[0-4][0-9]| # Match 200-249 range
    [01]?[0-9]{1,2} # Match 0-199 range
    )\.){3} # Repeat 3 times with period
    (25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2}) # and once without
    $/x

    View Slide

  44. View Slide

  45. [0] – Complete match
    [1] – Match against sub-pattern 1
    [2] – Match against sub-pattern 2
    [3] – Match against sub-pattern 3
    ...
    Match Array
    Photo by Petr Kratochvil

    View Slide

  46. (?)
    (?P>name)

    View Slide

  47. [0] – Complete match
    [firstname] – Match against named sub-pattern firstname
    [lastname] – Match against named sub-pattern lastname
    ...
    Match Array
    Photo by Petr Kratochvil

    View Slide

  48. Image by Gerd Altmann

    View Slide

  49. — Richard Feynman
    Know how to solve every
    problem that has been solved.
    What I cannot create,
    I do not understand.
    Photo by Gleick, J. Genius. p. 310f

    View Slide

  50. Advanced Features
    Look around
    Conditional
    sub-patterns
    Recursion
    Inline
    comments

    View Slide

  51. Thanks!
    Any questions ?
    Feedback:
    https://joind.in/talk/462b2
    Slides:
    https://speakerdeck.com/jrf
    Course:
    https://www.pluralsight.com/courses/
    regular-expressions-fundamentals

    View Slide

  52. View Slide