Slide 1

Slide 1 text

RegEx-Fu Juliette Reinders Folmer @jrf_nl AMSTERDAM | MAY 8-9, 2018

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Wildcards on Steroids

Slide 4

Slide 4 text

Pattern Recognition

Slide 5

Slide 5 text

Regex Engines POSIX PCRE ECMAscript Oniguruma Boost DEELX RE2 TRE Pattwo GRETA GLib/ GRegex FREJ RGX QT CL-PPCRE Jakarta Henry Spencer’s regex

Slide 6

Slide 6 text

Syntax Overlap

Slide 7

Slide 7 text

PCRE

Slide 8

Slide 8 text

Terminology /[a-z0-9]+/im Regular Expression /[a-z0-9]+/im Delimiters /[a-z0-9]+/im Modifiers

Slide 9

Slide 9 text

Tips & Tricks

Slide 10

Slide 10 text

Photo by Scott Liddell 1. If you need a screwdriver, why use a hammer ?

Slide 11

Slide 11 text

Jamie Zawinski, August 1997 alt.religion.emacs Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Slide 12

Slide 12 text

2. Nothing in life is to be feared. It is only to be understood. Marie Curie

Slide 13

Slide 13 text

Whitelisting Blacklisting Input string Input string ? ?

Slide 14

Slide 14 text

3. Not all matches are made in heaven... Photo by Petr Kratochvil

Slide 15

Slide 15 text

4. Only Elephants Remember Everything © Photo by Juliette Reinders Folmer

Slide 16

Slide 16 text

(?:)

Slide 17

Slide 17 text

Less is the new more 5

Slide 18

Slide 18 text

/ / o on one one. one.* one.*s one.*s. one.*s.? one.*s.?t one.*s.?t [a-z] one.*s.?t[a-z]+ one.*s.?t[a-z]+p = space one.*s.?t[a-z]+p one.*s.?t[a-z]+p . one.*s.?t[a-z]+p . {2,} one.*s.?t[a-z]+p .{2,}, one.*s.?t[a-z]+p .{2,}, We take one step forward, two steps back ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Slide 19

Slide 19 text

/ / We take one step back, two steps forward

Slide 20

Slide 20 text

{,m}? {n,}? {n,m}? *? +? ?? Reluctant Quantifiers

Slide 21

Slide 21 text

6. Being negative isn't always a bad thing © Photo by Juliette Reinders Folmer

Slide 22

Slide 22 text

[^]

Slide 23

Slide 23 text

7. Explore Your Boundaries Photo by Miguel A.C. Domingo

Slide 24

Slide 24 text

Beginning of string Beginning of line Word boundaries End of string End of line

Slide 25

Slide 25 text

8. The first love is the deepest...

Slide 26

Slide 26 text

/#?([A-F0-9]{6}|[A-F0-9]{3})/i

Slide 27

Slide 27 text

9. What's this global village people keep talking about ???

Slide 28

Slide 28 text

Character classes PCRE POSIX [0-9] [^0-9] \d \D [[:digit:]] [^[:digit:]] [A-Za-z0-9_] [^A-Za-z0-9_] \w \W [[:word:]] [^[:word:]] [\t\f\r\n \v] [^\t\f\r\n \v] \s \S [[:space:]] [^[:space:]] [\t\f ] [^\t\f ] \h \H [[:blank:]] [^[:blank:]] [\r\n] [^\r\n] \v \V - -

Slide 29

Slide 29 text

déjà vu [\w ]+ French (fr) déjà vu [\w ]+ English (en)

Slide 30

Slide 30 text

10. Escape and escape again

Slide 31

Slide 31 text

String delimiter - for prog language Regex delimiter - for regex - for prog language Meta-characters - for regex - for prog language What to Escape ?

Slide 32

Slide 32 text

\[ \] \( \) \| \. \? \* \+ \{ \} \^ \$ \\ \/ Literals [ ] ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters

Slide 33

Slide 33 text

[(] [)] [|] [.] [?][*][+][{][}] [$] [/] Literals [ ] ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters

Slide 34

Slide 34 text

Java String.quote() quoteReplacement() PHP preg_quote() Matlab regexptranslate() Python re.escape() Objective-C escapedTemplateForString() escapedPatternForString() Ruby Regexp.escape() Regexp.quote() Escaping Arbitrary Strings // Javascript: function escapeInputString( str ) { return str.replace(/[[\]\/\\{}()|?+^$*.-]/g, "\\$&"); }

Slide 35

Slide 35 text

Matching a Literal Backslash \\\\ The actual backslash \\\\ Escaping for use in regex \\\\ String escape

Slide 36

Slide 36 text

Modify your behaviour 11

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Setting: Unsetting: Combined: Apply to subpattern (non-capturing): Inline Modifiers (?i) (?-i) (?im-sx) (?i:subp)

Slide 39

Slide 39 text

Explore

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

/^(( 25[0-5]| # Match 250-255 range 2[0-4][0-9]| # Match 200-249 range [01]?[0-9]{1,2} # Match 0-199 range )\.){3} # Repeat 3 times with period (25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2}) # and once without $/x

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

[0] – Complete match [1] – Match against sub-pattern 1 [2] – Match against sub-pattern 2 [3] – Match against sub-pattern 3 ... Match Array Photo by Petr Kratochvil

Slide 44

Slide 44 text

(?) (?P>name)

Slide 45

Slide 45 text

[0] – Complete match [firstname] – Match against named sub-pattern firstname [lastname] – Match against named sub-pattern lastname ... Match Array Photo by Petr Kratochvil

Slide 46

Slide 46 text

Image by Gerd Altmann

Slide 47

Slide 47 text

— Richard Feynman Know how to solve every problem that has been solved. What I cannot create, I do not understand. Photo by Gleick, J. Genius. p. 310f

Slide 48

Slide 48 text

Advanced Features Look around Conditional sub-patterns Recursion Inline comments

Slide 49

Slide 49 text

Thanks! Any questions ? Slides: https://speakerdeck.com/jrf Course: https://www.pluralsight.com/courses/ regular-expressions-fundamentals