Slide 1

Slide 1 text

Regular Expression Fundamentals Juliette Reinders Folmer @jrf_nl regexcheatsheets.com

Slide 2

Slide 2 text

pluralsight.com

Slide 3

Slide 3 text

Wildcards on Steriods

Slide 4

Slide 4 text

Pattern Recognition

Slide 5

Slide 5 text

Serial numbers Barcodes Flight numbers CSV files Log files Email headers Twitter handles Facebook username Skype usernames MD5 hash Sentences Good passwords Isbn numbers HTML code Html tags Html attributes CSS code Urls Email addresses File names File extensions Directory paths Postal codes Telephone numbers Number plates Credit card numbers Bank account numbers Mathematical formulas Elements from the periodical table Patterns in text strings

Slide 6

Slide 6 text

Whitelisting Blacklisting Input string Input string ? ?

Slide 7

Slide 7 text

Typical Uses for Regular Expressions Search (and replace) String parsing Data mapping Syntax highlighting Data scraping Input validation

Slide 8

Slide 8 text

(Sys-)Admins File system Server directives Data Professionals Query data Developers Working with strings Users of Regular Expressions

Slide 9

Slide 9 text

How It Works

Slide 10

Slide 10 text

Subject String Subject

Slide 11

Slide 11 text

How It Works Pattern Regex Subject Function Engine Result

Slide 12

Slide 12 text

What are the matches ? How many matches have been found ? Does it match ? Result Types

Slide 13

Slide 13 text

Regex Engines POSIX PCRE ECMAscript Oniguruma Boost DEELX RE2 TRE Pattwo GRETA GLib/ GRegex FREJ RGX QT CL-PPCRE Jakarta Henry Spencer’s regex

Slide 14

Slide 14 text

Syntax Overlap

Slide 15

Slide 15 text

Still with me ?

Slide 16

Slide 16 text

Terminology /[a-z0-9]+/im Regular Expression /[a-z0-9]+/im Delimiters /[a-z0-9]+/im Modifiers

Slide 17

Slide 17 text

A a 1 . ? * + {#} [...] ( ... | ... ) ^ ... $ \w \d \s g m s i  Literals  Wildcard  Quantifiers  Character ranges  Grouping and alternation  Anchors  Shorthand character codes  Modifiers Basic Syntax

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

The Pattern # AB 12 34

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Visualization of the Pattern

Slide 22

Slide 22 text

Tips & Tricks

Slide 23

Slide 23 text

Photo by Scott Liddell 1. If you need a screwdriver, why use a hammer ?

Slide 24

Slide 24 text

Jamie Zawinski, August 1997 alt.religion.emacs Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Slide 25

Slide 25 text

2. Not all matches are made in heaven... Photo by Petr Kratochvil

Slide 26

Slide 26 text

3. Only Elephants Remember Everything © Photo by Juliette Reinders Folmer

Slide 27

Slide 27 text

(?:)

Slide 28

Slide 28 text

4. Being negative isn't always a bad thing © Photo by Juliette Reinders Folmer

Slide 29

Slide 29 text

[^]

Slide 30

Slide 30 text

Less is the new more 5

Slide 31

Slide 31 text

/ / o on one one. one.* one.*s one.*s. one.*s.? one.*s.?t one.*s.?t [a-z] one.*s.?t[a-z]+ one.*s.?t[a-z]+p = space one.*s.?t[a-z]+p one.*s.?t[a-z]+p . one.*s.?t[a-z]+p . {2,} one.*s.?t[a-z]+p .{2,}, one.*s.?t[a-z]+p .{2,}, We take one step forward, two steps back ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Slide 32

Slide 32 text

/ / We take one step back, two steps forward

Slide 33

Slide 33 text

{,m}? {n,}? {n,m}? *? +? ?? Reluctant Quantifiers

Slide 34

Slide 34 text

6. Explore Your Boundaries Photo by Miguel A.C. Domingo

Slide 35

Slide 35 text

Beginning of string Beginning of line Word boundaries End of string End of line

Slide 36

Slide 36 text

7. The first love is the deepest...

Slide 37

Slide 37 text

/#?([A-F0-9]{6}|[A-F0-9]{3})/i

Slide 38

Slide 38 text

8. What's this global village people keep talking about ???

Slide 39

Slide 39 text

Character classes PCRE POSIX [0-9] [^0-9] \d \D [[:digit:]] [^[:digit:]] [A-Za-z0-9_] [^A-Za-z0-9_] \w \W [[:word:]] [^[:word:]] [\t\f\r\n \v] [^\t\f\r\n \v] \s \S [[:space:]] [^[:space:]] [\t\f ] [^\t\f ] \h \H [[:blank:]] [^[:blank:]] [\r\n] [^\r\n] \v \V - -

Slide 40

Slide 40 text

déjà vu [\w ]+ French (fr) déjà vu [\w ]+ English (en)

Slide 41

Slide 41 text

9. Escape and escape again

Slide 42

Slide 42 text

\[ \] \( \) \| \. \? \* \+ \{ \} \^ \$ \\ \/ Literals [ ] ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters

Slide 43

Slide 43 text

[(] [)] [|] [.] [?][*][+][{][}] [$] [/] Literals [ ] ( ) | . ? * + { } ^ $ \ / (delimiter) Special Meaning Escaping Meta Characters

Slide 44

Slide 44 text

Java String.quote() quoteReplacement() PHP preg_quote() Matlab regexptranslate() Python re.escape() Objective-C escapedTemplateForString() escapedPatternForString() Ruby Regexp.escape() Regexp.quote() Escaping Arbitrary Strings // Javascript: function escapeInputString( str ) { return str.replace(/[[\]\/\\{}()|?+^$*.-]/g, "\\$&"); }

Slide 45

Slide 45 text

Matching a Literal Backslash \\\\ The actual backslash \\\\ Escaping for use in regex \\\\ String escape

Slide 46

Slide 46 text

Modify your behaviour 10

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Setting: Unsetting: Combined: Apply to subpattern (non-capturing): Inline Modifiers (?i) (?-i) (?im-sx) (?i:subp)

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Advanced Features Look around Named sub-matches Conditional sub-patterns Recursion Inline comments

Slide 51

Slide 51 text

Thanks! Any questions ? Slides: https://speakerdeck.com/jrf Course: https://www.pluralsight.com/courses/ regular-expressions-fundamentals