The Swiss Army knife of string manipulation
REGEX 101
Slide 2
Slide 2 text
@matthiasmullie
Regular expressions 101
Slide 3
Slide 3 text
Regular expressions 101
INTRODUCTION
What are regular expressions?
Slide 4
Slide 4 text
Regular expressions 101 » Introduction
Google
Regular expressions are special
characters that match or capture
portions of a field, as well as the rules
that govern all characters.
Slide 5
Slide 5 text
Regular expressions 101 » Introduction
Wikipedia
A regular expression provides a
concise and flexible means for
"matching" strings of text, such as
particular characters, words, or
patterns of characters.
Regular expressions 101 » Introduction
Me
Regular expressions find patterns in
strings.
Slide 8
Slide 8 text
Regular expressions 101 » Introduction
Neque porro quisquam est qui
dolorem ipsum quia dolor sit amet,
consectetur, adipisci velit...
!
‣ /[a-z]/i
‣ /[^\w]/i
!
‣ /ipsum/
‣ /(est|qui)/
Slide 9
Slide 9 text
Regular expressions 101
BASICS
The syntax everyone should know already
Slide 10
Slide 10 text
/Delimiter/
Regular expressions 101 » Delimiter
‣ Any [^a-zA-Z0-9\\\s]
character
‣ Opening char == terminating char
‣ Except for [ ]
, ( )
, { }
and < >
Slide 11
Slide 11 text
Regular expressions 101 » Delimiter
Use /
(uniformity, you know)
Subpatterns
Regular expressions 101 » Subpatterns
/([a-z0-9]*)@([a-z0-9\.]*\.[a-z0-9]{2,3})/i email
!
hostname
user
Note: this regex only barely satisfies my needs for this particular example; do not use this really find occurrences of email addresses, it does not fully satisfy RFC5321 & RFC5322
Slide 20
Slide 20 text
Questions?
Regular expressions 101
Slide 21
Slide 21 text
Regular expressions 101
ADVANCED
The juicy stuff you never knew about, until now
Regular expressions 101 » Back references
Back references
Solution: /href=(['"])(.*?)\1/i
\1 references first subpattern!
!
Don’t forget to also string-escape in PHP:
preg_match('/href=([\'"])(.*?)\\1/i', ...);
Slide 24
Slide 24 text
Regular expressions 101 » Named subpatterns
Named subpatterns
Scenario: parsing large CSV
1,a title,5.00,92,green
2,another title,3.50,4,blue
3,one more,33699.99,15,white
...
Slide 25
Slide 25 text
/([0-9]+),(.*?),([0-9]+\.[0-9]{2}),([0-9]+),([a-z]+)/i
!
!
Result excerpt:
Regular expressions 101 » Named subpatterns
Named subpatterns
[1] => string(1) "1"
[2] => string(7) "a title"
[3] => string(4) "5.00"
[4] => string(2) "92"
[5] => string(5) "green"
!
!
!
!
Scenario: find all occurrences of “here”
!
“Where can I find here, not there?”
Regular expressions 101 » Assertions
Lookahead/-behind assertions
Slide 31
Slide 31 text
Regular expressions 101 » Assertions
Lookahead/-behind assertions
Deduction:
Find all here’s, not preceded or followed by
an alphabetic character.
!
Solution: /(?
Regular expressions 101 » Conditional subpatterns
Conditional subpatterns
Scenario: match all (x|ht)ml tags
!
Caution!
‣
‣
Slide 36
Slide 36 text
Solution: if then else
/<(?P[a-z]+).*?(?P\/)?>(?(self)|.*?<\/(?P=tag)>)/i
Named patterns
If self-closing, then do nothing,
else, find matching end tag
Regular expressions 101 » Conditional subpatterns
Conditional subpatterns
Slide 37
Slide 37 text
Regular expressions 101 » Conditional subpatterns
Conditional subpatterns
‣ With subpattern (named or by id):
‣ (?(pattern)then)
‣ (?(pattern)then|else)
‣ With lookahead/-behind:
‣ (?(?=assertion)then)
‣ (?(?=assertion)then|else)
Slide 38
Slide 38 text
Regular expressions 101 » Comments
Comments
/
# match currency symbols for USD, EUR, GBP & YEN
[$€£¥]
# must be followed by a number to indicate a price
(?=[0-9])
# pattern modifiers:
# u for UTF-8 interpretation (currency symbols),
# x to ignore whitespace (for comments)
/ux