Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PCRE With PHP
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Thomas Weinert
January 24, 2015
Programming
810
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
PCRE With PHP
PHP Benelux 2015
Thomas Weinert
January 24, 2015
More Decks by Thomas Weinert
See All by Thomas Weinert
Build Automation with Phive and Phing
thomasweinert
0
280
Introduction: PHP Extensions
thomasweinert
2
870
PCRE - Matching Patterns
thomasweinert
0
170
Controlling Arduino With PHP
thomasweinert
2
600
Modern PHP
thomasweinert
3
250
Controlling Arduino With PHP
thomasweinert
1
190
XPATH WITH PHP AND JS
thomasweinert
0
160
PHPUG CGN: Arduino With PHP
thomasweinert
0
160
IPC 2013: Controlling Arduino With PHP
thomasweinert
0
260
Other Decks in Programming
See All in Programming
dRuby over BLE
makicamel
2
390
過去最大のMCPアップデート! 2026-07-28 RC版の謎に迫る
licux
6
390
脅威をエンジニアリングの糧にして――現場編 / Turning Threats into Engineering Fuel — Field Edition
nrslib
0
300
メソッドのジェネリクスでGoの夢は広がるか? / Kyoto.go #65
utgwkk
3
940
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
560
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
590
JavaDoc 再入門
nagise
1
420
AI駆動開発を妨げる技術的負債の解消アプローチ / ai-refactoring-approach
minodriven
14
6.8k
才能?センス?知らん、 続けたもん勝ちだ。-- 結婚・出産・癌を越えてなお、私がプロダクトを創り続ける理由
16bitidol
1
350
Claspは野良GASの夢をみるか
takter00
0
210
さぁV100、メモリをお食べ・・・
nilpe
0
150
エージェンティックRAGにAWSで入門しよう!
har1101
9
1.8k
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.5k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
200
Site-Speed That Sticks
csswizardry
13
1.2k
Typedesign – Prime Four
hannesfritz
42
3.1k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
BBQ
matthewcrist
89
10k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
280
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Code Review Best Practice
trishagee
74
20k
Ruling the World: When Life Gets Gamed
codingconduct
0
260
How to train your dragon (web standard)
notwaldorf
97
6.7k
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
Transcript
PCRE WITH PHP @Thomas Weinert
ABOUT PHP functions and classes PCRE syntax
WARNING! Slides contain a lot of example source Most of
the examples are really stupid
PREG_MATCH() Pattern Subject Matches Flags Offset
PREG_MATCH() EXAMPLE preg_match('(a.?)', 'abac', $match); var_dump($match); array(1) { [0]=> string(2)
"ab" }
FLAG: PREG_OFFSET_CAPTURE preg_match('(a.?)', 'abac', $match, PREG_OFFSET_CAPTURE, 2); var_dump($match); array(1) {
[0]=> array(2) { [0]=> string(2) "ac" [1]=> int(2) } }
OFFSET $subject = 'aa ab ac ad'; $offset = 0;
$length = strlen($subject); while ($offset < $length) { if (preg_match('(a.)', $subject, $match, PREG_OFFSET_CAPTURE, $offset)) { $offset = $match[0][1] + strlen($match[0][0]); var_dump($match[0][0]); } else { break; } } string(2) "aa" string(2) "ab" string(2) "ac" string(2) "ad"
PATTERN Delimiter Expression Modifiers /expression/x
DELIMITER Any non alphanumeric character Escaping Special meaning Brackets
DELIMITER: BRACKETS preg_match('((one)(two))', 'onetwo', $match); var_dump($match); array(3) { [0]=> string(6)
"onetwo" [1]=> string(3) "one" [2]=> string(3) "two" }
PATTERN String Escaping $pattern = '(\\\n)'; $text = <<<'TEXT' foo\nbar
TEXT; preg_match($pattern, $text, $match); var_dump($pattern, $text, $match); string(5) "(\\n)" string(8) "foo\nbar" array(1) { [0]=> string(2) "\n" }
MODIFIERS x - PCRE_EXTENDED u - PCRE_UTF8 D - PCRE_DOLLAR_ENDONLY
s - PCRE_DOTALL m - PCRE_MULTILINE i - PCRE_CASELESS ...
PCRE_EXTENDED $pattern = <<<'REGEX' (^ (d‐)? # optional country prefix
(\d{5}) # german zip code $)Dix REGEX; var_dump((bool)preg_match($pattern, 'D‐50670')); bool(true)
PCRE_UTF8 (^.*$)u Pattern and subject need to be valid UTF-8!
UTF-8 1 to 4 (5 and 6 are invalid)
PCRE_DOLLAR_ENDONLY $examples = [ ["(^\\d+$)", "123"], ["(^\\d+$)", "123\n"], ["(^\\d+$)D", "123\n"],
["(\\A\\d+\\G)", "123\n"] ]; foreach ($examples as $example) { var_dump((bool)preg_match($example[0], $example[1], $match)); } bool(true) bool(true) bool(false) bool(false)
PCRE_DOTALL $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)s", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(7) "123 456" }
PCRE_MULTILINE $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)m", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(3) "123" }
PCRE_CASELESS $examples = [ ["(foo)", "foo"], ["(foo)", "FOO"], ["(foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "foo" } array(0) { } array(1) { [0]=> string(3) "FOO" }
PREG_MATCH_ALL() $subject = 'aa ab ac ad'; preg_match_all('(a.)', $subject, $match);
var_dump($match); array(1) { [0]=> array(4) { [0]=> string(2) "aa" [1]=> string(2) "ab" [2]=> string(2) "ac" [3]=> string(2) "ad" } }
PREG_PATTERN_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match); var_dump($match); array(2)
{ [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(2) "ac" } [1]=> array(2) { [0]=> string(1) "b" [1]=> string(1) "c" } }
PREG_SET_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match, PREG_SET_ORDER); var_dump($match);
array(2) { [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(1) "b" } [1]=> array(2) { [0]=> string(2) "ac" [1]=> string(1) "c" } }
PREG_REPLACE() var_dump( preg_replace("(')", '"', "'Hello'") ); string(7) ""Hello""
ARRAY ARGUMENTS var_dump( preg_replace(['(\\\r)', '(\\\n)'], ['CR', 'LF'], '\\r and \\n')
); string(9) "CR and LF"
REFERENCING SUBPATTERNS var_dump( preg_replace('(a(.))', 'a#${1}#', 'ab ac') ); string(9) "a#b#
a#c#" \\1 $1 ${1}
PREG_REPLACE_CALLBACK() No need for modifier "e" (PREG_REPLACE_EVAL) var_dump( preg_replace_callback( '(a(.))',
function ($match) { return strtoupper($match[1]); }, 'ab ac' ) ); string(3) "B C"
FUNCTOR class Replacer { public function __invoke($match) { return strtoupper($match[1]);
} } var_dump( preg_replace_callback( '(a(.))', new Replacer(), 'ab ac' ) );
PREG_SPLIT() $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject); var_dump($match); array(5) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(0) "" [3]=> string(5) "three" [4]=> string(4) "four" }
PREG_SPLIT_NO_EMPTY $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject, ‐1, PREG_SPLIT_NO_EMPTY); var_dump($match); array(4) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(5) "three" [3]=> string(4) "four" }
PREG_SPLIT_OFFSET_CAPTURE $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree"; $flags = PREG_SPLIT_NO_EMPTY
| PREG_SPLIT_OFFSET_CAPTURE; $match = preg_split($pattern, $subject, ‐1, $flags); var_dump($match); array(3) { [0]=> array(2) { [0]=> string(3) "one" [1]=> int(0) } [1]=> array(2) { [0]=> string(3) "two" [1]=> int(4) } [2]=> array(2) { [0]=> string(5) "three" [1]=> int(9) } }
PREG_SPLIT_DELIM_CAPTURE $highlights = ['small' => '*', 'short' => '_']; $pattern
= '((small|short))u'; $subject = "A small, short example"; $match = preg_split($pattern, $subject, ‐1, PREG_SPLIT_DELIM_CAPTURE); foreach ($match as $part) { if (isset($highlights[$part])) { echo $highlights[$part], $part, $highlights[$part]; } else { echo $part; } } A *small*, _short_ example
PREG_QUOTE() var_dump('('.preg_quote('/.*/').')'); string(8) "(/\.\*/)"
REGEXITERATOR $data = new ArrayIterator(['aa', 'ab']); $iterator = new RegexIterator(
$data, '(.(.))', RegexIterator::REPLACE ); $iterator‐>replacement = '$1'; var_dump(iterator_to_array($iterator)); array(2) { [0] => string(1) "a" [1] => string(1) "b" }
REGEXITERATOR MODES MATCH GET_MATCH ALL_MATCHES SPLIT REPLACE USE_KEY
UNICODE Modifier u All: \X Token: \x{A9} Category: \p{L} Negation:
\P{L}, \p{^L} Scripts: \p{Hangul} Blocks: \p{Arrows}
UNICODE EXAMPLE $data = <<<'DATA' English German 한국어 日本語 DATA;
preg_match_all('(\\pL+)u', $data, $match); var_dump($match[0]); array(4) { [0] => string(7) "English" [1] => string(6) "German" [2] => string(9) "한국어" [3] => string(9) "日本語" }
NON CATCHING SUBPATTERNS preg_match('((?:one)(two))', 'onetwo', $match); var_dump($match); array(2) { [0]=>
string(6) "onetwo" [1]=> string(3) "two" }
SUBPATTERN MODIFIERS (?i‐sm) $examples = [ ["((?i)foo)", "FOO"], ["((?‐i)foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "FOO" } array(0) { }
NAMED SUBPATTERNS $pattern = "(^ (?P<year>\d{4}) (?:‐(?<month>\d{1,2}))? (?:‐(?'day'\d{1,2}))? )x"; preg_match($pattern,
"2015‐01‐24", $match); var_dump($match);</month></year> array(7) { [0]=> string(10) "2015‐01‐24" ["year"]=> string(4) "2015" [1]=> string(4) "2015" ["month"]=> string(2) "01" [2]=> string(2) "01" ["day"]=> string(2) "24" [3]=> string(2) "24" }
PRE-DEFINED SUBROUTINES $pattern = "( ^ (?&number) (?:\\.(?&number)){3} $ (?(DEFINE)
(?'number'25[0‐5]|2[1‐4]\d|1\d{2}|\d{1,2}) ) )x"; var_dump((bool)preg_match($pattern, "127.0.0.1", $match)); var_dump((bool)preg_match($pattern, "355.0.0.1", $match)); bool(true) bool(false)
ASSERTIONS Look Around Look Ahead Look Behind
LOOK AHEAD $examples = [ ["(h(?=e))", "hello"], ["(h(?=e)llo)", "hello"], ["(h(?=e).llo)",
"hello"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "h" } array(0) { } array(1) { [0]=> string(5) "hello" }
LOOK AHEAD - NEGATION $examples = [ ["(h(?!e))", "hello"], ["(h(?!e))",
"hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(0) { } array(1) { [0]=> string(1) "h" }
LOOK BEHIND $examples = [ ["((?<=h).)", "hello"], ["((?<!h).)", "hallo"] ];
foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "e" } array(1) { [0]=> string(1) "h" }
LOOK BEHIND - ALTERNATIVES $examples = [ ["((?<=e|ha|.{2})l)", "hello"], ["((?<=e|ha)l)",
"hallo"], ["((?<=e|.{2})l)", "hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" }
LOOK BEHIND - UNKNOWN LENGTH preg_match("((?<=.{2,})l)", 'hello', $match); Warning: preg_match():
Compilation failed: lookbehind assertion is not fixed length at offset 9 in /tmp... on line 2
CONDITIONALS $pattern = '((?<quote>[\'"])?(?(quote).*?\\k<quote>|\\w+))'; $data = ['foo', '"foo"', "'foo'", 'foo
bar', '"foo bar"']; foreach ($data as $subject) { if (preg_match($pattern, $subject, $match)) { echo $match[0], "\n"; } }</quote></quote> foo "foo" 'foo' foo "foo bar"
RECURSIONS $pattern = <<<'PCRE' ( \( ( (?>[^()]+) | (?R)
)* \) )Ux PCRE; preg_match_all($pattern, '(ab(cd)ef)(gh)', $match); var_dump($match); array(2) { [0] => array(2) { [0] => string(10) "(ab(cd)ef)" [1] => string(4) "(gh)" } [1] => array(2) { [0] => string(1) "f" [1] => string(1) "h" } }
START OF PATTERN MODIFIERS (*UTF), (*UTF8), (*UTF16), (*UTF32) (*UTF)(*UCP) =
u (*CR), (*LF), (*CRLF), (*ANYCRLF), (*ANY) (*BSR_ANYCRLF), (*BSR_UNICODE) - \R (*LIMIT_MATCH=x), (*LIMIT_RECURSION=d) (*NO_AUTO_POSSESS), (*NO_START_OPT) (*NOTEMPTY), (*NOTEMPTY_ATSTART)
CONTROL VERBS (SKIP*)(?!) (PRUNE*) (THEN*) (COMMIT*) (ACCEPT*) http://perldoc.perl.org/perlre .html#Special-Backtracking-Control-Verbs
REGEX101.COM
VERSIONS PCRE2 10.0 2015-01-05 PCRE 8.36 2014-09-26 3V4L.ORG PHP7, HHVM
>= 3.3: 8.35 2014-04-04 PHP >= 5.5.10: 8.34 2013-12-15
LINKS http://www.rexegg.com/ http://www.regular-expressions.info/ https://www.regex101.com/
THANKS