Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PCRE With PHP
Search
Thomas Weinert
January 24, 2015
Programming
810
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
PCRE With PHP
PHP Benelux 2015
Thomas Weinert
January 24, 2015
More Decks by Thomas Weinert
See All by Thomas Weinert
Build Automation with Phive and Phing
thomasweinert
0
280
Introduction: PHP Extensions
thomasweinert
2
870
PCRE - Matching Patterns
thomasweinert
0
170
Controlling Arduino With PHP
thomasweinert
2
600
Modern PHP
thomasweinert
3
250
Controlling Arduino With PHP
thomasweinert
1
190
XPATH WITH PHP AND JS
thomasweinert
0
160
PHPUG CGN: Arduino With PHP
thomasweinert
0
160
IPC 2013: Controlling Arduino With PHP
thomasweinert
0
260
Other Decks in Programming
See All in Programming
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
180
Webフレームワークの ベンチマークについて
yusukebe
0
180
代数的データ型って何が嬉しいの? #frontend_phpcon_do
kajitack
8
3.8k
Datadog LLM Observabilityで実現する 安全なLLM Usage 管理
3150
0
110
並列実装の現場、2ヶ月間実務でAIを使い倒したAIもPCも私も限界が近い
ming_ayami
0
130
エンジニア向け会社紹介/Findy Company Profile
findyinc
6
350k
Performance Engineering for Everyone
elenatanasoiu
0
220
ふつうのFeature Flag実践入門
irof
8
4.2k
AI駆動開発を妨げる技術的負債の解消アプローチ / ai-refactoring-approach
minodriven
14
6.8k
TAKTでAI駆動開発の品質を設計する
j5ik2o
7
1.5k
Honoでのサプライチェーン侵害対策 〜 3つのライブラリに学ぶ
yusukebe
7
1.4k
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
5.4k
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
370
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
450
Being A Developer After 40
akosma
91
590k
Test your architecture with Archunit
thirion
1
2.3k
Are puppies a ranking factor?
jonoalderson
1
3.6k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
220
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.8k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
430
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
190
Prompt Engineering for Job Search
mfonobong
0
350
Transcript
PCRE WITH PHP @Thomas Weinert
ABOUT PHP functions and classes PCRE syntax
WARNING! Slides contain a lot of example source Most of
the examples are really stupid
PREG_MATCH() Pattern Subject Matches Flags Offset
PREG_MATCH() EXAMPLE preg_match('(a.?)', 'abac', $match); var_dump($match); array(1) { [0]=> string(2)
"ab" }
FLAG: PREG_OFFSET_CAPTURE preg_match('(a.?)', 'abac', $match, PREG_OFFSET_CAPTURE, 2); var_dump($match); array(1) {
[0]=> array(2) { [0]=> string(2) "ac" [1]=> int(2) } }
OFFSET $subject = 'aa ab ac ad'; $offset = 0;
$length = strlen($subject); while ($offset < $length) { if (preg_match('(a.)', $subject, $match, PREG_OFFSET_CAPTURE, $offset)) { $offset = $match[0][1] + strlen($match[0][0]); var_dump($match[0][0]); } else { break; } } string(2) "aa" string(2) "ab" string(2) "ac" string(2) "ad"
PATTERN Delimiter Expression Modifiers /expression/x
DELIMITER Any non alphanumeric character Escaping Special meaning Brackets
DELIMITER: BRACKETS preg_match('((one)(two))', 'onetwo', $match); var_dump($match); array(3) { [0]=> string(6)
"onetwo" [1]=> string(3) "one" [2]=> string(3) "two" }
PATTERN String Escaping $pattern = '(\\\n)'; $text = <<<'TEXT' foo\nbar
TEXT; preg_match($pattern, $text, $match); var_dump($pattern, $text, $match); string(5) "(\\n)" string(8) "foo\nbar" array(1) { [0]=> string(2) "\n" }
MODIFIERS x - PCRE_EXTENDED u - PCRE_UTF8 D - PCRE_DOLLAR_ENDONLY
s - PCRE_DOTALL m - PCRE_MULTILINE i - PCRE_CASELESS ...
PCRE_EXTENDED $pattern = <<<'REGEX' (^ (d‐)? # optional country prefix
(\d{5}) # german zip code $)Dix REGEX; var_dump((bool)preg_match($pattern, 'D‐50670')); bool(true)
PCRE_UTF8 (^.*$)u Pattern and subject need to be valid UTF-8!
UTF-8 1 to 4 (5 and 6 are invalid)
PCRE_DOLLAR_ENDONLY $examples = [ ["(^\\d+$)", "123"], ["(^\\d+$)", "123\n"], ["(^\\d+$)D", "123\n"],
["(\\A\\d+\\G)", "123\n"] ]; foreach ($examples as $example) { var_dump((bool)preg_match($example[0], $example[1], $match)); } bool(true) bool(true) bool(false) bool(false)
PCRE_DOTALL $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)s", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(7) "123 456" }
PCRE_MULTILINE $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)m", "123\n456"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(3) "123" }
PCRE_CASELESS $examples = [ ["(foo)", "foo"], ["(foo)", "FOO"], ["(foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "foo" } array(0) { } array(1) { [0]=> string(3) "FOO" }
PREG_MATCH_ALL() $subject = 'aa ab ac ad'; preg_match_all('(a.)', $subject, $match);
var_dump($match); array(1) { [0]=> array(4) { [0]=> string(2) "aa" [1]=> string(2) "ab" [2]=> string(2) "ac" [3]=> string(2) "ad" } }
PREG_PATTERN_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match); var_dump($match); array(2)
{ [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(2) "ac" } [1]=> array(2) { [0]=> string(1) "b" [1]=> string(1) "c" } }
PREG_SET_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match, PREG_SET_ORDER); var_dump($match);
array(2) { [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(1) "b" } [1]=> array(2) { [0]=> string(2) "ac" [1]=> string(1) "c" } }
PREG_REPLACE() var_dump( preg_replace("(')", '"', "'Hello'") ); string(7) ""Hello""
ARRAY ARGUMENTS var_dump( preg_replace(['(\\\r)', '(\\\n)'], ['CR', 'LF'], '\\r and \\n')
); string(9) "CR and LF"
REFERENCING SUBPATTERNS var_dump( preg_replace('(a(.))', 'a#${1}#', 'ab ac') ); string(9) "a#b#
a#c#" \\1 $1 ${1}
PREG_REPLACE_CALLBACK() No need for modifier "e" (PREG_REPLACE_EVAL) var_dump( preg_replace_callback( '(a(.))',
function ($match) { return strtoupper($match[1]); }, 'ab ac' ) ); string(3) "B C"
FUNCTOR class Replacer { public function __invoke($match) { return strtoupper($match[1]);
} } var_dump( preg_replace_callback( '(a(.))', new Replacer(), 'ab ac' ) );
PREG_SPLIT() $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject); var_dump($match); array(5) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(0) "" [3]=> string(5) "three" [4]=> string(4) "four" }
PREG_SPLIT_NO_EMPTY $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,
$subject, ‐1, PREG_SPLIT_NO_EMPTY); var_dump($match); array(4) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(5) "three" [3]=> string(4) "four" }
PREG_SPLIT_OFFSET_CAPTURE $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree"; $flags = PREG_SPLIT_NO_EMPTY
| PREG_SPLIT_OFFSET_CAPTURE; $match = preg_split($pattern, $subject, ‐1, $flags); var_dump($match); array(3) { [0]=> array(2) { [0]=> string(3) "one" [1]=> int(0) } [1]=> array(2) { [0]=> string(3) "two" [1]=> int(4) } [2]=> array(2) { [0]=> string(5) "three" [1]=> int(9) } }
PREG_SPLIT_DELIM_CAPTURE $highlights = ['small' => '*', 'short' => '_']; $pattern
= '((small|short))u'; $subject = "A small, short example"; $match = preg_split($pattern, $subject, ‐1, PREG_SPLIT_DELIM_CAPTURE); foreach ($match as $part) { if (isset($highlights[$part])) { echo $highlights[$part], $part, $highlights[$part]; } else { echo $part; } } A *small*, _short_ example
PREG_QUOTE() var_dump('('.preg_quote('/.*/').')'); string(8) "(/\.\*/)"
REGEXITERATOR $data = new ArrayIterator(['aa', 'ab']); $iterator = new RegexIterator(
$data, '(.(.))', RegexIterator::REPLACE ); $iterator‐>replacement = '$1'; var_dump(iterator_to_array($iterator)); array(2) { [0] => string(1) "a" [1] => string(1) "b" }
REGEXITERATOR MODES MATCH GET_MATCH ALL_MATCHES SPLIT REPLACE USE_KEY
UNICODE Modifier u All: \X Token: \x{A9} Category: \p{L} Negation:
\P{L}, \p{^L} Scripts: \p{Hangul} Blocks: \p{Arrows}
UNICODE EXAMPLE $data = <<<'DATA' English German 한국어 日本語 DATA;
preg_match_all('(\\pL+)u', $data, $match); var_dump($match[0]); array(4) { [0] => string(7) "English" [1] => string(6) "German" [2] => string(9) "한국어" [3] => string(9) "日本語" }
NON CATCHING SUBPATTERNS preg_match('((?:one)(two))', 'onetwo', $match); var_dump($match); array(2) { [0]=>
string(6) "onetwo" [1]=> string(3) "two" }
SUBPATTERN MODIFIERS (?i‐sm) $examples = [ ["((?i)foo)", "FOO"], ["((?‐i)foo)i", "FOO"]
]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "FOO" } array(0) { }
NAMED SUBPATTERNS $pattern = "(^ (?P<year>\d{4}) (?:‐(?<month>\d{1,2}))? (?:‐(?'day'\d{1,2}))? )x"; preg_match($pattern,
"2015‐01‐24", $match); var_dump($match);</month></year> array(7) { [0]=> string(10) "2015‐01‐24" ["year"]=> string(4) "2015" [1]=> string(4) "2015" ["month"]=> string(2) "01" [2]=> string(2) "01" ["day"]=> string(2) "24" [3]=> string(2) "24" }
PRE-DEFINED SUBROUTINES $pattern = "( ^ (?&number) (?:\\.(?&number)){3} $ (?(DEFINE)
(?'number'25[0‐5]|2[1‐4]\d|1\d{2}|\d{1,2}) ) )x"; var_dump((bool)preg_match($pattern, "127.0.0.1", $match)); var_dump((bool)preg_match($pattern, "355.0.0.1", $match)); bool(true) bool(false)
ASSERTIONS Look Around Look Ahead Look Behind
LOOK AHEAD $examples = [ ["(h(?=e))", "hello"], ["(h(?=e)llo)", "hello"], ["(h(?=e).llo)",
"hello"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "h" } array(0) { } array(1) { [0]=> string(5) "hello" }
LOOK AHEAD - NEGATION $examples = [ ["(h(?!e))", "hello"], ["(h(?!e))",
"hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(0) { } array(1) { [0]=> string(1) "h" }
LOOK BEHIND $examples = [ ["((?<=h).)", "hello"], ["((?<!h).)", "hallo"] ];
foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "e" } array(1) { [0]=> string(1) "h" }
LOOK BEHIND - ALTERNATIVES $examples = [ ["((?<=e|ha|.{2})l)", "hello"], ["((?<=e|ha)l)",
"hallo"], ["((?<=e|.{2})l)", "hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" }
LOOK BEHIND - UNKNOWN LENGTH preg_match("((?<=.{2,})l)", 'hello', $match); Warning: preg_match():
Compilation failed: lookbehind assertion is not fixed length at offset 9 in /tmp... on line 2
CONDITIONALS $pattern = '((?<quote>[\'"])?(?(quote).*?\\k<quote>|\\w+))'; $data = ['foo', '"foo"', "'foo'", 'foo
bar', '"foo bar"']; foreach ($data as $subject) { if (preg_match($pattern, $subject, $match)) { echo $match[0], "\n"; } }</quote></quote> foo "foo" 'foo' foo "foo bar"
RECURSIONS $pattern = <<<'PCRE' ( \( ( (?>[^()]+) | (?R)
)* \) )Ux PCRE; preg_match_all($pattern, '(ab(cd)ef)(gh)', $match); var_dump($match); array(2) { [0] => array(2) { [0] => string(10) "(ab(cd)ef)" [1] => string(4) "(gh)" } [1] => array(2) { [0] => string(1) "f" [1] => string(1) "h" } }
START OF PATTERN MODIFIERS (*UTF), (*UTF8), (*UTF16), (*UTF32) (*UTF)(*UCP) =
u (*CR), (*LF), (*CRLF), (*ANYCRLF), (*ANY) (*BSR_ANYCRLF), (*BSR_UNICODE) - \R (*LIMIT_MATCH=x), (*LIMIT_RECURSION=d) (*NO_AUTO_POSSESS), (*NO_START_OPT) (*NOTEMPTY), (*NOTEMPTY_ATSTART)
CONTROL VERBS (SKIP*)(?!) (PRUNE*) (THEN*) (COMMIT*) (ACCEPT*) http://perldoc.perl.org/perlre .html#Special-Backtracking-Control-Verbs
REGEX101.COM
VERSIONS PCRE2 10.0 2015-01-05 PCRE 8.36 2014-09-26 3V4L.ORG PHP7, HHVM
>= 3.3: 8.35 2014-04-04 PHP >= 5.5.10: 8.34 2013-12-15
LINKS http://www.rexegg.com/ http://www.regular-expressions.info/ https://www.regex101.com/
THANKS