Slide 1

Slide 1 text

PCRE MATCHING PATTERNS @ThomasWeinert 1

Slide 2

Slide 2 text

TASK REPOSITORY bitbucket.org/thomasweinert/workshop-pcre-tasks 2

Slide 3

Slide 3 text

MATCHING: PHP FUNCTIONS preg_match preg_match_all 3 . 1

Slide 4

Slide 4 text

PREG_MATCH Find first match preg_match($pattern, $subject); preg_match($pattern, $subject, $matches); preg_match($pattern, $subject, $matches, $flags, $offset); 3 . 2

Slide 5

Slide 5 text

PREG_MATCH - RETURN VALUES Match count - 0 or 1 FALSE for errors 3 . 3

Slide 6

Slide 6 text

PREG_MATCH_ALL Find all matches preg_match_all($pattern, $subject); preg_match_all($pattern, $subject, $matches); preg_match_all($pattern, $subject, $matches, $flags, $offset); 3 . 4

Slide 7

Slide 7 text

PREG_MATCH_ALL - RETURN VALUES Match count - 0 to n FALSE for errors 3 . 5

Slide 8

Slide 8 text

$MATCHES preg_match() - array, matched groups preg_match_all() - array with PREG_PATTERN_ORDER - an array for each group in the pattern PREG_SET_ORDER - an array for each match 3 . 6

Slide 9

Slide 9 text

PATTERN ARGUMENT /string/u │ │ │└ Modifier │ │ └ Delimiter │ └ Pattern └ Delimiter 4 . 1

Slide 10

Slide 10 text

TASK: MATCH A STRING Match the string nevercodealone. This is case sensitive. $pattern = ''; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 4 . 2

Slide 11

Slide 11 text

TRY DIFFERENT DELIMITERS ASCII Letters and digits do NOT work. Brackets! 4 . 3

Slide 12

Slide 12 text

MODIFIER U - ungreedy mode i - case insensitive u - utf-8 mode x - modifies whitespace behaviour s - modifies dot behaviour m - modifies anchor behaviour D - modifies behaviour of $ anchor ... 5 . 1

Slide 13

Slide 13 text

TASK: MATCH A STRING CASE INSENSITIVE The modifier i allows case insensitive matches Match the string code. This is case insensitive. $pattern = ''; $result = preg_match_all( $pattern, 'code CODE Code', $matches ); if ($result && count($matches[0]) == 3) { echo 'SUCCESS'; } else { echo 'FAIL'; } 5 . 2

Slide 14

Slide 14 text

THE DOT Matches anything except a newline Matches anything if modifier "s" is set Escape . with \ to match an actual . 6 . 1

Slide 15

Slide 15 text

TASK: MATCH ANYTHING BUT NEWLINES Match the string cc.cc.cc.cc. "c" can by any character except a newline. $pattern = '()'; $result = preg_match_all( $pattern, "ab.cd.ef.gh\na\n.b\n.d\n.e\n\nabcdefghiklm", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 6 . 2

Slide 16

Slide 16 text

QUALIFIER What will be matched? Define bytes/characters that will be matched. 7 . 1

Slide 17

Slide 17 text

TASK: MATCH DIGITS AND NON-DIGITS The qualifier \d matches any digit (0-9). The qualifier \D matches anything except a digit. Match the a string with the structure xxXxxXxxxx. "x" represents a digit, "X" a non digit. $pattern = '()'; $result = preg_match_all( $pattern, "12.34.5678\n123456789\nab.cd.efgh", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 7 . 2

Slide 18

Slide 18 text

ANCHORS Anchor your pattern to the start and/or end of the subject. ^ - string start $ - string end 8 . 1

Slide 19

Slide 19 text

TASK: VALIDATE STRING START The ^ anchors the pattern to the string start. Validate that the string starts with a digit. $pattern = '()'; $subjects = [ '1. match' => TRUE, '2. match' => TRUE, '42' => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "end 3" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 2

Slide 20

Slide 20 text

TASK: VALIDATE STRING END The $ anchors the pattern to the string end. Validate that the string ends with a digit. $pattern = '()'; $subjects = [ 'match 1' => TRUE, 'match 2' => TRUE, '42' => TRUE, "21\n" => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "3 start" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 3

Slide 21

Slide 21 text

TASK: VALIDATE A GERMAN ZIP CODE The modifier D makes sure that a linefeed at the end of the subject is not ignored. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 4

Slide 22

Slide 22 text

MODIFIER AND ALTERNATIVES Modifier m - line anchors \A - string start \Z - string end, ignore linefeed \z - string end, recognize linefeed \b - word boundary 8 . 5

Slide 23

Slide 23 text

CHARACTER CLASSES Square Brackets: [] - for ranges ^ for negative matches many special characters lose function 9 . 1

Slide 24

Slide 24 text

TASK: MATCH VOWELS Match all the vowels (aeiou) in the string. $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 8) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 2

Slide 25

Slide 25 text

TASK: MATCH NON-VOWELS Match all the non-vowels in the string. $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 17) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 3

Slide 26

Slide 26 text

TASK: VALIDATE HEXADECIMAL BYTES Validate that the string consists of two characters. The characters can be digits or a letter between a and f. $pattern = '()'; $subjects = [ '01' => TRUE, '0f' => TRUE, 'FA' => TRUE, 'az' => FALSE, "foo" => FALSE, "123" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 9 . 4

Slide 27

Slide 27 text

QUANTIFIER How o en will it be matched? * - any count ? - maximum of 1 + - minimum of 1 {n} - exactly n {n,m} - minimum of n, maximum of m {n,} - minimum of n {0,m} - maximum of m 10 . 1

Slide 28

Slide 28 text

TASK: VALIDATE A GERMAN ZIP CODE The {n} syntax allows you to match a fixed repeat of qualifiers. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 2

Slide 29

Slide 29 text

TASK: VALIDATE A LANGUAGE CODE The {n,m} syntax allows you minimum and a maximum repetitions. Validate that the subject is an 2 or 3 letter language code. $pattern = '()'; $subjects = [ 'en' => TRUE, 'de' => TRUE, 'eng' => TRUE, 'deu' => TRUE, '123' => FALSE, "en­US" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 3

Slide 30

Slide 30 text

TASK: VALIDATE AN INTEGER ? matches one or none. + matches at least one repetition. Validate an integer including an optional leading sign $pattern = '()'; $subjects = [ '1' => TRUE, '123' => TRUE, '+123' => TRUE, '­456' => TRUE, '1.1' => FALSE, "abc" => FALSE, "123 456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 4

Slide 31

Slide 31 text

UNICODE The modifier u activates Unicode UTF-8 mode. \X - extended unicode grapheme sequence \p{ }, \p - character with unicode property \p{^ }, \P{^ } - character without unicode property \p{ } - character from script \x{ } - code point xx x xx xx script FFFF 11 . 1

Slide 32

Slide 32 text

TASK: MATCH UNICODE LETTERS Use the unicode property L to match any letter in the string "English, Русский, 中文". $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 16) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 2

Slide 33

Slide 33 text

TASK: MATCH CYRILLIC LETTERS Match any cyrillic letter in the subject. $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 7) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 3

Slide 34

Slide 34 text

GROUPS ( ) - captured group (?< > ) - named group (?: ) - group without capture ((?i) ), (?i: ) - group modifiers ... group_name ... ... ... ... 12 . 1

Slide 35

Slide 35 text

TASK: MATCH A DATE Match a date in the format "YYYY-MM-DD". Capture each part into a named group (year, month, day). $pattern = '()'; if ( preg_match($pattern, '2017­02­27', $match) && (isset($match['year']) && $match['year'] == '2017') && (isset($match['month']) && $match['month'] == '02') && (isset($match['day']) && $match['day'] == '27') ) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } 12 . 2

Slide 36

Slide 36 text

TASK: VALIDATE CONSECUTIVE UGHS Validate that the string contains 3 consecutive "ugh"s. $pattern = ''; $subjects = [ 'ughughugh' => TRUE, 'ughughughugh' => TRUE, 'ughugahugh' => FALSE, "ughughugah" => FALSE, "ughughughugah" => TRUE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 12 . 3

Slide 37

Slide 37 text

12 . 4

Slide 38

Slide 38 text

ALTERNATIVES | - alternative patterns 13 . 1

Slide 39

Slide 39 text

TASK: VALIDATE TITLE AND NAME Match strings that start with a title ('Mr.', 'Ms.', 'Mrs.'), followed by a space and a string that contains at least one letter. $pattern = '()'; $subjects = [ 'Mr. Doe' => TRUE, 'Mrs. Jane Doe' => TRUE, 'Ms. Marple' => TRUE, 'Mr. ' => FALSE, "Mrs. 1" => FALSE, "1. Mr. Doe" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 13 . 2

Slide 40

Slide 40 text

FORMAT AND COMMENT Modifier x allows formatting # - single line comment (?# ) - comment group \Q \E - remove special meaning ... ... 14 . 1

Slide 41

Slide 41 text

EXAMPLE: FORMAT AND COMMENT $pattern = '(/ (?:[a­zA­Z\\d_­]+\\.) #title (?media|download|thumb)\\. # mode (?:(?preview)\\.)? # is preview (? (?[A­Fa­f\\d]{32}) #id (?:v(?\\d+))? #version (?:\\.[a­zA­Z\\d]+)? #extension ) $)Dix'; 14 . 2

Slide 42

Slide 42 text

BACK REFERENCES \ , \g{ } - reference group by index (?P= ), \g{ } - reference group by name \g{ } - relative group reference 1 1 name name -1 15 . 1

Slide 43

Slide 43 text

TASK: VALIDATE DRUNKEN NUMBERS Validate strings that consist of the any count of same digit (11, 444, ...). $pattern = '()'; $subjects = [ '7' => TRUE, '11' => TRUE, '444' => TRUE, '8888' => TRUE, '12' => FALSE, "456" => FALSE, "ugh" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 15 . 2

Slide 44

Slide 44 text

TEMPLATES (?(DEFINE)(?< > )) (?& ) name ... name 16 . 1

Slide 45

Slide 45 text

TASK: VALIDATE IPV4 Define a template that matches number between 0 and 255. Use the template to match an IP. $pattern = ''; $subjects = [ '127.0.0.1' => TRUE, '0.0.0.0' => TRUE, '255.255.255.0' => TRUE, '1.1.1.256' => FALSE, "1.1.1.a" => FALSE, "­1.1.1.1" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 16 . 2

Slide 46

Slide 46 text

PATTERN: IPV4 $pattern = '(^ (?:(?&number)\\.){3}(?&number) (?(DEFINE) (? 25[0­5]| # 250 ­ 255 2[0­4]\\d| # 200 ­ 249 1?\\d{1,2} # 0 ­ 199 ) ) $)Dx'; 16 . 3

Slide 47

Slide 47 text

ASSERTIONS (?= ), (?! ) - Lookahead (?<= ), (?

Slide 48

Slide 48 text

LINKS http://www.php.net/manual/de/pcre.pattern.php https://www.hackerrank.com/ https://regex101.com/ http://www.regular-expressions.info/ http://www.rexegg.com/ 18