PCRE - Matching Patterns

PCRE - Matching Patterns

PCRE Inroduction, Interactive Workshop, A wireless keyboard is given around and attendees solve the tasks on the slides.

3f2fb8bbcd44609346e1cc0c06d0a39b?s=128

Thomas Weinert

March 04, 2017
Tweet

Transcript

  1. 8.

    $MATCHES preg_match() - array, matched groups preg_match_all() - array with

    PREG_PATTERN_ORDER - an array for each group in the pattern PREG_SET_ORDER - an array for each match 3 . 6
  2. 9.

    PATTERN ARGUMENT /string/u │ │ │└ Modifier │ │ └

    Delimiter │ └ Pattern └ Delimiter 4 . 1
  3. 10.

    TASK: MATCH A STRING Match the string nevercodealone. This is

    case sensitive. $pattern = ''; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 4 . 2
  4. 12.

    MODIFIER U - ungreedy mode i - case insensitive u

    - utf-8 mode x - modifies whitespace behaviour s - modifies dot behaviour m - modifies anchor behaviour D - modifies behaviour of $ anchor ... 5 . 1
  5. 13.

    TASK: MATCH A STRING CASE INSENSITIVE The modifier i allows

    case insensitive matches Match the string code. This is case insensitive. $pattern = ''; $result = preg_match_all( $pattern, 'code CODE Code', $matches ); if ($result && count($matches[0]) == 3) { echo 'SUCCESS'; } else { echo 'FAIL'; } 5 . 2
  6. 14.

    THE DOT Matches anything except a newline Matches anything if

    modifier "s" is set Escape . with \ to match an actual . 6 . 1
  7. 15.

    TASK: MATCH ANYTHING BUT NEWLINES Match the string cc.cc.cc.cc. "c"

    can by any character except a newline. $pattern = '()'; $result = preg_match_all( $pattern, "ab.cd.ef.gh\na\n.b\n.d\n.e\n\nabcdefghiklm", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 6 . 2
  8. 17.

    TASK: MATCH DIGITS AND NON-DIGITS The qualifier \d matches any

    digit (0-9). The qualifier \D matches anything except a digit. Match the a string with the structure xxXxxXxxxx. "x" represents a digit, "X" a non digit. $pattern = '()'; $result = preg_match_all( $pattern, "12.34.5678\n123456789\nab.cd.efgh", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 7 . 2
  9. 18.

    ANCHORS Anchor your pattern to the start and/or end of

    the subject. ^ - string start $ - string end 8 . 1
  10. 19.

    TASK: VALIDATE STRING START The ^ anchors the pattern to

    the string start. Validate that the string starts with a digit. $pattern = '()'; $subjects = [ '1. match' => TRUE, '2. match' => TRUE, '42' => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "end 3" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 2
  11. 20.

    TASK: VALIDATE STRING END The $ anchors the pattern to

    the string end. Validate that the string ends with a digit. $pattern = '()'; $subjects = [ 'match 1' => TRUE, 'match 2' => TRUE, '42' => TRUE, "21\n" => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "3 start" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 3
  12. 21.

    TASK: VALIDATE A GERMAN ZIP CODE The modifier D makes

    sure that a linefeed at the end of the subject is not ignored. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 4
  13. 22.

    MODIFIER AND ALTERNATIVES Modifier m - line anchors \A -

    string start \Z - string end, ignore linefeed \z - string end, recognize linefeed \b - word boundary 8 . 5
  14. 23.

    CHARACTER CLASSES Square Brackets: [] - for ranges ^ for

    negative matches many special characters lose function 9 . 1
  15. 24.

    TASK: MATCH VOWELS Match all the vowels (aeiou) in the

    string. $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 8) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 2
  16. 25.

    TASK: MATCH NON-VOWELS Match all the non-vowels in the string.

    $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 17) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 3
  17. 26.

    TASK: VALIDATE HEXADECIMAL BYTES Validate that the string consists of

    two characters. The characters can be digits or a letter between a and f. $pattern = '()'; $subjects = [ '01' => TRUE, '0f' => TRUE, 'FA' => TRUE, 'az' => FALSE, "foo" => FALSE, "123" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 9 . 4
  18. 27.

    QUANTIFIER How o en will it be matched? * -

    any count ? - maximum of 1 + - minimum of 1 {n} - exactly n {n,m} - minimum of n, maximum of m {n,} - minimum of n {0,m} - maximum of m 10 . 1
  19. 28.

    TASK: VALIDATE A GERMAN ZIP CODE The {n} syntax allows

    you to match a fixed repeat of qualifiers. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 2
  20. 29.

    TASK: VALIDATE A LANGUAGE CODE The {n,m} syntax allows you

    minimum and a maximum repetitions. Validate that the subject is an 2 or 3 letter language code. $pattern = '()'; $subjects = [ 'en' => TRUE, 'de' => TRUE, 'eng' => TRUE, 'deu' => TRUE, '123' => FALSE, "en­US" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 3
  21. 30.

    TASK: VALIDATE AN INTEGER ? matches one or none. +

    matches at least one repetition. Validate an integer including an optional leading sign $pattern = '()'; $subjects = [ '1' => TRUE, '123' => TRUE, '+123' => TRUE, '­456' => TRUE, '1.1' => FALSE, "abc" => FALSE, "123 456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 4
  22. 31.

    UNICODE The modifier u activates Unicode UTF-8 mode. \X -

    extended unicode grapheme sequence \p{ }, \p - character with unicode property \p{^ }, \P{^ } - character without unicode property \p{ } - character from script \x{ } - code point xx x xx xx script FFFF 11 . 1
  23. 32.

    TASK: MATCH UNICODE LETTERS Use the unicode property L to

    match any letter in the string "English, Русский, 中文". $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 16) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 2
  24. 33.

    TASK: MATCH CYRILLIC LETTERS Match any cyrillic letter in the

    subject. $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 7) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 3
  25. 34.

    GROUPS ( ) - captured group (?< > ) -

    named group (?: ) - group without capture ((?i) ), (?i: ) - group modifiers ... group_name ... ... ... ... 12 . 1
  26. 35.

    TASK: MATCH A DATE Match a date in the format

    "YYYY-MM-DD". Capture each part into a named group (year, month, day). $pattern = '()'; if ( preg_match($pattern, '2017­02­27', $match) && (isset($match['year']) && $match['year'] == '2017') && (isset($match['month']) && $match['month'] == '02') && (isset($match['day']) && $match['day'] == '27') ) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } 12 . 2
  27. 36.

    TASK: VALIDATE CONSECUTIVE UGHS Validate that the string contains 3

    consecutive "ugh"s. $pattern = ''; $subjects = [ 'ughughugh' => TRUE, 'ughughughugh' => TRUE, 'ughugahugh' => FALSE, "ughughugah" => FALSE, "ughughughugah" => TRUE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 12 . 3
  28. 37.
  29. 39.

    TASK: VALIDATE TITLE AND NAME Match strings that start with

    a title ('Mr.', 'Ms.', 'Mrs.'), followed by a space and a string that contains at least one letter. $pattern = '()'; $subjects = [ 'Mr. Doe' => TRUE, 'Mrs. Jane Doe' => TRUE, 'Ms. Marple' => TRUE, 'Mr. ' => FALSE, "Mrs. 1" => FALSE, "1. Mr. Doe" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 13 . 2
  30. 40.

    FORMAT AND COMMENT Modifier x allows formatting # - single

    line comment (?# ) - comment group \Q \E - remove special meaning ... ... 14 . 1
  31. 41.

    EXAMPLE: FORMAT AND COMMENT $pattern = '(/ (?:[a­zA­Z\\d_­]+\\.) #title (?<mode>media|download|thumb)\\.

    # mode (?:(?<preview>preview)\\.)? # is preview (?<media_uri> (?<id>[A­Fa­f\\d]{32}) #id (?:v(?<version>\\d+))? #version (?:\\.[a­zA­Z\\d]+)? #extension ) $)Dix'; 14 . 2
  32. 42.

    BACK REFERENCES \ , \g{ } - reference group by

    index (?P= ), \g{ } - reference group by name \g{ } - relative group reference 1 1 name name -1 15 . 1
  33. 43.

    TASK: VALIDATE DRUNKEN NUMBERS Validate strings that consist of the

    any count of same digit (11, 444, ...). $pattern = '()'; $subjects = [ '7' => TRUE, '11' => TRUE, '444' => TRUE, '8888' => TRUE, '12' => FALSE, "456" => FALSE, "ugh" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 15 . 2
  34. 45.

    TASK: VALIDATE IPV4 Define a template that matches number between

    0 and 255. Use the template to match an IP. $pattern = ''; $subjects = [ '127.0.0.1' => TRUE, '0.0.0.0' => TRUE, '255.255.255.0' => TRUE, '1.1.1.256' => FALSE, "1.1.1.a" => FALSE, "­1.1.1.1" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 16 . 2
  35. 46.

    PATTERN: IPV4 $pattern = '(^ (?:(?&number)\\.){3}(?&number) (?(DEFINE) (?<number> 25[0­5]| #

    250 ­ 255 2[0­4]\\d| # 200 ­ 249 1?\\d{1,2} # 0 ­ 199 ) ) $)Dx'; 16 . 3
  36. 47.

    ASSERTIONS (?= ), (?! ) - Lookahead (?<= ), (?<!

    ) - Lookbehind ... ... ... ... 17