PCRE - Matching Patterns

PCRE - Matching Patterns

PCRE Inroduction, Interactive Workshop, A wireless keyboard is given around and attendees solve the tasks on the slides.

3f2fb8bbcd44609346e1cc0c06d0a39b?s=128

Thomas Weinert

March 04, 2017
Tweet

Transcript

  1. PCRE MATCHING PATTERNS @ThomasWeinert 1

  2. TASK REPOSITORY bitbucket.org/thomasweinert/workshop-pcre-tasks 2

  3. MATCHING: PHP FUNCTIONS preg_match preg_match_all 3 . 1

  4. PREG_MATCH Find first match preg_match($pattern, $subject); preg_match($pattern, $subject, $matches); preg_match($pattern,

    $subject, $matches, $flags, $offset); 3 . 2
  5. PREG_MATCH - RETURN VALUES Match count - 0 or 1

    FALSE for errors 3 . 3
  6. PREG_MATCH_ALL Find all matches preg_match_all($pattern, $subject); preg_match_all($pattern, $subject, $matches); preg_match_all($pattern,

    $subject, $matches, $flags, $offset); 3 . 4
  7. PREG_MATCH_ALL - RETURN VALUES Match count - 0 to n

    FALSE for errors 3 . 5
  8. $MATCHES preg_match() - array, matched groups preg_match_all() - array with

    PREG_PATTERN_ORDER - an array for each group in the pattern PREG_SET_ORDER - an array for each match 3 . 6
  9. PATTERN ARGUMENT /string/u │ │ │└ Modifier │ │ └

    Delimiter │ └ Pattern └ Delimiter 4 . 1
  10. TASK: MATCH A STRING Match the string nevercodealone. This is

    case sensitive. $pattern = ''; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 4 . 2
  11. TRY DIFFERENT DELIMITERS ASCII Letters and digits do NOT work.

    Brackets! 4 . 3
  12. MODIFIER U - ungreedy mode i - case insensitive u

    - utf-8 mode x - modifies whitespace behaviour s - modifies dot behaviour m - modifies anchor behaviour D - modifies behaviour of $ anchor ... 5 . 1
  13. TASK: MATCH A STRING CASE INSENSITIVE The modifier i allows

    case insensitive matches Match the string code. This is case insensitive. $pattern = ''; $result = preg_match_all( $pattern, 'code CODE Code', $matches ); if ($result && count($matches[0]) == 3) { echo 'SUCCESS'; } else { echo 'FAIL'; } 5 . 2
  14. THE DOT Matches anything except a newline Matches anything if

    modifier "s" is set Escape . with \ to match an actual . 6 . 1
  15. TASK: MATCH ANYTHING BUT NEWLINES Match the string cc.cc.cc.cc. "c"

    can by any character except a newline. $pattern = '()'; $result = preg_match_all( $pattern, "ab.cd.ef.gh\na\n.b\n.d\n.e\n\nabcdefghiklm", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 6 . 2
  16. QUALIFIER What will be matched? Define bytes/characters that will be

    matched. 7 . 1
  17. TASK: MATCH DIGITS AND NON-DIGITS The qualifier \d matches any

    digit (0-9). The qualifier \D matches anything except a digit. Match the a string with the structure xxXxxXxxxx. "x" represents a digit, "X" a non digit. $pattern = '()'; $result = preg_match_all( $pattern, "12.34.5678\n123456789\nab.cd.efgh", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 7 . 2
  18. ANCHORS Anchor your pattern to the start and/or end of

    the subject. ^ - string start $ - string end 8 . 1
  19. TASK: VALIDATE STRING START The ^ anchors the pattern to

    the string start. Validate that the string starts with a digit. $pattern = '()'; $subjects = [ '1. match' => TRUE, '2. match' => TRUE, '42' => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "end 3" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 2
  20. TASK: VALIDATE STRING END The $ anchors the pattern to

    the string end. Validate that the string ends with a digit. $pattern = '()'; $subjects = [ 'match 1' => TRUE, 'match 2' => TRUE, '42' => TRUE, "21\n" => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "3 start" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 3
  21. TASK: VALIDATE A GERMAN ZIP CODE The modifier D makes

    sure that a linefeed at the end of the subject is not ignored. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 4
  22. MODIFIER AND ALTERNATIVES Modifier m - line anchors \A -

    string start \Z - string end, ignore linefeed \z - string end, recognize linefeed \b - word boundary 8 . 5
  23. CHARACTER CLASSES Square Brackets: [] - for ranges ^ for

    negative matches many special characters lose function 9 . 1
  24. TASK: MATCH VOWELS Match all the vowels (aeiou) in the

    string. $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 8) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 2
  25. TASK: MATCH NON-VOWELS Match all the non-vowels in the string.

    $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 17) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 3
  26. TASK: VALIDATE HEXADECIMAL BYTES Validate that the string consists of

    two characters. The characters can be digits or a letter between a and f. $pattern = '()'; $subjects = [ '01' => TRUE, '0f' => TRUE, 'FA' => TRUE, 'az' => FALSE, "foo" => FALSE, "123" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 9 . 4
  27. QUANTIFIER How o en will it be matched? * -

    any count ? - maximum of 1 + - minimum of 1 {n} - exactly n {n,m} - minimum of n, maximum of m {n,} - minimum of n {0,m} - maximum of m 10 . 1
  28. TASK: VALIDATE A GERMAN ZIP CODE The {n} syntax allows

    you to match a fixed repeat of qualifiers. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 2
  29. TASK: VALIDATE A LANGUAGE CODE The {n,m} syntax allows you

    minimum and a maximum repetitions. Validate that the subject is an 2 or 3 letter language code. $pattern = '()'; $subjects = [ 'en' => TRUE, 'de' => TRUE, 'eng' => TRUE, 'deu' => TRUE, '123' => FALSE, "en­US" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 3
  30. TASK: VALIDATE AN INTEGER ? matches one or none. +

    matches at least one repetition. Validate an integer including an optional leading sign $pattern = '()'; $subjects = [ '1' => TRUE, '123' => TRUE, '+123' => TRUE, '­456' => TRUE, '1.1' => FALSE, "abc" => FALSE, "123 456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 4
  31. UNICODE The modifier u activates Unicode UTF-8 mode. \X -

    extended unicode grapheme sequence \p{ }, \p - character with unicode property \p{^ }, \P{^ } - character without unicode property \p{ } - character from script \x{ } - code point xx x xx xx script FFFF 11 . 1
  32. TASK: MATCH UNICODE LETTERS Use the unicode property L to

    match any letter in the string "English, Русский, 中文". $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 16) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 2
  33. TASK: MATCH CYRILLIC LETTERS Match any cyrillic letter in the

    subject. $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 7) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 3
  34. GROUPS ( ) - captured group (?< > ) -

    named group (?: ) - group without capture ((?i) ), (?i: ) - group modifiers ... group_name ... ... ... ... 12 . 1
  35. TASK: MATCH A DATE Match a date in the format

    "YYYY-MM-DD". Capture each part into a named group (year, month, day). $pattern = '()'; if ( preg_match($pattern, '2017­02­27', $match) && (isset($match['year']) && $match['year'] == '2017') && (isset($match['month']) && $match['month'] == '02') && (isset($match['day']) && $match['day'] == '27') ) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } 12 . 2
  36. TASK: VALIDATE CONSECUTIVE UGHS Validate that the string contains 3

    consecutive "ugh"s. $pattern = ''; $subjects = [ 'ughughugh' => TRUE, 'ughughughugh' => TRUE, 'ughugahugh' => FALSE, "ughughugah" => FALSE, "ughughughugah" => TRUE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 12 . 3
  37. 12 . 4

  38. ALTERNATIVES | - alternative patterns 13 . 1

  39. TASK: VALIDATE TITLE AND NAME Match strings that start with

    a title ('Mr.', 'Ms.', 'Mrs.'), followed by a space and a string that contains at least one letter. $pattern = '()'; $subjects = [ 'Mr. Doe' => TRUE, 'Mrs. Jane Doe' => TRUE, 'Ms. Marple' => TRUE, 'Mr. ' => FALSE, "Mrs. 1" => FALSE, "1. Mr. Doe" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 13 . 2
  40. FORMAT AND COMMENT Modifier x allows formatting # - single

    line comment (?# ) - comment group \Q \E - remove special meaning ... ... 14 . 1
  41. EXAMPLE: FORMAT AND COMMENT $pattern = '(/ (?:[a­zA­Z\\d_­]+\\.) #title (?<mode>media|download|thumb)\\.

    # mode (?:(?<preview>preview)\\.)? # is preview (?<media_uri> (?<id>[A­Fa­f\\d]{32}) #id (?:v(?<version>\\d+))? #version (?:\\.[a­zA­Z\\d]+)? #extension ) $)Dix'; 14 . 2
  42. BACK REFERENCES \ , \g{ } - reference group by

    index (?P= ), \g{ } - reference group by name \g{ } - relative group reference 1 1 name name -1 15 . 1
  43. TASK: VALIDATE DRUNKEN NUMBERS Validate strings that consist of the

    any count of same digit (11, 444, ...). $pattern = '()'; $subjects = [ '7' => TRUE, '11' => TRUE, '444' => TRUE, '8888' => TRUE, '12' => FALSE, "456" => FALSE, "ugh" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 15 . 2
  44. TEMPLATES (?(DEFINE)(?< > )) (?& ) name ... name 16

    . 1
  45. TASK: VALIDATE IPV4 Define a template that matches number between

    0 and 255. Use the template to match an IP. $pattern = ''; $subjects = [ '127.0.0.1' => TRUE, '0.0.0.0' => TRUE, '255.255.255.0' => TRUE, '1.1.1.256' => FALSE, "1.1.1.a" => FALSE, "­1.1.1.1" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 16 . 2
  46. PATTERN: IPV4 $pattern = '(^ (?:(?&number)\\.){3}(?&number) (?(DEFINE) (?<number> 25[0­5]| #

    250 ­ 255 2[0­4]\\d| # 200 ­ 249 1?\\d{1,2} # 0 ­ 199 ) ) $)Dx'; 16 . 3
  47. ASSERTIONS (?= ), (?! ) - Lookahead (?<= ), (?<!

    ) - Lookbehind ... ... ... ... 17
  48. LINKS http://www.php.net/manual/de/pcre.pattern.php https://www.hackerrank.com/ https://regex101.com/ http://www.regular-expressions.info/ http://www.rexegg.com/ 18