Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PCRE - Matching Patterns

PCRE - Matching Patterns

PCRE Inroduction, Interactive Workshop, A wireless keyboard is given around and attendees solve the tasks on the slides.

Avatar for Thomas Weinert

Thomas Weinert

March 04, 2017
Tweet

More Decks by Thomas Weinert

Other Decks in Programming

Transcript

  1. $MATCHES preg_match() - array, matched groups preg_match_all() - array with

    PREG_PATTERN_ORDER - an array for each group in the pattern PREG_SET_ORDER - an array for each match 3 . 6
  2. PATTERN ARGUMENT /string/u │ │ │└ Modifier │ │ └

    Delimiter │ └ Pattern └ Delimiter 4 . 1
  3. TASK: MATCH A STRING Match the string nevercodealone. This is

    case sensitive. $pattern = ''; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 4 . 2
  4. MODIFIER U - ungreedy mode i - case insensitive u

    - utf-8 mode x - modifies whitespace behaviour s - modifies dot behaviour m - modifies anchor behaviour D - modifies behaviour of $ anchor ... 5 . 1
  5. TASK: MATCH A STRING CASE INSENSITIVE The modifier i allows

    case insensitive matches Match the string code. This is case insensitive. $pattern = ''; $result = preg_match_all( $pattern, 'code CODE Code', $matches ); if ($result && count($matches[0]) == 3) { echo 'SUCCESS'; } else { echo 'FAIL'; } 5 . 2
  6. THE DOT Matches anything except a newline Matches anything if

    modifier "s" is set Escape . with \ to match an actual . 6 . 1
  7. TASK: MATCH ANYTHING BUT NEWLINES Match the string cc.cc.cc.cc. "c"

    can by any character except a newline. $pattern = '()'; $result = preg_match_all( $pattern, "ab.cd.ef.gh\na\n.b\n.d\n.e\n\nabcdefghiklm", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 6 . 2
  8. TASK: MATCH DIGITS AND NON-DIGITS The qualifier \d matches any

    digit (0-9). The qualifier \D matches anything except a digit. Match the a string with the structure xxXxxXxxxx. "x" represents a digit, "X" a non digit. $pattern = '()'; $result = preg_match_all( $pattern, "12.34.5678\n123456789\nab.cd.efgh", $matches ); if ($result && count($matches[0]) == 1) { echo 'SUCCESS'; } else { echo 'FAIL'; } 7 . 2
  9. ANCHORS Anchor your pattern to the start and/or end of

    the subject. ^ - string start $ - string end 8 . 1
  10. TASK: VALIDATE STRING START The ^ anchors the pattern to

    the string start. Validate that the string starts with a digit. $pattern = '()'; $subjects = [ '1. match' => TRUE, '2. match' => TRUE, '42' => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "end 3" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 2
  11. TASK: VALIDATE STRING END The $ anchors the pattern to

    the string end. Validate that the string ends with a digit. $pattern = '()'; $subjects = [ 'match 1' => TRUE, 'match 2' => TRUE, '42' => TRUE, "21\n" => TRUE, 'no match' => FALSE, "a 345 b" => FALSE, "3 start" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 3
  12. TASK: VALIDATE A GERMAN ZIP CODE The modifier D makes

    sure that a linefeed at the end of the subject is not ignored. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 8 . 4
  13. MODIFIER AND ALTERNATIVES Modifier m - line anchors \A -

    string start \Z - string end, ignore linefeed \z - string end, recognize linefeed \b - word boundary 8 . 5
  14. CHARACTER CLASSES Square Brackets: [] - for ranges ^ for

    negative matches many special characters lose function 9 . 1
  15. TASK: MATCH VOWELS Match all the vowels (aeiou) in the

    string. $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 8) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 2
  16. TASK: MATCH NON-VOWELS Match all the non-vowels in the string.

    $pattern = '()'; $result = preg_match_all( $pattern, 'https://nevercodealone.de', $matches ); if ($result && count($matches[0]) == 17) { echo 'SUCCESS'; } else { echo 'FAIL'; } 9 . 3
  17. TASK: VALIDATE HEXADECIMAL BYTES Validate that the string consists of

    two characters. The characters can be digits or a letter between a and f. $pattern = '()'; $subjects = [ '01' => TRUE, '0f' => TRUE, 'FA' => TRUE, 'az' => FALSE, "foo" => FALSE, "123" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 9 . 4
  18. QUANTIFIER How o en will it be matched? * -

    any count ? - maximum of 1 + - minimum of 1 {n} - exactly n {n,m} - minimum of n, maximum of m {n,} - minimum of n {0,m} - maximum of m 10 . 1
  19. TASK: VALIDATE A GERMAN ZIP CODE The {n} syntax allows

    you to match a fixed repeat of qualifiers. Validate that the subject is a German zip code. It consists of 5 digits. $pattern = '()'; $subjects = [ '01234' => TRUE, '50670' => TRUE, '40213' => TRUE, 'abcdef' => FALSE, "50670\n" => FALSE, "123456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 2
  20. TASK: VALIDATE A LANGUAGE CODE The {n,m} syntax allows you

    minimum and a maximum repetitions. Validate that the subject is an 2 or 3 letter language code. $pattern = '()'; $subjects = [ 'en' => TRUE, 'de' => TRUE, 'eng' => TRUE, 'deu' => TRUE, '123' => FALSE, "en­US" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 3
  21. TASK: VALIDATE AN INTEGER ? matches one or none. +

    matches at least one repetition. Validate an integer including an optional leading sign $pattern = '()'; $subjects = [ '1' => TRUE, '123' => TRUE, '+123' => TRUE, '­456' => TRUE, '1.1' => FALSE, "abc" => FALSE, "123 456" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 10 . 4
  22. UNICODE The modifier u activates Unicode UTF-8 mode. \X -

    extended unicode grapheme sequence \p{ }, \p - character with unicode property \p{^ }, \P{^ } - character without unicode property \p{ } - character from script \x{ } - code point xx x xx xx script FFFF 11 . 1
  23. TASK: MATCH UNICODE LETTERS Use the unicode property L to

    match any letter in the string "English, Русский, 中文". $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 16) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 2
  24. TASK: MATCH CYRILLIC LETTERS Match any cyrillic letter in the

    subject. $pattern = '()'; $result = preg_match_all( $pattern, 'English, Русский, 中文', $matches ); if ($result && count($matches[0]) == 7) { echo 'SUCCESS'; } else { echo 'FAIL'; } 11 . 3
  25. GROUPS ( ) - captured group (?< > ) -

    named group (?: ) - group without capture ((?i) ), (?i: ) - group modifiers ... group_name ... ... ... ... 12 . 1
  26. TASK: MATCH A DATE Match a date in the format

    "YYYY-MM-DD". Capture each part into a named group (year, month, day). $pattern = '()'; if ( preg_match($pattern, '2017­02­27', $match) && (isset($match['year']) && $match['year'] == '2017') && (isset($match['month']) && $match['month'] == '02') && (isset($match['day']) && $match['day'] == '27') ) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } 12 . 2
  27. TASK: VALIDATE CONSECUTIVE UGHS Validate that the string contains 3

    consecutive "ugh"s. $pattern = ''; $subjects = [ 'ughughugh' => TRUE, 'ughughughugh' => TRUE, 'ughugahugh' => FALSE, "ughughugah" => FALSE, "ughughughugah" => TRUE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 12 . 3
  28. TASK: VALIDATE TITLE AND NAME Match strings that start with

    a title ('Mr.', 'Ms.', 'Mrs.'), followed by a space and a string that contains at least one letter. $pattern = '()'; $subjects = [ 'Mr. Doe' => TRUE, 'Mrs. Jane Doe' => TRUE, 'Ms. Marple' => TRUE, 'Mr. ' => FALSE, "Mrs. 1" => FALSE, "1. Mr. Doe" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 13 . 2
  29. FORMAT AND COMMENT Modifier x allows formatting # - single

    line comment (?# ) - comment group \Q \E - remove special meaning ... ... 14 . 1
  30. EXAMPLE: FORMAT AND COMMENT $pattern = '(/ (?:[a­zA­Z\\d_­]+\\.) #title (?<mode>media|download|thumb)\\.

    # mode (?:(?<preview>preview)\\.)? # is preview (?<media_uri> (?<id>[A­Fa­f\\d]{32}) #id (?:v(?<version>\\d+))? #version (?:\\.[a­zA­Z\\d]+)? #extension ) $)Dix'; 14 . 2
  31. BACK REFERENCES \ , \g{ } - reference group by

    index (?P= ), \g{ } - reference group by name \g{ } - relative group reference 1 1 name name -1 15 . 1
  32. TASK: VALIDATE DRUNKEN NUMBERS Validate strings that consist of the

    any count of same digit (11, 444, ...). $pattern = '()'; $subjects = [ '7' => TRUE, '11' => TRUE, '444' => TRUE, '8888' => TRUE, '12' => FALSE, "456" => FALSE, "ugh" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 15 . 2
  33. TASK: VALIDATE IPV4 Define a template that matches number between

    0 and 255. Use the template to match an IP. $pattern = ''; $subjects = [ '127.0.0.1' => TRUE, '0.0.0.0' => TRUE, '255.255.255.0' => TRUE, '1.1.1.256' => FALSE, "1.1.1.a" => FALSE, "­1.1.1.1" => FALSE ]; foreach ($subjects as $subject => $shouldMatch) { if ($shouldMatch == preg_match($pattern, $subject)) { echo "SUCCESS\n"; } else { echo "FAIL\n"; } } 16 . 2
  34. PATTERN: IPV4 $pattern = '(^ (?:(?&number)\\.){3}(?&number) (?(DEFINE) (?<number> 25[0­5]| #

    250 ­ 255 2[0­4]\\d| # 200 ­ 249 1?\\d{1,2} # 0 ­ 199 ) ) $)Dx'; 16 . 3
  35. ASSERTIONS (?= ), (?! ) - Lookahead (?<= ), (?<!

    ) - Lookbehind ... ... ... ... 17