PCRE With PHP

PCRE With PHP

PHP Benelux 2015

3f2fb8bbcd44609346e1cc0c06d0a39b?s=128

Thomas Weinert

January 24, 2015
Tweet

Transcript

  1. PCRE WITH PHP @Thomas Weinert

  2. ABOUT PHP functions and classes PCRE syntax

  3. WARNING! Slides contain a lot of example source Most of

    the examples are really stupid
  4. PREG_MATCH() Pattern Subject Matches Flags Offset

  5. PREG_MATCH() EXAMPLE preg_match('(a.?)', 'abac', $match); var_dump($match); array(1) { [0]=> string(2)

    "ab" }
  6. FLAG: PREG_OFFSET_CAPTURE preg_match('(a.?)', 'abac', $match, PREG_OFFSET_CAPTURE, 2); var_dump($match); array(1) {

    [0]=> array(2) { [0]=> string(2) "ac" [1]=> int(2) } }
  7. OFFSET $subject = 'aa ab ac ad'; $offset = 0;

    $length = strlen($subject); while ($offset < $length) { if (preg_match('(a.)', $subject, $match, PREG_OFFSET_CAPTURE, $offset)) { $offset = $match[0][1] + strlen($match[0][0]); var_dump($match[0][0]); } else { break; } } string(2) "aa" string(2) "ab" string(2) "ac" string(2) "ad"
  8. PATTERN Delimiter Expression Modifiers /expression/x

  9. DELIMITER Any non alphanumeric character Escaping Special meaning Brackets

  10. DELIMITER: BRACKETS preg_match('((one)(two))', 'onetwo', $match); var_dump($match); array(3) { [0]=> string(6)

    "onetwo" [1]=> string(3) "one" [2]=> string(3) "two" }
  11. PATTERN String Escaping $pattern = '(\\\n)'; $text = <<<'TEXT' foo\nbar

    TEXT; preg_match($pattern, $text, $match); var_dump($pattern, $text, $match); string(5) "(\\n)" string(8) "foo\nbar" array(1) { [0]=> string(2) "\n" }
  12. MODIFIERS x - PCRE_EXTENDED u - PCRE_UTF8 D - PCRE_DOLLAR_ENDONLY

    s - PCRE_DOTALL m - PCRE_MULTILINE i - PCRE_CASELESS ...
  13. PCRE_EXTENDED $pattern = <<<'REGEX' (^ (d‐)? # optional country prefix

    (\d{5}) # german zip code $)Dix REGEX; var_dump((bool)preg_match($pattern, 'D‐50670')); bool(true)
  14. PCRE_UTF8 (^.*$)u Pattern and subject need to be valid UTF-8!

    UTF-8 1 to 4 (5 and 6 are invalid)
  15. PCRE_DOLLAR_ENDONLY $examples = [ ["(^\\d+$)", "123"], ["(^\\d+$)", "123\n"], ["(^\\d+$)D", "123\n"],

    ["(\\A\\d+\\G)", "123\n"] ]; foreach ($examples as $example) { var_dump((bool)preg_match($example[0], $example[1], $match)); } bool(true) bool(true) bool(false) bool(false)
  16. PCRE_DOTALL $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)s", "123\n456"]

    ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(7) "123 456" }
  17. PCRE_MULTILINE $examples = [ ["(^.+$)", "123"], ["(^.+$)", "123\n456"], ["(^.+$)m", "123\n456"]

    ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "123" } array(0) { } array(1) { [0]=> string(3) "123" }
  18. PCRE_CASELESS $examples = [ ["(foo)", "foo"], ["(foo)", "FOO"], ["(foo)i", "FOO"]

    ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "foo" } array(0) { } array(1) { [0]=> string(3) "FOO" }
  19. PREG_MATCH_ALL() $subject = 'aa ab ac ad'; preg_match_all('(a.)', $subject, $match);

    var_dump($match); array(1) { [0]=> array(4) { [0]=> string(2) "aa" [1]=> string(2) "ab" [2]=> string(2) "ac" [3]=> string(2) "ad" } }
  20. PREG_PATTERN_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match); var_dump($match); array(2)

    { [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(2) "ac" } [1]=> array(2) { [0]=> string(1) "b" [1]=> string(1) "c" } }
  21. PREG_SET_ORDER $subject = 'ab ac'; preg_match_all('(a(.))', $subject, $match, PREG_SET_ORDER); var_dump($match);

    array(2) { [0]=> array(2) { [0]=> string(2) "ab" [1]=> string(1) "b" } [1]=> array(2) { [0]=> string(2) "ac" [1]=> string(1) "c" } }
  22. PREG_REPLACE() var_dump( preg_replace("(')", '"', "'Hello'") ); string(7) ""Hello""

  23. ARRAY ARGUMENTS var_dump( preg_replace(['(\\\r)', '(\\\n)'], ['CR', 'LF'], '\\r and \\n')

    ); string(9) "CR and LF"
  24. REFERENCING SUBPATTERNS var_dump( preg_replace('(a(.))', 'a#${1}#', 'ab ac') ); string(9) "a#b#

    a#c#" \\1 $1 ${1}
  25. PREG_REPLACE_CALLBACK() No need for modifier "e" (PREG_REPLACE_EVAL) var_dump( preg_replace_callback( '(a(.))',

    function ($match) { return strtoupper($match[1]); }, 'ab ac' ) ); string(3) "B C"
  26. FUNCTOR class Replacer { public function __invoke($match) { return strtoupper($match[1]);

    } } var_dump( preg_replace_callback( '(a(.))', new Replacer(), 'ab ac' ) );
  27. PREG_SPLIT() $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,

    $subject); var_dump($match); array(5) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(0) "" [3]=> string(5) "three" [4]=> string(4) "four" }
  28. PREG_SPLIT_NO_EMPTY $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree\r\nfour"; $match = preg_split($pattern,

    $subject, ‐1, PREG_SPLIT_NO_EMPTY); var_dump($match); array(4) { [0]=> string(3) "one" [1]=> string(3) "two" [2]=> string(5) "three" [3]=> string(4) "four" }
  29. PREG_SPLIT_OFFSET_CAPTURE $pattern = '(\\R)u'; $subject = "one\rtwo\n\nthree"; $flags = PREG_SPLIT_NO_EMPTY

    | PREG_SPLIT_OFFSET_CAPTURE; $match = preg_split($pattern, $subject, ‐1, $flags); var_dump($match); array(3) { [0]=> array(2) { [0]=> string(3) "one" [1]=> int(0) } [1]=> array(2) { [0]=> string(3) "two" [1]=> int(4) } [2]=> array(2) { [0]=> string(5) "three" [1]=> int(9) } }
  30. PREG_SPLIT_DELIM_CAPTURE $highlights = ['small' => '*', 'short' => '_']; $pattern

    = '((small|short))u'; $subject = "A small, short example"; $match = preg_split($pattern, $subject, ‐1, PREG_SPLIT_DELIM_CAPTURE); foreach ($match as $part) { if (isset($highlights[$part])) { echo $highlights[$part], $part, $highlights[$part]; } else { echo $part; } } A *small*, _short_ example
  31. PREG_QUOTE() var_dump('('.preg_quote('/.*/').')'); string(8) "(/\.\*/)"

  32. REGEXITERATOR $data = new ArrayIterator(['aa', 'ab']); $iterator = new RegexIterator(

    $data, '(.(.))', RegexIterator::REPLACE ); $iterator‐>replacement = '$1'; var_dump(iterator_to_array($iterator)); array(2) { [0] => string(1) "a" [1] => string(1) "b" }
  33. REGEXITERATOR MODES MATCH GET_MATCH ALL_MATCHES SPLIT REPLACE USE_KEY

  34. UNICODE Modifier u All: \X Token: \x{A9} Category: \p{L} Negation:

    \P{L}, \p{^L} Scripts: \p{Hangul} Blocks: \p{Arrows}
  35. UNICODE EXAMPLE $data = <<<'DATA' English German 한국어 日本語 DATA;

    preg_match_all('(\\pL+)u', $data, $match); var_dump($match[0]); array(4) { [0] => string(7) "English" [1] => string(6) "German" [2] => string(9) "한국어" [3] => string(9) "日本語" }
  36. NON CATCHING SUBPATTERNS preg_match('((?:one)(two))', 'onetwo', $match); var_dump($match); array(2) { [0]=>

    string(6) "onetwo" [1]=> string(3) "two" }
  37. SUBPATTERN MODIFIERS (?i‐sm) $examples = [ ["((?i)foo)", "FOO"], ["((?‐i)foo)i", "FOO"]

    ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(3) "FOO" } array(0) { }
  38. NAMED SUBPATTERNS $pattern = "(^ (?P<year>\d{4}) (?:‐(?<month>\d{1,2}))? (?:‐(?'day'\d{1,2}))? )x"; preg_match($pattern,

    "2015‐01‐24", $match); var_dump($match);</month></year> array(7) { [0]=> string(10) "2015‐01‐24" ["year"]=> string(4) "2015" [1]=> string(4) "2015" ["month"]=> string(2) "01" [2]=> string(2) "01" ["day"]=> string(2) "24" [3]=> string(2) "24" }
  39. PRE-DEFINED SUBROUTINES $pattern = "( ^ (?&number) (?:\\.(?&number)){3} $ (?(DEFINE)

    (?'number'25[0‐5]|2[1‐4]\d|1\d{2}|\d{1,2}) ) )x"; var_dump((bool)preg_match($pattern, "127.0.0.1", $match)); var_dump((bool)preg_match($pattern, "355.0.0.1", $match)); bool(true) bool(false)
  40. ASSERTIONS Look Around Look Ahead Look Behind

  41. LOOK AHEAD $examples = [ ["(h(?=e))", "hello"], ["(h(?=e)llo)", "hello"], ["(h(?=e).llo)",

    "hello"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "h" } array(0) { } array(1) { [0]=> string(5) "hello" }
  42. LOOK AHEAD - NEGATION $examples = [ ["(h(?!e))", "hello"], ["(h(?!e))",

    "hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(0) { } array(1) { [0]=> string(1) "h" }
  43. LOOK BEHIND $examples = [ ["((?<=h).)", "hello"], ["((?<!h).)", "hallo"] ];

    foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "e" } array(1) { [0]=> string(1) "h" }
  44. LOOK BEHIND - ALTERNATIVES $examples = [ ["((?<=e|ha|.{2})l)", "hello"], ["((?<=e|ha)l)",

    "hallo"], ["((?<=e|.{2})l)", "hallo"] ]; foreach ($examples as $example) { preg_match($example[0], $example[1], $match); var_dump($match); } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" } array(1) { [0]=> string(1) "l" }
  45. LOOK BEHIND - UNKNOWN LENGTH preg_match("((?<=.{2,})l)", 'hello', $match); Warning: preg_match():

    Compilation failed: lookbehind assertion is not fixed length at offset 9 in /tmp... on line 2
  46. CONDITIONALS $pattern = '((?<quote>[\'"])?(?(quote).*?\\k<quote>|\\w+))'; $data = ['foo', '"foo"', "'foo'", 'foo

    bar', '"foo bar"']; foreach ($data as $subject) { if (preg_match($pattern, $subject, $match)) { echo $match[0], "\n"; } }</quote></quote> foo "foo" 'foo' foo "foo bar"
  47. RECURSIONS $pattern = <<<'PCRE' ( \( ( (?>[^()]+) | (?R)

    )* \) )Ux PCRE; preg_match_all($pattern, '(ab(cd)ef)(gh)', $match); var_dump($match); array(2) { [0] => array(2) { [0] => string(10) "(ab(cd)ef)" [1] => string(4) "(gh)" } [1] => array(2) { [0] => string(1) "f" [1] => string(1) "h" } }
  48. START OF PATTERN MODIFIERS (*UTF), (*UTF8), (*UTF16), (*UTF32) (*UTF)(*UCP) =

    u (*CR), (*LF), (*CRLF), (*ANYCRLF), (*ANY) (*BSR_ANYCRLF), (*BSR_UNICODE) - \R (*LIMIT_MATCH=x), (*LIMIT_RECURSION=d) (*NO_AUTO_POSSESS), (*NO_START_OPT) (*NOTEMPTY), (*NOTEMPTY_ATSTART)
  49. CONTROL VERBS (SKIP*)(?!) (PRUNE*) (THEN*) (COMMIT*) (ACCEPT*) http://perldoc.perl.org/perlre .html#Special-Backtracking-Control-Verbs

  50. REGEX101.COM

  51. VERSIONS PCRE2 10.0 2015-01-05 PCRE 8.36 2014-09-26 3V4L.ORG PHP7, HHVM

    >= 3.3: 8.35 2014-04-04 PHP >= 5.5.10: 8.34 2013-12-15
  52. LINKS http://www.rexegg.com/ http://www.regular-expressions.info/ https://www.regex101.com/

  53. THANKS