Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced Regular Expressions in .NET

Advanced Regular Expressions in .NET

Not all regular expression engines are created equal. They all have differences and nuances. The .NET regex engine has some amazing advanced features that make it one of the most powerful implementations in existence today. Not all problems should be solved with regular expressions, but after this talk you will have to think harder to find one that can't.

Patrick Delancy

November 14, 2015
Tweet

More Decks by Patrick Delancy

Other Decks in Programming

Transcript

  1. NOTICE!!! This slide deck has been adapted from a presentation

    that was intended to be given live, in person…. like with a real person in front of real people. You know… breathing the same air and all that. The key points have been transcribed onto separate slides, so you still get some benefit from reading through it all, but you are still missing out on all of the great stories, witty banter, hilarious costumes, stunning arias … or something like that. If you REALLY want to get the most out of this presentation, go to patrickdelancy.com and ask him to come give it to your group!
  2. Then you can make a more intelligent decision about when

    you should and should not use Regex.
  3. Look Ahead \b\w+(?=\.) # match the word at end of

    each sentence # but don’t capture the period. See Dick. See Jane. See Dick and Jane run. Dick Jane run
  4. Look Behind (?<=\b19)\d{2}\b # match all years in the 1900’s

    # capturing only the 2-digit year 1842 1902 1776 1985 2003 1999 02 85 99
  5. Free Spacing (Ignore Pattern Whitespace) new Regex( @” \b[^@]+ #

    pattern can now span multiple lines @ [^\b]+\b # and include white space for readability ”, RegexOptions.IgnorePatternWhitespace);
  6. Less-Common Features ...in more advanced engines • Named Captures •

    Comments • Inline Directives • Conditional Alternation • Atomic Groups • Compiled Patterns • Unicode Categories and Named Character Blocks
  7. Comments ^.*@.*$ # comment to the end of the line

    ^.*@(?# this is an inline comment).*$
  8. Inline Directives John the (?ix) (?: wiser | better and

    greater | privy ) John the Wiser, John the BetterAndGreater, john the privy, John the Better and Greater John the Wiser John the BetterAndGreater
  9. var pattern = new Regex(@”a+h+!+”); return pattern.IsMatch(value); Compiled Patterns var

    pattern = @”a+h+!+”; return Regex .IsMatch(pattern, value);
  10. Unique Features ...in the .NET RegEx engine • Balancing Groups

    • Character Class Subtraction • Explicit Capture Only
  11. ^(?:[^{}]|(?<open>{)|(?<-open>}))*(?(open)(?!))$ { if (true) { return “A”; } else {

    return “B”; } } { if (true) { return “A”; } else { return “B”; } Balancing Groups
  12. ^(?<name>[^@\+]+(\+[^\+]+)?)@ (?<domain>(\w+)\.(com|net|org))$ [email protected] [email protected] [name] = e+mail [2] = +mail

    [domain] = ddress.com [4] = ddress [5] = com Explicit Capture Only (?n)^(?<name>[^@\+]+(\+[^\+]+)?)@ (?<domain>(\w+)\.(com|net|org))$ [email protected] [email protected] [name] = e+mail [domain] = ddress.com
  13. Some Additional Resources • https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines - This is a little

    outdated, but still a good overview of how Regex implementations vary. • https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#SupportedNamedBlocks – Here is a reference of all of the named Unicode blocks that .NET supports in Regex. Linked here because I told you I would : ) • http://www.regular-expressions.info/refflavors.html - This is a very comprehensive reference for many common Regex engines. Some content may be out of date as new versions of each platform are released. • http://www.regexplanet.com/ - An online pattern tester. Not the best interface, but very capable and has some nice features.