up with 10 ways of matching your assigned regex? (try at regex101.com if you aren’t sure what will match) • One or more x’s • Zero or more x’s • Two x’s? Note: for this game, you can only have two x’s in each regex
form \D Not digit [^0-9] \S Not whitespace [^\s] \W Not word char [^a-zA-Z0-9] [1-3[x-z]] Union [1-3x-z] [[m-p]&&[l-n]] Intersection [mn] [m-p&&[^o]] Subtraction [mnp] Clarity Understanding
pattern = Pattern.compile( “[0-9]{3}-[0-9]{3}-[0-9]{4}"); var matcher = pattern.matcher(phoneNumbers); while (matcher.find()) System.out.println(matcher.group()); Match a Pattern 19 You promised readable regex!
areaCodeGroup = "(\\d{3})"; var threeDigits = "\\d{3}"; var fourDigits = "\\d{4}"; var dash = "-"; var regex = areaCodeGroup + dash + threeDigits + dash + fourDigits; var pattern = Pattern.compile(regex); var matcher = pattern.matcher(phoneNumbers); while (matcher.find()) System.out.println(matcher.group(1)); Groups 22
var pattern = Pattern.compile(regex); var matcher = pattern.matcher(numbers); while (matcher.find()) { System.out.format("%s %s ", matcher.group(), matcher.group(0)); System.out.format("%s %s ", matcher.group(1), matcher.group(2)); System.out.format("%s %s", matcher.group(3), matcher.group(4)); } What is the output? 23 012 012 01 0 Index out of bounds: no group 4
= "(?<areaCode>\\d{3})"; var threeDigits = "\\d{3}"; var fourDigits = "\\d{4}"; var dash = "-"; var regex = areaCodeGroup + dash + threeDigits + dash + fourDigits; var pattern = Pattern.compile(regex); var matcher = pattern.matcher(phoneNumbers); while (matcher.find()) System.out.println( matcher.group("areaCode")); Named Capturing Groups 25
string.replaceAll("^\\w+", ""); string = string.replaceAll("\\w+$", ""); string = string.strip(); System.out.println(string); What does this print? 28 High
= "\\b\\w+"; string = string.replaceAll( boundaryAndWord, ""); string = string.strip(); System.out.println(string); What about now? 30 Blank. Both start of string and spaces are boundaries
insensitive ASCII (?m) MULTILINE ^ and $ match line breaks (?s) DOTALL . matches line break (?d) UNIX_LINES Only matches \n (?x) COMMENTS Ignores whitespace and # to end of line + Unicode ones
var regex = fiveDigits + optionalFourDigitSuffix; var pattern = Pattern.compile(regex); var regex = """ \\d{5} # five digits (-\\d{4})? # optional four digits """; var pattern = Pattern.compile(regex, Pattern.COMMENTS); Comments 34 Which is more readable? When would the other be?
text = "* -aa- -b- *"; var pattern = Pattern.compile("-([a-z]+)-"); var matcher = pattern.matcher(text); var builder = new StringBuilder(); while(matcher.find()) matcher.appendReplacement( builder, "x"); System.out.println(builder); * x x
text = "* -aa- -b- *"; var pattern = Pattern.compile("-([a-z]+)-"); var matcher = pattern.matcher(text); var builder = new StringBuilder(); while(matcher.find()) matcher.appendReplacement( builder, "x"); matcher.appendTail(builder); System.out.println(builder); * x x *
"* -aa- -b- *"; var pattern = Pattern.compile("-([a-z]+)-"); var matcher = pattern.matcher(text); var builder = new StringBuilder(); while(matcher.find()) matcher.appendReplacement( builder, "$"); System.out.println(builder); IllegalArgumentException: Illegal group reference: group index is missing
*"; var pattern = Pattern.compile("-([a-z]+)-"); var matcher = pattern.matcher(text); var builder = new StringBuilder(); while(matcher.find()) var replace = Matcher.quoteReplacement("$"); matcher.appendReplacement(builder, replace); System.out.println(builder); * $ $
fish red fish blue fish"; var regex = "\\w+ fish(?! blue)"; var pattern = Pattern.compile(regex); var matcher = pattern.matcher(text); while (matcher.find()) System.out.println(matcher.group()); 1 fish 2 fish blue fish
I am SAM. Sam i am"; var pattern = Pattern.compile("(?i)sam"); var matcher = pattern.matcher(text); while (matcher.find()) System.out.println(matcher.start() + "-" + matcher.end()); 5-8 15-18 20-23
(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\\d| 2[0-8])(\\/|-|\\.)(?:(?:0?[1-9])|(?:1[0-2]))\\4(?: (?:1[6-9]|[2-9]\\d)?\\d{2})$")) { handleDate(dateString); } Too complicated I draw the line way before this :)
Incorrectly assumes Unicode Graphene Cluster is one code point. Fix: "cc̈d̈d".replaceAll("c̈|d̈", "X"); Pattern.compile("söme pättern", Pattern.CASE_INSENSITIV E); By default, case insensitive is ASCII only. Fix: Pattern.compile(“söme pättern", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); Pattern p = Pattern.compile("é|ë| è"); Could be code point or cluster. Fix: Pattern p = Pattern.compile("é| ë|è", Pattern.CANON_EQ);
val wordBoundary = "\\b" val threeOrFourChars = "\\w{3,4}" val space = " " val regex = Regex(wordBoundary + threeOrFourChars + space) println(regex.replaceFirst(text, "_")) Kotlin 68 _had a little lamb.
little lamb." val wordBoundary = """\b""" val threeOrFourChars = """\w{3,4}""" val space = " " val regex = new Regex(wordBoundary + threeOrFourChars + space) println(regex replaceFirstIn(text, "_")) Scala 72 _had a little lamb.
"Mary had a little lamb.") (def wordBoundary "\\b") (def threeOrFourChars "\\w{3,4}") (def space " ") (def regex (str wordBoundary threeOrFourChars space)) (def pat (re-pattern regex)) (println(clojure.string/replace-first text pat "_"))) (Replacer) Clojure 78 _had a little lamb.