$30 off During Our Annual Pro Sale. View Details »

Readable Regular Expressions

Readable Regular Expressions

Jeanne Boyarsky

June 24, 2023
Tweet

More Decks by Jeanne Boyarsky

Other Decks in Technology

Transcript

  1. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 1
    Readable Regular Expressions
    Jeanne Boyarsky


    Friday, June 23th 2023


    KCDC


    speakerdeck.com/boyarsky


    View Slide

  2. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Pause for a Commercial
    2
    Java certs: 8/11/17

    View Slide

  3. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 3
    Readability
    note!

    View Slide

  4. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 4
    Intro

    View Slide

  5. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Why learning regex matters
    5
    Anyone see what is wrong?
    Match all version 1 TPR reports

    View Slide

  6. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Why learning regex matters
    6
    Uh oh :(

    View Slide

  7. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Why learning regex matters
    7
    Whew

    View Slide

  8. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    History
    8
    1943 Patterns in neuroscience
    1951 Stephen Keene describing
    neural networks
    1960’s Pattern matching in text
    editors, lexical parsing in
    compilers
    1980’s PERL
    2002 Java 1.4 - regex in core Java

    View Slide

  9. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Greedy Quantifiers
    9
    Symbol # j’s?
    j 1
    j? 0-1
    j* 0 or more
    j+ 1 or more
    j{5} 5
    j{5,6} 5-6
    j{5,} 5 or more

    View Slide

  10. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Puzzle time
    10
    Can you come up with 10 ways of


    matching your assigned regex?


    • One or more x’s


    Note: for this game, you can only have


    two x’s in each regex

    View Slide

  11. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Sample solutions
    11
    One or more


    • x+


    • x{1,}


    • xx*


    • x{1}x*


    • x{1,1}x*


    • x{1}x{0,}


    • x{1,1}x{0,}


    • xx{0,}


    • x{0,0},x{1,}


    • (x|x)+


    • etc


    View Slide

  12. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Common Character Classes
    12
    Regex Matches
    [123] Any of 1, 2 or 3
    [1-3] Any of 1, 2 or 3
    [^5] Any character but “5”
    [a-zA-Z] Letter
    \d Digit
    \s Whitespace
    \w Word character (letter
    or digit)

    View Slide

  13. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Less Common Character Classes
    13
    Regex Matches Longer form
    \D Not digit [^0-9]
    \S Not whitespace [^\s]
    \W Not word char [^a-zA-Z0-9]
    [1-3[x-z]] Union [1-3x-z]
    [[m-p]&&[l-n]] Intersection [mn]
    [m-p&&[^o]] Subtraction [mnp]
    Clarity


    Understanding

    View Slide

  14. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Know your use case
    14
    Runtime
    generated
    classes?
    [[m-p]&&[l-n]] Rarely
    clearer

    View Slide

  15. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Know your team
    15
    Team
    knows what \D
    means?
    \D vs [^0-9] Misread
    \d vs \\D?

    View Slide

  16. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 16
    Conclusion:
    Don’t be clever!

    View Slide

  17. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Use case
    17
    Doing
    by hand likely
    faster….
    Do 10 replaces one time

    View Slide

  18. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 18
    Time for Java

    View Slide

  19. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var phoneNumbers = """
    111-111-1111
    222-222-2222
    """;
    var pattern = Pattern.compile(
    “[0-9]{3}-[0-9]{3}-[0-9]{4}");
    var matcher = pattern.matcher(phoneNumbers);
    while (matcher.find())
    System.out.println(matcher.group());
    Match a Pattern
    19
    You
    promised
    readable regex!

    View Slide

  20. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var phoneNumbers = """
    111-111-1111
    222-222-2222
    """;
    var threeDigits = "[0-9]{3}";
    var fourDigits = "[0-9]{4}";
    var dash = "-";
    var regex = threeDigits + dash
    + threeDigits + dash + fourDigits;
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(phoneNumbers);
    while (matcher.find())
    System.out.println(matcher.group());
    Refactored
    20

    View Slide

  21. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var phoneNumbers = """
    111-111-1111
    222-222-2222
    """;
    var threeDigits = “\\d{3}”;
    var fourDigits = “\\d{4}”;
    var dash = "-";
    var regex = threeDigits + dash
    + threeDigits + dash + fourDigits;
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(phoneNumbers);
    while (matcher.find())
    System.out.println(matcher.group());
    Escaping
    21

    View Slide

  22. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var phoneNumbers = """
    111-111-1111
    222-222-2222
    """;
    var areaCodeGroup = "(\\d{3})";
    var threeDigits = "\\d{3}";
    var fourDigits = "\\d{4}";
    var dash = "-";
    var regex = areaCodeGroup + dash
    + threeDigits + dash + fourDigits;
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(phoneNumbers);
    while (matcher.find())
    System.out.println(matcher.group(1));
    Groups
    22

    View Slide

  23. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var numbers = "012";
    var regex = "((\\d)(\\d))\\d";
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(numbers);
    while (matcher.find()) {
    System.out.format("%s %s ",
    matcher.group(), matcher.group(0));
    System.out.format("%s %s ",
    matcher.group(1), matcher.group(2));
    System.out.format("%s %s",
    matcher.group(3), matcher.group(4));
    }
    What is the output?
    23
    012 012 01 0


    Index out of bounds: no group 4

    View Slide

  24. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Named capturing groups
    24
    Group
    2 was what
    now?
    (?)

    View Slide

  25. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var phoneNumbers = """
    111-111-1111
    """;
    var areaCodeGroup = "(?\\d{3})";
    var threeDigits = "\\d{3}";
    var fourDigits = "\\d{4}";
    var dash = "-";
    var regex = areaCodeGroup + dash
    + threeDigits + dash + fourDigits;
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(phoneNumbers);
    while (matcher.find())
    System.out.println(
    matcher.group("areaCode"));
    Named Capturing Groups
    25

    View Slide

  26. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var string = "Elevation high";
    var regex = "[a-zA-Z ]+";
    System.out.println(
    string.matches(regex));
    Exact match
    26
    That’s
    a lot of
    ceremony!

    View Slide

  27. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var before = "123 Sesame Street";
    var after = before.replaceAll("\\d", "");
    System.out.println(after);
    Replace
    27
    Now
    THAT is easy
    to read

    View Slide

  28. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var string = "Mile High City";
    string = string.replaceAll("^\\w+", "");
    string = string.replaceAll("\\w+$", "");
    string = string.strip();
    System.out.println(string);
    What does this print?
    28
    High

    View Slide

  29. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    29
    With
    readability this
    time?
    var string = "Mile High City";
    var firstWord = "^\\w+";
    var lastWord = "\\w+$";
    string = string.replaceAll(firstWord, "");
    string = string.replaceAll(lastWord, "");
    string = string.strip();
    System.out.println(string);

    View Slide

  30. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var string = "Mile High City";
    var boundaryAndWord = "\\b\\w+";
    string = string.replaceAll(
    boundaryAndWord, "");
    string = string.strip();
    System.out.println(string);
    What about now?
    30
    Blank. Both start of string and spaces are boundaries

    View Slide

  31. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What does this print?
    31
    var text = "\\___/";
    var regex = "\\_.*/";
    System.out.println(text.matches(regex));
    false


    Need four backslashes in the regex to print true.

    View Slide

  32. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 32
    Beyond the Basics

    View Slide

  33. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Flags
    33
    Flag Name Purpose
    (?i) CASE_INSENTIVE Case insensitive
    ASCII
    (?m) MULTILINE ^ and $ match line
    breaks
    (?s) DOTALL . matches line
    break
    (?d) UNIX_LINES Only matches \n
    (?x) COMMENTS Ignores
    whitespace and
    # to end of line
    + Unicode ones

    View Slide

  34. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var fiveDigits = "\\d{5}";
    var optionalFourDigitSuffix = "(-\\d{4})?";
    var regex = fiveDigits + optionalFourDigitSuffix;
    var pattern = Pattern.compile(regex);
    var regex = """
    \\d{5} # five digits
    (-\\d{4})? # optional four digits
    """;
    var pattern = Pattern.compile(regex, Pattern.COMMENTS);
    Comments
    34
    Which
    is more readable?


    When would the other
    be?

    View Slide

  35. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    var html = """



    Ready!


    """;
    var body = html.replaceFirst("(?s)^.*", "")
    .replaceFirst("(?s).*$", “")
    .strip();
    System.out.println(body);
    Embedding Flag
    35
    Ready!

    View Slide

  36. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    36
    So I
    have to say what
    I don’t want?
    var regex = "(?s).*(.*).*";
    var body = html
    .replaceFirst(regex, "$1")
    .strip();
    System.out.println(body);
    Ready!

    View Slide

  37. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    37
    Huh?
    var dotAllMode = "(?s)";
    var anyChars = ".*";
    var captureAnyChars = "(.*)";
    var startBody = "";
    var endBody = "";
    var bodyPart = startBody
    + captureAnyChars + endBody;
    var regex = dotAllMode + anyChars
    + bodyPart + anyChars;
    var body = html.replaceFirst(regex, “$1")
    .strip();
    System.out.println(body);
    Ready!

    View Slide

  38. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Where did the close * go?
    38
    var text = "* -aa- -b- *";
    var pattern =
    Pattern.compile("-([a-z]+)-");
    var matcher = pattern.matcher(text);
    var builder = new StringBuilder();
    while(matcher.find())
    matcher.appendReplacement(
    builder, "x");
    System.out.println(builder);
    * x x

    View Slide

  39. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Where did the close * go?
    39
    var text = "* -aa- -b- *";
    var pattern =
    Pattern.compile("-([a-z]+)-");
    var matcher = pattern.matcher(text);
    var builder = new StringBuilder();
    while(matcher.find())
    matcher.appendReplacement(
    builder, "x");
    matcher.appendTail(builder);
    System.out.println(builder);
    * x x *

    View Slide

  40. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What does this do?
    40
    var text = "* -aa- -b- *";
    var pattern =
    Pattern.compile("-([a-z]+)-");
    var matcher = pattern.matcher(text);
    var builder = new StringBuilder();
    while(matcher.find())
    matcher.appendReplacement(
    builder, "$");
    System.out.println(builder);
    IllegalArgumentException: Illegal group reference:
    group index is missing

    View Slide

  41. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Fix
    41
    var text = "* -aa- -b- *";
    var pattern =
    Pattern.compile("-([a-z]+)-");
    var matcher = pattern.matcher(text);
    var builder = new StringBuilder();
    while(matcher.find())
    var replace = Matcher.quoteReplacement("$");
    matcher.appendReplacement(builder, replace);
    System.out.println(builder);
    * $ $

    View Slide

  42. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Quantifier Types
    42
    Sample Type Description
    z? Greedy Read whole string and backtrack
    z?? Reluctant Look at one character at a time
    z?+ Possessive Read whole string/never backtrack

    View Slide

  43. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Comparing
    43
    var text = "Poem: row row row your boat";
    System.out.println(
    text.matches(".*(row )+your boat"));
    System.out.println(
    text.matches(".*?(row )+your boat"));
    System.out.println(
    text.matches(".*+(row )+your boat"));
    true (extra backtracking)


    true (faster)


    false

    View Slide

  44. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Lookahead/behind
    44
    Sample Type
    (?=r) Positive lookahead
    (?!r) Negative lookahead
    (?<=r) Positive lookbehind
    (?

    View Slide

  45. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Looking
    45
    var text =
    “1 fish 2 fish red fish blue fish";
    var regex = "\\w+ fish(?! blue)";
    var pattern = Pattern.compile(regex);
    var matcher = pattern.matcher(text);
    while (matcher.find())
    System.out.println(matcher.group());
    1 fish


    2 fish


    blue fish

    View Slide

  46. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Indexes
    46
    var text = "i am sam. I am SAM. Sam i am";
    var pattern = Pattern.compile("(?i)sam");
    var matcher = pattern.matcher(text);
    while (matcher.find())
    System.out.println(matcher.start()
    + "-" + matcher.end());
    5-8


    15-18


    20-23

    View Slide

  47. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Debugging Tip
    47
    Small
    regex and build
    up
    Online
    regex checker

    View Slide

  48. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 48
    Sonar 9 Highlights

    View Slide

  49. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    49
    "[ab]|a"
    Redundant. a is a subset of [ab]

    View Slide

  50. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    50
    changed = changed.
    replaceAll("\\.\\.\\.", ";")
    Performance. No need for a regex


    changed = changed.replace("...", “;");

    View Slide

  51. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    51
    Pattern regex =
    Pattern.compile("myRegex");
    Matcher matcher =
    regex.matcher("s");
    Performance since not static pattern
    Readability
    tradeoff

    View Slide

  52. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    52
    <.+?>
    Needless backtracking. Instead can write:


    <[^>]+>

    View Slide

  53. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    53
    if (dateString.matches("^(?:(?:31(\\/|-|\\.)(?:0?[13578]|
    1[02]))\\1|(?:(?:29|30)(\\/|-|\\.)(?:0?[13-9]|1[0-2])\\2))
    (?:(?:1[6-9]|[2-9]\\d)?\\d{2})$|^(?:29(\\/|-|\\.)0?2\\3(?:
    (?:(?:1[6-9]|[2-9]\\d)?(?:0[48]|[2468][048]|[13579][26])|
    (?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\\d|
    2[0-8])(\\/|-|\\.)(?:(?:0?[1-9])|(?:1[0-2]))\\4(?:
    (?:1[6-9]|[2-9]\\d)?\\d{2})$")) {
    handleDate(dateString);
    }
    Too complicated
    I
    draw the line way
    before this :)

    View Slide

  54. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    54
    Pattern.compile("$[a-z]+^");
    Functionally wrong. Should be:


    Pattern.compile("^[a-z]+$");

    View Slide

  55. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    55
    Pattern.compile("(?=a)b");
    If lookahead matches next character = a, it isn’t b

    View Slide

  56. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    56
    Pattern.compile("[a-zA-Z]");
    Assumes only English characters


    Pattern.compile("\\p{IsAlphabetic}");
    a-z
    is clearer if that’s all
    you want

    View Slide

  57. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    57
    String regex = request
    .getParameter("regex");
    String input = request
    .getParameter("input");
    return input.matches(regex);
    Denial of service opportunity. Need to validate

    View Slide

  58. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    58
    Pattern.compile("(a|b)*");
    Backtracking can overflow stack on large strings. Vs


    Pattern.compile("[ab]*");

    View Slide

  59. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    59
    Pattern.compile("(.|\n)*");
    Have dot itself match the line breaks. Better to use:


    Pattern.compile("(?s).*");

    View Slide

  60. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    60
    Pattern.compile("(ab?)*");
    Possessive quantifiers disable backtracking


    Pattern.compile("(ab?)*+");

    View Slide

  61. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    61
    str.matches("\\d*?")
    ? is redundant here and causes backtracking


    str.matches("\\d*")

    View Slide

  62. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    What’s wrong?
    62
    Pattern.compile("a++abc");
    Can’t match because ++ is greedy so no “a” left after


    Pattern.compile("aa++bc");

    View Slide

  63. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Beyond English
    63
    Problem Reason/Fix
    "cc̈d̈d".replaceAll("[c̈d̈]
    ", "X");
    Incorrectly assumes Unicode
    Graphene Cluster is one code
    point. Fix:


    "cc̈d̈d".replaceAll("c̈|d̈", "X");
    Pattern.compile("söme
    pättern",
    Pattern.CASE_INSENSITIV
    E);
    By default, case insensitive is
    ASCII only. Fix:


    Pattern.compile(“söme pättern",
    Pattern.CASE_INSENSITIVE |
    Pattern.UNICODE_CASE);
    Pattern p =
    Pattern.compile("é|ë|
    è");
    Could be code point or cluster.
    Fix:


    Pattern p = Pattern.compile("é|
    ë|è", Pattern.CANON_EQ);

    View Slide

  64. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky 64
    Other JVM Lang Sampling

    View Slide

  65. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    Use Cases
    65
    1.Find first match


    2.Find all matches


    3.Replace first matches

    View Slide

  66. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    val text = "Mary had a little lamb"
    val regex = Regex("\\b\\w{3,4} ")
    print(regex.find(text)?.value)
    Kotlin
    66
    Mary

    View Slide

  67. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    val text = "Mary had a little lamb"
    val regex = "\\b\\w{3,4} ".toRegex()
    regex.findAll(text)
    .map { it.groupValues[0] }
    .forEach { print(it) }
    Kotlin
    67
    Mary had

    View Slide

  68. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    val text = "Mary had a little lamb."
    val wordBoundary = "\\b"
    val threeOrFourChars = "\\w{3,4}"
    val space = " "
    val regex = Regex(wordBoundary +
    threeOrFourChars + space)
    println(regex.replaceFirst(text, "_"))
    Kotlin
    68
    _had a little lamb.

    View Slide

  69. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    anyOf {
    string("hello")
    .digit()
    .word()
    .char('.')
    .char('#')
    }
    Kotlin - SuperExpressive
    69
    Justin Lee


    https://github.com/
    evanchooly/super-
    expressive

    View Slide

  70. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    val text = "Mary had a little lamb"
    val regex = """\b\w{3,4} """.r
    val optional = regex findFirstIn text
    println(optional.getOrElse("No Match"))
    Scala
    70
    Mary

    View Slide

  71. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    val text = "Mary had a little lamb."
    val regex = """\b\w{3,4} """.r
    val it = regex findAllIn text
    it foreach print
    Scala
    71
    Mary had

    View Slide

  72. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    import scala.util.matching.Regex
    val text = "Mary had a little lamb."
    val wordBoundary = """\b"""
    val threeOrFourChars = """\w{3,4}"""
    val space = " "
    val regex = new Regex(wordBoundary +
    threeOrFourChars + space)
    println(regex replaceFirstIn(text, "_"))
    Scala
    72
    _had a little lamb.

    View Slide

  73. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    def text = 'Mary had a little lamb'
    def regex = /\b\w{3,4} /
    def matcher = text =~ regex
    print matcher[0]
    Groovy
    73
    Mary

    View Slide

  74. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    def text = 'Mary had a little lamb'
    def regex = /\b\w{3,4} /
    def matcher = text =~ regex
    print matcher.findAll().join(' ')
    Groovy
    74
    Mary had

    View Slide

  75. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    def text = 'Mary had a little lamb.'
    def wordBoundary = "\\b"
    def threeOrFourChars = "\\w{3,4}"
    def space = " "
    def regex =
    /$wordBoundary$threeOrFourChars$space/
    println text.replaceFirst(regex)
    { it -> '_' }
    println text.replaceFirst(regex, '_')
    Groovy
    75
    _had a little lamb.


    _had a little lamb.

    View Slide

  76. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    (println(
    re-find #”\b\w{3,4} ",
    "Mary had a little lamb"))
    Clojure
    76
    Mary

    View Slide

  77. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    (println(
    re-seq #”\b\w{3,4} ",
    "Mary had a little lamb"))
    Clojure
    77
    (Mary had )

    View Slide

  78. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    (ns clojure.examples.example
    (:gen-class))
    (defn Replacer []
    (def text "Mary had a little lamb.")
    (def wordBoundary "\\b")
    (def threeOrFourChars "\\w{3,4}")
    (def space " ")
    (def regex (str wordBoundary
    threeOrFourChars space))
    (def pat (re-pattern regex))
    (println(clojure.string/replace-first
    text pat "_")))
    (Replacer)
    Clojure
    78
    _had a little lamb.

    View Slide

  79. twitter.com/jeanneboyarsky mastodon.social/@jeanneboyarsky
    For more reading
    79

    View Slide