Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dealing with /Regexion/

Dealing with /Regexion/

We’ve all been there. You want to to parse a string with a bit o’ regex. You have to account for things like dashes and three-digit-numbers and words-that-start-with-capital-letters. So you cobble together some sketchy stack overflow results, toss em between those infamous forward slashes, and cross your fingers. Maybe 10% of the time you get what you want, the other 90% leaves you with an empty string or nil. Either way, your heart feels fragile, and you just need a hug.

This, my friends, is regexion. Like rejection, it hurts. Don't feel bad though, it happens to everyone. We love expressive syntax, so it’s hard to not see regex as inscrutable black magic. This talk provides context for why regex is worth the effort and dives into the advanced technique of capture groups. You will walk away with the tools and mindset to face regexion with courage and optimism.

481a1f18bdd124c255bcf9e79a281ec3?s=128

tmikeschu

August 20, 2019
Tweet

Transcript

  1. DEALING WITH /REGEXION/ Detroit Software Guild Mike Schutte August 20,

    2019 1/81 — @tmikeschu
  2. 2/81

  3. > @tmikeschu ( !"# ) 2/81

  4. > @tmikeschu ( !"# ) > $ 2/81

  5. > @tmikeschu ( !"# ) > $ > % 2/81

  6. > @tmikeschu ( !"# ) > $ > % >

    & " ' 2/81
  7. > @tmikeschu ( !"# ) > $ > % >

    & " ' > ( ) 2/81
  8. > @tmikeschu ( !"# ) > $ > % >

    & " ' > ( ) > 2/81
  9. FWBAT friendgineers will be able to... 3/81 — @tmikeschu

  10. 4/81 — @tmikeschu

  11. > Feel at piece with the craziness of regex 4/81

    — @tmikeschu
  12. > Feel at piece with the craziness of regex >

    Have a few strategies for deciding when to use regex 4/81 — @tmikeschu
  13. > Feel at piece with the craziness of regex >

    Have a few strategies for deciding when to use regex > Be like "wow capture groups are amazing" 4/81 — @tmikeschu
  14. ROADMAP 5/81 — @tmikeschu

  15. ROADMAP > My soapbox: why regex should be messy 5/81

    — @tmikeschu
  16. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions 5/81 — @tmikeschu
  17. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions > When to use regex 5/81 — @tmikeschu
  18. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions > When to use regex > Capture groups 5/81 — @tmikeschu
  19. Disclaimer 6/81 — @tmikeschu

  20. me using regex 7/81 — @tmikeschu

  21. HOW TO DEAL WITH /REGEXION/: 8/81 — @tmikeschu

  22. STRINGS AND LANGUAGE 9/81 — @tmikeschu

  23. STRINGS AND LANGUAGE > Typos 9/81 — @tmikeschu

  24. STRINGS AND LANGUAGE > Typos > Duplication 9/81 — @tmikeschu

  25. STRINGS AND LANGUAGE > Typos > Duplication > Patterns 9/81

    — @tmikeschu
  26. STRINGS AND LANGUAGE > Typos > Duplication > Patterns >

    Formats 9/81 — @tmikeschu
  27. STRINGS AND LANGUAGE > Typos > Duplication > Patterns >

    Formats > Repetition 9/81 — @tmikeschu
  28. ACCEPT IT 10/81 — @tmikeschu

  29. ACCEPT IT > Regex is messy 10/81 — @tmikeschu

  30. ACCEPT IT > Regex is messy > because string data

    is messy 10/81 — @tmikeschu
  31. ACCEPT IT > Regex is messy > because string data

    is messy > because language is messy 10/81 — @tmikeschu
  32. REGULAR EXPRESSIONS: AN OVERVIEW 11/81 — @tmikeschu

  33. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 11/81 —

    @tmikeschu
  34. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 11/81 — @tmikeschu
  35. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search 11/81 — @tmikeschu
  36. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis 11/81 — @tmikeschu
  37. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis > POSIX, Perl, PCRE 11/81 — @tmikeschu
  38. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis > POSIX, Perl, PCRE > Finite state machine 11/81 — @tmikeschu
  39. https://www.youtube.com/watch?v=hprXxJHQVfQ 12/81 — @tmikeschu

  40. CODE THAT WANTS TO BE REGEXED 13/81 — @tmikeschu

  41. IF YOU ASK MORE THAN ONE QUESTION ABOUT A STRING...

    14/81 — @tmikeschu
  42. DITCH && AND || FOR // 15/81 — @tmikeschu

  43. - someString.startsWith(":") && - someString.split("").some(char => Boolean(Number(char))) + /^:.*\d/.test(someString) -

    someString.includes("someWord") || someString.includes("someOtherWord"); + /someWord|someOtherWord/.test(someString) 16/81 — @tmikeschu
  44. (Array of characters).context < (string).context 17/81 — @tmikeschu

  45. Regex forces you to consider the string as (more of)

    a whole 18/81 — @tmikeschu
  46. METHODS TO USE 19/81 — @tmikeschu

  47. 20/81 — @tmikeschu

  48. > Change format 20/81 — @tmikeschu

  49. > Change format > String.prototype.replace (=> String) 20/81 — @tmikeschu

  50. > Change format > String.prototype.replace (=> String) > Get substring(s)

    20/81 — @tmikeschu
  51. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) 20/81 — @tmikeschu
  52. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities 20/81 — @tmikeschu
  53. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) 20/81 — @tmikeschu
  54. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) > Stateful search 20/81 — @tmikeschu
  55. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) > Stateful search > Regex.prototype.exec (=> Array) 20/81 — @tmikeschu
  56. You know what those parentheses in regular expressions are, right?

    /(\d+)/; 21/81 — @tmikeschu
  57. CAPTURE GROUPS: KEEP IT TOGETHER /()/ 22/81 — @tmikeschu

  58. 23/81 — @tmikeschu

  59. > Is familiarity worth rigidity? 23/81 — @tmikeschu

  60. > Is familiarity worth rigidity? > Is difficulty worth flexibility?

    23/81 — @tmikeschu
  61. Is difficulty worth flexibility? 24/81 — @tmikeschu

  62. TASK CREATE A FUNCTION THAT 25/81 — @tmikeschu

  63. TASK CREATE A FUNCTION THAT > Takes in a name

    in First Last format 25/81 — @tmikeschu
  64. TASK CREATE A FUNCTION THAT > Takes in a name

    in First Last format > And returns the name in Last, First format 25/81 — @tmikeschu
  65. const albus = "Albus Dumbledore"; function lastFirst(name) { // TODO

    } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 26/81 — @tmikeschu
  66. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 27/81 — @tmikeschu
  67. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 28/81 — @tmikeschu
  68. _ someString.replace(/(cats)(dogs)/, (full, group1, group2) => { // do stuff

    with the groups }); _ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ String/replace#Specifying_a_function_as_a_parameter 29/81 — @tmikeschu
  69. ⚠ CHANGE ALERT 30/81 — @tmikeschu

  70. const albus = "Albus Percival Dumbledore"; fullName(albus); // => "Dumbledore,

    Albus Percival" 31/81 — @tmikeschu
  71. > ...middle names too. const albus = "Albus Percival Dumbledore";

    fullName(albus); // => "Dumbledore, Albus Percival" 31/81 — @tmikeschu
  72. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Percival, Albus" 32/81 — @tmikeschu
  73. ! 33/81 — @tmikeschu

  74. function lastFirst(rawName) { const names = rawName.split(" "); const maxIndex

    = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 34/81 — @tmikeschu
  75. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Percival, Albus Dumbledore" 35/81 — @tmikeschu
  76. ! 36/81 — @tmikeschu

  77. function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/; return name.replace(reFirstLast, "$2,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 37/81 — @tmikeschu
  78. - /(\w+)\s(\w+)/; + /(\w+\s*\w*)\s(\w+)/; 38/81 — @tmikeschu

  79. COMPARISON SPLIT 39/81 — @tmikeschu

  80. COMPARISON SPLIT > Calculating indices 39/81 — @tmikeschu

  81. COMPARISON SPLIT > Calculating indices > Accommodating for zero-based counting

    39/81 — @tmikeschu
  82. COMPARISON SPLIT > Calculating indices > Accommodating for zero-based counting

    > Array.prototype methods 39/81 — @tmikeschu
  83. COMPARISON SPLIT > Calculating indices > Accommodating for zero-based counting

    > Array.prototype methods > String interpolation 39/81 — @tmikeschu
  84. COMPARISON REGEX 40/81 — @tmikeschu

  85. COMPARISON REGEX > Patterns 40/81 — @tmikeschu

  86. COMPARISON REGEX > Patterns > There is a first bit

    40/81 — @tmikeschu
  87. COMPARISON REGEX > Patterns > There is a first bit

    > and a last bit 40/81 — @tmikeschu
  88. COMPARISON REGEX > Patterns > There is a first bit

    > and a last bit > and sometimes extra middle bits in the first bit 40/81 — @tmikeschu
  89. THAT IS MESSY 41/81 — @tmikeschu

  90. THAT IS OKAY 42/81 — @tmikeschu

  91. THAT IS AWESOME 43/81 — @tmikeschu

  92. ⚠ CHANGE ALERT 44/81 — @tmikeschu

  93. const albus = "Albus Percival Wulfric Brian Dumbledore"; lastFirst(albus); //

    => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  94. > ...middle names too const albus = "Albus Percival Wulfric

    Brian Dumbledore"; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  95. > ...middle names too > ...multiple middle names const albus

    = "Albus Percival Wulfric Brian Dumbledore"; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  96. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 46/81 — @tmikeschu
  97. ! 47/81 — @tmikeschu

  98. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Wulfric, Albus Percival Brian Dumbledore" 48/81 — @tmikeschu
  99. ! 49/81 — @tmikeschu

  100. - const reFirstLast = /(\w+\s*\w*)\s(\w+)/; + const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    50/81 — @tmikeschu
  101. function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/; return name.replace(reFirstLast, "$3,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 51/81 — @tmikeschu
  102. ⚠ CHANGE ALERT 52/81 — @tmikeschu

  103. const albus = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus);

    // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  104. > ...middle names too const albus = "Albus Percival Wulfric

    Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  105. > ...middle names too > ...multiple middle names const albus

    = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  106. > ...middle names too > ...multiple middle names > ...suffixes

    too const albus = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  107. ! 54/81 — @tmikeschu

  108. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Jr., Albus Percival Wulfric Brian Dumbledore," 55/81 — @tmikeschu
  109. ! 56/81 — @tmikeschu

  110. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 57/81 — @tmikeschu
  111. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 58/81 — @tmikeschu
  112. ! 59/81 — @tmikeschu

  113. ! 60/81 — @tmikeschu

  114. ⚠ CHANGE ALERT 61/81 — @tmikeschu

  115. const albus = "Albus"; lastFirst(albus); // => "Albus" 62/81 —

    @tmikeschu
  116. > ...middle names too const albus = "Albus"; lastFirst(albus); //

    => "Albus" 62/81 — @tmikeschu
  117. > ...middle names too > ...multiple middle names const albus

    = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  118. > ...middle names too > ...multiple middle names > ...suffixes

    too const albus = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  119. > ...middle names too > ...multiple middle names > ...suffixes

    too > ...just first name is okay ¯_(ϑ)_/¯ const albus = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  120. APPROACH 1: SPLIT function lastFirst(rawName) { const [name, suffix] =

    rawName.split(", "); const names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Albus," 63/81 — @tmikeschu
  121. ! 64/81 — @tmikeschu

  122. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } if (output.endsWith(",")) { return output.slice(0, output.length - 1); } return output; } 65/81 — @tmikeschu
  123. ! const output = `${last}, ${rest.join(" ")}`; if (suffix) {

    return `${output}, ${suffix}`; } + if (output.endsWith(",")) { + return output.slice(0, output.length - 1); + } return output 66/81 — @tmikeschu
  124. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex).join(" "); let output = last; if (Boolean(rest)) { output += `, ${rest}`; } if (suffix) { output += `, ${suffix}`; } return output; } 67/81 — @tmikeschu
  125. ! - const output = `${last}, ${rest.join(" ")}`; - if

    (suffix) { - return `${output}, ${suffix}`; - } - if (output.lastIndexOf(",") === output.length) { - return output.slice(0, output.length - 1); - } + let output = last; + if (Boolean(rest)) { + output += `, ${rest}`; + } + if (suffix) { + output += `, ${suffix}`; + } return output; 68/81 — @tmikeschu
  126. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Albus" 69/81 — @tmikeschu
  127. ! 70/81 — @tmikeschu

  128. ! 71/81 — @tmikeschu

  129. const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/; 72/81 — @tmikeschu

  130. FLEXIBILITY > READABILITY 73/81 — @tmikeschu

  131. ...ABOUT "READABILITY" 74/81 — @tmikeschu

  132. ...ABOUT "READABILITY" > Is German readable? 74/81 — @tmikeschu

  133. REVIEW 75/81 — @tmikeschu

  134. REVIEW > Designed for analyzing and searching text 75/81 —

    @tmikeschu
  135. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy 75/81 — @tmikeschu
  136. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy > More than one question about your string... 75/81 — @tmikeschu
  137. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy > More than one question about your string... > Capture groups are great for manipulating substrings 75/81 — @tmikeschu
  138. Go forth and parse your strings! 76/81 — @tmikeschu

  139. Embrace the /pain/! 77/81 — @tmikeschu

  140. Don't fear /regexion/! 78/81 — @tmikeschu

  141. Response /gracefully/ to change 79/81 — @tmikeschu

  142. THANK YOU! 80/81 — @tmikeschu

  143. RESOURCES 81/81 — @tmikeschu

  144. RESOURCES > Repl-ish tool: https://regexr.com/ 81/81 — @tmikeschu

  145. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html 81/81

    — @tmikeschu
  146. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html >

    Wiki: https://en.wikipedia.org/wiki/Regular_expression 81/81 — @tmikeschu
  147. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html >

    Wiki: https://en.wikipedia.org/wiki/Regular_expression > Named capture groups: https://github.com/tc39/ proposal-regexp-named-groups 81/81 — @tmikeschu