Pro Yearly is on sale from $80 to $50! »

Dealing with /Regexion/

Dealing with /Regexion/

We’ve all been there. You want to to parse a string with a bit o’ regex. You have to account for things like dashes and three-digit-numbers and words-that-start-with-capital-letters. So you cobble together some sketchy stack overflow results, toss em between those infamous forward slashes, and cross your fingers. Maybe 10% of the time you get what you want, the other 90% leaves you with an empty string or nil. Either way, your heart feels fragile, and you just need a hug.

This, my friends, is regexion. Like rejection, it hurts. Don't feel bad though, it happens to everyone. We love expressive syntax, so it’s hard to not see regex as inscrutable black magic. This talk provides context for why regex is worth the effort and dives into advanced techniques like capture groups. You will walk away with the tools and mindset to face regexion with courage and optimism.

481a1f18bdd124c255bcf9e79a281ec3?s=128

tmikeschu

June 08, 2019
Tweet

Transcript

  1. DEALING WITH /REGEXION/ self.conference Mike Schutte June 8, 2019 1

    — 82 @tmikeschu
  2. > @tmikeschu > ! > " > # $ %

    > & ' > 2 — 82 @tmikeschu
  3. 3 — 82 @tmikeschu

  4. - - https://www.epi.org/publication/the-color-of-law-a-forgotten-history-of-how-our-government-segregated- america/ 4 — 82 @tmikeschu

  5. VICTORY CONDITIONS > Feel at piece with the craziness of

    regex > Have a few strategies for deciding when to use regex > Be like "wow capture groups are amazing" 5 — 82 @tmikeschu
  6. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions > When to use regex > Capture groups 6 — 82 @tmikeschu
  7. Disclaimer 7 — 82 @tmikeschu

  8. me using regex 8 — 82 @tmikeschu

  9. HOW TO DEAL WITH /REGEXION/: 9 — 82 @tmikeschu

  10. STRINGS AND LANGUAGE > Typos > Duplication > Patterns >

    Formats > Repetition 10 — 82 @tmikeschu
  11. ACCEPT IT > Regex is messy > because string data

    is messy > because language is messy 11 — 82 @tmikeschu
  12. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis > POSIX, Perl, PCRE > Finite state machine 12 — 82 @tmikeschu
  13. https://www.youtube.com/watch?v=hprXxJHQVfQ 13 — 82 @tmikeschu

  14. CODE THAT WANTS TO BE REGEXED 14 — 82 @tmikeschu

  15. IF YOU ASK MORE THAN ONE QUESTION ABOUT A STRING...

    15 — 82 @tmikeschu
  16. DITCH && AND || FOR // 16 — 82 @tmikeschu

  17. - someString.startsWith(":") && - someString.split("").some(char => Boolean(Number(char))) + /^:.*\d/.test(someString) -

    someString.includes("someWord") || someString.includes("someOtherWord"); + /someWord|someOtherWord/.test(someString) 17 — 82 @tmikeschu
  18. (Array of characters).context < (string).context 18 — 82 @tmikeschu

  19. Regex forces you to consider the string as (more of)

    a whole 19 — 82 @tmikeschu
  20. METHODS TO USE 20 — 82 @tmikeschu

  21. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) > Stateful search > Regex.prototype.exec (=> Array) 21 — 82 @tmikeschu
  22. You know what those parentheses in regular expressions are, right?

    /(\d+)/; 22 — 82 @tmikeschu
  23. CAPTURE GROUPS: KEEP IT TOGETHER /()/ 23 — 82 @tmikeschu

  24. > Is familiarity worth rigidity? > Is difficulty worth flexibility?

    24 — 82 @tmikeschu
  25. Is difficulty worth flexibility? 25 — 82 @tmikeschu

  26. TASK CREATE A FUNCTION THAT > Takes in a name

    in First Last format > And returns the name in Last, First format 26 — 82 @tmikeschu
  27. const albus = "Albus Dumbledore"; function lastFirst(name) { // TODO

    } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 27 — 82 @tmikeschu
  28. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 28 — 82 @tmikeschu
  29. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 29 — 82 @tmikeschu
  30. _ someString.replace(/(cats)(dogs)/, (full, group1, group2) => { // do stuff

    with the groups }); _ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ String/replace#Specifying_a_function_as_a_parameter 30 — 82 @tmikeschu
  31. ⚠ CHANGE ALERT 31 — 82 @tmikeschu

  32. > ...middle names too. const albus = "Albus Percival Dumbledore";

    fullName(albus); // => "Dumbledore, Albus Percival" 32 — 82 @tmikeschu
  33. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Percival, Albus" 33 — 82 @tmikeschu
  34. ! 34 — 82 @tmikeschu

  35. function lastFirst(rawName) { const names = rawName.split(" "); const maxIndex

    = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 35 — 82 @tmikeschu
  36. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Percival, Albus Dumbledore" 36 — 82 @tmikeschu
  37. ! 37 — 82 @tmikeschu

  38. function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/; return name.replace(reFirstLast, "$2,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 38 — 82 @tmikeschu
  39. - /(\w+)\s(\w+)/; + /(\w+\s*\w*)\s(\w+)/; 39 — 82 @tmikeschu

  40. COMPARISON SPLIT > Calculating indices > Accommodating for zero-based counting

    > Array.prototype methods > String interpolation 40 — 82 @tmikeschu
  41. COMPARISON REGEX > Patterns > There is a first bit

    > and a last bit > and sometimes extra middle bits in the first bit 41 — 82 @tmikeschu
  42. THAT IS MESSY 42 — 82 @tmikeschu

  43. THAT IS OKAY 43 — 82 @tmikeschu

  44. THAT IS AWESOME 44 — 82 @tmikeschu

  45. ⚠ CHANGE ALERT 45 — 82 @tmikeschu

  46. > ...middle names too > ...multiple middle names const albus

    = "Albus Percival Wulfric Brian Dumbledore"; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian" 46 — 82 @tmikeschu
  47. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 47 — 82 @tmikeschu
  48. ! 48 — 82 @tmikeschu

  49. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Wulfric, Albus Percival Brian Dumbledore" 49 — 82 @tmikeschu
  50. ! 50 — 82 @tmikeschu

  51. - const reFirstLast = /(\w+\s*\w*)\s(\w+)/; + const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    51 — 82 @tmikeschu
  52. function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/; return name.replace(reFirstLast, "$3,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 52 — 82 @tmikeschu
  53. ⚠ CHANGE ALERT 53 — 82 @tmikeschu

  54. > ...middle names too > ...multiple middle names > ...suffixes

    too const albus = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 54 — 82 @tmikeschu
  55. ! 55 — 82 @tmikeschu

  56. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Jr., Albus Percival Wulfric Brian Dumbledore," 56 — 82 @tmikeschu
  57. ! 57 — 82 @tmikeschu

  58. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 58 — 82 @tmikeschu
  59. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 59 — 82 @tmikeschu
  60. ! 60 — 82 @tmikeschu

  61. ! 61 — 82 @tmikeschu

  62. ⚠ CHANGE ALERT 62 — 82 @tmikeschu

  63. > ...middle names too > ...multiple middle names > ...suffixes

    too > ...just first name is okay ¯_(ϑ)_/¯ const albus = "Albus"; lastFirst(albus); // => "Albus" 63 — 82 @tmikeschu
  64. APPROACH 1: SPLIT function lastFirst(rawName) { const [name, suffix] =

    rawName.split(", "); const names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Albus," 64 — 82 @tmikeschu
  65. ! 65 — 82 @tmikeschu

  66. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } if (output.endsWith(",")) { return output.slice(0, output.length - 1); } return output; } 66 — 82 @tmikeschu
  67. ! const output = `${last}, ${rest.join(" ")}`; if (suffix) {

    return `${output}, ${suffix}`; } + if (output.endsWith(",")) { + return output.slice(0, output.length - 1); + } return output 67 — 82 @tmikeschu
  68. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex).join(" "); let output = last; if (Boolean(rest)) { output += `, ${rest}`; } if (suffix) { output += `, ${suffix}`; } return output; } 68 — 82 @tmikeschu
  69. ! - const output = `${last}, ${rest.join(" ")}`; - if

    (suffix) { - return `${output}, ${suffix}`; - } - if (output.lastIndexOf(",") === output.length) { - return output.slice(0, output.length - 1); - } + let output = last; + if (Boolean(rest)) { + output += `, ${rest}`; + } + if (suffix) { + output += `, ${suffix}`; + } return output; 69 — 82 @tmikeschu
  70. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Albus" 70 — 82 @tmikeschu
  71. ! 71 — 82 @tmikeschu

  72. ! 72 — 82 @tmikeschu

  73. const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/; 73 — 82 @tmikeschu

  74. FLEXIBILITY > READABILITY 74 — 82 @tmikeschu

  75. ...ABOUT "READABILITY" > Is German readable? 75 — 82 @tmikeschu

  76. REVIEW > Rich history specifically designed for analyzing and searching

    text > Regex is messy because string data is messy because language is messy > If you have more than one question about your string... > Capture groups are great for manipulating substrings 76 — 82 @tmikeschu
  77. Go forth and parse your strings! 77 — 82 @tmikeschu

  78. Embrace the /pain/! 78 — 82 @tmikeschu

  79. Don't fear /regexion/! 79 — 82 @tmikeschu

  80. Response /gracefully/ to change 80 — 82 @tmikeschu

  81. THANK YOU! 81 — 82 @tmikeschu

  82. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html >

    Wiki: https://en.wikipedia.org/wiki/Regular_expression > Named capture groups: https://github.com/tc39/ proposal-regexp-named-groups 82 — 82 @tmikeschu