Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dealing with /Regexion/

Dealing with /Regexion/

We’ve all been there. You want to to parse a string with a bit o’ regex. You have to account for things like dashes and three-digit-numbers and words-that-start-with-capital-letters. So you cobble together some sketchy stack overflow results, toss em between those infamous forward slashes, and cross your fingers. Maybe 10% of the time you get what you want, the other 90% leaves you with an empty string or nil. Either way, your heart feels fragile, and you just need a hug.

This, my friends, is regexion. Like rejection, it hurts. Don't feel bad though, it happens to everyone. We love expressive syntax, so it’s hard to not see regex as inscrutable black magic. This talk provides context for why regex is worth the effort and dives into the advanced technique of capture groups. You will walk away with the tools and mindset to face regexion with courage and optimism.

tmikeschu

August 20, 2019
Tweet

More Decks by tmikeschu

Other Decks in Programming

Transcript

  1. > @tmikeschu ( !"# ) > $ > % >

    & " ' > ( ) > 2/81
  2. > Feel at piece with the craziness of regex >

    Have a few strategies for deciding when to use regex 4/81 — @tmikeschu
  3. > Feel at piece with the craziness of regex >

    Have a few strategies for deciding when to use regex > Be like "wow capture groups are amazing" 4/81 — @tmikeschu
  4. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions 5/81 — @tmikeschu
  5. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions > When to use regex 5/81 — @tmikeschu
  6. ROADMAP > My soapbox: why regex should be messy >

    Brief overview of regular expressions > When to use regex > Capture groups 5/81 — @tmikeschu
  7. STRINGS AND LANGUAGE > Typos > Duplication > Patterns >

    Formats > Repetition 9/81 — @tmikeschu
  8. ACCEPT IT > Regex is messy > because string data

    is messy 10/81 — @tmikeschu
  9. ACCEPT IT > Regex is messy > because string data

    is messy > because language is messy 10/81 — @tmikeschu
  10. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search 11/81 — @tmikeschu
  11. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis 11/81 — @tmikeschu
  12. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis > POSIX, Perl, PCRE 11/81 — @tmikeschu
  13. REGULAR EXPRESSIONS: AN OVERVIEW > Born in 1951 > Popularized

    in 1968 > Text editor search > Lexical analysis > POSIX, Perl, PCRE > Finite state machine 11/81 — @tmikeschu
  14. - someString.startsWith(":") && - someString.split("").some(char => Boolean(Number(char))) + /^:.*\d/.test(someString) -

    someString.includes("someWord") || someString.includes("someOtherWord"); + /someWord|someOtherWord/.test(someString) 16/81 — @tmikeschu
  15. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) 20/81 — @tmikeschu
  16. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities 20/81 — @tmikeschu
  17. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) 20/81 — @tmikeschu
  18. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) > Stateful search 20/81 — @tmikeschu
  19. > Change format > String.prototype.replace (=> String) > Get substring(s)

    > String.prototype.match (=> Array) > Assert string qualities > Regex.prototype.test (=> Boolean) > Stateful search > Regex.prototype.exec (=> Array) 20/81 — @tmikeschu
  20. TASK CREATE A FUNCTION THAT > Takes in a name

    in First Last format 25/81 — @tmikeschu
  21. TASK CREATE A FUNCTION THAT > Takes in a name

    in First Last format > And returns the name in Last, First format 25/81 — @tmikeschu
  22. const albus = "Albus Dumbledore"; function lastFirst(name) { // TODO

    } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 26/81 — @tmikeschu
  23. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 27/81 — @tmikeschu
  24. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus" 28/81 — @tmikeschu
  25. _ someString.replace(/(cats)(dogs)/, (full, group1, group2) => { // do stuff

    with the groups }); _ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ String/replace#Specifying_a_function_as_a_parameter 29/81 — @tmikeschu
  26. > ...middle names too. const albus = "Albus Percival Dumbledore";

    fullName(albus); // => "Dumbledore, Albus Percival" 31/81 — @tmikeschu
  27. APPROACH #1: SPLIT function lastFirst(name) { return name .split(" ")

    .reverse() .join(", "); } console.log(lastFirst(albus)); // => "Dumbledore, Percival, Albus" 32/81 — @tmikeschu
  28. function lastFirst(rawName) { const names = rawName.split(" "); const maxIndex

    = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 34/81 — @tmikeschu
  29. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Percival, Albus Dumbledore" 35/81 — @tmikeschu
  30. function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/; return name.replace(reFirstLast, "$2,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival" 37/81 — @tmikeschu
  31. COMPARISON SPLIT > Calculating indices > Accommodating for zero-based counting

    > Array.prototype methods > String interpolation 39/81 — @tmikeschu
  32. COMPARISON REGEX > Patterns > There is a first bit

    > and a last bit 40/81 — @tmikeschu
  33. COMPARISON REGEX > Patterns > There is a first bit

    > and a last bit > and sometimes extra middle bits in the first bit 40/81 — @tmikeschu
  34. const albus = "Albus Percival Wulfric Brian Dumbledore"; lastFirst(albus); //

    => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  35. > ...middle names too const albus = "Albus Percival Wulfric

    Brian Dumbledore"; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  36. > ...middle names too > ...multiple middle names const albus

    = "Albus Percival Wulfric Brian Dumbledore"; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian" 45/81 — @tmikeschu
  37. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 46/81 — @tmikeschu
  38. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+\s*\w*)\s(\w+)/;

    return name.replace(reFirstLast, "$2, $1"); } console.log(lastFirst(albus)); // => "Wulfric, Albus Percival Brian Dumbledore" 48/81 — @tmikeschu
  39. function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/; return name.replace(reFirstLast, "$3,

    $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival Wulfric Brian" 51/81 — @tmikeschu
  40. const albus = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus);

    // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  41. > ...middle names too const albus = "Albus Percival Wulfric

    Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  42. > ...middle names too > ...multiple middle names const albus

    = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  43. > ...middle names too > ...multiple middle names > ...suffixes

    too const albus = "Albus Percival Wulfric Brian Dumbledore, Jr."; lastFirst(albus); // => "Dumbledore, Albus Percival Wulfric Brian, Jr." 53/81 — @tmikeschu
  44. APPROACH #1: SPLIT function lastFirst(rawName) { const names = rawName.split("

    "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); return `${last}, ${rest.join(" ")}`; } console.log(lastFirst(albus)); // => "Jr., Albus Percival Wulfric Brian Dumbledore," 55/81 — @tmikeschu
  45. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 57/81 — @tmikeschu
  46. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Dumbledore, Albus Percival, Jr." 58/81 — @tmikeschu
  47. > ...middle names too > ...multiple middle names const albus

    = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  48. > ...middle names too > ...multiple middle names > ...suffixes

    too const albus = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  49. > ...middle names too > ...multiple middle names > ...suffixes

    too > ...just first name is okay ¯_(ϑ)_/¯ const albus = "Albus"; lastFirst(albus); // => "Albus" 62/81 — @tmikeschu
  50. APPROACH 1: SPLIT function lastFirst(rawName) { const [name, suffix] =

    rawName.split(", "); const names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } return output; } console.log(lastFirst(albus)); // => "Albus," 63/81 — @tmikeschu
  51. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex); const output = `${last}, ${rest.join(" ")}`; if (suffix) { return `${output}, ${suffix}`; } if (output.endsWith(",")) { return output.slice(0, output.length - 1); } return output; } 65/81 — @tmikeschu
  52. ! const output = `${last}, ${rest.join(" ")}`; if (suffix) {

    return `${output}, ${suffix}`; } + if (output.endsWith(",")) { + return output.slice(0, output.length - 1); + } return output 66/81 — @tmikeschu
  53. function lastFirst(rawName) { const [name, suffix] = rawName.split(", "); const

    names = name.split(" "); const maxIndex = names.length - 1; const last = names[maxIndex]; const rest = names.slice(0, maxIndex).join(" "); let output = last; if (Boolean(rest)) { output += `, ${rest}`; } if (suffix) { output += `, ${suffix}`; } return output; } 67/81 — @tmikeschu
  54. ! - const output = `${last}, ${rest.join(" ")}`; - if

    (suffix) { - return `${output}, ${suffix}`; - } - if (output.lastIndexOf(",") === output.length) { - return output.slice(0, output.length - 1); - } + let output = last; + if (Boolean(rest)) { + output += `, ${rest}`; + } + if (suffix) { + output += `, ${suffix}`; + } return output; 68/81 — @tmikeschu
  55. APPROACH #2: REGEX function lastFirst(name) { const reFirstLast = /(\w+(\s\w+)*)\s(\w+)/;

    return name.replace(reFirstLast, "$3, $1"); } console.log(lastFirst(albus)); // => "Albus" 69/81 — @tmikeschu
  56. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy 75/81 — @tmikeschu
  57. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy > More than one question about your string... 75/81 — @tmikeschu
  58. REVIEW > Designed for analyzing and searching text > (regex

    :: string data :: language) == messy > More than one question about your string... > Capture groups are great for manipulating substrings 75/81 — @tmikeschu
  59. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html >

    Wiki: https://en.wikipedia.org/wiki/Regular_expression 81/81 — @tmikeschu
  60. RESOURCES > Repl-ish tool: https://regexr.com/ > Cheat sheet: ://www.rexegg.com/regex-quickstart.html >

    Wiki: https://en.wikipedia.org/wiki/Regular_expression > Named capture groups: https://github.com/tc39/ proposal-regexp-named-groups 81/81 — @tmikeschu