Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The 1960s elegance behind Go's regexp

The 1960s elegance behind Go's regexp

The presentation is given at FOSS Asia Summit 2017

Jalem Raj Rohit

March 19, 2017
Tweet

More Decks by Jalem Raj Rohit

Other Decks in Programming

Transcript

  1. What this talk is about? About the two approaches to

    regex matching. - One used in almost all standard regex interpreters, like Python, Perl, etc - The other one used in some implementations like, awk, grep, sed, etc And Go, of course
  2. What exactly are Regular Expressions? - It’s a style of

    describing character strings - If a string successfully describes a regex, then it is called a match
  3. Examples: Let’s say e1 matches “s” and e2 matches “t”:

    ➔ Alternation If e1 | e2 ⇒ s or it ➔ Concatenation e1 e2 ⇒ st. ➔ e1* 0 or more s ➔ e1+ 1 or more s
  4. Meet Finite Automata - It’s also known as State Machines

    - ← This one is a Deterministic Finite Automata (or a DFA)
  5. Also, meet NFA - NFA stands for Nondeterministic Finite Automata

    - Example on the left - It has multiple legit choices in state S2 Which one to choose? :( - Also, the machine can’t peek ahead
  6. Converting Regexes to NFAs - This would be the basic

    unit of the NFA - Concatenation be like: - Aaaand alternation
  7. Perl’s algorithm at work - Also, called the backtracking approach

    - Time complexity grows exponentially for pathological regex matches, as the string size grows. - Literally, out of the window
  8. Thompson’s algorithm at work - Guesses both options simultaneously -

    Allows the machine to be in multiple states at the same time - Linear time complexity. Yayyy !!!
  9. Special Shoutout to GopherData - An attempt to bring together

    Go’s and gophers’ efforts in Data Science and Analytics - Github: https://github.com/gopherdata - Twitter: https://twitter.com/GopherDataIO