What this talk is about? About the two approaches to regex matching. - One used in almost all standard regex interpreters, like Python, Perl, etc - The other one used in some implementations like, awk, grep, sed, etc And Go, of course
What exactly are Regular Expressions? - It’s a style of describing character strings - If a string successfully describes a regex, then it is called a match
Examples: Let’s say e1 matches “s” and e2 matches “t”: ➔ Alternation If e1 | e2 ⇒ s or it ➔ Concatenation e1 e2 ⇒ st. ➔ e1* 0 or more s ➔ e1+ 1 or more s
Also, meet NFA - NFA stands for Nondeterministic Finite Automata - Example on the left - It has multiple legit choices in state S2 Which one to choose? :( - Also, the machine can’t peek ahead
Perl’s algorithm at work - Also, called the backtracking approach - Time complexity grows exponentially for pathological regex matches, as the string size grows. - Literally, out of the window
Thompson’s algorithm at work - Guesses both options simultaneously - Allows the machine to be in multiple states at the same time - Linear time complexity. Yayyy !!!
Special Shoutout to GopherData - An attempt to bring together Go’s and gophers’ efforts in Data Science and Analytics - Github: https://github.com/gopherdata - Twitter: https://twitter.com/GopherDataIO