The 1960s elegance behind Go's regexp

The 1960s elegance behind Go’s regexp 19 MarchFOSS Asia ‘17
Jalem Raj Rohit

What this talk is about? About the two approaches to
regex matching. - One used in almost all standard regex interpreters, like Python, Perl, etc - The other one used in some implementations like, awk, grep, sed, etc And Go, of course

What exactly are Regular Expressions? - It’s a style of
describing character strings - If a string successfully describes a regex, then it is called a match

Examples: Let’s say e1 matches “s” and e2 matches “t”:
➔ Alternation If e1 | e2 ⇒ s or it ➔ Concatenation e1 e2 ⇒ st. ➔ e1* 0 or more s ➔ e1+ 1 or more s

Perl vs Golang time comparison for matching a?a?a?aaa with respect
to the string length

Aaaand welcome to: The world of super awesome Computer Science
! and Super awesome algorithms !

Meet Finite Automata - It’s also known as State Machines
- ← This one is a Deterministic Finite Automata (or a DFA)

Also, meet NFA - NFA stands for Nondeterministic Finite Automata
- Example on the left - It has multiple legit choices in state S2 Which one to choose? :( - Also, the machine can’t peek ahead

Converting Regexes to NFAs - This would be the basic
unit of the NFA - Concatenation be like: - Aaaand alternation

Perl’s algorithm at work - Also, called the backtracking approach
- Time complexity grows exponentially for pathological regex matches, as the string size grows. - Literally, out of the window

Can we make this better?

Thompson’s algorithm at work - Guesses both options simultaneously -
Allows the machine to be in multiple states at the same time - Linear time complexity. Yayyy !!!

Again, comparison of the algorithms

Again ….

Special Shoutout to GopherData - An attempt to bring together
Go’s and gophers’ efforts in Data Science and Analytics - Github: https://github.com/gopherdata - Twitter: https://twitter.com/GopherDataIO

THANK YOU - Github: Dawny33 - Twitter: @data__wizard (<-- 2
_’s there) - Facebook: rajrohit.33

The 1960s elegance behind Go's regexp

The 1960s elegance behind Go's regexp

Jalem Raj Rohit

More Decks by Jalem Raj Rohit

Other Decks in Programming

Featured

Transcript