$30 off During Our Annual Pro Sale. View Details »

The 1960s elegance behind Go's regexp

The 1960s elegance behind Go's regexp

The presentation is given at FOSS Asia Summit 2017

Jalem Raj Rohit

March 19, 2017
Tweet

More Decks by Jalem Raj Rohit

Other Decks in Programming

Transcript

  1. The 1960s elegance
    behind Go’s regexp
    19 MarchFOSS Asia ‘17
    Jalem Raj Rohit

    View Slide

  2. What this talk is
    about?
    About the two approaches to regex matching.
    - One used in almost all standard regex
    interpreters, like Python, Perl, etc
    - The other one used in some implementations
    like, awk, grep, sed, etc
    And Go, of course

    View Slide

  3. What exactly are Regular Expressions?
    - It’s a style of describing character strings
    - If a string successfully describes a regex, then it is
    called a match

    View Slide

  4. Examples:
    Let’s say e1 matches “s” and e2 matches “t”:
    ➔ Alternation
    If e1 | e2 ⇒ s or it
    ➔ Concatenation
    e1 e2 ⇒ st.
    ➔ e1*
    0 or more s
    ➔ e1+
    1 or more s

    View Slide

  5. Perl vs Golang time comparison for matching a?a?a?aaa with
    respect to the string length

    View Slide

  6. Aaaand welcome to:
    The world of super awesome
    Computer Science !
    and
    Super awesome algorithms !

    View Slide

  7. Meet Finite Automata
    - It’s also known as State Machines
    - ← This one is a Deterministic
    Finite Automata (or a DFA)

    View Slide

  8. Also, meet NFA
    - NFA stands for Nondeterministic
    Finite Automata
    - Example on the left
    - It has multiple legit choices in
    state S2
    Which one to choose? :(
    - Also, the machine can’t peek
    ahead

    View Slide

  9. Converting Regexes to
    NFAs
    - This would be the basic unit of the
    NFA
    - Concatenation be like:
    - Aaaand alternation

    View Slide

  10. Perl’s algorithm at
    work
    - Also, called the backtracking
    approach
    - Time complexity grows
    exponentially for pathological
    regex matches, as the string size
    grows.
    - Literally, out of the window

    View Slide

  11. Can we make this better?

    View Slide

  12. Thompson’s algorithm
    at work
    - Guesses both options
    simultaneously
    - Allows the machine to be in
    multiple states at the same time
    - Linear time complexity. Yayyy !!!

    View Slide

  13. Again, comparison of the algorithms

    View Slide

  14. Again ….

    View Slide

  15. Special Shoutout to
    GopherData
    - An attempt to bring together Go’s and gophers’
    efforts in Data Science and Analytics
    - Github: https://github.com/gopherdata
    - Twitter: https://twitter.com/GopherDataIO

    View Slide

  16. THANK YOU
    - Github: Dawny33
    - Twitter: @data__wizard (<-- 2 _’s there)
    - Facebook: rajrohit.33

    View Slide