Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

GoDays
January 23, 2020

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

Using regular expressions to process content may be enough in some cases, but as the grammar growths in complexity, they become a nightmare to maintain. This is were parsers based on Parsing Expression Grammars (PEG) come to the rescue.
In this talk, we will see how to build such a parser to handle a small subset of the Asciidoc markup language.

GoDays

January 23, 2020
Tweet

More Decks by GoDays

Other Decks in Technology

Transcript

  1. Writing a Language Parser
    in 15min (or less)
    Xavier Coulon - Red Hat
    twitter.com/xcoulon
    medium.com/xcoulon

    View Slide

  2. About me
    Working at Red Hat for 8+ years
    Working on OpenShift.io and its successor (we are hiring!)
    On my free time, coding on a library to convert Asciidoc to HTML

    View Slide

  3. Asciidoc Markup Language
    Similar to Markdown
    Lot of features
    Very well suited for documentation

    View Slide

  4. Asciidoc Markup Language Example
    *Bold text*
    _Italic text_
    `Monospace text`

    View Slide

  5. Asciidoc Markup Language Example
    *Bold text* \*\w+[\s+\w+]*\*
    _Italic text_ _\w+[\s+\w+]*_
    `Monospace text` \x60\w+[\s+\w+]*\x60

    View Slide

  6. Asciidoc Markup Language Example
    Some *bold and _italic and `monospace text`_*

    View Slide

  7. Photo by Aarón Blanco Tejedor on Unsplash

    View Slide

  8. Parsing Expression Grammar

    View Slide

  9. PEG to the Rescue
    Parsing Expression Grammars:
    - Describe a language, using rules to recognize strings
    - Use the first matching rule recursively until the end of document is reached,
    or tries with the next rule

    View Slide

  10. Defining the Grammar
    Document DocumentBlock Paragraph ParagraphLine QuotedText BoldText BoldTextElement ItalicText MonospaceText Text

    View Slide

  11. Defining the Grammar (1/2)
    Document DocumentBlock BlankLine Paragraph ParagraphLine QuotedText WS Newline EOF EOL

    View Slide

  12. Defining the Grammar (2/2)
    BoldText BoldTextElement ItalicText MonospaceText Text

    View Slide

  13. Grammar in action
    Some *bold and _italic and `monospace text`_*
    Document DocumentBlock Paragraph ParagraphLine QuotedText BoldText ItalicText MonospaceText Text WS

    View Slide

  14. Writing a parser in Go

    View Slide

  15. Writing Generating a parser in Go

    View Slide

  16. github.com/mna/pigeon
    Writing Generating a parser in Go

    View Slide

  17. $ go install github.com/mna/pigeon
    $ pigeon ./pkg/parser/parser.peg > ./pkg/parser/parser.go
    Writing Generating a parser in Go

    View Slide

  18. Writing Generating a parser in Go
    BoldText BoldText return types.NewQuotedText(types.Bold, elements.([]interface{}))
    }
    {
    package parser
    import (...)
    }

    View Slide

  19. Demo

    View Slide

  20. github.com/xcoulon/godays2020

    View Slide

  21. Advanced features of mna/pigeon
    // predicate to skip a rule if a condition is not met
    // check characters without processing

    View Slide

  22. Wanna contribute to a parser?
    github.com/bytesparadise/libasciidoc

    View Slide

  23. Photo by Portuguese Gravity on Unsplash

    View Slide