Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

6e3ea86995d93d35c0fadf2694bca773?s=47 GoDays
January 23, 2020

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

Using regular expressions to process content may be enough in some cases, but as the grammar growths in complexity, they become a nightmare to maintain. This is were parsers based on Parsing Expression Grammars (PEG) come to the rescue.
In this talk, we will see how to build such a parser to handle a small subset of the Asciidoc markup language.

6e3ea86995d93d35c0fadf2694bca773?s=128

GoDays

January 23, 2020
Tweet

Transcript

  1. Writing a Language Parser in 15min (or less) Xavier Coulon

    - Red Hat twitter.com/xcoulon medium.com/xcoulon
  2. About me Working at Red Hat for 8+ years Working

    on OpenShift.io and its successor (we are hiring!) On my free time, coding on a library to convert Asciidoc to HTML
  3. Asciidoc Markup Language Similar to Markdown Lot of features Very

    well suited for documentation
  4. Asciidoc Markup Language Example *Bold text* _Italic text_ `Monospace text`

  5. Asciidoc Markup Language Example *Bold text* \*\w+[\s+\w+]*\* _Italic text_ _\w+[\s+\w+]*_

    `Monospace text` \x60\w+[\s+\w+]*\x60
  6. Asciidoc Markup Language Example Some *bold and _italic and `monospace

    text`_*
  7. Photo by Aarón Blanco Tejedor on Unsplash

  8. Parsing Expression Grammar

  9. PEG to the Rescue Parsing Expression Grammars: - Describe a

    language, using rules to recognize strings - Use the first matching rule recursively until the end of document is reached, or tries with the next rule
  10. Defining the Grammar Document <- DocumentBlock* EOF DocumentBlock <- Paragraph

    / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+
  11. Defining the Grammar (1/2) Document <- DocumentBlock* EOF DocumentBlock <-

    Paragraph / BlankLine BlankLine <- WS* Newline Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText WS <- " " Newline <- "\r\n" / "\r" / "\n" EOF <- !. EOL <- Newline / EOF
  12. Defining the Grammar (2/2) BoldText <- '*' !WS BoldTextElement (WS+

    BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+
  13. Grammar in action Some *bold and _italic and `monospace text`_*

    Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' ItalicText <- '_' !WS ItalicTextElement (WS+ ItalicTextElement)* '_' MonospaceText <- '`' !WS MonospaceTextElement (WS+ MonospaceTextElement)* '`' Text <- [a-zA-Z0-9]+ WS <- " "
  14. Writing a parser in Go

  15. Writing Generating a parser in Go

  16. github.com/mna/pigeon Writing Generating a parser in Go

  17. $ go install github.com/mna/pigeon $ pigeon ./pkg/parser/parser.peg > ./pkg/parser/parser.go Writing

    Generating a parser in Go
  18. Writing Generating a parser in Go BoldText <- '*' !WS

    BoldTextElement (WS+ BoldTextElement)* '*' BoldText <- '*' !WS elements:(BoldTextElement (WS+ BoldTextElement)*) '*' { return types.NewQuotedText(types.Bold, elements.([]interface{})) } { package parser import (...) }
  19. Demo

  20. github.com/xcoulon/godays2020

  21. Advanced features of mna/pigeon // predicate to skip a rule

    if a condition is not met // check characters without processing
  22. Wanna contribute to a parser? github.com/bytesparadise/libasciidoc

  23. Photo by Portuguese Gravity on Unsplash