Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

Writing a Language Parser in 15min (or less) Xavier Coulon
- Red Hat twitter.com/xcoulon medium.com/xcoulon

About me Working at Red Hat for 8+ years Working
on OpenShift.io and its successor (we are hiring!) On my free time, coding on a library to convert Asciidoc to HTML

Asciidoc Markup Language Similar to Markdown Lot of features Very
well suited for documentation

Asciidoc Markup Language Example *Bold text* _Italic text_ `Monospace text`

Asciidoc Markup Language Example *Bold text* \*\w+[\s+\w+]*\* _Italic text_ _\w+[\s+\w+]*_
`Monospace text` \x60\w+[\s+\w+]*\x60

Asciidoc Markup Language Example Some *bold and _italic and `monospace
text`_*

Photo by Aarón Blanco Tejedor on Unsplash

Parsing Expression Grammar

PEG to the Rescue Parsing Expression Grammars: - Describe a
language, using rules to recognize strings - Use the ﬁrst matching rule recursively until the end of document is reached, or tries with the next rule

Deﬁning the Grammar Document <- DocumentBlock* EOF DocumentBlock <- Paragraph
/ BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+

Deﬁning the Grammar (1/2) Document <- DocumentBlock* EOF DocumentBlock <-
Paragraph / BlankLine BlankLine <- WS* Newline Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText WS <- " " Newline <- "\r\n" / "\r" / "\n" EOF <- !. EOL <- Newline / EOF

Deﬁning the Grammar (2/2) BoldText <- '*' !WS BoldTextElement (WS+
BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+

Grammar in action Some *bold and _italic and `monospace text`_*
Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' ItalicText <- '_' !WS ItalicTextElement (WS+ ItalicTextElement)* '_' MonospaceText <- '`' !WS MonospaceTextElement (WS+ MonospaceTextElement)* '`' Text <- [a-zA-Z0-9]+ WS <- " "

Writing a parser in Go

Writing Generating a parser in Go

github.com/mna/pigeon Writing Generating a parser in Go

$ go install github.com/mna/pigeon $ pigeon ./pkg/parser/parser.peg > ./pkg/parser/parser.go Writing
Generating a parser in Go

Writing Generating a parser in Go BoldText <- '*' !WS
BoldTextElement (WS+ BoldTextElement)* '*' BoldText <- '*' !WS elements:(BoldTextElement (WS+ BoldTextElement)*) '*' { return types.NewQuotedText(types.Bold, elements.([]interface{})) } { package parser import (...) }

github.com/xcoulon/godays2020

Advanced features of mna/pigeon // predicate to skip a rule
if a condition is not met // check characters without processing

Wanna contribute to a parser? github.com/bytesparadise/libasciidoc

Photo by Portuguese Gravity on Unsplash

Writing a language parser in 15min (or less) - ...

Writing a language parser in 15min (or less) - Xavier Coulon - Red Hat

GoDays

More Decks by GoDays

Other Decks in Technology

Featured

Transcript