Slide 1

Slide 1 text

Writing a Language Parser in 15min (or less) Xavier Coulon - Red Hat twitter.com/xcoulon medium.com/xcoulon

Slide 2

Slide 2 text

About me Working at Red Hat for 8+ years Working on OpenShift.io and its successor (we are hiring!) On my free time, coding on a library to convert Asciidoc to HTML

Slide 3

Slide 3 text

Asciidoc Markup Language Similar to Markdown Lot of features Very well suited for documentation

Slide 4

Slide 4 text

Asciidoc Markup Language Example *Bold text* _Italic text_ `Monospace text`

Slide 5

Slide 5 text

Asciidoc Markup Language Example *Bold text* \*\w+[\s+\w+]*\* _Italic text_ _\w+[\s+\w+]*_ `Monospace text` \x60\w+[\s+\w+]*\x60

Slide 6

Slide 6 text

Asciidoc Markup Language Example Some *bold and _italic and `monospace text`_*

Slide 7

Slide 7 text

Photo by Aarón Blanco Tejedor on Unsplash

Slide 8

Slide 8 text

Parsing Expression Grammar

Slide 9

Slide 9 text

PEG to the Rescue Parsing Expression Grammars: - Describe a language, using rules to recognize strings - Use the first matching rule recursively until the end of document is reached, or tries with the next rule

Slide 10

Slide 10 text

Defining the Grammar Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+

Slide 11

Slide 11 text

Defining the Grammar (1/2) Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine BlankLine <- WS* Newline Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText WS <- " " Newline <- "\r\n" / "\r" / "\n" EOF <- !. EOL <- Newline / EOF

Slide 12

Slide 12 text

Defining the Grammar (2/2) BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldTextElement <- ItalicText / MonospaceText / Text ItalicText <- ... MonospaceText <- ... Text <- [a-zA-Z0-9]+

Slide 13

Slide 13 text

Grammar in action Some *bold and _italic and `monospace text`_* Document <- DocumentBlock* EOF DocumentBlock <- Paragraph / BlankLine Paragraph <- ParagraphLine+ ParagraphLine <- (QuotedText / Text / WS)+ EOL QuotedText <- BoldText / ItalicText / MonospaceText BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' ItalicText <- '_' !WS ItalicTextElement (WS+ ItalicTextElement)* '_' MonospaceText <- '`' !WS MonospaceTextElement (WS+ MonospaceTextElement)* '`' Text <- [a-zA-Z0-9]+ WS <- " "

Slide 14

Slide 14 text

Writing a parser in Go

Slide 15

Slide 15 text

Writing Generating a parser in Go

Slide 16

Slide 16 text

github.com/mna/pigeon Writing Generating a parser in Go

Slide 17

Slide 17 text

$ go install github.com/mna/pigeon $ pigeon ./pkg/parser/parser.peg > ./pkg/parser/parser.go Writing Generating a parser in Go

Slide 18

Slide 18 text

Writing Generating a parser in Go BoldText <- '*' !WS BoldTextElement (WS+ BoldTextElement)* '*' BoldText <- '*' !WS elements:(BoldTextElement (WS+ BoldTextElement)*) '*' { return types.NewQuotedText(types.Bold, elements.([]interface{})) } { package parser import (...) }

Slide 19

Slide 19 text

Demo

Slide 20

Slide 20 text

github.com/xcoulon/godays2020

Slide 21

Slide 21 text

Advanced features of mna/pigeon // predicate to skip a rule if a condition is not met // check characters without processing

Slide 22

Slide 22 text

Wanna contribute to a parser? github.com/bytesparadise/libasciidoc

Slide 23

Slide 23 text

Photo by Portuguese Gravity on Unsplash