Demystifying compilers by writing your own

DEMYSTIFYING COMPILERS BY WRITING YOUR OWN Hey, let’s start the
talk about “Demystifying compilers by writing your own”

Bruno Macabeus github.com/macabeus macalogs.com.br Developer at Pagar.me I’m Bruno Macabeus,
I’m a software developer at Pagar.me. Here is my blog, and here is my github, where you can ﬁnd the source code of the compiler that I will show on this talk.

And, I’ll talk about several subjects apparently distinct, like the
online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.

COMPILERS Let’s go to the main topic. About the compilers!

COMPILERS? But, what is a compiler? Whats the deﬁnition?

Deﬁnitions We have many deﬁnitions

Compiler is a software that translates a code from language
A to language B Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?

A to language B A ≠ B ? Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?

Closure Compiler Babel Deﬁnitions We have a very nice compiler
called Babel. It’s a compiler that translate JS code to JS code, around diﬀerent JS versions, or still on same JS version but generating an optimised code to run faster or lighter

A to language B Deﬁnitions Then, it makes sense a compiler that translates a code from language A to A.

A to language B A could be equal to B! Deﬁnitions Then, it makes sense a compiler that translates a code from language A to A.

A to language B Deﬁnitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?

A to language B But does it need to compile between languages? Deﬁnitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?

Closure Compiler Pagedraw Pagedraw Deﬁnitions We have a very nice
compiler called Pagedraw, that is a website to translate a document, like a webpage layout by Sketch, to web languages, like JS, CSS…

Compiler is a software that transforms a data representation to
another data representation that is someway related or equivalent to the first Definitions Then, we have a more generic definition about compilers, that is, a compiler is a software that transforms a data representation to another data representation that is someway related or equivalent to the first.

OKY… BUT WE ALREADY HAVE MANY MANY COMPILERS AND LANGUAGES…
Oky… But we already have many many compilers and languages…

WHY WE NEED TO CRATE ONE MORE? why we need
to create one more compiler?

Reason #1 The language that we use has design problems
We have many reasons. For example: the languages that we use has design problems, delaying our work.

TypeScript is a JS superset that compiles to JS with
the difference that it implements types TypeScript is a JS superset that compiles to JS with the diﬀerence that it implements types. It’s very useful to more complex applications.

TypeScript is a JS superset that compiles to JS with
the differential that it implements types // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? For example, we have a function called “createUser” and this function receives two parameters: id and level. But, we don’t know if the “id' really is a number. We don’t know all the valid levels. And we don’t know if the new user is returned by this function.

// TypeScript enum level { guest = 'guest', normal =
'normal', admin = 'admin', } const createUser = (id: number, level: level) => { // code... } createUser(42, level.guest) // All doubts are answered reading the  // signature! // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? TypeScript is a JS superset that compiles to JS with the differential that it implements types Using TypeScript we don’t have these questions, because the signature of the function has more information. Reading the function signature we ﬁnd that “id" really is a number, we discover all valid levels and we ﬁnd that the function doesn’t return the new user. Maybe it saves on a database.

Reason #2 To study a new concept Another reason is
when we are studying a new concept that the current languages don’t implement.

Koka is a language that splits the values and side
effects. And it infers the side effects at compilation time For example, Koka is a language that the employees from Microsoft Research are developing and studying. This language aims to split the values and side eﬀects, and infer it at compilation time.

// Swift func getEvenNumber() -> Int { // some code...
} getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time For example, we have this code written in Swift, and this function returns an even number. But, we don’t know if this function makes an IO, if this return is deterministic, or maybe this function never ends.

// Koka  function getEvenNumber() : ndet int {  // some
code…  } function main() {  getEvenNumber() // reading the signature we discover that the  // result is a random number!  } Koka is a language that splits the values and side effects. And it infers the side effects at compilation time // Swift func getEvenNumber() -> Int { // some code... } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Using Koka, we don’t have any doubts. We only need to read the function’s signature. Reading this keyword “ndet” we discover that this function is nondeterministic

// Koka  function getEvenNumber() : ndet int {  return randomInt()
* 2  } function main() {  getEvenNumber() // reading the signature we discover that the  // result is a random number!  } // Swift func getEvenNumber() -> Int { return Int(arc4random()) * 2 } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time In this case, both functions use a random function.

Reason #3 When you haven’t had any issue to resolve, 
but you still want to write a cool code Another reason is when you haven’t had any issue to resolve, but you still want to write a cool code

Piet, an esoteric language where the code are written using
pixels! # It includes esoteric languages, such as Piet. In this language you write a code using Pixels!

Piet, an esoteric language where the code are written using
pixels! # Believe me, this code is a sum and subtraction calculator.

Reason #4 To solve a speciﬁc issue Then, another reason
is when you have a very speciﬁc issue to solve

Logo, a language with educational purpose For example, Logo has
a speciﬁc purpose: education.

Logo, a language with educational purpose Then, it has a
very nice visual feedback, where the student writes a code that the "turtle" uses to draw shapes. A speciﬁc purpose language isn’t necessarily a domain speciﬁc language. You can do anything using Logo, despite the language design aims to be useful at education.

EventMacro, a language to write macros to bot for the
game Ragnarok Then, we also have the language that I’m working, because it has a speciﬁc issue to solve: write macros to online game Ragnarok.

Like many of you maybe already know…

… Ragnarok is an online MMORPG game.

Ragnarok … Ragnarok is an online MMORPG game.

But, after a while it get's boring to do the
same thing on the game

Then, whats the solution? To make the computer play for
you, using a software called OpenKore! A bot to Ragnarok.

OpenKore Then, whats the solution? To make the computer play
for you, using a software called OpenKore! A bot to Ragnarok.

EventMacro The OpenKore can make simple actions, but for more
complex actions you need to write some code. And the language to write this code is called EventMacro.

automacro ref { InInventory "Rough Oridecon" > 4 call ref-while
} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s learn more about this language.

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro We can declare two types of instruction blocks: automacro and macro.

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro In automacro block, we write some conditions to call a macro. For example, if we have more than 4 of this item on inventory…

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro … should call the macro ref-while. A macro is a sequence of actions that the bot should do.

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s print a message on console.

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s move the character on a map

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s save on a variable the amount of an item in inventory

} macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s print this amount

EventMacro automacro ref { InInventory "Rough Oridecon" > 4 call
ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } And let’s go to an action when we have more than 4 of those.

' EventMacro ' automacro ref { InInventory "Rough Oridecon" >
4 call ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } Just a curiosity, one important collaborator of this language is Brazilian

EventMacro And this language has some factors that inﬂuenced its
design

EventMacro Very inﬂuenced by Perl Since OpenKore is written in
Perl, EventMacro is very inﬂuenced by Perl

EventMacro my $scalar = 'foo'; my @array = (1, 2,
3); my %hash = (1 => 'foo', 2 => 'bar'); Very inﬂuenced by Perl For example, on Perl and on EventMacro we have three types of variables: - scalar, that begins with "dollar" - array, that begins with "at" - hash, that begins with “percent”

EventMacro my $variable = 'foo'; print "variable value: $variable" Very
inﬂuenced by Perl Another inﬂuence is the implicit interpolation on string.

It runs by a plugin of OpenKore, that reads each
line EventMacro Very inﬂuenced by Perl And, EventMacro runs by a plugin of OpenKore, that reads each line…

EventMacro ⛏ Developing looking forward to facilitate the writing of
a regular expression based interpreter Very inﬂuenced by Perl It runs by a plugin of OpenKore, that reads each line … and, since the plugin tries to match each line on a regular expression, the language grammar was developed looking forward to facilitate the writing of a regular expression based interpreter.

We can build a compiler for EventMacro to translate the
Event Macro code to an OpenKore plugin! MacroCompiler And, take a look at this nice idea! We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin!

Error and warning messages at compile time MacroCompiler We can
build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! With a compiler, we can have an error and warning messages at compile time, and the sooner you ﬁnd an error, the easiest to ﬁx it

MacroCompiler Optimized ﬁnal code Error and warning messages at compile
time We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! And a compiler can generate an optimized code, removing some overhead on the code

MAYBE BUILDING A COMPILER ISN’T THE BEST SOLUTION And, a
disclaimer, building a compiler may not be the best solution for your issue

An eDSL (embedded domain-speciﬁc language) could be a simpler solution
An embedded domain-speciﬁc language may be a simpler solution

It’s a way to structure a public API of a
library as a programming language, that is, it has primitive keywords and can join them to build routines An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines

// JS  const myRegexp = /^age (\d+)/ // css selector
in JS + jQuery  const element = $('#foo .bar'); An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines For example, you can see the CSS selector at jQuery, and regular expression in JS, as a small language inside a bigger language

// Haskell + Functional MetaPost library  beginfig(1)  pair A, B,
C;  A:=(0, 0); B:=(1cm, 0); C:=(0, 1cm);  draw A--B--C--cycle;  endfig; An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines Another example is the Haskell library MetaPost. Seeing this code you could think that is a language to write shapes, but it’s only a library

BUILDING A LANGUAGE X A COMPILER And, another disclaimer. Building
a new language is very diﬀerent than building a compiler

When you are building a design of a language you
need to think more about the communication between the programmer and the computer When you are building a new language you should think more about the communication between the programmer and the computer

Build a compiler is more like any other software developer
challenge When you are building a design of a language you need to think more about the communication between the programmer and the computer And, building a compiler is more like any other software developing challenge

HOW TO BUILD A COMPILER? Oky, but I want to
build a new compiler. How can I build it?

STEPS OF A COMPILATION A compiler has some steps

Parser Source code Semantic analysis Optimization Code generation Syntax analysis
First of all, we have a source code on some language. Then, the next step is “syntax analysis”, to transform the source code to a stream of tokens, and pass it to a “parser” to build a structure called “abstract syntax tree”.

Scannerless Parsing But, hey! The “syntax analysis” and “parser" could be implemented on a single step. It’s more about an implementation detail. The concept about a compiler where this steps isn’t split is called “scannerless parsing”.

Furthermore we have the “semantic analysis”, to check if the “abstract syntax tree” is semantically valid. And, we can pass to a “optimization” step to remove an overhead. Then, ﬁnally we have the “code generation” to build our source code on the target language.

macro sayHi { } $someone = Macabeus log Hi, $someone
Parser Source code Semantic analysis Optimization Code generation Syntax analysis In this example, we have a very simple code on event macro. We have a macro called “sayHi". We assign to a scalar variable called “someone" the constant value “Macabeus”. And, on the next line, we send to console the message “Hi" and value of “someone”.

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis On the next step we could build a stream of tokens.

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis We find the keyword “macro”…

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis We find the identifier “sayHi”…

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis openBraces…

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis and many others tokens

[  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis But, on my compiler I didn’t have this step, because I made a scannerless parsing…

Parser Source code Semantic analysis Optimization Code generation Syntax analysis …where we build the abstract syntax tree using the source code directly.

AST Parser Source code Semantic analysis Optimization Code generation Syntax analysis “AST" is a way to represent the entire source code on a tree, where each node is a simple part of the source code.

Let’s see how to build this AST.

In this moment the compiler checks that we wrote a macro block. Then it adds a node “Macro" that has two attributes: a name “sayHi" and a instruction block, that is an array

Now it’s a scalar assign command. The node has the attribute with the scalar variable name and a scalar value to assign, in this case, a text value

And the text value is a literal “Macabeus"

macro sayHi { } log $someone = Macabeus Hi, $someone
The next macro block code is saved on the next space on array. And the compiler check a keyword “log”, and this node has a Text attribute to log

The text value has a “Hi”…

… and an interpolation with the “someone" variable

Since we have success on building the AST, this code is syntactically correct.

Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next step…

Parser Source code Semantic analysis Optimization Code generation Syntax analysis … semantic analysis, to check if this code is semantically correct. For example, we check if the code try to read a variable never written. If it happens, this code has a semantic error.

Symbol table To do these checks, we need to build a structure called symbol table. A symbol table aims to expose the informations from AST in a more accessible way.

Symbol table macro_write : sayHi To build this structure, we need to parser on each AST node. In this node, we are writing a new macro. Let’s write it on the symbol table: we are writing a macro called “sayHi"

Symbol table macro_write : sayHi variable_write: $someone In this sub-tree we are writing a scalar variable called "someone"

Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone And, in this sub-tree we are reading the variable “someone”

Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone Then, we can check that all variables that were read were written, then, this code is semantically correct!

Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next compilation step…

Parser Source code Semantic analysis Optimization Code generation Syntax analysis … optimization! In this step we want to remove an overhead and to preserve the code semantic.

Dead code strip Constant folding Optimizations I implemented two very simple optimisations: constant folding and dead code strip. In constant folding, we want to propagate the constant value from variables, in order to reduce the variable reading. In dead code strip, we want to remove unnecessary code.

Dead code strip Constant folding Optimizations Let’s start the constant folding. On this line we can note that we are assigning to a variable a constant value…

Dead code strip Constant folding Optimizations … and we are using this value on next line.

macro sayHi { } log Hi, Macabeus Dead code strip
Constant folding $someone = Macabeus Optimizations Since we know this value on compilation time, we can use this value directly. Please, notice that the AST is simpler now

Constant folding $someone = Macabeus Optimizations Let’s start the dead code strip

Constant folding $someone = Macabeus Optimizations In this line we are assigning to a scalar variable, but we never read it

macro sayHi { } log Hi, Dead code strip Constant
folding Macabeus Optimizations Then, we can remove this code. Note that we removed many AST nodes

folding Macabeus Optimizations Since we changed the AST, let’s start again the optimizations. Constant folding can't note anything special…

folding Macabeus Optimizations Dead code strip can't note anything special…

folding Macabeus Optimizations Since the optimizations could not change the AST anymore, we ﬁnish the optimizations. Another way to stop it is “run it for 10 times, then stop”, for example.

macro sayHi { } log Hi, Macabeus Parser Source code
Semantic analysis Optimization Code generation Syntax analysis Let’s go to the last compilation step…

macro sayHi { } log Hi, Macabeus Parser Source code
Semantic analysis Optimization Code generation Syntax analysis … code generation! Since we want to compile this code to run as an OpenKore plugin, we need to compile it to Perl code.

macro sayHi { } log Hi, Macabeus

macro sayHi { } log Hi, Macabeus Body Header Code
generation Footer I have three steps on code generation: - header code - body code - footer code

macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register(
'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer On header I need to add some boilerplate in order to register this plugin on OpenKore.

macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register(
'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer I also need to import some features from OpenKore to my plugin. To do it, I need to parse on each node on AST

macro sayHi { } log Hi, Macabeus package macroCompiled; use
Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer Then, if I ﬁnd a log command node, I know that I need to import “Log message” module

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer Nothing to do in this node…

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer The next step is the body code generation. To do it, I also need to parse each AST node

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { } Body Header Code generation Footer Then, I need to translate the "macro" node. On Perl, the equivalent code is a sub statement.

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message   } Body Header Code generation Footer The equivalent code to “log command” node is “message”

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message   } Body Header Code generation Footer And to TextValue node…

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; } Body Header Code generation Footer … is a literal string with break line.

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; } Body Header Code generation Footer Let’s go to the last step on the code generation, “footer”

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }  1; Body Header Code generation Footer And it’s very simple. Just need add “1 semi-colon” at end of the code. It’s important because on Perl a module needs to ﬁnish with a true value, for example, a positive number.

Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }  1; Body Header Code generation Footer Hey! We ﬁnish the code generation step!

package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of
eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }  1; macro sayHi { } log Hi, Macabeus Parser Source code Semantic analysis Optimization Code generation Syntax analysis And it’s all the steps of my compiler. We saw how to translate a code wrote on EventMacro to an equivalent code on Perl to run at OpenKore

It’s a simpliﬁcation! But, important! It’s a simpliﬁcation!

It’s a simpliﬁcation! An AST could have metadata nodes An
AST could has metadata nodes. I’ll show more about it soon.

It’s a simpliﬁcation! A compiler could have many intermediary  representations
An AST could have metadata nodes A compiler could have many intermediary steps. For example, GHC, a compiler to Haskell language, has many intermediary representations, because Haskell works with a mindset very diﬀerent than the architecture where our computer runs. Then, there are various intermediary representations that change a little bit to make this compiler easier to build.

As well as it could build the final code straightly
It’s a simplification! A compiler could have many intermediary  representations An AST could have metadata nodes And, also a compiler could output the final code directly. It’s the case of LuaC and Wren. The design of Lua was thought to compile the code on a single step, because it’s useful to embed the compiler on a device, because it will be lighter.

DEMONSTRATION Let’s start the demonstration of this compiler

HOW IS THE CODE OF MACRO COMPILER? Nice! But how
is the code of macro compiler?

I'm writing the compiler using Elixir. I chose to use
Elixir because of the hype. And also because I want to learn more about Elixir. Also, this language has two feature very useful to a compiler

Pattern matching Pattern matching, which is useful to identiﬁer what
is this node

Pattern matching Works very well with recursions And Elixir works
very well with recursions. It is important because we need to parse the AST, and it could be implemented with a recursive code

Parser Source code Semantic analysis Optimization Code generation Ok, let’s
see in more details each compilation step

PARSER Parser Source code Semantic analysis Optimization Code generation Let’s
see in more details the Parser!

Grammar To write a parser, we need a grammar to
specify how our language should be written

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL We have
many ways to write a grammar. I’ll talk about four of them

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a
very simple way to write a language grammar. “Regular grammar” is a very simple way.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a
very simple way to write a language grammar. Regexp (regular expression) is an example move prontera 30 42 move 30 42 An example of the regular grammar is the regular expressions. A moment where it is useful is when we need to parse a command in a CLI. We could have a command to move to a position at a map, and the map name is optional.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL /move (?:(\w+)
)?(\d+) (\d+)/ move prontera 30 42 move 30 42 It’s a very simple way to write a language grammar. Regexp (regular expression) is an example We may write this regular expression to match this command. It’s a very simple solution to a very simple issue.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation
is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. Because “regular grammar" is a very simple grammar, it has some limitations. For example, we can’t match an arbitrary sequence of commands, then we can’t specify a nested block code.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL { evil_query(id:
42) { complex_field {  complex_field { field } } } } A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. For example, in GraphQL we can have a ﬁeld inside of another ﬁeld

{ evil_query(id: 42) { complex_field {  complex_field { field }
} } } Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. We can’t specify it using only a regular grammar. We need something more powerful

We could embed a language to specify grammars in another
language Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars Then, we could use an eDSL which is a small language inside of a bigger language.  And we could use an eDSL to write grammars!

Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars sequence([  ignore(string("move")), 
ignore(spaces()),    many(letter()),  skip(spaces()),    integer(),  ignore(spaces()),    integer()  ]) We could embed a language to specify grammars in another language We could write a grammar using an eDSL to match the previous command, instead of using a regular expression. Then, we’ll have a grammar like it.

It deﬁnes a symbols set and the respective valid transformations
to each symbol. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Another way is using a context free grammar, where we have a symbols set and the respective transformations to each symbol

It deﬁnes a symbols set and the respective valid transformations
to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL <vowel> :== "a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> An example of CFG is BNF. On the left side we have our symbols and on the right side we have the transformations for each symbol.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL <vowel> :==
"a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> a9 ww It deﬁnes a symbols set and the respective valid transformations to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Then, by this grammar, we can say that “a9” is a valid text and “ww” isn’t a valid text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Suchlike of
CFG, PEG (Parsing Expression Grammar) also deﬁnes a symbols set and the respective valid transformations to each symbol. We also have the parsing expression grammar! And suchlike of CFG, we also deﬁnes a symbols set and the transformations to each symbol.

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL vowel ←
'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Suchlike of CFG, PEG (Parsing Expression Grammar) also deﬁnes a symbols set and the respective valid transformations to each symbol. Where is the same grammar but using PEG.

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main
differences between CFG and PEG are: Oky, CFG and PEG are very similar, but we have two relevantes diﬀerences!

differences between CFG and PEG are:  - notations Firstly, the notations.

differences between CFG and PEG are:  - notations <vowel> :== "a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> vowel ← 'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Is very easy to notice the CFG notation is very diﬀerent that PEG notation. For example, you can use regular expression on PEG.

differences between CFG and PEG are:  - notations - rule interpretation And a relevante diﬀerence is the rule interpretation…

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG Rule
A  Rule B  Rule C CFG Rule B  Rule C  Rule A = PEG Rule A  Rule B  Rule C PEG Rule B  Rule C  Rule A ≠ The main differences between CFG and PEG are:  - notations - rule interpretation …because on CFG the order of rules doesn’t matter, but on PEG the order of rules is very relevante

differences between CFG and PEG are:  - notations - rule interpretation CFG CFG = PEG PEG ≠ “Well.. I can use the rule A or B… What should I use?” Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A Think of a situation where I could use the rule A or B to parse

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG
= PEG PEG ≠ I don’t know! The main differences between CFG and PEG are:  - notations - rule interpretation I don’t know! “Well.. I can use the rule A or B… What should I use?” Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A If I’m using CFG, the parse will crash, because it doesn’t know how to work with ambiguity

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG
= PEG PEG ≠ rule B,  it came before! rule A,  it came before! Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A I don’t know! I don’t know! “Well.. I can use the rule A or B… What should I use?” The main differences between CFG and PEG are:  - notations - rule interpretation But, if I’m using PEG, the parser will say to use the rule that came before, then the order of rule matter and I need to give more attention about it if I’m using PEG

Algorithm Oky, using a grammar we can specify how is
the syntax of a language, but we need an algorithm to run this grammar.

Parser Generators Parser Combinators I’ll talk about two popular algorithms.
The main diﬀerence between them is the interface.

Parser Generators Parser Combinators Description Using parser generator, by a
description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Parser Generators Parser Combinators Description Parser  Generator Using parser generator,
by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Parser Generators Parser Combinators Description Parser  Generator Parser Using parser
generator, by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Parser Combinators Parser Generators And parser combinator has another approach.
We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Parser Combinators Parser Generators Parser A Parser B Parser C
And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Parser D And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Parser D Parser E And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Nice!  But how can I use it in Elixir? ?
Very nice! But, how can I run this algorithm using Elixir?

github.com/bitwalker/combine We have a very nice parser combinator library, called
Combine. In this library we write the grammar using an eDSL.

github.com/bitwalker/combine Combine, a parser combinators library  Description by an eDSL
We have a very nice parser combinator library, called Combine. In this library we write the grammar using an eDSL.

github.com/bitwalker/combine Approach scannerless parsing Combine, a parser combinators library  Description
by an eDSL And, it gives an approach scannerless parsing, which the compiler will read the source code to build the AST directly, without a tokenizer step

And, as well you can see, this library is a
tool box with many many very simple parsers. And, yeah, you need to remember many of this very simple parser to join it to build a complex parser to your language.

Practical Example #1  Parsing an Event Macro command Okay, let’s
see a practical example: writing a parser for Event Macro command

&push( ) @ori text , Where is the "push" command.
Semantically, it is used to add a text to an array. And syntactically, it has…

Keyword &push( ) @ori text , A keyword “push”…

Array variable name &push( ) @ori text , An array
variable name…

Comma &push( ) @ori text , And after a comma…

Text value &push( ) @ori text , A text value

Keyword Array variable name Text value Comma &push( ) @ori
text , We should parse this command

Now we have here a very complex parser of PushCommand.
There is a sequence of a string “push open parentheses”, an array variable name, spaces, a comma, spaces, a text value, and ﬁnally a close parentheses. This is a very complex parse, right?

But this very complex parser uses a simpler parser to
describe the array variable name, which is the sequence of an “at" and an identiﬁer.

And the identiﬁer is a simpler parser, which is a
sequence of characters ending in space, or new line, or comma…

And PushCommand also uses the TextValue parse. This parser is
similar to the Identiﬁer, but we can interpolate with the variables.

Then the ideia really is the same as I said.
We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Push  Command Then the ideia really is the same as
I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Push  Command Array  Variable Then the ideia really is the
same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Push  Command Identiﬁer Array  Variable Then the ideia really is
the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Push  Command TextValue Identiﬁer Array  Variable Then the ideia really
is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Push  Command TextValue Identiﬁer Array  Variable Scalar  Variable Hash  Variable
Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

After parsing the command, we need to map it! After
running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.

Practical Example #2  Parsing a code block Okay, let’s go
to parse a very diﬀerent thing: a code block!

ref-while # comments log message do c hi if (1)
macro { } In this language we have macros…

macro { } ref-while # comments log message do c
hi if (1) Code block …that has a code block.

if (1) { # comments log message do c hi
if (1) } It also has an “if”…

# comments log message do c hi if (1) Code
block if (1) { } …and it also has a code block! Okay, they are similar, right?

Then, the parsers are also similar.

Both uses the MacroBlock parser

And it maps the code block on the structure using
the same key, to work on same way on next compilations steps.

And the MacroBlock is very nice.

It is a set of parsers, and it tries to
parse using DoCommand, if it fails, it tries using LogCommand, if it also fails, it tries using CallCommand…

…if all parsers fail, then a syntax errors raises. That’s
it.

Parser Source code Semantic analysis Optimization Code generation Let’s go
to the next compilation step…

SEMANTIC ANALYSIS Parser Source code Semantic analysis Optimization Code generation
… semantic analysis!

The symbols table is important to expose the informations from
AST by a more accessible way to next compilation steps Like I said, the symbols table is important to expose the informations from AST by a more accessible way to next compilation steps.

Then, I wrote a function called “build symbols table” that
receives the AST to parse each node

I recursively call the function “symbols table”

Then, if I ﬁnd the “macro" node…

…I’ll write at the symbols table that I have a
macro with this name and this code block

If I ﬁnd a list of nodes, like a code
block…

… I’ll map each command node from this block

If I ﬁnd the “call” node…

…I’ll write at symbols table that I read a macro
with this name

If I ﬁnd the “push” node…

…I’ll write at symbols table that I write at some
variable and I read some others variables. This is the idea of how I build my symbols table.

But, oky. I built a very complex structure. I need
a way to read it! Then, I have a module called “Symbols Table”, that has some helper functions.

Functions like “list the macros written and read”

List the variables written and read… And others functions

To read the symbols table I used a lot of
the Access library. It is useful to write routines to access a complex structure

Ok, now I need to use my symbols table on
some validations.

For example, let’s to check my macros, because I can’t
try to read a macro never written.

Then I have a module to validate this rule.

This gets the symbols table

read macros: [ "foo", "bar" ] On next step, from
the symbols table, it lists the macros that we read

read macros: [ "foo", "bar" ] write macros: [ "foo"
] It lists the macro that we written

] difference: [ "bar" ] And it checks the diﬀerences… We can note that we are trying to read a macro that we never written!

] difference: [ "bar" ] We need to raise an error  because we are  reading "bar" ! It’s bad! Then, the validate raises an error.

Parser Source code Semantic analysis Optimization Code generation Oky, I
spoke about the parser and semantic analysis, and both steps raises errors

Parser Source code Semantic analysis Optimization Code generation ERROR How
I’m handle the errors?

To show the syntax error, I need to raise an
exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened

Syntax error Then, I can have a very nice error
message, like it. For example, the programmer forgot to write a close parentheses

Semantic error Another error that the compiler raises is a
semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.

log foo Semantic error Another error that the compiler raises
is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.

log foo Semantic error Then, instead of give a very
simple AST like it…

log foo Semantic error … we can have a more
complete and useful AST!

macro example { $never_read_var = &rand(1, 4) # warning log
number: $never_written_var # fatal error } Semantic error Then I can to show exactly where a semantic error happened. The compiler has two types of semantic error: a warning and a fatal error. A warning is something weird, but not wrong, for example, write at a variable but never read it. Also I have a fatal error, where something is wrong, for example, try to read a variable never written.

Parser Source code Semantic analysis Optimization Code generation Let’s go
to the next compilation step…

OPTIMIZATION Parser Source code Semantic analysis Optimization Code generation …
optimization! Now we want to remove an overhead from the code.

Constant folding One of the optimizations that I made on
the compiler…

Constant folding …was the constant folding. This optimization wants to
propagate the values known at compilation time.

For example, this code will be compiled like this other
code. You can note that we have less variables read.

macro setVars { $foo = value $bar = &rand(1, 4)
} macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } For example, this code will be compiled like this other code. You can note that we have less variables read.

} macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.

} macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: value log bar: $bar log name: macabeus $name = pagarme log name: pagarme } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo $bar $name Then, at the symbols table I built a context about the variables value at the end of each macro

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $bar $name value $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo: value  $bar: is nondeterministic $name: pagarme $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

} macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo: value  $bar: is nondeterministic $name: pagarme $bar value macabeus pagarme Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Parser Source code Semantic analysis Optimization Code generation Nice! Let’s
go to the last step!

CODE GENERATION Parser Source code Semantic analysis Optimization Code generation
Code generation! Since I need to compile from Event Macro to a plugin to run at OpenKore, and OpenKore is written in Perl, let’s compile it to Perl.

header body footer I split the code generation in three
steps

header body footer Code generation to the header, to the
body and to the footer.

header body footer [  "push",  "@values",  ",",  [ "f", "o",
"o" ], ";" ] Please, notice that this step is just many array concatenations, where each part of this array is a Perl code. I will generate an array with depth N…

header body footer [  "push",  "@values",  ",",  [ "f", "o",
"o" ], ";" ] [  "push",  "@values",  ",", "f", "o", "o", ";" ] … then I use a ﬂatten to make this to depth 1…

header body footer push @values, "foo"; … and I use
a join to get a string with the end code

header body footer Let’s talk about the header.

header body footer make the boilerplate One thing that I
need to do is make the boilerplate to register this plugin on OpenKore. It’s just adding some strings to the array.

header body footer find what variables are written to declare
it  find what modules from OpenKore we need to import Another work is to find which variables are written to declare it in global context at Plugin. It’s important because in Event Macro variables always are global, while in Perl variables have (scoupe) scope and need to be declared before in use. And another similar work is to find what modules from OpenKore we need to import

it  ﬁnd what modules from OpenKore we need to import To do this, we just need to parse at each AST node

it  find what modules from OpenKore we need to import If we find a LogCommand node, we know that we need to import the module "Log message” from OpenKore. If we find an ArrayVariable node, we know that we need to declare an array variable with this name.

it  ﬁnd what modules from OpenKore we need to import Then, add this result at the array.

header body footer ✏ ﬁnd what macros are written to
do it callable Also, we need to ﬁnd which macros are written to do it callable by the CLI. To do it, I use my symbols table, using the function “list written macros”.

header body footer make the boilerplate  find what variables are
written to declare it  find what modules from OpenKore we need to import  ✏ find what macros are written to do it callable On header, I make these four things.

header body footer The next step is generating the body
code

header body footer &push(@values, foo) push @values, "foo"; In some
way I need to translate this Event Macro code to this Perl code

header body footer &push(@values, foo) ... Like I said, we
have some parsers…

header body footer &push(@values, foo) ... … that generate an
AST. Using this push command, we will get this AST

header body footer &push(@values, foo) ... Let’s parse at each
AST node. At Push Command node…

header body footer &push(@values, foo) [ "push", ..., ",", ...,
";" ] … we will start to generate the end code at array. We have “push”, something, comma, something and semicolon. Let’s call recursively other “generate" functions

header body footer &push(@values, foo) [ "push", "@values", ",", ...,
";" ] At this sub-tree, we will generate the array variable reference. In this case, the code will be “at-values”.

header body footer &push(@values, foo) [ "push", "@values", ",", [
"f", "o", "o" ], ";" ] And the code generator to the TextValue node is a little more complex because we need to handle the variable interpolations, but, in this case, the array will be it: “f”, “o”, “o".

header body footer push @values, "foo"; [ "push", "@values", ",",
[ "f", "o", "o" ], ";" ] Then, we just need to use the ﬂatten and the join, then we have the equivalent code in Perl

header body footer &rand(1, 4) ("1" + int(rand(1 + "4"
- "1"))) The push command is very similar between Event Macro and Perl, but we have other commands that are diﬀerent from these languages. For example, the “rand” command, that it generates a very diﬀerent code in Perl

body footer header Finally, the last code generation step is
the “footer”. All Perl modules should end with a true value, to say that this module was imported with success. Then, we just add “1 semicolon” at the array.

Parser Source code Semantic analysis Optimization Code generation Yeah! We
ﬁnished all compilations steps!

FINALLY… And to ﬁnish this little talk…

Thank to Pedro Castilho because he helped me a lot
to build this compiler and to make this talk I really want to thank Pedro Castilho because he helped me a lot to code the compiler and to make this talk

Images source - https://darkchiichan.deviantart.com/ - https://www.newgrounds.com/art/view/shidoisnthere/tree-pixel-art-2 Where to learn more
- DSL & eDSL http://bit.ly/quora-edsl - Syntax analysis http://esprima.org/ - About languages design x compiler - http://bit.ly/quora-language-x-compiler - http://bit.ly/quora-language-x-compiler-2 There are some image sources that I used in this talk, and some links to learn more about subjects that I talked about

Where to learn more - Well didactic material about how
parser combinators works http://theorangeduck.com/page/you-could-have- invented-parser-combinators - Very good material about parsers in general https:// tomassetti.me/guide-parsing-algorithms-terminology/ - Ruby library about parser generator using PEG, Parslet And more links. I highly recommend this second link, a manual about parsers, because the author explains very well about this subject

Pagar.me Talks about this same subject in Portuguese '  https://youtu.be/t77ThZNCJGY
  https://youtu.be/q9T6Y2ZjE54 I presented the same subject of this talk at the youtube channel of the company that I work for, Pagarme Talks. These talks are in Portuguese and have a little more topics, like tests and recursive grammar.

http://bit.ly/asciinema-compiler I recorded an ascii cinema where I add a
new command in the compiler, and I explain each step.

github.com/macabeus/macro-compiler Finally, you could see at GitHub the source code
of the compiler that I showed here, Macro Compiler

THANK YOU!  OBRIGADO! Thank you! I hope that this little
talk about languages and compilers has helped you to increase your curiosity about these subjects.

Demystifying compilers by writing your own

Demystifying compilers by writing your own

More Decks by Bruno Macabeus

Other Decks in Programming

Featured

Transcript