Demystifying compilers by writing your own

Slide 1

Slide 1 text

DEMYSTIFYING COMPILERS BY WRITING YOUR OWN Hey, let’s start the talk about “Demystifying compilers by writing your own”

Slide 2

Slide 2 text

Bruno Macabeus github.com/macabeus macalogs.com.br Developer at Pagar.me I’m Bruno Macabeus, I’m a software developer at Pagar.me. Here is my blog, and here is my github, where you can ﬁnd the source code of the compiler that I will show on this talk.

Slide 3

Slide 3 text

And, I’ll talk about several subjects apparently distinct, like the online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

COMPILERS Let’s go to the main topic. About the compilers!

Slide 8

Slide 8 text

COMPILERS? But, what is a compiler? Whats the deﬁnition?

Slide 9

Slide 9 text

Deﬁnitions We have many deﬁnitions

Slide 10

Slide 10 text

Compiler is a software that translates a code from language A to language B Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?

Slide 11

Slide 11 text

Compiler is a software that translates a code from language A to language B A ≠ B ? Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?

Slide 12

Slide 12 text

Closure Compiler Babel Deﬁnitions We have a very nice compiler called Babel. It’s a compiler that translate JS code to JS code, around diﬀerent JS versions, or still on same JS version but generating an optimised code to run faster or lighter

Slide 13

Slide 13 text

Compiler is a software that translates a code from language A to language B Deﬁnitions Then, it makes sense a compiler that translates a code from language A to A.

Slide 14

Slide 14 text

Compiler is a software that translates a code from language A to language B A could be equal to B! Deﬁnitions Then, it makes sense a compiler that translates a code from language A to A.

Slide 15

Slide 15 text

Compiler is a software that translates a code from language A to language B Deﬁnitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?

Slide 16

Slide 16 text

Compiler is a software that translates a code from language A to language B But does it need to compile between languages? Deﬁnitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?

Slide 17

Slide 17 text

Closure Compiler Pagedraw Pagedraw Deﬁnitions We have a very nice compiler called Pagedraw, that is a website to translate a document, like a webpage layout by Sketch, to web languages, like JS, CSS…

Slide 18

Slide 18 text

Compiler is a software that transforms a data representation to another data representation that is someway related or equivalent to the first Definitions Then, we have a more generic definition about compilers, that is, a compiler is a software that transforms a data representation to another data representation that is someway related or equivalent to the first.

Slide 19

Slide 19 text

OKY… BUT WE ALREADY HAVE MANY MANY COMPILERS AND LANGUAGES… Oky… But we already have many many compilers and languages…

Slide 20

Slide 20 text

WHY WE NEED TO CRATE ONE MORE? why we need to create one more compiler?

Slide 21

Slide 21 text

Reason #1 The language that we use has design problems We have many reasons. For example: the languages that we use has design problems, delaying our work.

Slide 22

Slide 22 text

TypeScript is a JS superset that compiles to JS with the difference that it implements types TypeScript is a JS superset that compiles to JS with the diﬀerence that it implements types. It’s very useful to more complex applications.

Slide 23

Slide 23 text

TypeScript is a JS superset that compiles to JS with the differential that it implements types // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? For example, we have a function called “createUser” and this function receives two parameters: id and level. But, we don’t know if the “id' really is a number. We don’t know all the valid levels. And we don’t know if the new user is returned by this function.

Slide 24

Slide 24 text

// TypeScript enum level { guest = 'guest', normal = 'normal', admin = 'admin', } const createUser = (id: number, level: level) => { // code... } createUser(42, level.guest) // All doubts are answered reading the  // signature! // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? TypeScript is a JS superset that compiles to JS with the differential that it implements types Using TypeScript we don’t have these questions, because the signature of the function has more information. Reading the function signature we ﬁnd that “id" really is a number, we discover all valid levels and we ﬁnd that the function doesn’t return the new user. Maybe it saves on a database.

Slide 25

Slide 25 text

Reason #2 To study a new concept Another reason is when we are studying a new concept that the current languages don’t implement.

Slide 26

Slide 26 text

Koka is a language that splits the values and side effects. And it infers the side effects at compilation time For example, Koka is a language that the employees from Microsoft Research are developing and studying. This language aims to split the values and side eﬀects, and infer it at compilation time.

Slide 27

Slide 27 text

// Swift func getEvenNumber() -> Int { // some code... } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time For example, we have this code written in Swift, and this function returns an even number. But, we don’t know if this function makes an IO, if this return is deterministic, or maybe this function never ends.

Slide 28

Slide 28 text

// Koka  function getEvenNumber() : ndet int {  // some code…  } function main() {  getEvenNumber() // reading the signature we discover that the  // result is a random number!  } Koka is a language that splits the values and side effects. And it infers the side effects at compilation time // Swift func getEvenNumber() -> Int { // some code... } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Using Koka, we don’t have any doubts. We only need to read the function’s signature. Reading this keyword “ndet” we discover that this function is nondeterministic

Slide 29

Slide 29 text

// Koka  function getEvenNumber() : ndet int {  return randomInt() * 2  } function main() {  getEvenNumber() // reading the signature we discover that the  // result is a random number!  } // Swift func getEvenNumber() -> Int { return Int(arc4random()) * 2 } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time In this case, both functions use a random function.

Slide 30

Slide 30 text

Reason #3 When you haven’t had any issue to resolve,  but you still want to write a cool code Another reason is when you haven’t had any issue to resolve, but you still want to write a cool code

Slide 31

Slide 31 text

Piet, an esoteric language where the code are written using pixels! # It includes esoteric languages, such as Piet. In this language you write a code using Pixels!

Slide 32

Slide 32 text

Piet, an esoteric language where the code are written using pixels! # Believe me, this code is a sum and subtraction calculator.

Slide 33

Slide 33 text

Reason #4 To solve a speciﬁc issue Then, another reason is when you have a very speciﬁc issue to solve

Slide 34

Slide 34 text

Logo, a language with educational purpose For example, Logo has a speciﬁc purpose: education.

Slide 35

Slide 35 text

Logo, a language with educational purpose Then, it has a very nice visual feedback, where the student writes a code that the "turtle" uses to draw shapes. A speciﬁc purpose language isn’t necessarily a domain speciﬁc language. You can do anything using Logo, despite the language design aims to be useful at education.

Slide 36

Slide 36 text

EventMacro, a language to write macros to bot for the game Ragnarok Then, we also have the language that I’m working, because it has a speciﬁc issue to solve: write macros to online game Ragnarok.

Slide 37

Slide 37 text

Like many of you maybe already know…

Slide 38

Slide 38 text

… Ragnarok is an online MMORPG game.

Slide 39

Slide 39 text

Ragnarok … Ragnarok is an online MMORPG game.

Slide 40

Slide 40 text

But, after a while it get's boring to do the same thing on the game

Slide 41

Slide 41 text

Then, whats the solution? To make the computer play for you, using a software called OpenKore! A bot to Ragnarok.

Slide 42

Slide 42 text

OpenKore Then, whats the solution? To make the computer play for you, using a software called OpenKore! A bot to Ragnarok.

Slide 43

Slide 43 text

EventMacro The OpenKore can make simple actions, but for more complex actions you need to write some code. And the language to write this code is called EventMacro.

Slide 44

Slide 44 text

automacro ref { InInventory "Rough Oridecon" > 4 call ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s learn more about this language.

Slide 45

Slide 45 text

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Slide 48

Slide 48 text

Slide 49

Slide 49 text

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Slide 52

Slide 52 text

EventMacro automacro ref { InInventory "Rough Oridecon" > 4 call ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } And let’s go to an action when we have more than 4 of those.

Slide 53

Slide 53 text

' EventMacro ' automacro ref { InInventory "Rough Oridecon" > 4 call ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } Just a curiosity, one important collaborator of this language is Brazilian

Slide 54

Slide 54 text

EventMacro And this language has some factors that inﬂuenced its design

Slide 55

Slide 55 text

EventMacro Very inﬂuenced by Perl Since OpenKore is written in Perl, EventMacro is very inﬂuenced by Perl

Slide 56

Slide 56 text

EventMacro my $scalar = 'foo'; my @array = (1, 2, 3); my %hash = (1 => 'foo', 2 => 'bar'); Very inﬂuenced by Perl For example, on Perl and on EventMacro we have three types of variables: - scalar, that begins with "dollar" - array, that begins with "at" - hash, that begins with “percent”

Slide 57

Slide 57 text

EventMacro my $variable = 'foo'; print "variable value: $variable" Very inﬂuenced by Perl Another inﬂuence is the implicit interpolation on string.

Slide 58

Slide 58 text

It runs by a plugin of OpenKore, that reads each line EventMacro Very inﬂuenced by Perl And, EventMacro runs by a plugin of OpenKore, that reads each line…

Slide 59

Slide 59 text

EventMacro ⛏ Developing looking forward to facilitate the writing of a regular expression based interpreter Very inﬂuenced by Perl It runs by a plugin of OpenKore, that reads each line … and, since the plugin tries to match each line on a regular expression, the language grammar was developed looking forward to facilitate the writing of a regular expression based interpreter.

Slide 60

Slide 60 text

We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! MacroCompiler And, take a look at this nice idea! We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin!

Slide 61

Slide 61 text

Error and warning messages at compile time MacroCompiler We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! With a compiler, we can have an error and warning messages at compile time, and the sooner you ﬁnd an error, the easiest to ﬁx it

Slide 62

Slide 62 text

MacroCompiler Optimized ﬁnal code Error and warning messages at compile time We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! And a compiler can generate an optimized code, removing some overhead on the code

Slide 63

Slide 63 text

MAYBE BUILDING A COMPILER ISN’T THE BEST SOLUTION And, a disclaimer, building a compiler may not be the best solution for your issue

Slide 64

Slide 64 text

An eDSL (embedded domain-speciﬁc language) could be a simpler solution An embedded domain-speciﬁc language may be a simpler solution

Slide 65

Slide 65 text

It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines

Slide 66

Slide 66 text

// JS  const myRegexp = /^age (\d+)/ // css selector in JS + jQuery  const element = $('#foo .bar'); An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines For example, you can see the CSS selector at jQuery, and regular expression in JS, as a small language inside a bigger language

Slide 67

Slide 67 text

// Haskell + Functional MetaPost library  beginfig(1)  pair A, B, C;  A:=(0, 0); B:=(1cm, 0); C:=(0, 1cm);  draw A--B--C--cycle;  endfig; An eDSL (embedded domain-speciﬁc language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines Another example is the Haskell library MetaPost. Seeing this code you could think that is a language to write shapes, but it’s only a library

Slide 68

Slide 68 text

BUILDING A LANGUAGE X A COMPILER And, another disclaimer. Building a new language is very diﬀerent than building a compiler

Slide 69

Slide 69 text

When you are building a design of a language you need to think more about the communication between the programmer and the computer When you are building a new language you should think more about the communication between the programmer and the computer

Slide 70

Slide 70 text

Build a compiler is more like any other software developer challenge When you are building a design of a language you need to think more about the communication between the programmer and the computer And, building a compiler is more like any other software developing challenge

Slide 71

Slide 71 text

HOW TO BUILD A COMPILER? Oky, but I want to build a new compiler. How can I build it?

Slide 72

Slide 72 text

STEPS OF A COMPILATION A compiler has some steps

Slide 73

Slide 73 text

Parser Source code Semantic analysis Optimization Code generation Syntax analysis First of all, we have a source code on some language. Then, the next step is “syntax analysis”, to transform the source code to a stream of tokens, and pass it to a “parser” to build a structure called “abstract syntax tree”.

Slide 74

Slide 74 text

Parser Source code Semantic analysis Optimization Code generation Syntax analysis Scannerless Parsing But, hey! The “syntax analysis” and “parser" could be implemented on a single step. It’s more about an implementation detail. The concept about a compiler where this steps isn’t split is called “scannerless parsing”.

Slide 75

Slide 75 text

Parser Source code Semantic analysis Optimization Code generation Syntax analysis Furthermore we have the “semantic analysis”, to check if the “abstract syntax tree” is semantically valid. And, we can pass to a “optimization” step to remove an overhead. Then, ﬁnally we have the “code generation” to build our source code on the target language.

Slide 76

Slide 76 text

macro sayHi { } $someone = Macabeus log Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis In this example, we have a very simple code on event macro. We have a macro called “sayHi". We assign to a scalar variable called “someone" the constant value “Macabeus”. And, on the next line, we send to console the message “Hi" and value of “someone”.

Slide 77

Slide 77 text

macro sayHi { } $someone = Macabeus log Hi, $someone [  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis On the next step we could build a stream of tokens.

Slide 78

Slide 78 text

Slide 79

Slide 79 text

Slide 80

Slide 80 text

Slide 81

Slide 81 text

Slide 82

Slide 82 text

macro sayHi { } $someone = Macabeus log Hi, $someone [  keyword(macro),  identifier(sayHi),  openBraces, newLine,  scalarIdentifier(someone),  equal,  text(Macabeus), newLine,  keyword(log),  text(Hi, ),  scalarIdentifier(someone),  newLine, closeBraces  ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis But, on my compiler I didn’t have this step, because I made a scannerless parsing…

Slide 83

Slide 83 text

macro sayHi { } $someone = Macabeus log Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis …where we build the abstract syntax tree using the source code directly.

Slide 84

Slide 84 text

macro sayHi { } $someone = Macabeus log Hi, $someone AST Parser Source code Semantic analysis Optimization Code generation Syntax analysis “AST" is a way to represent the entire source code on a tree, where each node is a simple part of the source code.

Slide 85

Slide 85 text

macro sayHi { } $someone = Macabeus log Hi, $someone Let’s see how to build this AST.

Slide 86

Slide 86 text

macro sayHi { } $someone = Macabeus log Hi, $someone In this moment the compiler checks that we wrote a macro block. Then it adds a node “Macro" that has two attributes: a name “sayHi" and a instruction block, that is an array

Slide 87

Slide 87 text

macro sayHi { } $someone = Macabeus log Hi, $someone Now it’s a scalar assign command. The node has the attribute with the scalar variable name and a scalar value to assign, in this case, a text value

Slide 88

Slide 88 text

macro sayHi { } $someone = Macabeus log Hi, $someone And the text value is a literal “Macabeus"

Slide 89

Slide 89 text

macro sayHi { } log $someone = Macabeus Hi, $someone The next macro block code is saved on the next space on array. And the compiler check a keyword “log”, and this node has a Text attribute to log

Slide 90

Slide 90 text

macro sayHi { } log $someone = Macabeus Hi, $someone The text value has a “Hi”…

Slide 91

Slide 91 text

macro sayHi { } log $someone = Macabeus Hi, $someone … and an interpolation with the “someone" variable

Slide 92

Slide 92 text

macro sayHi { } log $someone = Macabeus Hi, $someone Since we have success on building the AST, this code is syntactically correct.

Slide 93

Slide 93 text

macro sayHi { } log $someone = Macabeus Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next step…

Slide 94

Slide 94 text

macro sayHi { } log $someone = Macabeus Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis … semantic analysis, to check if this code is semantically correct. For example, we check if the code try to read a variable never written. If it happens, this code has a semantic error.

Slide 95

Slide 95 text

macro sayHi { } log $someone = Macabeus Hi, $someone

Slide 96

Slide 96 text

macro sayHi { } log $someone = Macabeus Hi, $someone Symbol table To do these checks, we need to build a structure called symbol table. A symbol table aims to expose the informations from AST in a more accessible way.

Slide 97

Slide 97 text

macro sayHi { } log $someone = Macabeus Hi, $someone Symbol table macro_write : sayHi To build this structure, we need to parser on each AST node. In this node, we are writing a new macro. Let’s write it on the symbol table: we are writing a macro called “sayHi"

Slide 98

Slide 98 text

macro sayHi { } log $someone = Macabeus Hi, $someone Symbol table macro_write : sayHi variable_write: $someone In this sub-tree we are writing a scalar variable called "someone"

Slide 99

Slide 99 text

macro sayHi { } log $someone = Macabeus Hi, $someone Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone And, in this sub-tree we are reading the variable “someone”

Slide 100

Slide 100 text

macro sayHi { } log $someone = Macabeus Hi, $someone Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone Then, we can check that all variables that were read were written, then, this code is semantically correct!

Slide 101

Slide 101 text

macro sayHi { } log $someone = Macabeus Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next compilation step…

Slide 102

Slide 102 text

macro sayHi { } log $someone = Macabeus Hi, $someone Parser Source code Semantic analysis Optimization Code generation Syntax analysis … optimization! In this step we want to remove an overhead and to preserve the code semantic.

Slide 103

Slide 103 text

macro sayHi { } log $someone = Macabeus Hi, $someone

Slide 104

Slide 104 text

macro sayHi { } log $someone = Macabeus Hi, $someone Dead code strip Constant folding Optimizations I implemented two very simple optimisations: constant folding and dead code strip. In constant folding, we want to propagate the constant value from variables, in order to reduce the variable reading. In dead code strip, we want to remove unnecessary code.

Slide 105

Slide 105 text

macro sayHi { } log $someone = Macabeus Hi, $someone Dead code strip Constant folding Optimizations Let’s start the constant folding. On this line we can note that we are assigning to a variable a constant value…

Slide 106

Slide 106 text

macro sayHi { } log $someone = Macabeus Hi, $someone Dead code strip Constant folding Optimizations … and we are using this value on next line.

Slide 107

Slide 107 text

macro sayHi { } log Hi, Macabeus Dead code strip Constant folding $someone = Macabeus Optimizations Since we know this value on compilation time, we can use this value directly. Please, notice that the AST is simpler now

Slide 108

Slide 108 text

macro sayHi { } log Hi, Macabeus Dead code strip Constant folding $someone = Macabeus Optimizations Let’s start the dead code strip

Slide 109

Slide 109 text

macro sayHi { } log Hi, Macabeus Dead code strip Constant folding $someone = Macabeus Optimizations In this line we are assigning to a scalar variable, but we never read it

Slide 110

Slide 110 text

macro sayHi { } log Hi, Dead code strip Constant folding Macabeus Optimizations Then, we can remove this code. Note that we removed many AST nodes

Slide 111

Slide 111 text

macro sayHi { } log Hi, Dead code strip Constant folding Macabeus Optimizations Since we changed the AST, let’s start again the optimizations. Constant folding can't note anything special…

Slide 112

Slide 112 text

macro sayHi { } log Hi, Dead code strip Constant folding Macabeus Optimizations Dead code strip can't note anything special…

Slide 113

Slide 113 text

macro sayHi { } log Hi, Dead code strip Constant folding Macabeus Optimizations Since the optimizations could not change the AST anymore, we ﬁnish the optimizations. Another way to stop it is “run it for 10 times, then stop”, for example.

Slide 114

Slide 114 text

macro sayHi { } log Hi, Macabeus Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the last compilation step…

Slide 115

Slide 115 text

macro sayHi { } log Hi, Macabeus Parser Source code Semantic analysis Optimization Code generation Syntax analysis … code generation! Since we want to compile this code to run as an OpenKore plugin, we need to compile it to Perl code.

Slide 116

Slide 116 text

macro sayHi { } log Hi, Macabeus

Slide 117

Slide 117 text

macro sayHi { } log Hi, Macabeus Body Header Code generation Footer I have three steps on code generation: - header code - body code - footer code

Slide 118

Slide 118 text

macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer On header I need to add some boilerplate in order to register this plugin on OpenKore.

Slide 119

Slide 119 text

macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer I also need to import some features from OpenKore to my plugin. To do it, I need to parse on each node on AST

Slide 120

Slide 120 text

Slide 121

Slide 121 text

Slide 122

Slide 122 text

Slide 123

Slide 123 text

Slide 124

Slide 124 text

macro sayHi { } log Hi, Macabeus package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { } Body Header Code generation Footer Then, I need to translate the "macro" node. On Perl, the equivalent code is a sub statement.

Slide 125

Slide 125 text

Slide 126

Slide 126 text

Slide 127

Slide 127 text

Slide 128

Slide 128 text

macro sayHi { } log Hi, Macabeus package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; } Body Header Code generation Footer Let’s go to the last step on the code generation, “footer”

Slide 129

Slide 129 text

macro sayHi { } log Hi, Macabeus package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }  1; Body Header Code generation Footer And it’s very simple. Just need add “1 semi-colon” at end of the code. It’s important because on Perl a module needs to ﬁnish with a true value, for example, a positive number.

Slide 130

Slide 130 text

Slide 131

Slide 131 text

package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }  1; macro sayHi { } log Hi, Macabeus Parser Source code Semantic analysis Optimization Code generation Syntax analysis And it’s all the steps of my compiler. We saw how to translate a code wrote on EventMacro to an equivalent code on Perl to run at OpenKore

Slide 132

Slide 132 text

It’s a simpliﬁcation! But, important! It’s a simpliﬁcation!

Slide 133

Slide 133 text

It’s a simpliﬁcation! An AST could have metadata nodes An AST could has metadata nodes. I’ll show more about it soon.

Slide 134

Slide 134 text

It’s a simpliﬁcation! A compiler could have many intermediary  representations An AST could have metadata nodes A compiler could have many intermediary steps. For example, GHC, a compiler to Haskell language, has many intermediary representations, because Haskell works with a mindset very diﬀerent than the architecture where our computer runs. Then, there are various intermediary representations that change a little bit to make this compiler easier to build.

Slide 135

Slide 135 text

As well as it could build the final code straightly It’s a simplification! A compiler could have many intermediary  representations An AST could have metadata nodes And, also a compiler could output the final code directly. It’s the case of LuaC and Wren. The design of Lua was thought to compile the code on a single step, because it’s useful to embed the compiler on a device, because it will be lighter.

Slide 136

Slide 136 text

DEMONSTRATION Let’s start the demonstration of this compiler

Slide 137

Slide 137 text

HOW IS THE CODE OF MACRO COMPILER? Nice! But how is the code of macro compiler?

Slide 138

Slide 138 text

I'm writing the compiler using Elixir. I chose to use Elixir because of the hype. And also because I want to learn more about Elixir. Also, this language has two feature very useful to a compiler

Slide 139

Slide 139 text

Pattern matching Pattern matching, which is useful to identiﬁer what is this node

Slide 140

Slide 140 text

Pattern matching Works very well with recursions And Elixir works very well with recursions. It is important because we need to parse the AST, and it could be implemented with a recursive code

Slide 141

Slide 141 text

Parser Source code Semantic analysis Optimization Code generation Ok, let’s see in more details each compilation step

Slide 142

Slide 142 text

PARSER Parser Source code Semantic analysis Optimization Code generation Let’s see in more details the Parser!

Slide 143

Slide 143 text

Grammar To write a parser, we need a grammar to specify how our language should be written

Slide 144

Slide 144 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL We have many ways to write a grammar. I’ll talk about four of them

Slide 145

Slide 145 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a very simple way to write a language grammar. “Regular grammar” is a very simple way.

Slide 146

Slide 146 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a very simple way to write a language grammar. Regexp (regular expression) is an example move prontera 30 42 move 30 42 An example of the regular grammar is the regular expressions. A moment where it is useful is when we need to parse a command in a CLI. We could have a command to move to a position at a map, and the map name is optional.

Slide 147

Slide 147 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL /move (?:(\w+) )?(\d+) (\d+)/ move prontera 30 42 move 30 42 It’s a very simple way to write a language grammar. Regexp (regular expression) is an example We may write this regular expression to match this command. It’s a very simple solution to a very simple issue.

Slide 148

Slide 148 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. Because “regular grammar" is a very simple grammar, it has some limitations. For example, we can’t match an arbitrary sequence of commands, then we can’t specify a nested block code.

Slide 149

Slide 149 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL { evil_query(id: 42) { complex_field {  complex_field { field } } } } A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. For example, in GraphQL we can have a ﬁeld inside of another ﬁeld

Slide 150

Slide 150 text

{ evil_query(id: 42) { complex_field {  complex_field { field } } } } Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. We can’t specify it using only a regular grammar. We need something more powerful

Slide 151

Slide 151 text

We could embed a language to specify grammars in another language Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars Then, we could use an eDSL which is a small language inside of a bigger language.  And we could use an eDSL to write grammars!

Slide 152

Slide 152 text

Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars sequence([  ignore(string("move")),  ignore(spaces()),    many(letter()),  skip(spaces()),    integer(),  ignore(spaces()),    integer()  ]) We could embed a language to specify grammars in another language We could write a grammar using an eDSL to match the previous command, instead of using a regular expression. Then, we’ll have a grammar like it.

Slide 153

Slide 153 text

It deﬁnes a symbols set and the respective valid transformations to each symbol. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Another way is using a context free grammar, where we have a symbols set and the respective transformations to each symbol

Slide 154

Slide 154 text

It deﬁnes a symbols set and the respective valid transformations to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL :== "a" | "e" | "i" | "o" | "u" :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" :== | :== | An example of CFG is BNF. On the left side we have our symbols and on the right side we have the transformations for each symbol.

Slide 155

Slide 155 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL :== "a" | "e" | "i" | "o" | "u" :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" :== | :== | a9 ww It deﬁnes a symbols set and the respective valid transformations to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Then, by this grammar, we can say that “a9” is a valid text and “ww” isn’t a valid text

Slide 156

Slide 156 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Suchlike of CFG, PEG (Parsing Expression Grammar) also deﬁnes a symbols set and the respective valid transformations to each symbol. We also have the parsing expression grammar! And suchlike of CFG, we also deﬁnes a symbols set and the transformations to each symbol.

Slide 157

Slide 157 text

Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL vowel ← 'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Suchlike of CFG, PEG (Parsing Expression Grammar) also deﬁnes a symbols set and the respective valid transformations to each symbol. Where is the same grammar but using PEG.

Slide 158

Slide 158 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main differences between CFG and PEG are: Oky, CFG and PEG are very similar, but we have two relevantes diﬀerences!

Slide 159

Slide 159 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main differences between CFG and PEG are:  - notations Firstly, the notations.

Slide 160

Slide 160 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main differences between CFG and PEG are:  - notations :== "a" | "e" | "i" | "o" | "u" :== "0" | "1" | "2" | "3" | "4" |  "5" | "6" | "7" | "8" | "9" :== | :== | vowel ← 'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Is very easy to notice the CFG notation is very diﬀerent that PEG notation. For example, you can use regular expression on PEG.

Slide 161

Slide 161 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main differences between CFG and PEG are:  - notations - rule interpretation And a relevante diﬀerence is the rule interpretation…

Slide 162

Slide 162 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG Rule A  Rule B  Rule C CFG Rule B  Rule C  Rule A = PEG Rule A  Rule B  Rule C PEG Rule B  Rule C  Rule A ≠ The main differences between CFG and PEG are:  - notations - rule interpretation …because on CFG the order of rules doesn’t matter, but on PEG the order of rules is very relevante

Slide 163

Slide 163 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main differences between CFG and PEG are:  - notations - rule interpretation CFG CFG = PEG PEG ≠ “Well.. I can use the rule A or B… What should I use?” Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A Think of a situation where I could use the rule A or B to parse

Slide 164

Slide 164 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG = PEG PEG ≠ I don’t know! The main differences between CFG and PEG are:  - notations - rule interpretation I don’t know! “Well.. I can use the rule A or B… What should I use?” Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A If I’m using CFG, the parse will crash, because it doesn’t know how to work with ambiguity

Slide 165

Slide 165 text

Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG = PEG PEG ≠ rule B,  it came before! rule A,  it came before! Rule A  Rule B  Rule C Rule B  Rule C  Rule A Rule A  Rule B  Rule C Rule B  Rule C  Rule A I don’t know! I don’t know! “Well.. I can use the rule A or B… What should I use?” The main differences between CFG and PEG are:  - notations - rule interpretation But, if I’m using PEG, the parser will say to use the rule that came before, then the order of rule matter and I need to give more attention about it if I’m using PEG

Slide 166

Slide 166 text

Algorithm Oky, using a grammar we can specify how is the syntax of a language, but we need an algorithm to run this grammar.

Slide 167

Slide 167 text

Parser Generators Parser Combinators I’ll talk about two popular algorithms. The main diﬀerence between them is the interface.

Slide 168

Slide 168 text

Parser Generators Parser Combinators Description Using parser generator, by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Slide 169

Slide 169 text

Parser Generators Parser Combinators Description Parser  Generator Using parser generator, by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Slide 170

Slide 170 text

Parser Generators Parser Combinators Description Parser  Generator Parser Using parser generator, by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST

Slide 171

Slide 171 text

Parser Combinators Parser Generators And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Slide 172

Slide 172 text

Parser Combinators Parser Generators Parser A Parser B Parser C And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Slide 173

Slide 173 text

Parser Combinators Parser Generators Parser A Parser B Parser C Parser D And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Slide 174

Slide 174 text

Parser Combinators Parser Generators Parser A Parser B Parser C Parser D Parser E And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.

Slide 175

Slide 175 text

Nice!  But how can I use it in Elixir? ? Very nice! But, how can I run this algorithm using Elixir?

Slide 176

Slide 176 text

github.com/bitwalker/combine We have a very nice parser combinator library, called Combine. In this library we write the grammar using an eDSL.

Slide 177

Slide 177 text

github.com/bitwalker/combine Combine, a parser combinators library  Description by an eDSL We have a very nice parser combinator library, called Combine. In this library we write the grammar using an eDSL.

Slide 178

Slide 178 text

github.com/bitwalker/combine Approach scannerless parsing Combine, a parser combinators library  Description by an eDSL And, it gives an approach scannerless parsing, which the compiler will read the source code to build the AST directly, without a tokenizer step

Slide 179

Slide 179 text

And, as well you can see, this library is a tool box with many many very simple parsers. And, yeah, you need to remember many of this very simple parser to join it to build a complex parser to your language.

Slide 180

Slide 180 text

Slide 181

Slide 181 text

Practical Example #1  Parsing an Event Macro command Okay, let’s see a practical example: writing a parser for Event Macro command

Slide 182

Slide 182 text

&push( ) @ori text , Where is the "push" command. Semantically, it is used to add a text to an array. And syntactically, it has…

Slide 183

Slide 183 text

Keyword &push( ) @ori text , A keyword “push”…

Slide 184

Slide 184 text

Array variable name &push( ) @ori text , An array variable name…

Slide 185

Slide 185 text

Comma &push( ) @ori text , And after a comma…

Slide 186

Slide 186 text

Text value &push( ) @ori text , A text value

Slide 187

Slide 187 text

Keyword Array variable name Text value Comma &push( ) @ori text , We should parse this command

Slide 188

Slide 188 text

Now we have here a very complex parser of PushCommand. There is a sequence of a string “push open parentheses”, an array variable name, spaces, a comma, spaces, a text value, and ﬁnally a close parentheses. This is a very complex parse, right?

Slide 189

Slide 189 text

But this very complex parser uses a simpler parser to describe the array variable name, which is the sequence of an “at" and an identiﬁer.

Slide 190

Slide 190 text

And the identiﬁer is a simpler parser, which is a sequence of characters ending in space, or new line, or comma…

Slide 191

Slide 191 text

And PushCommand also uses the TextValue parse. This parser is similar to the Identiﬁer, but we can interpolate with the variables.

Slide 192

Slide 192 text

Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 193

Slide 193 text

Push  Command Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 194

Slide 194 text

Push  Command Array  Variable Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 195

Slide 195 text

Push  Command Identiﬁer Array  Variable Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 196

Slide 196 text

Push  Command TextValue Identiﬁer Array  Variable Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 197

Slide 197 text

Push  Command TextValue Identiﬁer Array  Variable Scalar  Variable Hash  Variable Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identiﬁer. PushCommand also uses the TextValue, which uses the variables parsers.

Slide 198

Slide 198 text

After parsing the command, we need to map it! After running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.

Slide 199

Slide 199 text

Slide 200

Slide 200 text

Slide 201

Slide 201 text

Slide 202

Slide 202 text

Slide 203

Slide 203 text

Practical Example #2  Parsing a code block Okay, let’s go to parse a very diﬀerent thing: a code block!

Slide 204

Slide 204 text

ref-while # comments log message do c hi if (1) macro { } In this language we have macros…

Slide 205

Slide 205 text

macro { } ref-while # comments log message do c hi if (1) Code block …that has a code block.

Slide 206

Slide 206 text

if (1) { # comments log message do c hi if (1) } It also has an “if”…

Slide 207

Slide 207 text

# comments log message do c hi if (1) Code block if (1) { } …and it also has a code block! Okay, they are similar, right?

Slide 208

Slide 208 text

Then, the parsers are also similar.

Slide 209

Slide 209 text

Both uses the MacroBlock parser

Slide 210

Slide 210 text

And it maps the code block on the structure using the same key, to work on same way on next compilations steps.

Slide 211

Slide 211 text

And the MacroBlock is very nice.

Slide 212

Slide 212 text

It is a set of parsers, and it tries to parse using DoCommand, if it fails, it tries using LogCommand, if it also fails, it tries using CallCommand…

Slide 213

Slide 213 text

…if all parsers fail, then a syntax errors raises. That’s it.

Slide 214

Slide 214 text

Parser Source code Semantic analysis Optimization Code generation Let’s go to the next compilation step…

Slide 215

Slide 215 text

SEMANTIC ANALYSIS Parser Source code Semantic analysis Optimization Code generation … semantic analysis!

Slide 216

Slide 216 text

The symbols table is important to expose the informations from AST by a more accessible way to next compilation steps Like I said, the symbols table is important to expose the informations from AST by a more accessible way to next compilation steps.

Slide 217

Slide 217 text

Then, I wrote a function called “build symbols table” that receives the AST to parse each node

Slide 218

Slide 218 text

I recursively call the function “symbols table”

Slide 219

Slide 219 text

Then, if I ﬁnd the “macro" node…

Slide 220

Slide 220 text

…I’ll write at the symbols table that I have a macro with this name and this code block

Slide 221

Slide 221 text

If I ﬁnd a list of nodes, like a code block…

Slide 222

Slide 222 text

… I’ll map each command node from this block

Slide 223

Slide 223 text

If I ﬁnd the “call” node…

Slide 224

Slide 224 text

…I’ll write at symbols table that I read a macro with this name

Slide 225

Slide 225 text

If I ﬁnd the “push” node…

Slide 226

Slide 226 text

…I’ll write at symbols table that I write at some variable and I read some others variables. This is the idea of how I build my symbols table.

Slide 227

Slide 227 text

But, oky. I built a very complex structure. I need a way to read it! Then, I have a module called “Symbols Table”, that has some helper functions.

Slide 228

Slide 228 text

Functions like “list the macros written and read”

Slide 229

Slide 229 text

List the variables written and read… And others functions

Slide 230

Slide 230 text

To read the symbols table I used a lot of the Access library. It is useful to write routines to access a complex structure

Slide 231

Slide 231 text

Ok, now I need to use my symbols table on some validations.

Slide 232

Slide 232 text

For example, let’s to check my macros, because I can’t try to read a macro never written.

Slide 233

Slide 233 text

Then I have a module to validate this rule.

Slide 234

Slide 234 text

This gets the symbols table

Slide 235

Slide 235 text

read macros: [ "foo", "bar" ] On next step, from the symbols table, it lists the macros that we read

Slide 236

Slide 236 text

read macros: [ "foo", "bar" ] write macros: [ "foo" ] It lists the macro that we written

Slide 237

Slide 237 text

read macros: [ "foo", "bar" ] write macros: [ "foo" ] difference: [ "bar" ] And it checks the diﬀerences… We can note that we are trying to read a macro that we never written!

Slide 238

Slide 238 text

read macros: [ "foo", "bar" ] write macros: [ "foo" ] difference: [ "bar" ] We need to raise an error  because we are  reading "bar" ! It’s bad! Then, the validate raises an error.

Slide 239

Slide 239 text

Parser Source code Semantic analysis Optimization Code generation Oky, I spoke about the parser and semantic analysis, and both steps raises errors

Slide 240

Slide 240 text

Parser Source code Semantic analysis Optimization Code generation ERROR How I’m handle the errors?

Slide 241

Slide 241 text

To show the syntax error, I need to raise an exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened

Slide 242

Slide 242 text

Slide 243

Slide 243 text

Slide 244

Slide 244 text

Slide 245

Slide 245 text

Syntax error Then, I can have a very nice error message, like it. For example, the programmer forgot to write a close parentheses

Slide 246

Slide 246 text

Semantic error Another error that the compiler raises is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.

Slide 247

Slide 247 text

log foo Semantic error Another error that the compiler raises is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.

Slide 248

Slide 248 text

Slide 249

Slide 249 text

Slide 250

Slide 250 text

Slide 251

Slide 251 text

Slide 252

Slide 252 text

Slide 253

Slide 253 text

log foo Semantic error Then, instead of give a very simple AST like it…

Slide 254

Slide 254 text

log foo Semantic error Then, instead of give a very simple AST like it…

Slide 255

Slide 255 text

log foo Semantic error … we can have a more complete and useful AST!

Slide 256

Slide 256 text

macro example { $never_read_var = &rand(1, 4) # warning log number: $never_written_var # fatal error } Semantic error Then I can to show exactly where a semantic error happened. The compiler has two types of semantic error: a warning and a fatal error. A warning is something weird, but not wrong, for example, write at a variable but never read it. Also I have a fatal error, where something is wrong, for example, try to read a variable never written.

Slide 257

Slide 257 text

Parser Source code Semantic analysis Optimization Code generation Let’s go to the next compilation step…

Slide 258

Slide 258 text

OPTIMIZATION Parser Source code Semantic analysis Optimization Code generation … optimization! Now we want to remove an overhead from the code.

Slide 259

Slide 259 text

Constant folding One of the optimizations that I made on the compiler…

Slide 260

Slide 260 text

Constant folding …was the constant folding. This optimization wants to propagate the values known at compilation time.

Slide 261

Slide 261 text

For example, this code will be compiled like this other code. You can note that we have less variables read.

Slide 262

Slide 262 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } For example, this code will be compiled like this other code. You can note that we have less variables read.

Slide 263

Slide 263 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.

Slide 264

Slide 264 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: value log bar: $bar log name: macabeus $name = pagarme log name: pagarme } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.

Slide 265

Slide 265 text

Slide 266

Slide 266 text

Slide 267

Slide 267 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 268

Slide 268 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 269

Slide 269 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 270

Slide 270 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $bar $name value $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 271

Slide 271 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $bar $name value $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 272

Slide 272 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $name: macabeus $foo: value  $bar: is nondeterministic $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 273

Slide 273 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo: value  $bar: is nondeterministic $name: pagarme $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 274

Slide 274 text

macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables  at end of macros Variables context at the arrow setVars  $foo: value  $bar: is nondeterministic  example  $name: pagarme $foo: value  $bar: is nondeterministic $name: pagarme $bar value macabeus pagarme Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.

Slide 275

Slide 275 text

Parser Source code Semantic analysis Optimization Code generation Nice! Let’s go to the last step!

Slide 276

Slide 276 text

CODE GENERATION Parser Source code Semantic analysis Optimization Code generation Code generation! Since I need to compile from Event Macro to a plugin to run at OpenKore, and OpenKore is written in Perl, let’s compile it to Perl.

Slide 277

Slide 277 text

header body footer I split the code generation in three steps

Slide 278

Slide 278 text

header body footer Code generation to the header, to the body and to the footer.

Slide 279

Slide 279 text

header body footer Code generation to the header, to the body and to the footer.

Slide 280

Slide 280 text

header body footer Code generation to the header, to the body and to the footer.

Slide 281

Slide 281 text

header body footer Code generation to the header, to the body and to the footer.

Slide 282

Slide 282 text

header body footer [  "push",  "@values",  ",",  [ "f", "o", "o" ], ";" ] Please, notice that this step is just many array concatenations, where each part of this array is a Perl code. I will generate an array with depth N…

Slide 283

Slide 283 text

header body footer [  "push",  "@values",  ",",  [ "f", "o", "o" ], ";" ] [  "push",  "@values",  ",", "f", "o", "o", ";" ] … then I use a ﬂatten to make this to depth 1…

Slide 284

Slide 284 text

header body footer push @values, "foo"; … and I use a join to get a string with the end code

Slide 285

Slide 285 text

header body footer Let’s talk about the header.

Slide 286

Slide 286 text

header body footer make the boilerplate One thing that I need to do is make the boilerplate to register this plugin on OpenKore. It’s just adding some strings to the array.

Slide 287

Slide 287 text

header body footer find what variables are written to declare it  find what modules from OpenKore we need to import Another work is to find which variables are written to declare it in global context at Plugin. It’s important because in Event Macro variables always are global, while in Perl variables have (scoupe) scope and need to be declared before in use. And another similar work is to find what modules from OpenKore we need to import

Slide 288

Slide 288 text

header body footer ﬁnd what variables are written to declare it  ﬁnd what modules from OpenKore we need to import To do this, we just need to parse at each AST node

Slide 289

Slide 289 text

header body footer find what variables are written to declare it  find what modules from OpenKore we need to import If we find a LogCommand node, we know that we need to import the module "Log message” from OpenKore. If we find an ArrayVariable node, we know that we need to declare an array variable with this name.

Slide 290

Slide 290 text

Slide 291

Slide 291 text

header body footer ﬁnd what variables are written to declare it  ﬁnd what modules from OpenKore we need to import Then, add this result at the array.

Slide 292

Slide 292 text

header body footer ✏ ﬁnd what macros are written to do it callable Also, we need to ﬁnd which macros are written to do it callable by the CLI. To do it, I use my symbols table, using the function “list written macros”.

Slide 293

Slide 293 text

Slide 294

Slide 294 text

header body footer make the boilerplate  find what variables are written to declare it  find what modules from OpenKore we need to import  ✏ find what macros are written to do it callable On header, I make these four things.

Slide 295

Slide 295 text

header body footer The next step is generating the body code

Slide 296

Slide 296 text

header body footer &push(@values, foo) push @values, "foo"; In some way I need to translate this Event Macro code to this Perl code

Slide 297

Slide 297 text

header body footer &push(@values, foo) ... Like I said, we have some parsers…

Slide 298

Slide 298 text

header body footer &push(@values, foo) ... … that generate an AST. Using this push command, we will get this AST

Slide 299

Slide 299 text

header body footer &push(@values, foo) ... Let’s parse at each AST node. At Push Command node…

Slide 300

Slide 300 text

header body footer &push(@values, foo) [ "push", ..., ",", ..., ";" ] … we will start to generate the end code at array. We have “push”, something, comma, something and semicolon. Let’s call recursively other “generate" functions

Slide 301

Slide 301 text

header body footer &push(@values, foo) [ "push", "@values", ",", ..., ";" ] At this sub-tree, we will generate the array variable reference. In this case, the code will be “at-values”.

Slide 302

Slide 302 text

header body footer &push(@values, foo) [ "push", "@values", ",", [ "f", "o", "o" ], ";" ] And the code generator to the TextValue node is a little more complex because we need to handle the variable interpolations, but, in this case, the array will be it: “f”, “o”, “o".

Slide 303

Slide 303 text

header body footer push @values, "foo"; [ "push", "@values", ",", [ "f", "o", "o" ], ";" ] Then, we just need to use the ﬂatten and the join, then we have the equivalent code in Perl

Slide 304

Slide 304 text

header body footer &rand(1, 4) ("1" + int(rand(1 + "4" - "1"))) The push command is very similar between Event Macro and Perl, but we have other commands that are diﬀerent from these languages. For example, the “rand” command, that it generates a very diﬀerent code in Perl

Slide 305

Slide 305 text

body footer header Finally, the last code generation step is the “footer”. All Perl modules should end with a true value, to say that this module was imported with success. Then, we just add “1 semicolon” at the array.

Slide 306

Slide 306 text

Parser Source code Semantic analysis Optimization Code generation Yeah! We ﬁnished all compilations steps!

Slide 307

Slide 307 text

FINALLY… And to ﬁnish this little talk…

Slide 308

Slide 308 text

Thank to Pedro Castilho because he helped me a lot to build this compiler and to make this talk I really want to thank Pedro Castilho because he helped me a lot to code the compiler and to make this talk

Slide 309

Slide 309 text

Images source - https://darkchiichan.deviantart.com/ - https://www.newgrounds.com/art/view/shidoisnthere/tree-pixel-art-2 Where to learn more - DSL & eDSL http://bit.ly/quora-edsl - Syntax analysis http://esprima.org/ - About languages design x compiler - http://bit.ly/quora-language-x-compiler - http://bit.ly/quora-language-x-compiler-2 There are some image sources that I used in this talk, and some links to learn more about subjects that I talked about

Slide 310

Slide 310 text

Where to learn more - Well didactic material about how parser combinators works http://theorangeduck.com/page/you-could-have- invented-parser-combinators - Very good material about parsers in general https:// tomassetti.me/guide-parsing-algorithms-terminology/ - Ruby library about parser generator using PEG, Parslet And more links. I highly recommend this second link, a manual about parsers, because the author explains very well about this subject

Slide 311

Slide 311 text

Pagar.me Talks about this same subject in Portuguese '  https://youtu.be/t77ThZNCJGY   https://youtu.be/q9T6Y2ZjE54 I presented the same subject of this talk at the youtube channel of the company that I work for, Pagarme Talks. These talks are in Portuguese and have a little more topics, like tests and recursive grammar.

Slide 312

Slide 312 text

http://bit.ly/asciinema-compiler I recorded an ascii cinema where I add a new command in the compiler, and I explain each step.

Slide 313

Slide 313 text

github.com/macabeus/macro-compiler Finally, you could see at GitHub the source code of the compiler that I showed here, Macro Compiler

Slide 314

Slide 314 text

THANK YOU!  OBRIGADO! Thank you! I hope that this little talk about languages and compilers has helped you to increase your curiosity about these subjects.