Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Demystifying compilers by writing your own

Demystifying compilers by writing your own

Learning more about compilers is a great way to demystify what happens from the moment you start to build your code to its output. This demystification is a quite good step in becoming a better developer and expanding our horizons; not only this knowledge is important to understand how tools like Babel, virtual machines, and other stuff from our everyday routine work but it also allows us to see in-depth how code optimization, reverse engineering and other obscure techniques are done.

In this talk, I'll use as a case study a compiler that I'm currently working on to show you how to build one from scratch using Elixir–explaining each phase of a compilation process and how Elixir can help us on this challenge–and, in the end of the day, I'll have obtained your Compilers 101 degree.

Bruno Macabeus

September 22, 2018
Tweet

More Decks by Bruno Macabeus

Other Decks in Programming

Transcript

  1. DEMYSTIFYING COMPILERS BY WRITING YOUR OWN Hey, let’s start the

    talk about “Demystifying compilers by writing your own”
  2. Bruno Macabeus github.com/macabeus macalogs.com.br Developer at Pagar.me I’m Bruno Macabeus,

    I’m a software developer at Pagar.me. Here is my blog, and here is my github, where you can find the source code of the compiler that I will show on this talk.
  3. And, I’ll talk about several subjects apparently distinct, like the

    online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.
  4. And, I’ll talk about several subjects apparently distinct, like the

    online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.
  5. And, I’ll talk about several subjects apparently distinct, like the

    online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.
  6. And, I’ll talk about several subjects apparently distinct, like the

    online game Ragnarok, programming languages and even trees. At the end of the talk all these subjects will be connected and make sense.
  7. Compiler is a software that translates a code from language

    A to language B Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?
  8. Compiler is a software that translates a code from language

    A to language B A ≠ B ? Definitions A very simple definition is that a compiler is a software that translates a code from language A to language B. But it raises a question: should A be different than B, or does it make sense to a compiler translate a code from language A to language A?
  9. Closure Compiler Babel Definitions We have a very nice compiler

    called Babel. It’s a compiler that translate JS code to JS code, around different JS versions, or still on same JS version but generating an optimised code to run faster or lighter
  10. Compiler is a software that translates a code from language

    A to language B Definitions Then, it makes sense a compiler that translates a code from language A to A.
  11. Compiler is a software that translates a code from language

    A to language B A could be equal to B! Definitions Then, it makes sense a compiler that translates a code from language A to A.
  12. Compiler is a software that translates a code from language

    A to language B Definitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?
  13. Compiler is a software that translates a code from language

    A to language B But does it need to compile between languages? Definitions But… a compiler needs to compile from a language as input? Or does it make sense a compiler that translates from something that isn’t a language, for example, a document?
  14. Closure Compiler Pagedraw Pagedraw Definitions We have a very nice

    compiler called Pagedraw, that is a website to translate a document, like a webpage layout by Sketch, to web languages, like JS, CSS…
  15. Compiler is a software that transforms a data representation to

    another data representation that is someway related or equivalent to the first Definitions Then, we have a more generic definition about compilers, that is, a compiler is a software that transforms a data representation to another data representation that is someway related or equivalent to the first.
  16. OKY… BUT WE ALREADY HAVE MANY MANY COMPILERS AND LANGUAGES…

    Oky… But we already have many many compilers and languages…
  17. WHY WE NEED TO CRATE ONE MORE? why we need

    to create one more compiler?
  18. Reason #1 The language that we use has design problems

    We have many reasons. For example: the languages that we use has design problems, delaying our work.
  19. TypeScript is a JS superset that compiles to JS with

    the difference that it implements types TypeScript is a JS superset that compiles to JS with the difference that it implements types. It’s very useful to more complex applications.
  20. TypeScript is a JS superset that compiles to JS with

    the differential that it implements types // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? For example, we have a function called “createUser” and this function receives two parameters: id and level. But, we don’t know if the “id' really is a number. We don’t know all the valid levels. And we don’t know if the new user is returned by this function.
  21. // TypeScript enum level { guest = 'guest', normal =

    'normal', admin = 'admin', } const createUser = (id: number, level: level) => { // code... } createUser(42, level.guest) // All doubts are answered reading the
 // signature! // JS const createUser = (id, level) => { // code... } createUser(42, 'guest') // ID really is a number? // What the all valid levels? // The user created is returned? TypeScript is a JS superset that compiles to JS with the differential that it implements types Using TypeScript we don’t have these questions, because the signature of the function has more information. Reading the function signature we find that “id" really is a number, we discover all valid levels and we find that the function doesn’t return the new user. Maybe it saves on a database.
  22. Reason #2 To study a new concept Another reason is

    when we are studying a new concept that the current languages don’t implement.
  23. Koka is a language that splits the values and side

    effects. And it infers the side effects at compilation time For example, Koka is a language that the employees from Microsoft Research are developing and studying. This language aims to split the values and side effects, and infer it at compilation time.
  24. // Swift func getEvenNumber() -> Int { // some code...

    } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time For example, we have this code written in Swift, and this function returns an even number. But, we don’t know if this function makes an IO, if this return is deterministic, or maybe this function never ends.
  25. // Koka
 function getEvenNumber() : ndet int {
 // some

    code…
 } function main() {
 getEvenNumber() // reading the signature we discover that the
 // result is a random number!
 } Koka is a language that splits the values and side effects. And it infers the side effects at compilation time // Swift func getEvenNumber() -> Int { // some code... } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Using Koka, we don’t have any doubts. We only need to read the function’s signature. Reading this keyword “ndet” we discover that this function is nondeterministic
  26. // Koka
 function getEvenNumber() : ndet int {
 return randomInt()

    * 2
 } function main() {
 getEvenNumber() // reading the signature we discover that the
 // result is a random number!
 } // Swift func getEvenNumber() -> Int { return Int(arc4random()) * 2 } getEvenNumber() // this func makes an IO? result is deterministic? // maybe this function never ends? Koka is a language that splits the values and side effects. And it infers the side effects at compilation time In this case, both functions use a random function.
  27. Reason #3 When you haven’t had any issue to resolve,


    but you still want to write a cool code Another reason is when you haven’t had any issue to resolve, but you still want to write a cool code
  28. Piet, an esoteric language where the code are written using

    pixels! # It includes esoteric languages, such as Piet. In this language you write a code using Pixels!
  29. Piet, an esoteric language where the code are written using

    pixels! # Believe me, this code is a sum and subtraction calculator.
  30. Reason #4 To solve a specific issue Then, another reason

    is when you have a very specific issue to solve
  31. Logo, a language with educational purpose Then, it has a

    very nice visual feedback, where the student writes a code that the "turtle" uses to draw shapes. A specific purpose language isn’t necessarily a domain specific language. You can do anything using Logo, despite the language design aims to be useful at education.
  32. EventMacro, a language to write macros to bot for the

    game Ragnarok Then, we also have the language that I’m working, because it has a specific issue to solve: write macros to online game Ragnarok.
  33. Then, whats the solution? To make the computer play for

    you, using a software called OpenKore! A bot to Ragnarok.
  34. OpenKore Then, whats the solution? To make the computer play

    for you, using a software called OpenKore! A bot to Ragnarok.
  35. EventMacro The OpenKore can make simple actions, but for more

    complex actions you need to write some code. And the language to write this code is called EventMacro.
  36. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s learn more about this language.
  37. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro We can declare two types of instruction blocks: automacro and macro.
  38. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro In automacro block, we write some conditions to call a macro. For example, if we have more than 4 of this item on inventory…
  39. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro … should call the macro ref-while. A macro is a sequence of actions that the bot should do.
  40. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s print a message on console.
  41. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s move the character on a map
  42. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s save on a variable the amount of an item in inventory
  43. automacro ref { InInventory "Rough Oridecon" > 4 call ref-while

    } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } EventMacro Let’s print this amount
  44. EventMacro automacro ref { InInventory "Rough Oridecon" > 4 call

    ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } And let’s go to an action when we have more than 4 of those.
  45. ' EventMacro ' automacro ref { InInventory "Rough Oridecon" >

    4 call ref-while } macro ref-while { log go to refiner do move prt_in 59 60 $ori = &invamount(Rough Oridecon) log I have $ori Rough Oridecons while (&invamount(Rough Oridecon) > 4) { do talk 0 pause 0.8 do talk resp 0 } } Just a curiosity, one important collaborator of this language is Brazilian
  46. EventMacro Very influenced by Perl Since OpenKore is written in

    Perl, EventMacro is very influenced by Perl
  47. EventMacro my $scalar = 'foo'; my @array = (1, 2,

    3); my %hash = (1 => 'foo', 2 => 'bar'); Very influenced by Perl For example, on Perl and on EventMacro we have three types of variables: - scalar, that begins with "dollar" - array, that begins with "at" - hash, that begins with “percent”
  48. EventMacro my $variable = 'foo'; print "variable value: $variable" Very

    influenced by Perl Another influence is the implicit interpolation on string.
  49. It runs by a plugin of OpenKore, that reads each

    line EventMacro Very influenced by Perl And, EventMacro runs by a plugin of OpenKore, that reads each line…
  50. EventMacro ⛏ Developing looking forward to facilitate the writing of

    a regular expression based interpreter Very influenced by Perl It runs by a plugin of OpenKore, that reads each line … and, since the plugin tries to match each line on a regular expression, the language grammar was developed looking forward to facilitate the writing of a regular expression based interpreter.
  51. We can build a compiler for EventMacro to translate the

    Event Macro code to an OpenKore plugin! MacroCompiler And, take a look at this nice idea! We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin!
  52. Error and warning messages at compile time MacroCompiler We can

    build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! With a compiler, we can have an error and warning messages at compile time, and the sooner you find an error, the easiest to fix it
  53. MacroCompiler Optimized final code Error and warning messages at compile

    time We can build a compiler for EventMacro to translate the Event Macro code to an OpenKore plugin! And a compiler can generate an optimized code, removing some overhead on the code
  54. MAYBE BUILDING A COMPILER ISN’T THE BEST SOLUTION And, a

    disclaimer, building a compiler may not be the best solution for your issue
  55. An eDSL (embedded domain-specific language) could be a simpler solution

    An embedded domain-specific language may be a simpler solution
  56. It’s a way to structure a public API of a

    library as a programming language, that is, it has primitive keywords and can join them to build routines An eDSL (embedded domain-specific language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines
  57. // JS
 const myRegexp = /^age (\d+)/ // css selector

    in JS + jQuery
 const element = $('#foo .bar'); An eDSL (embedded domain-specific language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines For example, you can see the CSS selector at jQuery, and regular expression in JS, as a small language inside a bigger language
  58. // Haskell + Functional MetaPost library
 beginfig(1)
 pair A, B,

    C;
 A:=(0, 0); B:=(1cm, 0); C:=(0, 1cm);
 draw A--B--C--cycle;
 endfig; An eDSL (embedded domain-specific language) could be a simpler solution It’s a way to structure a public API of a library as a programming language, that is, it has primitive keywords and can join them to build routines Another example is the Haskell library MetaPost. Seeing this code you could think that is a language to write shapes, but it’s only a library
  59. BUILDING A LANGUAGE X A COMPILER And, another disclaimer. Building

    a new language is very different than building a compiler
  60. When you are building a design of a language you

    need to think more about the communication between the programmer and the computer When you are building a new language you should think more about the communication between the programmer and the computer
  61. Build a compiler is more like any other software developer

    challenge When you are building a design of a language you need to think more about the communication between the programmer and the computer And, building a compiler is more like any other software developing challenge
  62. HOW TO BUILD A COMPILER? Oky, but I want to

    build a new compiler. How can I build it?
  63. Parser Source code Semantic analysis Optimization Code generation Syntax analysis

    First of all, we have a source code on some language. Then, the next step is “syntax analysis”, to transform the source code to a stream of tokens, and pass it to a “parser” to build a structure called “abstract syntax tree”.
  64. Parser Source code Semantic analysis Optimization Code generation Syntax analysis

    Scannerless Parsing But, hey! The “syntax analysis” and “parser" could be implemented on a single step. It’s more about an implementation detail. The concept about a compiler where this steps isn’t split is called “scannerless parsing”.
  65. Parser Source code Semantic analysis Optimization Code generation Syntax analysis

    Furthermore we have the “semantic analysis”, to check if the “abstract syntax tree” is semantically valid. And, we can pass to a “optimization” step to remove an overhead. Then, finally we have the “code generation” to build our source code on the target language.
  66. macro sayHi { } $someone = Macabeus log Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis In this example, we have a very simple code on event macro. We have a macro called “sayHi". We assign to a scalar variable called “someone" the constant value “Macabeus”. And, on the next line, we send to console the message “Hi" and value of “someone”.
  67. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis On the next step we could build a stream of tokens.
  68. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis We find the keyword “macro”…
  69. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis We find the identifier “sayHi”…
  70. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis openBraces…
  71. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis and many others tokens
  72. macro sayHi { } $someone = Macabeus log Hi, $someone

    [
 keyword(macro),
 identifier(sayHi),
 openBraces, newLine,
 scalarIdentifier(someone),
 equal,
 text(Macabeus), newLine,
 keyword(log),
 text(Hi, ),
 scalarIdentifier(someone),
 newLine, closeBraces
 ] Parser Source code Semantic analysis Optimization Code generation Syntax analysis But, on my compiler I didn’t have this step, because I made a scannerless parsing…
  73. macro sayHi { } $someone = Macabeus log Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis …where we build the abstract syntax tree using the source code directly.
  74. macro sayHi { } $someone = Macabeus log Hi, $someone

    AST Parser Source code Semantic analysis Optimization Code generation Syntax analysis “AST" is a way to represent the entire source code on a tree, where each node is a simple part of the source code.
  75. macro sayHi { } $someone = Macabeus log Hi, $someone

    Let’s see how to build this AST.
  76. macro sayHi { } $someone = Macabeus log Hi, $someone

    In this moment the compiler checks that we wrote a macro block. Then it adds a node “Macro" that has two attributes: a name “sayHi" and a instruction block, that is an array
  77. macro sayHi { } $someone = Macabeus log Hi, $someone

    Now it’s a scalar assign command. The node has the attribute with the scalar variable name and a scalar value to assign, in this case, a text value
  78. macro sayHi { } $someone = Macabeus log Hi, $someone

    And the text value is a literal “Macabeus"
  79. macro sayHi { } log $someone = Macabeus Hi, $someone

    The next macro block code is saved on the next space on array. And the compiler check a keyword “log”, and this node has a Text attribute to log
  80. macro sayHi { } log $someone = Macabeus Hi, $someone

    The text value has a “Hi”…
  81. macro sayHi { } log $someone = Macabeus Hi, $someone

    … and an interpolation with the “someone" variable
  82. macro sayHi { } log $someone = Macabeus Hi, $someone

    Since we have success on building the AST, this code is syntactically correct.
  83. macro sayHi { } log $someone = Macabeus Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next step…
  84. macro sayHi { } log $someone = Macabeus Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis … semantic analysis, to check if this code is semantically correct. For example, we check if the code try to read a variable never written. If it happens, this code has a semantic error.
  85. macro sayHi { } log $someone = Macabeus Hi, $someone

    Symbol table To do these checks, we need to build a structure called symbol table. A symbol table aims to expose the informations from AST in a more accessible way.
  86. macro sayHi { } log $someone = Macabeus Hi, $someone

    Symbol table macro_write : sayHi To build this structure, we need to parser on each AST node. In this node, we are writing a new macro. Let’s write it on the symbol table: we are writing a macro called “sayHi"
  87. macro sayHi { } log $someone = Macabeus Hi, $someone

    Symbol table macro_write : sayHi variable_write: $someone In this sub-tree we are writing a scalar variable called "someone"
  88. macro sayHi { } log $someone = Macabeus Hi, $someone

    Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone And, in this sub-tree we are reading the variable “someone”
  89. macro sayHi { } log $someone = Macabeus Hi, $someone

    Symbol table macro_write : sayHi variable_write: $someone variable_read : $someone Then, we can check that all variables that were read were written, then, this code is semantically correct!
  90. macro sayHi { } log $someone = Macabeus Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis Let’s go to the next compilation step…
  91. macro sayHi { } log $someone = Macabeus Hi, $someone

    Parser Source code Semantic analysis Optimization Code generation Syntax analysis … optimization! In this step we want to remove an overhead and to preserve the code semantic.
  92. macro sayHi { } log $someone = Macabeus Hi, $someone

    Dead code strip Constant folding Optimizations I implemented two very simple optimisations: constant folding and dead code strip. In constant folding, we want to propagate the constant value from variables, in order to reduce the variable reading. In dead code strip, we want to remove unnecessary code.
  93. macro sayHi { } log $someone = Macabeus Hi, $someone

    Dead code strip Constant folding Optimizations Let’s start the constant folding. On this line we can note that we are assigning to a variable a constant value…
  94. macro sayHi { } log $someone = Macabeus Hi, $someone

    Dead code strip Constant folding Optimizations … and we are using this value on next line.
  95. macro sayHi { } log Hi, Macabeus Dead code strip

    Constant folding $someone = Macabeus Optimizations Since we know this value on compilation time, we can use this value directly. Please, notice that the AST is simpler now
  96. macro sayHi { } log Hi, Macabeus Dead code strip

    Constant folding $someone = Macabeus Optimizations Let’s start the dead code strip
  97. macro sayHi { } log Hi, Macabeus Dead code strip

    Constant folding $someone = Macabeus Optimizations In this line we are assigning to a scalar variable, but we never read it
  98. macro sayHi { } log Hi, Dead code strip Constant

    folding Macabeus Optimizations Then, we can remove this code. Note that we removed many AST nodes
  99. macro sayHi { } log Hi, Dead code strip Constant

    folding Macabeus Optimizations Since we changed the AST, let’s start again the optimizations. Constant folding can't note anything special…
  100. macro sayHi { } log Hi, Dead code strip Constant

    folding Macabeus Optimizations Dead code strip can't note anything special…
  101. macro sayHi { } log Hi, Dead code strip Constant

    folding Macabeus Optimizations Since the optimizations could not change the AST anymore, we finish the optimizations. Another way to stop it is “run it for 10 times, then stop”, for example.
  102. macro sayHi { } log Hi, Macabeus Parser Source code

    Semantic analysis Optimization Code generation Syntax analysis Let’s go to the last compilation step…
  103. macro sayHi { } log Hi, Macabeus Parser Source code

    Semantic analysis Optimization Code generation Syntax analysis … code generation! Since we want to compile this code to run as an OpenKore plugin, we need to compile it to Perl code.
  104. macro sayHi { } log Hi, Macabeus Body Header Code

    generation Footer I have three steps on code generation: - header code - body code - footer code
  105. macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register(

    'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer On header I need to add some boilerplate in order to register this plugin on OpenKore.
  106. macro sayHi { } log Hi, Macabeus package macroCompiled; Plugins::register(

    'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer I also need to import some features from OpenKore to my plugin. To do it, I need to parse on each node on AST
  107. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer Then, if I find a log command node, I know that I need to import “Log message” module
  108. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer Nothing to do in this node…
  109. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer Nothing to do in this node…
  110. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } Body Header Code generation Footer The next step is the body code generation. To do it, I also need to parse each AST node
  111. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { } Body Header Code generation Footer Then, I need to translate the "macro" node. On Perl, the equivalent code is a sub statement.
  112. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message 
 } Body Header Code generation Footer The equivalent code to “log command” node is “message”
  113. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message 
 } Body Header Code generation Footer And to TextValue node…
  114. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; } Body Header Code generation Footer … is a literal string with break line.
  115. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; } Body Header Code generation Footer Let’s go to the last step on the code generation, “footer”
  116. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }
 1; Body Header Code generation Footer And it’s very simple. Just need add “1 semi-colon” at end of the code. It’s important because on Perl a module needs to finish with a true value, for example, a positive number.
  117. macro sayHi { } log Hi, Macabeus package macroCompiled; use

    Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }
 1; Body Header Code generation Footer Hey! We finish the code generation step!
  118. package macroCompiled; use Log qw(message); Plugins::register( 'macroCompiled', 'Compiled version of

    eventMacro.txt', &on_unload ); sub on_unload { } sub macro_sayHi { message "Hi, Macabeus" ."\n"; }
 1; macro sayHi { } log Hi, Macabeus Parser Source code Semantic analysis Optimization Code generation Syntax analysis And it’s all the steps of my compiler. We saw how to translate a code wrote on EventMacro to an equivalent code on Perl to run at OpenKore
  119. It’s a simplification! An AST could have metadata nodes An

    AST could has metadata nodes. I’ll show more about it soon.
  120. It’s a simplification! A compiler could have many intermediary
 representations

    An AST could have metadata nodes A compiler could have many intermediary steps. For example, GHC, a compiler to Haskell language, has many intermediary representations, because Haskell works with a mindset very different than the architecture where our computer runs. Then, there are various intermediary representations that change a little bit to make this compiler easier to build.
  121. As well as it could build the final code straightly

    It’s a simplification! A compiler could have many intermediary
 representations An AST could have metadata nodes And, also a compiler could output the final code directly. It’s the case of LuaC and Wren. The design of Lua was thought to compile the code on a single step, because it’s useful to embed the compiler on a device, because it will be lighter.
  122. HOW IS THE CODE OF MACRO COMPILER? Nice! But how

    is the code of macro compiler?
  123. I'm writing the compiler using Elixir. I chose to use

    Elixir because of the hype. And also because I want to learn more about Elixir. Also, this language has two feature very useful to a compiler
  124. Pattern matching Works very well with recursions And Elixir works

    very well with recursions. It is important because we need to parse the AST, and it could be implemented with a recursive code
  125. Grammar To write a parser, we need a grammar to

    specify how our language should be written
  126. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL We have

    many ways to write a grammar. I’ll talk about four of them
  127. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a

    very simple way to write a language grammar. “Regular grammar” is a very simple way.
  128. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL It’s a

    very simple way to write a language grammar. Regexp (regular expression) is an example move prontera 30 42 move 30 42 An example of the regular grammar is the regular expressions. A moment where it is useful is when we need to parse a command in a CLI. We could have a command to move to a position at a map, and the map name is optional.
  129. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL /move (?:(\w+)

    )?(\d+) (\d+)/ move prontera 30 42 move 30 42 It’s a very simple way to write a language grammar. Regexp (regular expression) is an example We may write this regular expression to match this command. It’s a very simple solution to a very simple issue.
  130. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation

    is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. Because “regular grammar" is a very simple grammar, it has some limitations. For example, we can’t match an arbitrary sequence of commands, then we can’t specify a nested block code.
  131. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL { evil_query(id:

    42) { complex_field {
 complex_field { field } } } } A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. For example, in GraphQL we can have a field inside of another field
  132. { evil_query(id: 42) { complex_field {
 complex_field { field }

    } } } Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL A limitation is that we can’t match an arbitrary sequence of commands, then we can’t specify a nested block of commands. We can’t specify it using only a regular grammar. We need something more powerful
  133. We could embed a language to specify grammars in another

    language Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars Then, we could use an eDSL which is a small language inside of a bigger language.
 And we could use an eDSL to write grammars!
  134. Context-free Grammars Parsing Expression Grammar eDSL Regular Grammars sequence([
 ignore(string("move")),


    ignore(spaces()),
 
 many(letter()),
 skip(spaces()),
 
 integer(),
 ignore(spaces()),
 
 integer()
 ]) We could embed a language to specify grammars in another language We could write a grammar using an eDSL to match the previous command, instead of using a regular expression. Then, we’ll have a grammar like it.
  135. It defines a symbols set and the respective valid transformations

    to each symbol. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Another way is using a context free grammar, where we have a symbols set and the respective transformations to each symbol
  136. It defines a symbols set and the respective valid transformations

    to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL <vowel> :== "a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |
 "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> An example of CFG is BNF. On the left side we have our symbols and on the right side we have the transformations for each symbol.
  137. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL <vowel> :==

    "a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |
 "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> a9 ww It defines a symbols set and the respective valid transformations to each symbol. An example of CFG (Context-free Grammars) is BNF (Backus–Naur Form): Then, by this grammar, we can say that “a9” is a valid text and “ww” isn’t a valid text
  138. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL Suchlike of

    CFG, PEG (Parsing Expression Grammar) also defines a symbols set and the respective valid transformations to each symbol. We also have the parsing expression grammar! And suchlike of CFG, we also defines a symbols set and the transformations to each symbol.
  139. Context-free Grammars Parsing Expression Grammar Regular Grammars eDSL vowel ←

    'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Suchlike of CFG, PEG (Parsing Expression Grammar) also defines a symbols set and the respective valid transformations to each symbol. Where is the same grammar but using PEG.
  140. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main

    differences between CFG and PEG are: Oky, CFG and PEG are very similar, but we have two relevantes differences!
  141. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main

    differences between CFG and PEG are:
 - notations Firstly, the notations.
  142. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main

    differences between CFG and PEG are:
 - notations <vowel> :== "a" | "e" | "i" | "o" | "u" <digit> :== "0" | "1" | "2" | "3" | "4" |
 "5" | "6" | "7" | "8" | "9" <character> :== <vowel> | <digit> <text> :== <character> | <text> <character> vowel ← 'a' / 'e' / 'i' / 'o' / 'u' digit ← [0-9] character ← vowel / digit text ← character+ Is very easy to notice the CFG notation is very different that PEG notation. For example, you can use regular expression on PEG.
  143. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main

    differences between CFG and PEG are:
 - notations - rule interpretation And a relevante difference is the rule interpretation…
  144. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG Rule

    A
 Rule B
 Rule C CFG Rule B
 Rule C
 Rule A = PEG Rule A
 Rule B
 Rule C PEG Rule B
 Rule C
 Rule A ≠ The main differences between CFG and PEG are:
 - notations - rule interpretation …because on CFG the order of rules doesn’t matter, but on PEG the order of rules is very relevante
  145. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars The main

    differences between CFG and PEG are:
 - notations - rule interpretation CFG CFG = PEG PEG ≠ “Well.. I can use the rule A or B… What should I use?” Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A Think of a situation where I could use the rule A or B to parse
  146. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG

    = PEG PEG ≠ I don’t know! The main differences between CFG and PEG are:
 - notations - rule interpretation I don’t know! “Well.. I can use the rule A or B… What should I use?” Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A If I’m using CFG, the parse will crash, because it doesn’t know how to work with ambiguity
  147. Parsing Expression Grammar Regular Grammars eDSL Context-free Grammars CFG CFG

    = PEG PEG ≠ rule B,
 it came before! rule A,
 it came before! Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A Rule A
 Rule B
 Rule C Rule B
 Rule C
 Rule A I don’t know! I don’t know! “Well.. I can use the rule A or B… What should I use?” The main differences between CFG and PEG are:
 - notations - rule interpretation But, if I’m using PEG, the parser will say to use the rule that came before, then the order of rule matter and I need to give more attention about it if I’m using PEG
  148. Algorithm Oky, using a grammar we can specify how is

    the syntax of a language, but we need an algorithm to run this grammar.
  149. Parser Generators Parser Combinators I’ll talk about two popular algorithms.

    The main difference between them is the interface.
  150. Parser Generators Parser Combinators Description Using parser generator, by a

    description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST
  151. Parser Generators Parser Combinators Description Parser
 Generator Using parser generator,

    by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST
  152. Parser Generators Parser Combinators Description Parser
 Generator Parser Using parser

    generator, by a description written in PEG, BNF, or any else, it will compile it to generate a parser to use it on the compiler. Then, the compiler can use this parsers to build an AST
  153. Parser Combinators Parser Generators And parser combinator has another approach.

    We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.
  154. Parser Combinators Parser Generators Parser A Parser B Parser C

    And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.
  155. Parser Combinators Parser Generators Parser A Parser B Parser C

    Parser D And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.
  156. Parser Combinators Parser Generators Parser A Parser B Parser C

    Parser D Parser E And parser combinator has another approach. We write many very simple parsers, parser a, parser b, parser c, and join them to build a more complex parser.
  157. Nice!
 But how can I use it in Elixir? ?

    Very nice! But, how can I run this algorithm using Elixir?
  158. github.com/bitwalker/combine We have a very nice parser combinator library, called

    Combine. In this library we write the grammar using an eDSL.
  159. github.com/bitwalker/combine Combine, a parser combinators library
 Description by an eDSL

    We have a very nice parser combinator library, called Combine. In this library we write the grammar using an eDSL.
  160. github.com/bitwalker/combine Approach scannerless parsing Combine, a parser combinators library
 Description

    by an eDSL And, it gives an approach scannerless parsing, which the compiler will read the source code to build the AST directly, without a tokenizer step
  161. And, as well you can see, this library is a

    tool box with many many very simple parsers. And, yeah, you need to remember many of this very simple parser to join it to build a complex parser to your language.
  162. And, as well you can see, this library is a

    tool box with many many very simple parsers. And, yeah, you need to remember many of this very simple parser to join it to build a complex parser to your language.
  163. Practical Example #1
 Parsing an Event Macro command Okay, let’s

    see a practical example: writing a parser for Event Macro command
  164. &push( ) @ori text , Where is the "push" command.

    Semantically, it is used to add a text to an array. And syntactically, it has…
  165. Now we have here a very complex parser of PushCommand.

    There is a sequence of a string “push open parentheses”, an array variable name, spaces, a comma, spaces, a text value, and finally a close parentheses. This is a very complex parse, right?
  166. But this very complex parser uses a simpler parser to

    describe the array variable name, which is the sequence of an “at" and an identifier.
  167. And the identifier is a simpler parser, which is a

    sequence of characters ending in space, or new line, or comma…
  168. And PushCommand also uses the TextValue parse. This parser is

    similar to the Identifier, but we can interpolate with the variables.
  169. Then the ideia really is the same as I said.

    We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  170. Push
 Command Then the ideia really is the same as

    I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  171. Push
 Command Array
 Variable Then the ideia really is the

    same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  172. Push
 Command Identifier Array
 Variable Then the ideia really is

    the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  173. Push
 Command TextValue Identifier Array
 Variable Then the ideia really

    is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  174. Push
 Command TextValue Identifier Array
 Variable Scalar
 Variable Hash
 Variable

    Then the ideia really is the same as I said. We have a very complex parser, PushCommand, and this parser uses a simpler parser, ArrayVariable, that uses the most simple parser, Identifier. PushCommand also uses the TextValue, which uses the variables parsers.
  175. After parsing the command, we need to map it! After

    running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.
  176. After parsing the command, we need to map it! After

    running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.
  177. After parsing the command, we need to map it! After

    running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.
  178. After parsing the command, we need to map it! After

    running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.
  179. After parsing the command, we need to map it! After

    running the parser, we need to map the result to save it on the AST. We can map it to a structure in Elixir, because it will be easier to work in the next steps. In this case, each structure is a node on our AST.
  180. Practical Example #2
 Parsing a code block Okay, let’s go

    to parse a very different thing: a code block!
  181. ref-while # comments log message do c hi if (1)

    macro { } In this language we have macros…
  182. macro { } ref-while # comments log message do c

    hi if (1) Code block …that has a code block.
  183. if (1) { # comments log message do c hi

    if (1) } It also has an “if”…
  184. # comments log message do c hi if (1) Code

    block if (1) { } …and it also has a code block! Okay, they are similar, right?
  185. And it maps the code block on the structure using

    the same key, to work on same way on next compilations steps.
  186. It is a set of parsers, and it tries to

    parse using DoCommand, if it fails, it tries using LogCommand, if it also fails, it tries using CallCommand…
  187. The symbols table is important to expose the informations from

    AST by a more accessible way to next compilation steps Like I said, the symbols table is important to expose the informations from AST by a more accessible way to next compilation steps.
  188. …I’ll write at the symbols table that I have a

    macro with this name and this code block
  189. …I’ll write at symbols table that I write at some

    variable and I read some others variables. This is the idea of how I build my symbols table.
  190. But, oky. I built a very complex structure. I need

    a way to read it! Then, I have a module called “Symbols Table”, that has some helper functions.
  191. To read the symbols table I used a lot of

    the Access library. It is useful to write routines to access a complex structure
  192. read macros: [ "foo", "bar" ] On next step, from

    the symbols table, it lists the macros that we read
  193. read macros: [ "foo", "bar" ] write macros: [ "foo"

    ] It lists the macro that we written
  194. read macros: [ "foo", "bar" ] write macros: [ "foo"

    ] difference: [ "bar" ] And it checks the differences… We can note that we are trying to read a macro that we never written!
  195. read macros: [ "foo", "bar" ] write macros: [ "foo"

    ] difference: [ "bar" ] We need to raise an error
 because we are
 reading "bar" ! It’s bad! Then, the validate raises an error.
  196. Parser Source code Semantic analysis Optimization Code generation Oky, I

    spoke about the parser and semantic analysis, and both steps raises errors
  197. To show the syntax error, I need to raise an

    exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened
  198. To show the syntax error, I need to raise an

    exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened
  199. To show the syntax error, I need to raise an

    exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened
  200. To show the syntax error, I need to raise an

    exception Do you remember the MacroBlock parser? Then… if all parses fails, a syntax error is raises. A something funny is that the function that raises the error is an other parser, and it says the line and column that the bug happened
  201. Syntax error Then, I can have a very nice error

    message, like it. For example, the programmer forgot to write a close parentheses
  202. Semantic error Another error that the compiler raises is a

    semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  203. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  204. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  205. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  206. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  207. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  208. log foo Semantic error Another error that the compiler raises

    is a semantic error. To show a good semantic error message, we need more informations on the AST, some nodes called metadata. To build this nodes, when the compiler is parsing a code, for example, “log foo”, we have this parser. It uses a macro called “parser command”. This macro has a sequence where call a “getMetadata" function, that is another parser, that it returns the line and column of the code from source code.
  209. macro example { $never_read_var = &rand(1, 4) # warning log

    number: $never_written_var # fatal error } Semantic error Then I can to show exactly where a semantic error happened. The compiler has two types of semantic error: a warning and a fatal error. A warning is something weird, but not wrong, for example, write at a variable but never read it. Also I have a fatal error, where something is wrong, for example, try to read a variable never written.
  210. OPTIMIZATION Parser Source code Semantic analysis Optimization Code generation …

    optimization! Now we want to remove an overhead from the code.
  211. Constant folding …was the constant folding. This optimization wants to

    propagate the values known at compilation time.
  212. For example, this code will be compiled like this other

    code. You can note that we have less variables read.
  213. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } For example, this code will be compiled like this other code. You can note that we have less variables read.
  214. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.
  215. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: value log bar: $bar log name: macabeus $name = pagarme log name: pagarme } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.
  216. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: $foo log bar: $bar log name: $name $name = pagarme log name: $name } macro setVars { $foo = value $bar = &rand(1, 4) } macro example { $name = macabeus call setVars log foo: value log bar: $bar log name: macabeus $name = pagarme log name: pagarme } ➡ For example, this code will be compiled like this other code. You can note that we have less variables read.
  217. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $foo $bar $name Then, at the symbols table I built a context about the variables value at the end of each macro
  218. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  219. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $name: macabeus $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  220. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $name: macabeus $foo: value
 $bar: is nondeterministic $foo $bar $name $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  221. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $name: macabeus $foo: value
 $bar: is nondeterministic $bar $name value $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  222. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $name: macabeus $foo: value
 $bar: is nondeterministic $bar $name value $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  223. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $name: macabeus $foo: value
 $bar: is nondeterministic $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  224. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $foo: value
 $bar: is nondeterministic $name: pagarme $bar value macabeus $name Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  225. macro setVars { $foo = value $bar = &rand(1, 4)

    } macro example { $name = macabeus call setVars log foo: log bar: log name: $name = pagarme log name: } Context about the variables
 at end of macros Variables context at the arrow setVars
 $foo: value
 $bar: is nondeterministic
 example
 $name: pagarme $foo: value
 $bar: is nondeterministic $name: pagarme $bar value macabeus pagarme Let’s optimize the “example” macro! At this context at the arrow we know that the value of “name” is “macabeus”. Now, I have the “foo" and “bar" at the context. Please, (nõtice) notice that we don’t know the “bar" value at compilation time, because it needs an IO. Then, we can optimize the “foo” but can’t optimize the “bar”. Now we need to update the “name" variable, and we can propagate the new “name" value. You can see the code of this optimization on my github.
  226. CODE GENERATION Parser Source code Semantic analysis Optimization Code generation

    Code generation! Since I need to compile from Event Macro to a plugin to run at OpenKore, and OpenKore is written in Perl, let’s compile it to Perl.
  227. header body footer [
 "push",
 "@values",
 ",",
 [ "f", "o",

    "o" ], ";" ] Please, notice that this step is just many array concatenations, where each part of this array is a Perl code. I will generate an array with depth N…
  228. header body footer [
 "push",
 "@values",
 ",",
 [ "f", "o",

    "o" ], ";" ] [
 "push",
 "@values",
 ",", "f", "o", "o", ";" ] … then I use a flatten to make this to depth 1…
  229. header body footer push @values, "foo"; … and I use

    a join to get a string with the end code
  230. header body footer make the boilerplate One thing that I

    need to do is make the boilerplate to register this plugin on OpenKore. It’s just adding some strings to the array.
  231. header body footer find what variables are written to declare

    it
 find what modules from OpenKore we need to import Another work is to find which variables are written to declare it in global context at Plugin. It’s important because in Event Macro variables always are global, while in Perl variables have (scoupe) scope and need to be declared before in use. And another similar work is to find what modules from OpenKore we need to import
  232. header body footer find what variables are written to declare

    it
 find what modules from OpenKore we need to import To do this, we just need to parse at each AST node
  233. header body footer find what variables are written to declare

    it
 find what modules from OpenKore we need to import If we find a LogCommand node, we know that we need to import the module "Log message” from OpenKore. If we find an ArrayVariable node, we know that we need to declare an array variable with this name.
  234. header body footer find what variables are written to declare

    it
 find what modules from OpenKore we need to import If we find a LogCommand node, we know that we need to import the module "Log message” from OpenKore. If we find an ArrayVariable node, we know that we need to declare an array variable with this name.
  235. header body footer find what variables are written to declare

    it
 find what modules from OpenKore we need to import Then, add this result at the array.
  236. header body footer ✏ find what macros are written to

    do it callable Also, we need to find which macros are written to do it callable by the CLI. To do it, I use my symbols table, using the function “list written macros”.
  237. header body footer ✏ find what macros are written to

    do it callable Also, we need to find which macros are written to do it callable by the CLI. To do it, I use my symbols table, using the function “list written macros”.
  238. header body footer make the boilerplate
 find what variables are

    written to declare it
 find what modules from OpenKore we need to import
 ✏ find what macros are written to do it callable On header, I make these four things.
  239. header body footer &push(@values, foo) push @values, "foo"; In some

    way I need to translate this Event Macro code to this Perl code
  240. header body footer &push(@values, foo) ... … that generate an

    AST. Using this push command, we will get this AST
  241. header body footer &push(@values, foo) [ "push", ..., ",", ...,

    ";" ] … we will start to generate the end code at array. We have “push”, something, comma, something and semicolon. Let’s call recursively other “generate" functions
  242. header body footer &push(@values, foo) [ "push", "@values", ",", ...,

    ";" ] At this sub-tree, we will generate the array variable reference. In this case, the code will be “at-values”.
  243. header body footer &push(@values, foo) [ "push", "@values", ",", [

    "f", "o", "o" ], ";" ] And the code generator to the TextValue node is a little more complex because we need to handle the variable interpolations, but, in this case, the array will be it: “f”, “o”, “o".
  244. header body footer push @values, "foo"; [ "push", "@values", ",",

    [ "f", "o", "o" ], ";" ] Then, we just need to use the flatten and the join, then we have the equivalent code in Perl
  245. header body footer &rand(1, 4) ("1" + int(rand(1 + "4"

    - "1"))) The push command is very similar between Event Macro and Perl, but we have other commands that are different from these languages. For example, the “rand” command, that it generates a very different code in Perl
  246. body footer header Finally, the last code generation step is

    the “footer”. All Perl modules should end with a true value, to say that this module was imported with success. Then, we just add “1 semicolon” at the array.
  247. Thank to Pedro Castilho because he helped me a lot

    to build this compiler and to make this talk I really want to thank Pedro Castilho because he helped me a lot to code the compiler and to make this talk
  248. Images source - https://darkchiichan.deviantart.com/ - https://www.newgrounds.com/art/view/shidoisnthere/tree-pixel-art-2 Where to learn more

    - DSL & eDSL http://bit.ly/quora-edsl - Syntax analysis http://esprima.org/ - About languages design x compiler - http://bit.ly/quora-language-x-compiler - http://bit.ly/quora-language-x-compiler-2 There are some image sources that I used in this talk, and some links to learn more about subjects that I talked about
  249. Where to learn more - Well didactic material about how

    parser combinators works http://theorangeduck.com/page/you-could-have- invented-parser-combinators - Very good material about parsers in general https:// tomassetti.me/guide-parsing-algorithms-terminology/ - Ruby library about parser generator using PEG, Parslet And more links. I highly recommend this second link, a manual about parsers, because the author explains very well about this subject
  250. Pagar.me Talks about this same subject in Portuguese '
 https://youtu.be/t77ThZNCJGY

    
 https://youtu.be/q9T6Y2ZjE54 I presented the same subject of this talk at the youtube channel of the company that I work for, Pagarme Talks. These talks are in Portuguese and have a little more topics, like tests and recursive grammar.
  251. http://bit.ly/asciinema-compiler I recorded an ascii cinema where I add a

    new command in the compiler, and I explain each step.
  252. THANK YOU!
 OBRIGADO! Thank you! I hope that this little

    talk about languages and compilers has helped you to increase your curiosity about these subjects.