Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Write A Language In Less Than 10 Minutes

Write A Language In Less Than 10 Minutes

Using a programming language is fun but writing your own is even more fun. Let's see how we could do that with the awesome hoa/compiler library.

Julien BIANCHI

March 24, 2016
Tweet

More Decks by Julien BIANCHI

Other Decks in Programming

Transcript

  1. « COMPILER Compilers allow to analyze and manipulate textual data.

    There is numerous usages. Hoa \Compiler offers to manipulate several compilers based on needs. — Hack book de Hoa\Compiler
  2. @neuf égal 4 plus 5
 @trois égal @neuf divisé par

    3
 @mon_age égal @trois multiplié par 10
 
 affiche J'ai @mon_age ans J'ai 30 ans WHAT WE WANT TML : Ten Minutes Language
  3. « A symbol represents the smallest lexical unit of a

    language, it is atomic and we will call it a lexeme (also often mentioned as a token). — Hack book de Hoa\Compiler
  4. TOKENS %skip T_SPACE \s
 %token T_OP_PLUS plus
 %token T_OP_MINUS moins


    %token T_OP_MULTI multiplié par
 %token T_OP_DIVIDE divisé par
 %token T_OP_EQUAL égal
 %token T_FN_ECHO affiche
 %token T_NUMBER [1-9][0-9]?
 %token T_VAR @[a-zA-Z_][a-zA-Z0-9_]*
 %token T_TEXT [^@\s]+
  5. REGULAR EXPR. \s
 plus
 [1-9] [0-9]*
 [a-zA-Z_]
 [^@\s]+ PCRE: PERL

    COMPATIBLE REGULAR EXPRESSIONS Whitespace character
 The word "plus"
 Numbers from 1 to 9 Zero or more numbers
 Char. from a to z, A to Z and _ Not the @ char and spaces
  6. « A language is a set of words. Each word

    is a sequence of symbols belonging to an alphabet. — Hack book de Hoa\Compiler
  7. WORDS égal @neuf plus 4 5 $ cat foo2.math |

    vendor/bin/hoa compiler:pp math.pp 0 -s # namespace token name token value offset !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 0 default T_OP_EQUAL égal 0 1 default T_VAR @neuf 6 2 default T_OP_PLUS plus 12 3 default T_NUMBER 4 17 4 default T_NUMBER 5 19 5 default EOF 21 LEXICAL ANALYSIS
  8. RULES #math:
 ( expr() | assign() | fn() )+
 


    #expr:
 ( <T_NUMBER> | <T_VAR> ) operator() ( <T_NUMBER> | <T_VAR> )
 
 #assign:
 <T_VAR> ::T_OP_EQUAL:: ( expr() | <T_VAR> )
 
 #fn:
 call() ( <T_VAR> | <T_TEXT> )+
 
 operator:
 ( <T_OP_PLUS> | <T_OP_MINUS> | <T_OP_MULTI> | <T_OP_DIVIDE> )
 
 call:
 <T_FN_ECHO>
  9. « BNF / EBNF (EXTENDED) BACKUS-NAUR FORM In computer science,

    Extended Backus–Naur Form (EBNF) is a family of metasyntax notations, any of which can be used to express a […] grammar. EBNF is used to make a formal description of a formal language which can be a computer programming language. They are extensions of the basic Backus–Naur Form (BNF) metasyntax notation. — Wikipedia
  10. « A set of rules is called a grammar. And

    so, a grammar represents a language! — Hack book de Hoa\Compiler
  11. GRAMMAR %skip T_SPACE \s
 %token T_OP_PLUS plus
 %token T_OP_MINUS moins


    %token T_OP_MULTI multiplié par
 %token T_OP_DIVIDE divisé par
 %token T_OP_EQUAL égal
 %token T_FN_ECHO affiche
 %token T_NUMBER [1-9][0-9]?
 %token T_VAR @[a-zA-Z_][a-zA-Z0-9_]*
 %token T_TEXT [^@\s]+
 
 #math:
 ( expr() | assign() | fn() )+
 
 #expr:
 ( <T_NUMBER> | <T_VAR> ) operator() ( <T_NUMBER> | <T_VAR> )
 
 #assign:
 <T_VAR> ::T_OP_EQUAL:: ( expr() | <T_VAR> )
 
 #fn:
 call() ( <T_VAR> | <T_TEXT> )+
 
 operator:
 ( <T_OP_PLUS> | <T_OP_MINUS> | <T_OP_MULTI> | <T_OP_DIVIDE> )
 
 call:
 <T_FN_ECHO>
  12. « [An AST (Abstract Syntax Tree)] represents our textual data

    after the analysis. One advantage is that it can visited […], which allows us to add new constraints which can not be expressed in the grammar […]. — Hack book de Hoa\Compiler
  13. AST $ cat foo.math | vendor/bin/hoa compiler:pp math.pp 0 -v

    dump > #math > > #assign > > > token(T_VAR, @neuf) > > > #expr > > > > token(T_NUMBER, 4) > > > > token(T_OP_PLUS, plus) > > > > token(T_NUMBER, 5) > > #assign > > > token(T_VAR, @trois) > > > #expr > > > > token(T_VAR, @neuf) > > > > token(T_OP_DIVIDE, divisé par) > > > > token(T_NUMBER, 3) > > #assign > > > token(T_VAR, @mon_age) > > > #expr > > > > token(T_VAR, @trois) > > > > token(T_OP_MULTI, multiplié par) > > > > token(T_NUMBER, 10) > > #fn > > > token(T_FN_ECHO, affiche) > > > token(T_TEXT, J'ai) > > > token(T_VAR, @mon_age) > > > token(T_TEXT, ans) > > > token(T_TEXT, ) SYNTACTIC ANALYSIS
  14. VISITORS !" Expr public function visit(Visitor\Element $element, &$handle = null,

    $eldnah = null)
 {
 $expr = '';
 
 foreach ($element#$getChildren() as $child) {
 switch ($child#$getValueToken()) {
 case 'T_NUMBER':
 $expr .= $child#$getValueValue();
 break;
 
 case 'T_VAR':
 if (isset($this#$variables[$child#$getValueValue()]) === false) {
 throw new \LogicException('Undefined variable ' . $child#$getValueValue());
 }
 
 $expr .= $this#$variables[$child#$getValueValue()];
 break;
 
 case 'T_OP_PLUS':
 case 'T_OP_MINUS':
 case 'T_OP_MULTI':
 case 'T_OP_DIVIDE':
 $expr .= static::OPERATORS[$child#$getValueToken()];
 break;
 
 default:
 $expr .= $child#$accept($this, $handle, $eldnah);
 }
 }
 
 return eval('return ' . $expr . ';');
 }
  15. CONSTRAINTS <?php
 
 namespace jubianchi\TML\Visitor\Node\Variable;
 
 use jubianchi\TML\Visitor\Node;
 
 class

    Right extends Node\Variable
 {
 protected function check($name, $variable)
 {
 if (array_key_exists($name, $this#$variables) === false) {
 throw new \LogicException('Undefined variable ' . $variable#$getValueValue());
 }
 
 return $this;
 }
 }
  16. THE BIG PICTURE . "## bin/ $ %## tml "##

    composer.json "## examples/ $ %## *.tml %## src/ "## Visitor/ $ "## Node/ $ $ "## Assign.php $ $ "## Expr.php $ $ %## Fn.php $ %## TML.php "## Visitor.php %## tml.pp
  17. REPL $ bin/tml !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Use &<file path> to import and

    execute a file Use ?<tml> to see the produced AST !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > &examples/age.tml < J'ai 30 ans > ?4 plus 5 < > #tml < > > #expr < > > > token(T_NUMBER, 4) < > > > token(T_OP_PLUS, plus) < > > > #expr < > > > > token(T_NUMBER, 5) READ - PROMPT - EVAL - LOOP
  18. WHAT WE HAVE • A grammar (~ 60 LOC) •

    A set of visitors (~250 LOC) • An interpreter/REPL (~ 100 LOC) • Some unit tests (~ 90 LOC, 15k assertions) • A small but working language!
  19. USE CASES • Create DSLs • Ruler (hoa/ruler, rulerz, …)

    • Grammar-based testing • … • Fun!