Pro Yearly is on sale from $80 to $50! »

Creating a language using only assembly language

Creating a language using only assembly language

386c31c9e9ce2d3ee001e967f9353d65?s=128

Koichi Nakamura

June 11, 2015
Tweet

Transcript

  1. Creating a language using only assembly language. Kernel/VM Tanken-tai #11

    Koichi Nakamura
  2. Codes •https://github.com/nineties/amber

  3. Profile • Koichi Nakamura • twitter: @9_ties • developing an

    IoT device • http://idein.jp
  4. I was a compiler writer • wrote compilers at student

    experiment • minCaml compiler by O’Caml • minCaml compiler by Haskell https://github.com/nineties/Choco • studied optimizing compilers at graduate school • wrote compilers for special purpose CPUs
  5. Wanted to create my own language • name: “Amber” •

    It was ‘“rowl” at first. • I wanted to enjoy the creation process itself. • How could I?
  6. Let’s play with limitations 1. Use assembly language only. 2.

    No libraries. 3. No code generators. libc etc. High-level langs. like C flex/bison etc.
  7. Strategy:Bootstrapping Write language 1 by assembly language Write a little

    bit high-level language 2 by language 1 Write Amber by language k Write Amber by Amber here now
  8. What’s the point? • For fun. • To cultivate knowledge,

    techniques, know-hows of compiler-writing. • But it’s not cost-effective study method... • To feel a sense of gratitude and respect for predecessors.
  9. I’ll show the outline of my development process.

  10. 1. Created “rowl0” by assembly language

  11. Made a little bit high-level lang. more than asm. •

    language name: rowl0 • compiler name: rlc
  12. From regular expressions of tokens

  13. Wrote a state transition diagram

  14. Converted to jump table

  15. And wrote the lexer

  16. Wrote rowl0’s syntax by BNF

  17. Then wrote the parser • recursive descent method

  18. Generates codes together with parsing • writing memory management is

    difficult here. • generates codes without building syntax trees. code generation parsing
  19. Completed the first language “rowl0”! • no symbol tables. •

    function params must be p0,p1,p2,... • to use local variables, allocate stack mems by “allocate(n)” then use x0,x1,x2,...
  20. 2.Created a LISP “rowl-core” by “rowl0”

  21. Made a LISP temporarily • language name: rowl-core • interpreter

    name: rlci • easy to implement • productivity improvement
  22. Wrote lexer and parser

  23. Writing became more comfortable

  24. Wrote eval

  25. No memory management • mmap and munmap is the only

    function 1. Does not recovery garbage memories 2. Allocates fresh memories for new objects 3. So, it will die eventually • When it can compile the next generation compilers, it’s no problem. malloc, free
  26. Completed a LISP “rowl-core”! • rich functions • lambda, map

    etc. • macros
  27. 3.Created a language to write “VM” by “rowl-core”

  28. Decided to create a VM for the next generation •

    Created a language just for writing the virtual machine. • Defined it as a DSL in the LISP “rowl-core” • No need of writing lexer and parser!
  29. Wrote the compiler like this

  30. Now I could use higher-order functions • productivity was improved

    a lot
  31. 4.Created a virtual machine “rlvm” by the DSL

  32. Wrote codes of VM with the DSL like this

  33. Wrote a garbage collector • Copying GC • Cheney’s algorithm

  34. Wrote primitive functions

  35. An application of meta-programming • The table of instructions of

    the VM
  36. Generates various codes from the table • reflects changes of

    instructions automatically • It is very easy to make this kind of mechanism with LISP vm_instructions eval loop of the VM Linker Disassembler Assembler Assembler used internally in Amber
  37. Wrote instruction sets

  38. Floating point arithmetics

  39. Multi-precision integer arithmetics

  40. Exception handling

  41. Delimited Continuation

  42. Completed the virtual machine “rlvm”! • 186 instructions • stack

    machine • copying GC • exception handling • shift/reset delimited continuation • floating-point arithmetics, multi-precision arithmetics
  43. 5.Created a tool chain for “rlvm”

  44. There was no programming tools for “rlvm” • Created a

    tool chain for the VM • a programming language “rowl1” • its compiler • assembler • disassembler • linker
  45. Wrote “rowl1”, assembler and compiler • Defined as a DSL

    of “rowl-core”
  46. Wrote linker and disassembler • Wrote these tools by “rowl1”,

    so they run on “rlvm” • The linker requires GC since it uses a lot of memory
  47. Example outputs of the disassembler

  48. Ready to program on “rlvm”! • writing programs for rlvm

    • disassembling of byte-codes • supports separate compilation • Reached the starting line
  49. 6.Wrote “Amber” by “rowl1”

  50. Started developing “Amber” • dynamic scripting language • instance-based object-oriented

    system • run on rlvm
  51. Wrote an assembler • The former assembler assembles codes ahead

    of time and run on rlci • This assembler assembles codes just in time and run on rlvm • fills addresses by backpatching
  52. Wrote the object system • slots, messages and parent delegation

  53. Wrote Amber’s core feature on the system • dynamic pattern-matching

    engine • mechanism of partial function fusion
  54. Wrote the compiler • Made Amber compiler as one of

    Amber objects VM object system pattern-matching engine compiler Amber’s core system matching of syntax tree resource management
  55. Wrote closure-conversion

  56. Wrote parsers • compiles parsers at run-time • each parser

    is a usual Amber object (closure) VM object system pattern-matching engine compiler Amber core system compile parsers
  57. very simple syntax 1. literals are expressions 2. for a

    symbol h and expressions e1,..,en (n>=0), h{e1, ..., en} is an expression 3. no other form of Amber’s expression
  58. Used Packrat parsing method • scanner less

  59. Encoding/decoding floating-point literal was difficult • wrote them by my

    self because of “no libc” limitation • require multi-precision integer arithmetic which I wrote before “3.14” 0x40091eb851eb851f strtod, sprintf
  60. Amber interpreter is completed! • dynamic scripting language • run

    on rlvm • instance-based object oriented system • dynamic pattern-matching engine • partial function fusion • lexical closure • I got modern programming language!
  61. 7. Created Amber’s standard library

  62. Amber has strong self extensibility • Amber’s simple syntax is

    extended in a standard library • amber/lib/syntax/parse.ab • Builds its syntax during boot sequence
  63. Only has very simple syntax at first used string literal

    for comments because there is no syntax for comments
  64. Defines a syntax for defining syntaxes

  65. Defines Amber’s syntax with the syntax

  66. Builds macro system

  67. Gives meanings to syntaxes by macros

  68. Now Amber got rich syntax

  69. Extends object system

  70. Now Amber got rich object system • Inheritence, mix-in etc.

  71. Now the development is under suspension • No plans of

    further updates • Try following commands to invoke Amber shell (Linux only) • See the outputs of the make command % git clone https://github.com/nineties/amber.git % cd amber % make; sudo make install % amber
  72. Summary rowl0 rlc rlci rowl-core as lang. for writing VM

    rlvm rowl1 linker disassembler compiler compiler Amber interpreter impl. impl. run self-extension language tool • I could reach relatively high-level language. Feel satisfied.
  73. None