Slide 1

Slide 1 text

EgoVM

Slide 2

Slide 2 text

motivation I learn by doing, reading books doesn’t work for me 20 years reading, using, exploring VMs, not full time job time to understand and apologize to C

Slide 3

Slide 3 text

virtual machines are everywhere LISP Erlang Smalltalk JVM CLR V8

Slide 4

Slide 4 text

managed runtime memory safety operation safety control safety

Slide 5

Slide 5 text

memory safety — Xiao-Feng Li ensures that a certain type of data in th memory always follow the restrictions of that type (like array never has elements out of bounds)

Slide 6

Slide 6 text

operation safety — Xiao-Feng Li ensures that the operation on a certain type always follow restrictions of that type

Slide 7

Slide 7 text

control safety — Xiao-Feng Li ensures that the ow of code execution never reach any point that either gets stuck or goes wild (jump to malicious code segment)

Slide 8

Slide 8 text

C is a dynamic language everything is a memory address or address of memory address

Slide 9

Slide 9 text

illusion memory is unlimited types exist virtual machine IS operating system

Slide 10

Slide 10 text

virtual machines are a lie

Slide 11

Slide 11 text

we all accept it, and unfortunately believe in

Slide 12

Slide 12 text

agenda EgoVM: what’s in it oops, object layouts and pointer arithmetic metaspace, methods, elds and classes interpreter, stack frame and local variables call sites, linking and method dispatch garbage collection

Slide 13

Slide 13 text

EgoVM: naked truth single threaded, interpreted only, mark and sweep garbage collected, no binary form of bytecode

Slide 14

Slide 14 text

the choice

Slide 15

Slide 15 text

specialization or universal bytecodes have a lots of specialized bytecodes for each type kind have a small set of bytecodes and carry type as operand

Slide 16

Slide 16 text

the choice I choose second option looking at amount of pain, sweat and misery in JVM with project Valhalla.

Slide 17

Slide 17 text

bytecode PUSH POP GET PUT NEW CALL

Slide 18

Slide 18 text

PUSH Pushes value on the top of the stack. push value from local variable table on the top of stack, and it takes form of PUSH #index, where index is zero based index from local variable table, we ommit type information as this will be inferred from local variable table, push constant on the top of the stack, PUSH @type value, where type denotes type, and value is value. This way we can push literals, like integers,

Slide 19

Slide 19 text

POP Pops value from the top of the stack and places it in a local variable table. It takes only one form, POP #index. Because we preserve type information on stack, it will inferred be and stored in local variable table as well.

Slide 20

Slide 20 text

RETURN No argument op. It returns value of the top of stack to a callee. What if there is no value on the stack? Error. We are not going to have null in EgoVM. EgoVM will have a special type, None, yes type, with only single value.

Slide 21

Slide 21 text

CALL The most important opcode. It calls method on a receiver type (this is not true all the time). It takes only one form, CALL methodname.It then pops whole stack, takes value from the bottom of the stack, it becomes receiver, and tries to link method to this call site, based on the types of remaining values. Returned value is pushed on stack.

Slide 22

Slide 22 text

this kind of looks like INVOKEDYNAMIC in JVM, target method signature will be derived from stack and it will be late bound, getting us closer to a perfect machine (read some Alan Kay)

Slide 23

Slide 23 text

NEW New allocates new object and pushes it on stack. Be careful it is uninitialized object. We then need call constructor method, in EgoVM land, any method which returns object of the same type as enclosing class, this way we can have named constructors.

Slide 24

Slide 24 text

Because we want to keep instructions set minimal, same oppressively will be used for allocation of new arrays. But with some fancy quirk. We need to pass length of arrays. One way we can do it.

Slide 25

Slide 25 text

This allocates 8 element array of unsigned 8-bit ints. PUSH @UINT8[] // push array type on stack PUSH @UINT16 8 // push array length NEW // create new array

Slide 26

Slide 26 text

So NEW would look more like a CALL but expects type as receiver. This is going to be awesome!!! Which leads me to speculation, do we really need new?

Slide 27

Slide 27 text

things we can live without polymorphism, this is just at the end if in disguise, late bound method dispatch is tricky enough if itself, and other control ow structures, look ma' I am Smalltalk, access modi ers, because we trust each other namespaces, because who needs this shit

Slide 28

Slide 28 text

oops, object layouts and pointer arithmetic

Slide 29

Slide 29 text

fat pointers remember safety guarantees? in C pointer doesn’t carry enough information, about type, array length and such this is when idea of fat pointer was born, Cello is a library that brings higher level programming to C.

Slide 30

Slide 30 text

object layout decides on elds layout in memory you need to take into account and cache lines sizes and EgoVM doesn’t care about it and takes naive approach memory alignment what every programmer should know about memory

Slide 31

Slide 31 text

object

Slide 32

Slide 32 text

Object oops_Object_get_object(Object obj, char* name) { Class *class = oops_Object_class(obj); Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); return *(Object*) (((uint8_t*) oops) + field->offset); } void oops_Object_set_object(Object obj, char* name, Object value) { Class *class = oops_Object_class(obj); Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); *(Object*) (((uint8_t*) oops) + field->offset) = value; }

Slide 33

Slide 33 text

array

Slide 34

Slide 34 text

void oops_Array_set_object(Array array,uint64_t index,Object obj){ ArrayHeader* oops = oops_Array_oops(array,index); *(Array*) (oops + index) = obj; } Object oops_Array_get_object(Array array,uint64_t index){ ArrayHeader* oops = oops_Array_oops(array,index); return *(Array*) (oops + index); }

Slide 35

Slide 35 text

metaspace, methods, elds and classes

Slide 36

Slide 36 text

metaspace is a repository of all metadata about our code

Slide 37

Slide 37 text

interpreter, stack frames and local variables

Slide 38

Slide 38 text

struct StackFrame { StackFrame_t* parent; GPtrArray* stack; GPtrArray* localvars; Metaspace_t* meta_space; };

Slide 39

Slide 39 text

struct StackEntry { Type_t* type; union { Object obj; char* literal; }; };

Slide 40

Slide 40 text

if in doubt use visitor … and function pointers

Slide 41

Slide 41 text

some thoughts on performance getting rid of indirection as much as you can bounds checking for array access code veri er "virtual bytecodes" inlining call site cache (especially polymorphic), we can use GQuark for

Slide 42

Slide 42 text

call sites, dynamic dispatch and late binding

Slide 43

Slide 43 text

— Alan Kay OOP to me means only messaging, local retention, and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.

Slide 44

Slide 44 text

— Wikipedia Dynamic dispatch is di erent from late binding (also known as dynamic binding). Name binding associates a name with an operation. A polymorphic operation has several implementations, all associated with the same name. These bindings can be made at compile time or (with late binding) at run time. With dynamic dispatch, one particular implementation of the operation is chosen at run time. While dynamic dispatch does not imply late binding, late binding does imply dynamic dispatch, since the implementation of a late-bound operation is not known until run time.

Slide 45

Slide 45 text

in JVM world we have a notion of early and late bound methods final,private,static, constructor calls are early bound all remaining (including invokevirtual, invokeinterface and invokedynamic) are late bound

Slide 46

Slide 46 text

single disptach, an implementation will be chosen based only on receivers type (remember operation safety?) multiple disptach, invokedynamic

Slide 47

Slide 47 text

garbage collection

Slide 48

Slide 48 text

nal notes SCons as a build tool, there is never too late to learn something new gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1 tried CLion (sucks, big time), Eclipse (sucks even more), VS Code (sucks less), NetBeans (just works) valgrind, life saver gdb, brain damage

Slide 49

Slide 49 text

stuff worth exploring : articles The Structure and Performance of E cient Interpreters Virtual Machine Showdown: Stack Versus Registers Adaptive Optimization for SELF Design Issues for Foreign Function Interfaces Uniprocessor Garbage Collection Techniques A brief history of just-in-time Software and Hardware Techniques for E cient Polymorphic Calls

Slide 50

Slide 50 text

stuff worth exploring : books Compiler Design: Virtual Machines Advanced Design and Implementation of Virtual Machines Virtual Machines: Versatile Platforms for Systems and Processes Virtual Machines Engineering: A Compiler The Garbage Collection Handbook: The Art of Automatic Memory Management Programming Language Pragmatics