EgoVM : naked truth

EgoVM : naked truth

First presentation about educational VM, EgoVM.

7c15bfb79d96b70b28fa47496d35f2a1?s=128

Jaroslaw Palka

March 27, 2018
Tweet

Transcript

  1. 1.
  2. 2.

    motivation I learn by doing, reading books doesn’t work for

    me 20 years reading, using, exploring VMs, not full time job time to understand and apologize to C
  3. 5.

    memory safety — Xiao-Feng Li ensures that a certain type

    of data in th memory always follow the restrictions of that type (like array never has elements out of bounds)
  4. 6.

    operation safety — Xiao-Feng Li ensures that the operation on

    a certain type always follow restrictions of that type
  5. 7.

    control safety — Xiao-Feng Li ensures that the ow of

    code execution never reach any point that either gets stuck or goes wild (jump to malicious code segment)
  6. 12.

    agenda EgoVM: what’s in it oops, object layouts and pointer

    arithmetic metaspace, methods, elds and classes interpreter, stack frame and local variables call sites, linking and method dispatch garbage collection
  7. 13.

    EgoVM: naked truth single threaded, interpreted only, mark and sweep

    garbage collected, no binary form of bytecode
  8. 15.

    specialization or universal bytecodes have a lots of specialized bytecodes

    for each type kind have a small set of bytecodes and carry type as operand
  9. 16.

    the choice I choose second option looking at amount of

    pain, sweat and misery in JVM with project Valhalla.
  10. 18.

    PUSH Pushes value on the top of the stack. push

    value from local variable table on the top of stack, and it takes form of PUSH #index, where index is zero based index from local variable table, we ommit type information as this will be inferred from local variable table, push constant on the top of the stack, PUSH @type value, where type denotes type, and value is value. This way we can push literals, like integers,
  11. 19.

    POP Pops value from the top of the stack and

    places it in a local variable table. It takes only one form, POP #index. Because we preserve type information on stack, it will inferred be and stored in local variable table as well.
  12. 20.

    RETURN No argument op. It returns value of the top

    of stack to a callee. What if there is no value on the stack? Error. We are not going to have null in EgoVM. EgoVM will have a special type, None, yes type, with only single value.
  13. 21.

    CALL The most important opcode. It calls method on a

    receiver type (this is not true all the time). It takes only one form, CALL methodname.It then pops whole stack, takes value from the bottom of the stack, it becomes receiver, and tries to link method to this call site, based on the types of remaining values. Returned value is pushed on stack.
  14. 22.

    this kind of looks like INVOKEDYNAMIC in JVM, target method

    signature will be derived from stack and it will be late bound, getting us closer to a perfect machine (read some Alan Kay)
  15. 23.

    NEW New allocates new object and pushes it on stack.

    Be careful it is uninitialized object. We then need call constructor method, in EgoVM land, any method which returns object of the same type as enclosing class, this way we can have named constructors.
  16. 24.

    Because we want to keep instructions set minimal, same oppressively

    will be used for allocation of new arrays. But with some fancy quirk. We need to pass length of arrays. One way we can do it.
  17. 25.

    This allocates 8 element array of unsigned 8-bit ints. PUSH

    @UINT8[] // push array type on stack PUSH @UINT16 8 // push array length NEW // create new array
  18. 26.

    So NEW would look more like a CALL but expects

    type as receiver. This is going to be awesome!!! Which leads me to speculation, do we really need new?
  19. 27.

    things we can live without polymorphism, this is just at

    the end if in disguise, late bound method dispatch is tricky enough if itself, and other control ow structures, look ma' I am Smalltalk, access modi ers, because we trust each other namespaces, because who needs this shit
  20. 29.

    fat pointers remember safety guarantees? in C pointer doesn’t carry

    enough information, about type, array length and such this is when idea of fat pointer was born, Cello is a library that brings higher level programming to C.
  21. 30.

    object layout decides on elds layout in memory you need

    to take into account and cache lines sizes and EgoVM doesn’t care about it and takes naive approach memory alignment what every programmer should know about memory
  22. 31.
  23. 32.

    Object oops_Object_get_object(Object obj, char* name) { Class *class = oops_Object_class(obj);

    Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); return *(Object*) (((uint8_t*) oops) + field->offset); } void oops_Object_set_object(Object obj, char* name, Object value) { Class *class = oops_Object_class(obj); Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); *(Object*) (((uint8_t*) oops) + field->offset) = value; }
  24. 33.
  25. 34.

    void oops_Array_set_object(Array array,uint64_t index,Object obj){ ArrayHeader* oops = oops_Array_oops(array,index); *(Array*)

    (oops + index) = obj; } Object oops_Array_get_object(Array array,uint64_t index){ ArrayHeader* oops = oops_Array_oops(array,index); return *(Array*) (oops + index); }
  26. 41.

    some thoughts on performance getting rid of indirection as much

    as you can bounds checking for array access code veri er "virtual bytecodes" inlining call site cache (especially polymorphic), we can use GQuark for
  27. 43.

    — Alan Kay OOP to me means only messaging, local

    retention, and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.
  28. 44.

    — Wikipedia Dynamic dispatch is di erent from late binding

    (also known as dynamic binding). Name binding associates a name with an operation. A polymorphic operation has several implementations, all associated with the same name. These bindings can be made at compile time or (with late binding) at run time. With dynamic dispatch, one particular implementation of the operation is chosen at run time. While dynamic dispatch does not imply late binding, late binding does imply dynamic dispatch, since the implementation of a late-bound operation is not known until run time.
  29. 45.

    in JVM world we have a notion of early and

    late bound methods final,private,static, constructor calls are early bound all remaining (including invokevirtual, invokeinterface and invokedynamic) are late bound
  30. 46.

    single disptach, an implementation will be chosen based only on

    receivers type (remember operation safety?) multiple disptach, invokedynamic
  31. 48.

    nal notes SCons as a build tool, there is never

    too late to learn something new gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1 tried CLion (sucks, big time), Eclipse (sucks even more), VS Code (sucks less), NetBeans (just works) valgrind, life saver gdb, brain damage
  32. 49.

    stuff worth exploring : articles The Structure and Performance of

    E cient Interpreters Virtual Machine Showdown: Stack Versus Registers Adaptive Optimization for SELF Design Issues for Foreign Function Interfaces Uniprocessor Garbage Collection Techniques A brief history of just-in-time Software and Hardware Techniques for E cient Polymorphic Calls
  33. 50.

    stuff worth exploring : books Compiler Design: Virtual Machines Advanced

    Design and Implementation of Virtual Machines Virtual Machines: Versatile Platforms for Systems and Processes Virtual Machines Engineering: A Compiler The Garbage Collection Handbook: The Art of Automatic Memory Management Programming Language Pragmatics