Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EgoVM : naked truth

EgoVM : naked truth

First presentation about educational VM, EgoVM.

Jaroslaw Palka

March 27, 2018
Tweet

More Decks by Jaroslaw Palka

Other Decks in Programming

Transcript

  1. EgoVM

    View Slide

  2. motivation
    I learn by doing, reading books doesn’t work for me
    20 years reading, using, exploring VMs, not full time job
    time to understand and apologize to C

    View Slide

  3. virtual machines are
    everywhere
    LISP
    Erlang
    Smalltalk
    JVM
    CLR
    V8

    View Slide

  4. managed runtime
    memory safety
    operation safety
    control safety

    View Slide

  5. memory safety
    — Xiao-Feng Li
    ensures that a certain type of data in th memory
    always follow the restrictions of that type (like
    array never has elements out of bounds)

    View Slide

  6. operation safety
    — Xiao-Feng Li
    ensures that the operation on a certain type
    always follow restrictions of that type

    View Slide

  7. control safety
    — Xiao-Feng Li
    ensures that the ow of code execution never
    reach any point that either gets stuck or goes wild
    (jump to malicious code segment)

    View Slide

  8. C is a dynamic language
    everything is a memory address or address of memory address

    View Slide

  9. illusion
    memory is unlimited
    types exist
    virtual machine IS operating system

    View Slide

  10. virtual machines are a lie

    View Slide

  11. we all accept it,
    and unfortunately believe in

    View Slide

  12. agenda
    EgoVM: what’s in it
    oops, object layouts and pointer arithmetic
    metaspace, methods, elds and classes
    interpreter, stack frame and local variables
    call sites, linking and method dispatch
    garbage collection

    View Slide

  13. EgoVM: naked truth
    single threaded,
    interpreted only,
    mark and sweep garbage collected,
    no binary form of bytecode

    View Slide

  14. the choice

    View Slide

  15. specialization or universal
    bytecodes
    have a lots of specialized bytecodes for each type kind
    have a small set of bytecodes and carry type as operand

    View Slide

  16. the choice
    I choose second option looking at amount of pain, sweat and misery
    in JVM with project Valhalla.

    View Slide

  17. bytecode
    PUSH
    POP
    GET
    PUT
    NEW
    CALL

    View Slide

  18. PUSH
    Pushes value on the top of the stack.
    push value from local variable table on the top of stack, and it
    takes form of PUSH #index, where index is zero based index
    from local variable table, we ommit type information as this will be
    inferred from local variable table,
    push constant on the top of the stack, PUSH @type value, where
    type denotes type, and value is value. This way we can push
    literals, like integers,

    View Slide

  19. POP
    Pops value from the top of the stack and places it in a local variable
    table.
    It takes only one form, POP #index. Because we preserve type
    information on stack, it will inferred be and stored in local variable
    table as well.

    View Slide

  20. RETURN
    No argument op. It returns value of the top of stack to a callee. What
    if there is no value on the stack? Error. We are not going to have
    null in EgoVM. EgoVM will have a special type, None, yes type, with
    only single value.

    View Slide

  21. CALL
    The most important opcode. It calls method on a receiver type (this
    is not true all the time). It takes only one form, CALL methodname.It
    then pops whole stack, takes value from the bottom of the stack, it
    becomes receiver, and tries to link method to this call site, based on
    the types of remaining values. Returned value is pushed on stack.

    View Slide

  22. this kind of looks like INVOKEDYNAMIC in JVM, target method
    signature will be derived from stack and it will be late bound, getting
    us closer to a perfect machine (read some Alan Kay)

    View Slide

  23. NEW
    New allocates new object and pushes it on stack. Be careful it is
    uninitialized object. We then need call constructor method, in
    EgoVM land, any method which returns object of the same type as
    enclosing class, this way we can have named constructors.

    View Slide

  24. Because we want to keep instructions set minimal, same
    oppressively will be used for allocation of new arrays. But with some
    fancy quirk. We need to pass length of arrays.
    One way we can do it.

    View Slide

  25. This allocates 8 element array of unsigned 8-bit ints.
    PUSH @UINT8[] // push array type on stack
    PUSH @UINT16 8 // push array length
    NEW // create new array

    View Slide

  26. So NEW would look more like a CALL but expects type as receiver.
    This is going to be awesome!!!
    Which leads me to speculation, do we really need new?

    View Slide

  27. things we can live without
    polymorphism, this is just at the end if in disguise, late bound
    method dispatch is tricky enough
    if itself, and other control ow structures, look ma' I am
    Smalltalk,
    access modi ers, because we trust each other
    namespaces, because who needs this shit

    View Slide

  28. oops, object layouts and
    pointer arithmetic

    View Slide

  29. fat pointers
    remember safety guarantees?
    in C pointer doesn’t carry enough information, about type, array
    length and such
    this is when idea of fat pointer was born,
    Cello is a library that brings higher level programming to C.

    View Slide

  30. object layout
    decides on elds layout in memory
    you need to take into account and cache lines
    sizes and
    EgoVM doesn’t care about it and takes naive approach
    memory alignment
    what every programmer should know about memory

    View Slide

  31. object

    View Slide

  32. Object oops_Object_get_object(Object obj, char* name) {
    Class *class = oops_Object_class(obj);
    Field* field = meta_Class_get_field(class, name);
    void* oops = oops_Object_oops(obj);
    return *(Object*) (((uint8_t*) oops) + field->offset);
    }
    void oops_Object_set_object(Object obj, char* name, Object value) {
    Class *class = oops_Object_class(obj);
    Field* field = meta_Class_get_field(class, name);
    void* oops = oops_Object_oops(obj);
    *(Object*) (((uint8_t*) oops) + field->offset) = value;
    }

    View Slide

  33. array

    View Slide

  34. void oops_Array_set_object(Array array,uint64_t index,Object obj){
    ArrayHeader* oops = oops_Array_oops(array,index);
    *(Array*) (oops + index) = obj;
    }
    Object oops_Array_get_object(Array array,uint64_t index){
    ArrayHeader* oops = oops_Array_oops(array,index);
    return *(Array*) (oops + index);
    }

    View Slide

  35. metaspace, methods, elds
    and classes

    View Slide

  36. metaspace is a repository of all metadata about our code

    View Slide

  37. interpreter, stack frames and
    local variables

    View Slide

  38. struct StackFrame {
    StackFrame_t* parent;
    GPtrArray* stack;
    GPtrArray* localvars;
    Metaspace_t* meta_space;
    };

    View Slide

  39. struct StackEntry {
    Type_t* type;
    union {
    Object obj;
    char* literal;
    };
    };

    View Slide

  40. if in doubt use visitor
    … and function pointers

    View Slide

  41. some thoughts on
    performance
    getting rid of indirection as much as you can
    bounds checking for array access
    code veri er
    "virtual bytecodes"
    inlining
    call site cache (especially polymorphic), we can use GQuark for

    View Slide

  42. call sites, dynamic dispatch
    and late binding

    View Slide

  43. — Alan Kay
    OOP to me means only messaging, local retention,
    and protection and hiding of state-process, and
    extreme late-binding of all things. It can be done in
    Smalltalk and in LISP. There are possibly other
    systems in which this is possible, but I’m not aware
    of them.

    View Slide

  44. — Wikipedia
    Dynamic dispatch is di erent from late binding
    (also known as dynamic binding). Name binding
    associates a name with an operation. A
    polymorphic operation has several
    implementations, all associated with the same
    name. These bindings can be made at compile
    time or (with late binding) at run time. With
    dynamic dispatch, one particular implementation
    of the operation is chosen at run time. While
    dynamic dispatch does not imply late binding, late
    binding does imply dynamic dispatch, since the
    implementation of a late-bound operation is not
    known until run time.

    View Slide

  45. in JVM world we have a notion of early and late bound methods
    final,private,static, constructor calls are early bound
    all remaining (including invokevirtual, invokeinterface and
    invokedynamic) are late bound

    View Slide

  46. single disptach, an implementation will be chosen based only on
    receivers type (remember operation safety?)
    multiple disptach, invokedynamic

    View Slide

  47. garbage collection

    View Slide

  48. nal notes
    SCons as a build tool, there is never too late to learn something
    new
    gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1
    tried CLion (sucks, big time), Eclipse (sucks even more), VS Code
    (sucks less), NetBeans (just works)
    valgrind, life saver
    gdb, brain damage

    View Slide

  49. stuff worth exploring : articles
    The Structure and Performance of E cient Interpreters
    Virtual Machine Showdown: Stack Versus Registers
    Adaptive Optimization for SELF
    Design Issues for Foreign Function Interfaces
    Uniprocessor Garbage Collection Techniques
    A brief history of just-in-time
    Software and Hardware Techniques for E cient Polymorphic Calls

    View Slide

  50. stuff worth exploring : books
    Compiler Design: Virtual Machines
    Advanced Design and Implementation of Virtual Machines
    Virtual Machines: Versatile Platforms for Systems and Processes
    Virtual Machines
    Engineering: A Compiler
    The Garbage Collection Handbook: The Art of Automatic Memory
    Management
    Programming Language Pragmatics

    View Slide