Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EgoVM : naked truth

EgoVM : naked truth

First presentation about educational VM, EgoVM.

Jaroslaw Palka

March 27, 2018
Tweet

More Decks by Jaroslaw Palka

Other Decks in Programming

Transcript

  1. motivation
    I learn by doing, reading books doesn’t work for me
    20 years reading, using, exploring VMs, not full time job
    time to understand and apologize to C

    View full-size slide

  2. virtual machines are
    everywhere
    LISP
    Erlang
    Smalltalk
    JVM
    CLR
    V8

    View full-size slide

  3. managed runtime
    memory safety
    operation safety
    control safety

    View full-size slide

  4. memory safety
    — Xiao-Feng Li
    ensures that a certain type of data in th memory
    always follow the restrictions of that type (like
    array never has elements out of bounds)

    View full-size slide

  5. operation safety
    — Xiao-Feng Li
    ensures that the operation on a certain type
    always follow restrictions of that type

    View full-size slide

  6. control safety
    — Xiao-Feng Li
    ensures that the ow of code execution never
    reach any point that either gets stuck or goes wild
    (jump to malicious code segment)

    View full-size slide

  7. C is a dynamic language
    everything is a memory address or address of memory address

    View full-size slide

  8. illusion
    memory is unlimited
    types exist
    virtual machine IS operating system

    View full-size slide

  9. virtual machines are a lie

    View full-size slide

  10. we all accept it,
    and unfortunately believe in

    View full-size slide

  11. agenda
    EgoVM: what’s in it
    oops, object layouts and pointer arithmetic
    metaspace, methods, elds and classes
    interpreter, stack frame and local variables
    call sites, linking and method dispatch
    garbage collection

    View full-size slide

  12. EgoVM: naked truth
    single threaded,
    interpreted only,
    mark and sweep garbage collected,
    no binary form of bytecode

    View full-size slide

  13. specialization or universal
    bytecodes
    have a lots of specialized bytecodes for each type kind
    have a small set of bytecodes and carry type as operand

    View full-size slide

  14. the choice
    I choose second option looking at amount of pain, sweat and misery
    in JVM with project Valhalla.

    View full-size slide

  15. bytecode
    PUSH
    POP
    GET
    PUT
    NEW
    CALL

    View full-size slide

  16. PUSH
    Pushes value on the top of the stack.
    push value from local variable table on the top of stack, and it
    takes form of PUSH #index, where index is zero based index
    from local variable table, we ommit type information as this will be
    inferred from local variable table,
    push constant on the top of the stack, PUSH @type value, where
    type denotes type, and value is value. This way we can push
    literals, like integers,

    View full-size slide

  17. POP
    Pops value from the top of the stack and places it in a local variable
    table.
    It takes only one form, POP #index. Because we preserve type
    information on stack, it will inferred be and stored in local variable
    table as well.

    View full-size slide

  18. RETURN
    No argument op. It returns value of the top of stack to a callee. What
    if there is no value on the stack? Error. We are not going to have
    null in EgoVM. EgoVM will have a special type, None, yes type, with
    only single value.

    View full-size slide

  19. CALL
    The most important opcode. It calls method on a receiver type (this
    is not true all the time). It takes only one form, CALL methodname.It
    then pops whole stack, takes value from the bottom of the stack, it
    becomes receiver, and tries to link method to this call site, based on
    the types of remaining values. Returned value is pushed on stack.

    View full-size slide

  20. this kind of looks like INVOKEDYNAMIC in JVM, target method
    signature will be derived from stack and it will be late bound, getting
    us closer to a perfect machine (read some Alan Kay)

    View full-size slide

  21. NEW
    New allocates new object and pushes it on stack. Be careful it is
    uninitialized object. We then need call constructor method, in
    EgoVM land, any method which returns object of the same type as
    enclosing class, this way we can have named constructors.

    View full-size slide

  22. Because we want to keep instructions set minimal, same
    oppressively will be used for allocation of new arrays. But with some
    fancy quirk. We need to pass length of arrays.
    One way we can do it.

    View full-size slide

  23. This allocates 8 element array of unsigned 8-bit ints.
    PUSH @UINT8[] // push array type on stack
    PUSH @UINT16 8 // push array length
    NEW // create new array

    View full-size slide

  24. So NEW would look more like a CALL but expects type as receiver.
    This is going to be awesome!!!
    Which leads me to speculation, do we really need new?

    View full-size slide

  25. things we can live without
    polymorphism, this is just at the end if in disguise, late bound
    method dispatch is tricky enough
    if itself, and other control ow structures, look ma' I am
    Smalltalk,
    access modi ers, because we trust each other
    namespaces, because who needs this shit

    View full-size slide

  26. oops, object layouts and
    pointer arithmetic

    View full-size slide

  27. fat pointers
    remember safety guarantees?
    in C pointer doesn’t carry enough information, about type, array
    length and such
    this is when idea of fat pointer was born,
    Cello is a library that brings higher level programming to C.

    View full-size slide

  28. object layout
    decides on elds layout in memory
    you need to take into account and cache lines
    sizes and
    EgoVM doesn’t care about it and takes naive approach
    memory alignment
    what every programmer should know about memory

    View full-size slide

  29. Object oops_Object_get_object(Object obj, char* name) {
    Class *class = oops_Object_class(obj);
    Field* field = meta_Class_get_field(class, name);
    void* oops = oops_Object_oops(obj);
    return *(Object*) (((uint8_t*) oops) + field->offset);
    }
    void oops_Object_set_object(Object obj, char* name, Object value) {
    Class *class = oops_Object_class(obj);
    Field* field = meta_Class_get_field(class, name);
    void* oops = oops_Object_oops(obj);
    *(Object*) (((uint8_t*) oops) + field->offset) = value;
    }

    View full-size slide

  30. void oops_Array_set_object(Array array,uint64_t index,Object obj){
    ArrayHeader* oops = oops_Array_oops(array,index);
    *(Array*) (oops + index) = obj;
    }
    Object oops_Array_get_object(Array array,uint64_t index){
    ArrayHeader* oops = oops_Array_oops(array,index);
    return *(Array*) (oops + index);
    }

    View full-size slide

  31. metaspace, methods, elds
    and classes

    View full-size slide

  32. metaspace is a repository of all metadata about our code

    View full-size slide

  33. interpreter, stack frames and
    local variables

    View full-size slide

  34. struct StackFrame {
    StackFrame_t* parent;
    GPtrArray* stack;
    GPtrArray* localvars;
    Metaspace_t* meta_space;
    };

    View full-size slide

  35. struct StackEntry {
    Type_t* type;
    union {
    Object obj;
    char* literal;
    };
    };

    View full-size slide

  36. if in doubt use visitor
    … and function pointers

    View full-size slide

  37. some thoughts on
    performance
    getting rid of indirection as much as you can
    bounds checking for array access
    code veri er
    "virtual bytecodes"
    inlining
    call site cache (especially polymorphic), we can use GQuark for

    View full-size slide

  38. call sites, dynamic dispatch
    and late binding

    View full-size slide

  39. — Alan Kay
    OOP to me means only messaging, local retention,
    and protection and hiding of state-process, and
    extreme late-binding of all things. It can be done in
    Smalltalk and in LISP. There are possibly other
    systems in which this is possible, but I’m not aware
    of them.

    View full-size slide

  40. — Wikipedia
    Dynamic dispatch is di erent from late binding
    (also known as dynamic binding). Name binding
    associates a name with an operation. A
    polymorphic operation has several
    implementations, all associated with the same
    name. These bindings can be made at compile
    time or (with late binding) at run time. With
    dynamic dispatch, one particular implementation
    of the operation is chosen at run time. While
    dynamic dispatch does not imply late binding, late
    binding does imply dynamic dispatch, since the
    implementation of a late-bound operation is not
    known until run time.

    View full-size slide

  41. in JVM world we have a notion of early and late bound methods
    final,private,static, constructor calls are early bound
    all remaining (including invokevirtual, invokeinterface and
    invokedynamic) are late bound

    View full-size slide

  42. single disptach, an implementation will be chosen based only on
    receivers type (remember operation safety?)
    multiple disptach, invokedynamic

    View full-size slide

  43. garbage collection

    View full-size slide

  44. nal notes
    SCons as a build tool, there is never too late to learn something
    new
    gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1
    tried CLion (sucks, big time), Eclipse (sucks even more), VS Code
    (sucks less), NetBeans (just works)
    valgrind, life saver
    gdb, brain damage

    View full-size slide

  45. stuff worth exploring : articles
    The Structure and Performance of E cient Interpreters
    Virtual Machine Showdown: Stack Versus Registers
    Adaptive Optimization for SELF
    Design Issues for Foreign Function Interfaces
    Uniprocessor Garbage Collection Techniques
    A brief history of just-in-time
    Software and Hardware Techniques for E cient Polymorphic Calls

    View full-size slide

  46. stuff worth exploring : books
    Compiler Design: Virtual Machines
    Advanced Design and Implementation of Virtual Machines
    Virtual Machines: Versatile Platforms for Systems and Processes
    Virtual Machines
    Engineering: A Compiler
    The Garbage Collection Handbook: The Art of Automatic Memory
    Management
    Programming Language Pragmatics

    View full-size slide