motivation I learn by doing, reading books doesn’t work for me 20 years reading, using, exploring VMs, not full time job time to understand and apologize to C
memory safety — Xiao-Feng Li ensures that a certain type of data in th memory always follow the restrictions of that type (like array never has elements out of bounds)
control safety — Xiao-Feng Li ensures that the ow of code execution never reach any point that either gets stuck or goes wild (jump to malicious code segment)
agenda EgoVM: what’s in it oops, object layouts and pointer arithmetic metaspace, methods, elds and classes interpreter, stack frame and local variables call sites, linking and method dispatch garbage collection
PUSH Pushes value on the top of the stack. push value from local variable table on the top of stack, and it takes form of PUSH #index, where index is zero based index from local variable table, we ommit type information as this will be inferred from local variable table, push constant on the top of the stack, PUSH @type value, where type denotes type, and value is value. This way we can push literals, like integers,
POP Pops value from the top of the stack and places it in a local variable table. It takes only one form, POP #index. Because we preserve type information on stack, it will inferred be and stored in local variable table as well.
RETURN No argument op. It returns value of the top of stack to a callee. What if there is no value on the stack? Error. We are not going to have null in EgoVM. EgoVM will have a special type, None, yes type, with only single value.
CALL The most important opcode. It calls method on a receiver type (this is not true all the time). It takes only one form, CALL methodname.It then pops whole stack, takes value from the bottom of the stack, it becomes receiver, and tries to link method to this call site, based on the types of remaining values. Returned value is pushed on stack.
this kind of looks like INVOKEDYNAMIC in JVM, target method signature will be derived from stack and it will be late bound, getting us closer to a perfect machine (read some Alan Kay)
NEW New allocates new object and pushes it on stack. Be careful it is uninitialized object. We then need call constructor method, in EgoVM land, any method which returns object of the same type as enclosing class, this way we can have named constructors.
Because we want to keep instructions set minimal, same oppressively will be used for allocation of new arrays. But with some fancy quirk. We need to pass length of arrays. One way we can do it.
This allocates 8 element array of unsigned 8-bit ints. PUSH @UINT8[] // push array type on stack PUSH @UINT16 8 // push array length NEW // create new array
things we can live without polymorphism, this is just at the end if in disguise, late bound method dispatch is tricky enough if itself, and other control ow structures, look ma' I am Smalltalk, access modi ers, because we trust each other namespaces, because who needs this shit
fat pointers remember safety guarantees? in C pointer doesn’t carry enough information, about type, array length and such this is when idea of fat pointer was born, Cello is a library that brings higher level programming to C.
object layout decides on elds layout in memory you need to take into account and cache lines sizes and EgoVM doesn’t care about it and takes naive approach memory alignment what every programmer should know about memory
some thoughts on performance getting rid of indirection as much as you can bounds checking for array access code veri er "virtual bytecodes" inlining call site cache (especially polymorphic), we can use GQuark for
— Alan Kay OOP to me means only messaging, local retention, and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.
— Wikipedia Dynamic dispatch is di erent from late binding (also known as dynamic binding). Name binding associates a name with an operation. A polymorphic operation has several implementations, all associated with the same name. These bindings can be made at compile time or (with late binding) at run time. With dynamic dispatch, one particular implementation of the operation is chosen at run time. While dynamic dispatch does not imply late binding, late binding does imply dynamic dispatch, since the implementation of a late-bound operation is not known until run time.
in JVM world we have a notion of early and late bound methods final,private,static, constructor calls are early bound all remaining (including invokevirtual, invokeinterface and invokedynamic) are late bound
nal notes SCons as a build tool, there is never too late to learn something new gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1 tried CLion (sucks, big time), Eclipse (sucks even more), VS Code (sucks less), NetBeans (just works) valgrind, life saver gdb, brain damage
stuff worth exploring : articles The Structure and Performance of E cient Interpreters Virtual Machine Showdown: Stack Versus Registers Adaptive Optimization for SELF Design Issues for Foreign Function Interfaces Uniprocessor Garbage Collection Techniques A brief history of just-in-time Software and Hardware Techniques for E cient Polymorphic Calls
stuff worth exploring : books Compiler Design: Virtual Machines Advanced Design and Implementation of Virtual Machines Virtual Machines: Versatile Platforms for Systems and Processes Virtual Machines Engineering: A Compiler The Garbage Collection Handbook: The Art of Automatic Memory Management Programming Language Pragmatics