EgoVM : naked truth

March 27, 2018

470

EgoVM : naked truth

First presentation about educational VM, EgoVM.

Jaroslaw Palka

March 27, 2018

Tweet

More Decks by Jaroslaw Palka

See All by Jaroslaw Palka

Voyeurs in the JVM land

0

130

Other Decks in Programming

See All in Programming

MCPを使ってイベントソーシングのAIコーディングを効率化する / Streamlining Event Sourcing AI Coding with MCP

0

170

PHPでWebSocketサーバーを実装しよう2025

0

320

おやつのお供はお決まりですか？@WWDC25 Recap -Japan-\(region).swift

0

140

Python型ヒント完全ガイド初心者でも分かる、現代的で実践的な使い方

1

240

Claude Code派？Gemini CLI派？みんなで比較LT会！_20250716

1

540

イベントストーミング図からコードへの変換手順 / Procedure for Converting Event Storming Diagrams to Code

2

1.1k

Vibe Codingの幻想を超えて-生成AIを現場で使えるようにするまでの泥臭い話.ai

10

4.6k

チームのテスト力を総合的に鍛えて品質、スピード、レジリエンスを共立させる/Testing approach that improves quality, speed, and resilience

5

1.1k

The Niche of CDK Grant オブジェクトって何者？/the-niche-of-cdk-what-isgrant-object

1

620

“いい感じ“な定量評価を求めて - Four Keysとアウトカムの間の探求 -

2

12k

顧客の画像データをテラバイト単位で配信する画像サーバを WebP にした際に起こった課題とその対応策～継続的な取り組みを添えて～

4

1.3k

たった 1 枚の PHP ファイルで実装する MCP サーバ / MCP Server with Vanilla PHP

1

300

Featured

See All Featured

Build your cross-platform service in a week with App Engine

231

18k

How to Create Impact in a Changing Tech Landscape [PerfNow 2023]

53

2.9k

Six Lessons from altMBA

28

3.9k

The Cult of Friendly URLs

79

6.5k

Bootstrapping a Software Product

307

110k

JavaScript: Past, Present, and Future - NDC Porto 2020

50

5.5k

GitHub's CSS Performance

1031

460k

Stop Working from a Prison Cell

271

21k

29

5.4k

37

3.5k

Into the Great Unknown - MozCon

40

1.9k

実際に使うSQLの書き方徹底解説 / pgcon21j-tutorial

181

54k

Transcript

EgoVM
motivation I learn by doing, reading books doesn’t work for
me 20 years reading, using, exploring VMs, not full time job time to understand and apologize to C
virtual machines are everywhere LISP Erlang Smalltalk JVM CLR V8
managed runtime memory safety operation safety control safety
memory safety — Xiao-Feng Li ensures that a certain type
of data in th memory always follow the restrictions of that type (like array never has elements out of bounds)
operation safety — Xiao-Feng Li ensures that the operation on
a certain type always follow restrictions of that type
control safety — Xiao-Feng Li ensures that the ow of
code execution never reach any point that either gets stuck or goes wild (jump to malicious code segment)
C is a dynamic language everything is a memory address
or address of memory address
illusion memory is unlimited types exist virtual machine IS operating
system
virtual machines are a lie
we all accept it, and unfortunately believe in
agenda EgoVM: what’s in it oops, object layouts and pointer
arithmetic metaspace, methods, elds and classes interpreter, stack frame and local variables call sites, linking and method dispatch garbage collection
EgoVM: naked truth single threaded, interpreted only, mark and sweep
garbage collected, no binary form of bytecode
the choice
specialization or universal bytecodes have a lots of specialized bytecodes
for each type kind have a small set of bytecodes and carry type as operand
the choice I choose second option looking at amount of
pain, sweat and misery in JVM with project Valhalla.
bytecode PUSH POP GET PUT NEW CALL
PUSH Pushes value on the top of the stack. push
value from local variable table on the top of stack, and it takes form of PUSH #index, where index is zero based index from local variable table, we ommit type information as this will be inferred from local variable table, push constant on the top of the stack, PUSH @type value, where type denotes type, and value is value. This way we can push literals, like integers,
POP Pops value from the top of the stack and
places it in a local variable table. It takes only one form, POP #index. Because we preserve type information on stack, it will inferred be and stored in local variable table as well.
RETURN No argument op. It returns value of the top
of stack to a callee. What if there is no value on the stack? Error. We are not going to have null in EgoVM. EgoVM will have a special type, None, yes type, with only single value.
CALL The most important opcode. It calls method on a
receiver type (this is not true all the time). It takes only one form, CALL methodname.It then pops whole stack, takes value from the bottom of the stack, it becomes receiver, and tries to link method to this call site, based on the types of remaining values. Returned value is pushed on stack.
this kind of looks like INVOKEDYNAMIC in JVM, target method
signature will be derived from stack and it will be late bound, getting us closer to a perfect machine (read some Alan Kay)
NEW New allocates new object and pushes it on stack.
Be careful it is uninitialized object. We then need call constructor method, in EgoVM land, any method which returns object of the same type as enclosing class, this way we can have named constructors.
Because we want to keep instructions set minimal, same oppressively
will be used for allocation of new arrays. But with some fancy quirk. We need to pass length of arrays. One way we can do it.
This allocates 8 element array of unsigned 8-bit ints. PUSH
@UINT8[] // push array type on stack PUSH @UINT16 8 // push array length NEW // create new array
So NEW would look more like a CALL but expects
type as receiver. This is going to be awesome!!! Which leads me to speculation, do we really need new?
things we can live without polymorphism, this is just at
the end if in disguise, late bound method dispatch is tricky enough if itself, and other control ow structures, look ma' I am Smalltalk, access modi ers, because we trust each other namespaces, because who needs this shit
oops, object layouts and pointer arithmetic
fat pointers remember safety guarantees? in C pointer doesn’t carry
enough information, about type, array length and such this is when idea of fat pointer was born, Cello is a library that brings higher level programming to C.
object layout decides on elds layout in memory you need
to take into account and cache lines sizes and EgoVM doesn’t care about it and takes naive approach memory alignment what every programmer should know about memory
object
Object oops_Object_get_object(Object obj, char* name) { Class *class = oops_Object_class(obj);
Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); return *(Object*) (((uint8_t*) oops) + field->offset); } void oops_Object_set_object(Object obj, char* name, Object value) { Class *class = oops_Object_class(obj); Field* field = meta_Class_get_field(class, name); void* oops = oops_Object_oops(obj); *(Object*) (((uint8_t*) oops) + field->offset) = value; }
array
void oops_Array_set_object(Array array,uint64_t index,Object obj){ ArrayHeader* oops = oops_Array_oops(array,index); *(Array*)
(oops + index) = obj; } Object oops_Array_get_object(Array array,uint64_t index){ ArrayHeader* oops = oops_Array_oops(array,index); return *(Array*) (oops + index); }
metaspace, methods, elds and classes
metaspace is a repository of all metadata about our code
interpreter, stack frames and local variables
struct StackFrame { StackFrame_t* parent; GPtrArray* stack; GPtrArray* localvars; Metaspace_t*
meta_space; };
struct StackEntry { Type_t* type; union { Object obj; char*
literal; }; };
if in doubt use visitor … and function pointers
some thoughts on performance getting rid of indirection as much
as you can bounds checking for array access code veri er "virtual bytecodes" inlining call site cache (especially polymorphic), we can use GQuark for
call sites, dynamic dispatch and late binding
— Alan Kay OOP to me means only messaging, local
retention, and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.
— Wikipedia Dynamic dispatch is di erent from late binding
(also known as dynamic binding). Name binding associates a name with an operation. A polymorphic operation has several implementations, all associated with the same name. These bindings can be made at compile time or (with late binding) at run time. With dynamic dispatch, one particular implementation of the operation is chosen at run time. While dynamic dispatch does not imply late binding, late binding does imply dynamic dispatch, since the implementation of a late-bound operation is not known until run time.
in JVM world we have a notion of early and
late bound methods final,private,static, constructor calls are early bound all remaining (including invokevirtual, invokeinterface and invokedynamic) are late bound
single disptach, an implementation will be chosen based only on
receivers type (remember operation safety?) multiple disptach, invokedynamic
garbage collection
nal notes SCons as a build tool, there is never
too late to learn something new gcc 7.2.0, -Wpedantic -Werror --std=c11 -D_GNU_SOURCE=1 tried CLion (sucks, big time), Eclipse (sucks even more), VS Code (sucks less), NetBeans (just works) valgrind, life saver gdb, brain damage
stuff worth exploring : articles The Structure and Performance of
E cient Interpreters Virtual Machine Showdown: Stack Versus Registers Adaptive Optimization for SELF Design Issues for Foreign Function Interfaces Uniprocessor Garbage Collection Techniques A brief history of just-in-time Software and Hardware Techniques for E cient Polymorphic Calls
stuff worth exploring : books Compiler Design: Virtual Machines Advanced
Design and Implementation of Virtual Machines Virtual Machines: Versatile Platforms for Systems and Processes Virtual Machines Engineering: A Compiler The Garbage Collection Handbook: The Art of Automatic Memory Management Programming Language Pragmatics