Slide 1

Slide 1 text

LuaJIT as a Ruby backend. Takeshi Watanabe (@take-cheeze, Fusic) 2018/06/02

Slide 2

Slide 2 text

Just an OSS mruby developer Works in Fukuoka Writing Ruby on Rails web application on work. Who am I ?

Slide 3

Slide 3 text

About the talk: First half: Mostly about LuaJIT Second half: How to use LuaJIT as mruby backend

Slide 4

Slide 4 text

Motivation of this talk I want to study LuaJIT(and JIT compiler) I want to go RubyKaigi as a speaker (Last year was LT) This was accepted from 3 CFP I’ve made. Others was: - Continuous integration for mruby and its gems (About CI system I’ve built) - Bringing keyword arguments to mruby (About mruby/mruby#3629)

Slide 5

Slide 5 text

Start a New Thing Elm is a good place to start purely functional programming language Start reading LuaJIT!

Slide 6

Slide 6 text

Let’s read and survey LuaJIT (to use as a Ruby backend)

Slide 7

Slide 7 text

Which versions am I going to talk? mruby 1.4.1~master LuaJIT 2.0.1~2.1(master) CRuby 2.6

Slide 8

Slide 8 text

Today’s topic: LuaJIT See: https://luajit.org/ Implementation of scripting programming language Lua. Created by Mike Pall. Known as one of the fastest JIT compiler implementation of dynamically typed language. VM is Faster than original Lua implementation

Slide 9

Slide 9 text

BTW what is Lua? A small programming language for embedding to application Born in Brazil ! Language features is similar to JavaScript - Dynamically Typed - Integer and Float isn’t treated differently - Object Oriented features can be done with metatable(similar to prototype) Very light-weight runtime

Slide 10

Slide 10 text

Difference from Ruby No Array or List types - Uses Table type with integer index instead - Table with integer indices are optimized Array like table index starts from 1 String type is immutable(like Symbol) Method call operator is `:` ( `call:method(true)` )

Slide 11

Slide 11 text

How does LuaJIT relate to Ruby ? Ruby and Lua is a dynamically typed language. Ruby is looking forward to have JIT compiler implementation.

Slide 12

Slide 12 text

How does LuaJIT relate to mruby? Lua VM’s instruction set is register based.(from 5.0) Lua’s application is similar to mruby (mruby is influenced by Lua) I want more speed to beat CRuby make thing better

Slide 13

Slide 13 text

Basics about JIT compiler With JIT compiler VM behaves like a profiler When VM finds code that should be JIT compiled: - Allocates memory that is executable - Compile non native codes to native code and output it to executable memory - Switches execution of bytecodes to native code

Slide 14

Slide 14 text

Warming up VM JIT compiler people talks about “warmup” In initial VM code aren’t JIT compiled at all The codes get compiled to native code after executed When code compilation finishes, VM is warmed-up JIT engine’s peak performance is measured by warmed-up VMs

Slide 15

Slide 15 text

Method vs Tracing JIT compiler LuaJIT and some JS engine is Tracing JIT compiler Method JIT compiler does the JIT compilation per method MJIT is method based. Kokubun will talk about it today!

Slide 16

Slide 16 text

Example of JIT code generation (x86) #include #include #include typedef uint32_t (*func_t)(); int main() { void *ptr = mmap(NULL, 4096, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); uint8_t *data = (uint8_t*)ptr; // uint32_t func() { return 0x11; } data[0] = 0xB8; // mov EAX,0x11 data[1] = 0x11; // EAX register is return value in x86 ABI data[2] = 0x00; data[3] = 0x00; data[4] = 0x00; data[5] = 0xC3; // ret data[6] = 0x00; printf("0x%0x\n", ((func_t)ptr)()); // call as C function return 0; }

Slide 17

Slide 17 text

About the example Running on wandbox: https://wandbox.org/permlink/QwGkCXhvojSGd06U Used online assembler to generate code: https://defuse.ca/online-x86-assembler.htm Just prints “0x11” with printf Maybe the only code I’ve written for this talk that is working

Slide 18

Slide 18 text

Can I do code generation in Ruby? Yes! If you can access mmap system call Binaries can be generated with Array#pack

Slide 19

Slide 19 text

JIT compilation doesn’t grantee Speed. JIT compiler needs performance measuring JIT compilation takes many cost - CPU and memory to compile codes to native form - VM becomes profiler with additional cost JIT compiler and VM is optimized for it It maybe slow before warm up completes than non JIT compilation

Slide 20

Slide 20 text

C/C++ extensions may be slow Overhead to convert things for C/C++ makes slowdown Making all code JIT compiled makes warmed up VM optimized In V8 C++ code is slower because of GC object overheads

Slide 21

Slide 21 text

JIT compiler + FFI JIT compiler with FFI support may beat C extensions: - When FFI Function Calls Beat Native C - DragonFFI libffi is faster than just a VM but there is overheads Cooperation of FFI module and JIT compiler reduces overhead

Slide 22

Slide 22 text

“Isn’t LLVM good for JIT?” LLVM is a good compiler infrastructure and generates good code. Though it’s designed for statically compiled languages. JavaScriptCore tried it but now it uses own JIT engine. HHVM tried it too but it ended up as experimental project.

Slide 23

Slide 23 text

What kind of code is JIT compiled? Codes that is executed frequently Long running Loops Codes that is forced to compile

Slide 24

Slide 24 text

Why is LuaJIT fast? Well designed It’s VM is fully implemented with assembly language. Compiler is very small Very memory efficient structure Pointers are limited to 32bit length Can JIT compile with FFIs NaN boxing

Slide 25

Slide 25 text

NaN Boxing Some NaN can be used to store non number There is articles from JavaScript engine developers - value representation in javascript implementations - NaN boxing (Japanese article) Efficient in language without integer type

Slide 26

Slide 26 text

Tagged Pointer Not all address get pointed - Usually pointer is aligned because of allocator - 64-bit address is too large to use all `VALUE` type is the tagged pointer in CRuby Good when non floating point number type is used more

Slide 27

Slide 27 text

LuaJIT code levels Source code Bytecode SSA-IR Native code

Slide 28

Slide 28 text

Source code Just an Lua script Most human readable form Parser(lj_parse.c) parse this

Slide 29

Slide 29 text

Bytecode See luajit wiki for detailed format VM execute this Representation is similar to Lua Dumpable to file

Slide 30

Slide 30 text

SSA IR Static Single Assignment Intermidiate Representation See luajit wiki for detail(again) Form used in optimization Many compiler implementation uses SSA form (GCC, LLVM)

Slide 31

Slide 31 text

Native Code The form CPU can execute directly x86/ARM/MIPS/PowerPC is supported

Slide 32

Slide 32 text

Optimization in LuaJIT See LuaJIT wiki page You can see a long list of optimization done in LuaJIT! Bytecode level optimization is well documented SSA-IR level optimization isn’t documented much so need to read codes! - It is well documented in comment - There is page of Allocation Sinking Optimization

Slide 33

Slide 33 text

Bytecode Optimization Copy paste from wiki: - Constant Folding - Optimizing Composite Conditionals - Elimination of Conditionals - Elimination of Unneeded Results - Jump Folding - Template Tables - Instruction and Operand Specialization Some is done in mruby as peephole optimization too

Slide 34

Slide 34 text

Where SSA optimizer code exist src/lj_opt_*.c (Short descriptions from the file description comments) - lj_opt_dce.c : Dead Code Elimination - lj_opt_fold.c : Fold Engine, Array Bounds Check Elimination, Common-Subexpression Elimination - lj_opt_loop.c : Loop Optimization - lj_opt_mem.c: Alias Analysis, Load/Store Forwarding, Dead Store Elimination - lj_opt_narrow.c: Narrowing double to int32_t, Strip of overflow checks - lj_opt_sink.c: Allocation Sinking, Store Sinking - lj_opt_split.c: Split 64 bit IR instructions into 32 bit IR instructions(for Soft-FP)

Slide 35

Slide 35 text

Assembler Backend Optimization Read lj_asm.c.

Slide 36

Slide 36 text

Native Code generator of JIT engine Many JIT engine has its own native code generator There is general purpose code generators too - Xbyak is used in mruby JIT by @miura1729

Slide 37

Slide 37 text

DynASM Named from Dynamic Assembler > DynASM is a pre-processing assembler. Allows assembly embedded inside of C code Written in Lua script MoarVM use it

Slide 38

Slide 38 text

DynASM example http://luajit.org/dynasm_examples.html if (ptr != NULL) { | mov eax, foo+17 | mov edx, [eax+esi*2+0x20] | add ebx, [ecx+bar(ptr, 9)] }

Slide 39

Slide 39 text

Pointers in LuaJIT is 32bit mmap is limited to 32bit pointer range GCRef is typed uint32_t - MRef type is uint32_t too - gcref() and mref() just casts to void*

Slide 40

Slide 40 text

Building LuaJIT VM VM is implemented in DynASM: src/vm_*.desc - x86/ARM/MIPS/PowerPC - Optimization is done in DynASM You can’t find lj_vm_call in source code(It’s VM body!) Symbol prefix “lj_” is added to vm_call so vm_call is the VM body Read files under src/host for detail

Slide 41

Slide 41 text

make amalg Compiles LuaJIT as single source file If you see ljamlg.c it’s just including src/*.c Compiler can optimize more

Slide 42

Slide 42 text

Tests of the LuaJIT There is a test of LuaJIT: https://github.com/LuaJIT/LuaJIT-test-cleanup Though I don’t know how to use it mruby has a built-in test suites so it’s more easier to test

Slide 43

Slide 43 text

Other JIT implementation of Lua raptorjit : A LuaJIT fork ravi : Lua 5.3 implementation with GCC/LLVM JIT compiler. Supports optional static typing too. luajit-mm : A LuaJIT fork with 2GB memory support (Original LuaJIT only supports 1GB)

Slide 44

Slide 44 text

Future of LuaJIT Clone Mike Pall #45 Goodbye, Lua There is plan for 3.0 Lua 5.3 support needed Limitation of 32bit Feel little gloomy

Slide 45

Slide 45 text

Let’s implement Ruby on LuaJIT!

Slide 46

Slide 46 text

What am I doing? Trying to use LuaJIT as JIT compiler backend for mruby Study about JIT compiler by reading LuaJIT codes In-progress!

Slide 47

Slide 47 text

Known limitation Numeric types won’t be same as Ruby The situation is same as Opal since Lua treats Float and Integer same too.

Slide 48

Slide 48 text

Form of mruby codes Source Code Abstract Syntax Tree Bytecode

Slide 49

Slide 49 text

Abstract Syntax Tree Tree representation of parsed source code Has node type and node type specific sub-nodes Can embed symbols, integers, strings in mruby AST

Slide 50

Slide 50 text

Steps to make LuaJIT a mruby backend Map basic data types of mruby to LuaJIT Remove VM (src/vm.c) Replace code generator (mrbgems/mruby-compiler/core/codegen.c) - Generating Lua source code is easier - For optimization bytecode is better though needs knowledge of LuaJIT bytecode If possible re-implement things with DynAsm

Slide 51

Slide 51 text

Type mapping of LuaJIT and mruby - LJ_TNIL: nil - LJ_TFALSE: false - LJ_TTRUE: true - LJ_TSTR: Symbol - LJ_TTHREAD: Fiber - LJ_TPROTO: struct mrb_irep (internal bytecodes) - LJ_TFUNC: Proc - LJ_TUDATA: Internal of MRB_TT_DATA - LJ_TNUMX: Numeric - LJ_TTAB: Rest types(Object, String, Array, Hash, Class, Module…)

Slide 52

Slide 52 text

Re-implementing language features in Lua Method resolution needs to be re-implemented in Lua Some other features needs to be re-implemented in Lua Things written in Lua will be optimized by JIT engine

Slide 53

Slide 53 text

First try Reimplement mruby APIs using LuaJIT API Remove files that needs to be replace

Slide 54

Slide 54 text

Hard things Making things to compile is very difficult I’m new at LuaJIT(have read some code though never used) Removing too many files made me lost in mruby and LuaJIT

Slide 55

Slide 55 text

Compilation difficulty Internal structure is different lj_* API is more familiar to mruby but still different

Slide 56

Slide 56 text

Don’t use Lua API Lua API’s stack operation isn’t for human APIs from lj_*.h is more useful

Slide 57

Slide 57 text

I’m new at LuaJIT Read lj_api.c when you get lost in Lua and LuaJIT - It has most of the implementation of Lua API of LuaJIT - It is public API so help learning LuaJIT internal Reading lj_obj.h helped a lot - Defines most data structure of LuaJIT VM - It’s my best friend in LuaJIT now - Type conversion functions and macros

Slide 58

Slide 58 text

Giving up I wanted to touch code generator replacement so gave up this version VCS is great! Though getting use to LuaJIT API wasn’t bad experience Moved to next approach!

Slide 59

Slide 59 text

Second try Keep mruby code as much as possible Make code compilable as soon as possible Don’t care runtime errors this time

Slide 60

Slide 60 text

Reached code generator replacement! Gets many compilation error Generating Lua source code from mruby AST is fun(transpiler!) My progress stopped here… Sin-Choku-Dame-Desu! orz

Slide 61

Slide 61 text

About mruby AST List structured data Read parse.y! For historical reason CRuby’s compiler is more complex mruby’s compiler is more cleaner

Slide 62

Slide 62 text

Class implemetation Lua has metatable that is like JavaScript’s Proxy and prototype Operator overloading (feature I love) Method dispatching custumization

Slide 63

Slide 63 text

Things I can bring back to mruby(CRuby) Things should be placed locally Memory allocation frequency should be reduced There’s thing that should be allocated once Methods used by language feature should be optimized (meta-method)

Slide 64

Slide 64 text

Reading LuaJIT to Know JIT compiler was FUN!

Slide 65

Slide 65 text

Conclusion LuaJIT is a great implementation but has limitations I can read LuaJIT forever! Re-implementing things is hard and takes time Reinvent the Wheel! (If you have reason: studying, hobby, …) mruby needs more optimization of data structures

Slide 66

Slide 66 text

Future work JIT generating FFI glue code is a good place to start Read LuaJIT more! It’s still interesting

Slide 67

Slide 67 text

Thank you!