Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LuaJIT as a Ruby backend

LuaJIT as a Ruby backend

take_cheeze

June 02, 2018
Tweet

More Decks by take_cheeze

Other Decks in Programming

Transcript

  1. Just an OSS mruby developer Works in Fukuoka Writing Ruby

    on Rails web application on work. Who am I ?
  2. Motivation of this talk I want to study LuaJIT(and JIT

    compiler) I want to go RubyKaigi as a speaker (Last year was LT) This was accepted from 3 CFP I’ve made. Others was: - Continuous integration for mruby and its gems (About CI system I’ve built) - Bringing keyword arguments to mruby (About mruby/mruby#3629)
  3. Start a New Thing Elm is a good place to

    start purely functional programming language Start reading LuaJIT!
  4. Today’s topic: LuaJIT See: https://luajit.org/ Implementation of scripting programming language

    Lua. Created by Mike Pall. Known as one of the fastest JIT compiler implementation of dynamically typed language. VM is Faster than original Lua implementation
  5. BTW what is Lua? A small programming language for embedding

    to application Born in Brazil ! Language features is similar to JavaScript - Dynamically Typed - Integer and Float isn’t treated differently - Object Oriented features can be done with metatable(similar to prototype) Very light-weight runtime
  6. Difference from Ruby No Array or List types - Uses

    Table type with integer index instead - Table with integer indices are optimized Array like table index starts from 1 String type is immutable(like Symbol) Method call operator is `:` ( `call:method(true)` )
  7. How does LuaJIT relate to Ruby ? Ruby and Lua

    is a dynamically typed language. Ruby is looking forward to have JIT compiler implementation.
  8. How does LuaJIT relate to mruby? Lua VM’s instruction set

    is register based.(from 5.0) Lua’s application is similar to mruby (mruby is influenced by Lua) I want more speed to beat CRuby make thing better
  9. Basics about JIT compiler With JIT compiler VM behaves like

    a profiler When VM finds code that should be JIT compiled: - Allocates memory that is executable - Compile non native codes to native code and output it to executable memory - Switches execution of bytecodes to native code
  10. Warming up VM JIT compiler people talks about “warmup” In

    initial VM code aren’t JIT compiled at all The codes get compiled to native code after executed When code compilation finishes, VM is warmed-up JIT engine’s peak performance is measured by warmed-up VMs
  11. Method vs Tracing JIT compiler LuaJIT and some JS engine

    is Tracing JIT compiler Method JIT compiler does the JIT compilation per method MJIT is method based. Kokubun will talk about it today!
  12. Example of JIT code generation (x86) #include <sys/mman.h> #include <stdint.h>

    #include <stdio.h> typedef uint32_t (*func_t)(); int main() { void *ptr = mmap(NULL, 4096, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); uint8_t *data = (uint8_t*)ptr; // uint32_t func() { return 0x11; } data[0] = 0xB8; // mov EAX,0x11 data[1] = 0x11; // EAX register is return value in x86 ABI data[2] = 0x00; data[3] = 0x00; data[4] = 0x00; data[5] = 0xC3; // ret data[6] = 0x00; printf("0x%0x\n", ((func_t)ptr)()); // call as C function return 0; }
  13. About the example Running on wandbox: https://wandbox.org/permlink/QwGkCXhvojSGd06U Used online assembler

    to generate code: https://defuse.ca/online-x86-assembler.htm Just prints “0x11” with printf Maybe the only code I’ve written for this talk that is working
  14. Can I do code generation in Ruby? Yes! If you

    can access mmap system call Binaries can be generated with Array#pack
  15. JIT compilation doesn’t grantee Speed. JIT compiler needs performance measuring

    JIT compilation takes many cost - CPU and memory to compile codes to native form - VM becomes profiler with additional cost JIT compiler and VM is optimized for it It maybe slow before warm up completes than non JIT compilation
  16. C/C++ extensions may be slow Overhead to convert things for

    C/C++ makes slowdown Making all code JIT compiled makes warmed up VM optimized In V8 C++ code is slower because of GC object overheads
  17. JIT compiler + FFI JIT compiler with FFI support may

    beat C extensions: - When FFI Function Calls Beat Native C - DragonFFI libffi is faster than just a VM but there is overheads Cooperation of FFI module and JIT compiler reduces overhead
  18. “Isn’t LLVM good for JIT?” LLVM is a good compiler

    infrastructure and generates good code. Though it’s designed for statically compiled languages. JavaScriptCore tried it but now it uses own JIT engine. HHVM tried it too but it ended up as experimental project.
  19. What kind of code is JIT compiled? Codes that is

    executed frequently Long running Loops Codes that is forced to compile
  20. Why is LuaJIT fast? Well designed It’s VM is fully

    implemented with assembly language. Compiler is very small Very memory efficient structure Pointers are limited to 32bit length Can JIT compile with FFIs NaN boxing
  21. NaN Boxing Some NaN can be used to store non

    number There is articles from JavaScript engine developers - value representation in javascript implementations - NaN boxing (Japanese article) Efficient in language without integer type
  22. Tagged Pointer Not all address get pointed - Usually pointer

    is aligned because of allocator - 64-bit address is too large to use all `VALUE` type is the tagged pointer in CRuby Good when non floating point number type is used more
  23. Bytecode See luajit wiki for detailed format VM execute this

    Representation is similar to Lua Dumpable to file
  24. SSA IR Static Single Assignment Intermidiate Representation See luajit wiki

    for detail(again) Form used in optimization Many compiler implementation uses SSA form (GCC, LLVM)
  25. Optimization in LuaJIT See LuaJIT wiki page You can see

    a long list of optimization done in LuaJIT! Bytecode level optimization is well documented SSA-IR level optimization isn’t documented much so need to read codes! - It is well documented in comment - There is page of Allocation Sinking Optimization
  26. Bytecode Optimization Copy paste from wiki: - Constant Folding -

    Optimizing Composite Conditionals - Elimination of Conditionals - Elimination of Unneeded Results - Jump Folding - Template Tables - Instruction and Operand Specialization Some is done in mruby as peephole optimization too
  27. Where SSA optimizer code exist src/lj_opt_*.c (Short descriptions from the

    file description comments) - lj_opt_dce.c : Dead Code Elimination - lj_opt_fold.c : Fold Engine, Array Bounds Check Elimination, Common-Subexpression Elimination - lj_opt_loop.c : Loop Optimization - lj_opt_mem.c: Alias Analysis, Load/Store Forwarding, Dead Store Elimination - lj_opt_narrow.c: Narrowing double to int32_t, Strip of overflow checks - lj_opt_sink.c: Allocation Sinking, Store Sinking - lj_opt_split.c: Split 64 bit IR instructions into 32 bit IR instructions(for Soft-FP)
  28. Native Code generator of JIT engine Many JIT engine has

    its own native code generator There is general purpose code generators too - Xbyak is used in mruby JIT by @miura1729
  29. DynASM Named from Dynamic Assembler > DynASM is a pre-processing

    assembler. Allows assembly embedded inside of C code Written in Lua script MoarVM use it
  30. DynASM example http://luajit.org/dynasm_examples.html if (ptr != NULL) { | mov

    eax, foo+17 | mov edx, [eax+esi*2+0x20] | add ebx, [ecx+bar(ptr, 9)] }
  31. Pointers in LuaJIT is 32bit mmap is limited to 32bit

    pointer range GCRef is typed uint32_t - MRef type is uint32_t too - gcref() and mref() just casts to void*
  32. Building LuaJIT VM VM is implemented in DynASM: src/vm_*.desc -

    x86/ARM/MIPS/PowerPC - Optimization is done in DynASM You can’t find lj_vm_call in source code(It’s VM body!) Symbol prefix “lj_” is added to vm_call so vm_call is the VM body Read files under src/host for detail
  33. make amalg Compiles LuaJIT as single source file If you

    see ljamlg.c it’s just including src/*.c Compiler can optimize more
  34. Tests of the LuaJIT There is a test of LuaJIT:

    https://github.com/LuaJIT/LuaJIT-test-cleanup Though I don’t know how to use it mruby has a built-in test suites so it’s more easier to test
  35. Other JIT implementation of Lua raptorjit : A LuaJIT fork

    ravi : Lua 5.3 implementation with GCC/LLVM JIT compiler. Supports optional static typing too. luajit-mm : A LuaJIT fork with 2GB memory support (Original LuaJIT only supports 1GB)
  36. Future of LuaJIT Clone Mike Pall #45 Goodbye, Lua There

    is plan for 3.0 Lua 5.3 support needed Limitation of 32bit Feel little gloomy
  37. What am I doing? Trying to use LuaJIT as JIT

    compiler backend for mruby Study about JIT compiler by reading LuaJIT codes In-progress!
  38. Known limitation Numeric types won’t be same as Ruby The

    situation is same as Opal since Lua treats Float and Integer same too.
  39. Abstract Syntax Tree Tree representation of parsed source code Has

    node type and node type specific sub-nodes Can embed symbols, integers, strings in mruby AST
  40. Steps to make LuaJIT a mruby backend Map basic data

    types of mruby to LuaJIT Remove VM (src/vm.c) Replace code generator (mrbgems/mruby-compiler/core/codegen.c) - Generating Lua source code is easier - For optimization bytecode is better though needs knowledge of LuaJIT bytecode If possible re-implement things with DynAsm
  41. Type mapping of LuaJIT and mruby - LJ_TNIL: nil -

    LJ_TFALSE: false - LJ_TTRUE: true - LJ_TSTR: Symbol - LJ_TTHREAD: Fiber - LJ_TPROTO: struct mrb_irep (internal bytecodes) - LJ_TFUNC: Proc - LJ_TUDATA: Internal of MRB_TT_DATA - LJ_TNUMX: Numeric - LJ_TTAB: Rest types(Object, String, Array, Hash, Class, Module…)
  42. Re-implementing language features in Lua Method resolution needs to be

    re-implemented in Lua Some other features needs to be re-implemented in Lua Things written in Lua will be optimized by JIT engine
  43. Hard things Making things to compile is very difficult I’m

    new at LuaJIT(have read some code though never used) Removing too many files made me lost in mruby and LuaJIT
  44. Don’t use Lua API Lua API’s stack operation isn’t for

    human APIs from lj_*.h is more useful
  45. I’m new at LuaJIT Read lj_api.c when you get lost

    in Lua and LuaJIT - It has most of the implementation of Lua API of LuaJIT - It is public API so help learning LuaJIT internal Reading lj_obj.h helped a lot - Defines most data structure of LuaJIT VM - It’s my best friend in LuaJIT now - Type conversion functions and macros
  46. Giving up I wanted to touch code generator replacement so

    gave up this version VCS is great! Though getting use to LuaJIT API wasn’t bad experience Moved to next approach!
  47. Second try Keep mruby code as much as possible Make

    code compilable as soon as possible Don’t care runtime errors this time
  48. Reached code generator replacement! Gets many compilation error Generating Lua

    source code from mruby AST is fun(transpiler!) My progress stopped here… Sin-Choku-Dame-Desu! orz
  49. About mruby AST List structured data Read parse.y! For historical

    reason CRuby’s compiler is more complex mruby’s compiler is more cleaner
  50. Class implemetation Lua has metatable that is like JavaScript’s Proxy

    and prototype Operator overloading (feature I love) Method dispatching custumization
  51. Things I can bring back to mruby(CRuby) Things should be

    placed locally Memory allocation frequency should be reduced There’s thing that should be allocated once Methods used by language feature should be optimized (meta-method)
  52. Conclusion LuaJIT is a great implementation but has limitations I

    can read LuaJIT forever! Re-implementing things is hard and takes time Reinvent the Wheel! (If you have reason: studying, hobby, …) mruby needs more optimization of data structures
  53. Future work JIT generating FFI glue code is a good

    place to start Read LuaJIT more! It’s still interesting