Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LuaJIT as a Ruby backend

LuaJIT as a Ruby backend

take_cheeze

June 02, 2018
Tweet

More Decks by take_cheeze

Other Decks in Programming

Transcript

  1. LuaJIT as a Ruby backend.
    Takeshi Watanabe (@take-cheeze, Fusic)
    2018/06/02

    View Slide

  2. Just an OSS mruby developer
    Works in Fukuoka
    Writing Ruby on Rails web application on work.
    Who am I ?

    View Slide

  3. About the talk:
    First half: Mostly about LuaJIT
    Second half: How to use LuaJIT as mruby backend

    View Slide

  4. Motivation of this talk
    I want to study LuaJIT(and JIT compiler)
    I want to go RubyKaigi as a speaker (Last year was LT)
    This was accepted from 3 CFP I’ve made. Others was:
    - Continuous integration for mruby and its gems (About CI
    system I’ve built)
    - Bringing keyword arguments to mruby (About
    mruby/mruby#3629)

    View Slide

  5. Start a New Thing
    Elm is a good place to start purely functional programming
    language
    Start reading LuaJIT!

    View Slide

  6. Let’s read and survey
    LuaJIT
    (to use as a Ruby backend)

    View Slide

  7. Which versions am I going to talk?
    mruby 1.4.1~master
    LuaJIT 2.0.1~2.1(master)
    CRuby 2.6

    View Slide

  8. Today’s topic: LuaJIT
    See: https://luajit.org/
    Implementation of scripting programming language Lua.
    Created by Mike Pall.
    Known as one of the fastest JIT compiler implementation of
    dynamically typed language.
    VM is Faster than original Lua implementation

    View Slide

  9. BTW what is Lua?
    A small programming language for embedding to application
    Born in Brazil !
    Language features is similar to JavaScript
    - Dynamically Typed
    - Integer and Float isn’t treated differently
    - Object Oriented features can be done with
    metatable(similar to prototype)
    Very light-weight runtime

    View Slide

  10. Difference from Ruby
    No Array or List types
    - Uses Table type with integer index instead
    - Table with integer indices are optimized
    Array like table index starts from 1
    String type is immutable(like Symbol)
    Method call operator is `:` ( `call:method(true)` )

    View Slide

  11. How does LuaJIT relate to Ruby ?
    Ruby and Lua is a dynamically typed language.
    Ruby is looking forward to have JIT compiler implementation.

    View Slide

  12. How does LuaJIT relate to mruby?
    Lua VM’s instruction set is register based.(from 5.0)
    Lua’s application is similar to mruby (mruby is influenced by
    Lua)
    I want more speed to beat CRuby make thing better

    View Slide

  13. Basics about JIT compiler
    With JIT compiler VM behaves like a profiler
    When VM finds code that should be JIT compiled:
    - Allocates memory that is executable
    - Compile non native codes to native code and output it to
    executable memory
    - Switches execution of bytecodes to native code

    View Slide

  14. Warming up VM
    JIT compiler people talks about “warmup”
    In initial VM code aren’t JIT compiled at all
    The codes get compiled to native code after executed
    When code compilation finishes, VM is warmed-up
    JIT engine’s peak performance is measured by warmed-up
    VMs

    View Slide

  15. Method vs Tracing JIT compiler
    LuaJIT and some JS engine is Tracing JIT compiler
    Method JIT compiler does the JIT compilation per method
    MJIT is method based. Kokubun will talk about it today!

    View Slide

  16. Example of JIT code generation (x86)
    #include
    #include
    #include
    typedef uint32_t (*func_t)();
    int main() {
    void *ptr = mmap(NULL, 4096, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1,
    0);
    uint8_t *data = (uint8_t*)ptr; // uint32_t func() { return 0x11; }
    data[0] = 0xB8; // mov EAX,0x11
    data[1] = 0x11; // EAX register is return value in x86 ABI
    data[2] = 0x00;
    data[3] = 0x00;
    data[4] = 0x00;
    data[5] = 0xC3; // ret
    data[6] = 0x00;
    printf("0x%0x\n", ((func_t)ptr)()); // call as C function
    return 0;
    }

    View Slide

  17. About the example
    Running on wandbox:
    https://wandbox.org/permlink/QwGkCXhvojSGd06U
    Used online assembler to generate code:
    https://defuse.ca/online-x86-assembler.htm
    Just prints “0x11” with printf
    Maybe the only code I’ve written for this talk that is working

    View Slide

  18. Can I do code generation in Ruby?
    Yes!
    If you can access mmap system call
    Binaries can be generated with Array#pack

    View Slide

  19. JIT compilation doesn’t grantee Speed.
    JIT compiler needs performance measuring
    JIT compilation takes many cost
    - CPU and memory to compile codes to native form
    - VM becomes profiler with additional cost
    JIT compiler and VM is optimized for it
    It maybe slow before warm up completes than non JIT
    compilation

    View Slide

  20. C/C++ extensions may be slow
    Overhead to convert things for C/C++ makes slowdown
    Making all code JIT compiled makes warmed up VM
    optimized
    In V8 C++ code is slower because of GC object overheads

    View Slide

  21. JIT compiler + FFI
    JIT compiler with FFI support may beat C extensions:
    - When FFI Function Calls Beat Native C
    - DragonFFI
    libffi is faster than just a VM but there is overheads
    Cooperation of FFI module and JIT compiler reduces
    overhead

    View Slide

  22. “Isn’t LLVM good for JIT?”
    LLVM is a good compiler infrastructure and generates good
    code.
    Though it’s designed for statically compiled languages.
    JavaScriptCore tried it but now it uses own JIT engine.
    HHVM tried it too but it ended up as experimental project.

    View Slide

  23. What kind of code is JIT compiled?
    Codes that is executed frequently
    Long running Loops
    Codes that is forced to compile

    View Slide

  24. Why is LuaJIT fast?
    Well designed
    It’s VM is fully implemented with assembly language.
    Compiler is very small
    Very memory efficient structure
    Pointers are limited to 32bit length
    Can JIT compile with FFIs
    NaN boxing

    View Slide

  25. NaN Boxing
    Some NaN can be used to store non number
    There is articles from JavaScript engine developers
    - value representation in javascript implementations
    - NaN boxing (Japanese article)
    Efficient in language without integer type

    View Slide

  26. Tagged Pointer
    Not all address get pointed
    - Usually pointer is aligned because of allocator
    - 64-bit address is too large to use all
    `VALUE` type is the tagged pointer in CRuby
    Good when non floating point number type is used more

    View Slide

  27. LuaJIT code levels
    Source code
    Bytecode
    SSA-IR
    Native code

    View Slide

  28. Source code
    Just an Lua script
    Most human readable form
    Parser(lj_parse.c) parse this

    View Slide

  29. Bytecode
    See luajit wiki for detailed format
    VM execute this
    Representation is similar to Lua
    Dumpable to file

    View Slide

  30. SSA IR
    Static Single Assignment Intermidiate Representation
    See luajit wiki for detail(again)
    Form used in optimization
    Many compiler implementation uses SSA form (GCC, LLVM)

    View Slide

  31. Native Code
    The form CPU can execute directly
    x86/ARM/MIPS/PowerPC is supported

    View Slide

  32. Optimization in LuaJIT
    See LuaJIT wiki page
    You can see a long list of optimization done in LuaJIT!
    Bytecode level optimization is well documented
    SSA-IR level optimization isn’t documented much so need to
    read codes!
    - It is well documented in comment
    - There is page of Allocation Sinking Optimization

    View Slide

  33. Bytecode Optimization
    Copy paste from wiki:
    - Constant Folding
    - Optimizing Composite Conditionals
    - Elimination of Conditionals
    - Elimination of Unneeded Results
    - Jump Folding
    - Template Tables
    - Instruction and Operand Specialization
    Some is done in mruby as peephole optimization too

    View Slide

  34. Where SSA optimizer code exist
    src/lj_opt_*.c (Short descriptions from the file description
    comments)
    - lj_opt_dce.c : Dead Code Elimination
    - lj_opt_fold.c : Fold Engine, Array Bounds Check Elimination,
    Common-Subexpression Elimination
    - lj_opt_loop.c : Loop Optimization
    - lj_opt_mem.c: Alias Analysis, Load/Store Forwarding, Dead Store
    Elimination
    - lj_opt_narrow.c: Narrowing double to int32_t, Strip of overflow checks
    - lj_opt_sink.c: Allocation Sinking, Store Sinking
    - lj_opt_split.c: Split 64 bit IR instructions into 32 bit IR instructions(for
    Soft-FP)

    View Slide

  35. Assembler Backend Optimization
    Read lj_asm.c.

    View Slide

  36. Native Code generator of JIT engine
    Many JIT engine has its own native code generator
    There is general purpose code generators too
    - Xbyak is used in mruby JIT by @miura1729

    View Slide

  37. DynASM
    Named from Dynamic Assembler
    > DynASM is a pre-processing assembler.
    Allows assembly embedded inside of C code
    Written in Lua script
    MoarVM use it

    View Slide

  38. DynASM example
    http://luajit.org/dynasm_examples.html
    if (ptr != NULL) {
    | mov eax, foo+17
    | mov edx, [eax+esi*2+0x20]
    | add ebx, [ecx+bar(ptr, 9)]
    }

    View Slide

  39. Pointers in LuaJIT is 32bit
    mmap is limited to 32bit pointer range
    GCRef is typed uint32_t
    - MRef type is uint32_t too
    - gcref() and mref() just casts to void*

    View Slide

  40. Building LuaJIT VM
    VM is implemented in DynASM: src/vm_*.desc
    - x86/ARM/MIPS/PowerPC
    - Optimization is done in DynASM
    You can’t find lj_vm_call in source code(It’s VM body!)
    Symbol prefix “lj_” is added to vm_call so vm_call is the VM
    body
    Read files under src/host for detail

    View Slide

  41. make amalg
    Compiles LuaJIT as single source file
    If you see ljamlg.c it’s just including src/*.c
    Compiler can optimize more

    View Slide

  42. Tests of the LuaJIT
    There is a test of LuaJIT:
    https://github.com/LuaJIT/LuaJIT-test-cleanup
    Though I don’t know how to use it
    mruby has a built-in test suites so it’s more easier to test

    View Slide

  43. Other JIT implementation of Lua
    raptorjit : A LuaJIT fork
    ravi : Lua 5.3 implementation with GCC/LLVM JIT compiler.
    Supports optional static typing too.
    luajit-mm : A LuaJIT fork with 2GB memory support (Original
    LuaJIT only supports 1GB)

    View Slide

  44. Future of LuaJIT
    Clone Mike Pall #45
    Goodbye, Lua
    There is plan for 3.0
    Lua 5.3 support needed
    Limitation of 32bit
    Feel little gloomy

    View Slide

  45. Let’s implement Ruby on
    LuaJIT!

    View Slide

  46. What am I doing?
    Trying to use LuaJIT as JIT compiler backend for mruby
    Study about JIT compiler by reading LuaJIT codes
    In-progress!

    View Slide

  47. Known limitation
    Numeric types won’t be same as Ruby
    The situation is same as Opal since Lua treats Float and
    Integer same too.

    View Slide

  48. Form of mruby codes
    Source Code
    Abstract Syntax Tree
    Bytecode

    View Slide

  49. Abstract Syntax Tree
    Tree representation of parsed source code
    Has node type and node type specific sub-nodes
    Can embed symbols, integers, strings in mruby AST

    View Slide

  50. Steps to make LuaJIT a mruby backend
    Map basic data types of mruby to LuaJIT
    Remove VM (src/vm.c)
    Replace code generator
    (mrbgems/mruby-compiler/core/codegen.c)
    - Generating Lua source code is easier
    - For optimization bytecode is better though needs
    knowledge of LuaJIT bytecode
    If possible re-implement things with DynAsm

    View Slide

  51. Type mapping of LuaJIT and mruby
    - LJ_TNIL: nil
    - LJ_TFALSE: false
    - LJ_TTRUE: true
    - LJ_TSTR: Symbol
    - LJ_TTHREAD: Fiber
    - LJ_TPROTO: struct mrb_irep (internal bytecodes)
    - LJ_TFUNC: Proc
    - LJ_TUDATA: Internal of MRB_TT_DATA
    - LJ_TNUMX: Numeric
    - LJ_TTAB: Rest types(Object, String, Array, Hash, Class,
    Module…)

    View Slide

  52. Re-implementing language features in
    Lua
    Method resolution needs to be re-implemented in Lua
    Some other features needs to be re-implemented in Lua
    Things written in Lua will be optimized by JIT engine

    View Slide

  53. First try
    Reimplement mruby APIs using LuaJIT API
    Remove files that needs to be replace

    View Slide

  54. Hard things
    Making things to compile is very difficult
    I’m new at LuaJIT(have read some code though never used)
    Removing too many files made me lost in mruby and LuaJIT

    View Slide

  55. Compilation difficulty
    Internal structure is different
    lj_* API is more familiar to mruby but still different

    View Slide

  56. Don’t use Lua API
    Lua API’s stack operation isn’t for human
    APIs from lj_*.h is more useful

    View Slide

  57. I’m new at LuaJIT
    Read lj_api.c when you get lost in Lua and LuaJIT
    - It has most of the implementation of Lua API of LuaJIT
    - It is public API so help learning LuaJIT internal
    Reading lj_obj.h helped a lot
    - Defines most data structure of LuaJIT VM
    - It’s my best friend in LuaJIT now
    - Type conversion functions and macros

    View Slide

  58. Giving up
    I wanted to touch code generator replacement so gave up
    this version
    VCS is great!
    Though getting use to LuaJIT API wasn’t bad experience
    Moved to next approach!

    View Slide

  59. Second try
    Keep mruby code as much as possible
    Make code compilable as soon as possible
    Don’t care runtime errors this time

    View Slide

  60. Reached code generator replacement!
    Gets many compilation error
    Generating Lua source code from mruby AST is
    fun(transpiler!)
    My progress stopped here…
    Sin-Choku-Dame-Desu! orz

    View Slide

  61. About mruby AST
    List structured data
    Read parse.y!
    For historical reason CRuby’s compiler is more complex
    mruby’s compiler is more cleaner

    View Slide

  62. Class implemetation
    Lua has metatable that is like JavaScript’s Proxy and
    prototype
    Operator overloading (feature I love)
    Method dispatching custumization

    View Slide

  63. Things I can bring back to mruby(CRuby)
    Things should be placed locally
    Memory allocation frequency should be reduced
    There’s thing that should be allocated once
    Methods used by language feature should be optimized
    (meta-method)

    View Slide

  64. Reading LuaJIT to Know JIT
    compiler was FUN!

    View Slide

  65. Conclusion
    LuaJIT is a great implementation but has limitations
    I can read LuaJIT forever!
    Re-implementing things is hard and takes time
    Reinvent the Wheel! (If you have reason: studying, hobby, …)
    mruby needs more optimization of data structures

    View Slide

  66. Future work
    JIT generating FFI glue code is a good place to start
    Read LuaJIT more! It’s still interesting

    View Slide

  67. Thank you!

    View Slide