Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TruffleRuby: Wrapping up compatibility for C extensions

TruffleRuby: Wrapping up compatibility for C extensions

We think it is crucial that any alternative Ruby implementation aiming to be fully compatible with MRI runs the C extensions. TruffleRuby's compatibility was recently significantly improved, with much better support that almost completely removes the need to patch C extensions.

In this talk you will hear and see: how the old approach to C extensions worked and where and why it was failing short; how does the new approach work and how much closer it brings TruffleRuby to its goal to be a drop-in replacement for MRI.

We have been interpreting the C extensions (and JITing together with Ruby code) for a while, however we have been passing the Ruby objects directly into the C code which had lead to problems. We now have a new innovative technique which no longer requires patches in almost all cases. The objects are wrapped for greater compatibility and there is a virtual GC marking phase to avoid memory leaks.

Petr Chalupa

April 20, 2019
Tweet

More Decks by Petr Chalupa

Other Decks in Programming

Transcript

  1. View Slide

  2. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    TruffleRuby:
    Wrapping up compatibility for C extensions
    Petr Chalupa
    Principal Member of Technical Staff
    Oracle Labs
    April 20, 2019

    View Slide

  3. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Safe Harbor Statement
    The following is intended to provide some insight into a line of research in Oracle Labs. It
    is intended for information purposes only, and may not be incorporated into any contract.
    It is not a commitment to deliver any material, code, or functionality, and should not be
    relied upon in making purchasing decisions. Oracle reserves the right to alter its
    development plans and practices at any time, and the development, release, and timing
    of any features or functionality described in connection with any Oracle product or
    service remains at the sole discretion of Oracle. Any views expressed in this presentation
    are my own and do not necessarily reflect the views of Oracle.

    View Slide

  4. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Program Agenda
    Technologies
    Execution of the C extensions
    Old approach
    New approach
    Conclusion
    1
    2
    3
    4
    5

    View Slide

  5. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Technologies

    View Slide

  6. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Alternative implementation of Ruby
    • A drop-in replacement for the
    CRuby implementation
    – C extensions support
    – Good startup time for development
    • Just-in-time compilation
    • Generally faster then any other
    implementation
    – If not, please file a bug
    TruffleRuby
    TruffleRuby
    Ruby

    View Slide

  7. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Specializing (self-modifying)
    Abstract Syntax Tree interpreter
    • Simpler language implementation
    • Polyglot protocol
    – Languages can pass values to each
    other
    – No usual slow language barrier
    • Instrumentation
    – Multi-language debugger
    – Profilers
    Truffle - Language Implementation Framework
    Truffle
    TruffleRuby
    Ruby

    View Slide

  8. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Truffle - Language Implementation Framework

    View Slide

  9. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Dynamic compiler written in Java
    • In combination with Truffle
    – Inlining
    – Splitting, method cloning
    – Partial Evaluation
    • All the optimizations are done in
    Truffle and Graal rather than the
    language implementations
    – Shared
    – Optimized together
    Graal Compiler
    Graal
    Truffle
    TruffleRuby
    Ruby

    View Slide

  10. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Executes compiled methods
    – Provided by Graal
    • Garbage collector
    • Runs Java
    Java VM
    Java VM
    Graal
    Truffle
    TruffleRuby
    Ruby

    View Slide

  11. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Distribution of
    – Java VM
    – Graal
    – Truffle
    – Languages you can run
    • Java, Kotlin, Scala, ...
    • Ruby, JS, R, Python, ...
    • C, C++, Fortran, ...
    GraalVM
    GraalVM
    Java VM
    Graal
    Truffle
    TruffleRuby Other languages ...
    Ruby

    View Slide

  12. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • LLVM bitcode runtime
    – Technically an interpreter with JIT
    – Any language transformable to LLVM
    bitcode can be executed
    – E.g. C/C++ and Fortran
    • TruffleRuby executes Ruby code
    • Sulong executes C extensions
    • Both are Truffle languages
    optimized together by Graal
    Sulong
    GraalVM
    Java VM
    Graal
    Truffle
    TruffleRuby Sulong
    C
    Ruby

    View Slide

  13. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Ahead of time compilation of Java
    applications
    • TruffleRuby, Sulong, Truffle, and
    Graal are written in Java
    • Executable Ruby binary is produced
    with fast startup
    – No slow startup limitation for day to
    day development
    Substrate VM
    GraalVM
    Graal
    Truffle
    TruffleRuby Sulong
    C
    Java VM Substrate VM
    Ruby

    View Slide

  14. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Startup time
    Implementation Time Memory MB
    TruffleRuby native 0.025 65
    CRuby 2.6.2 0.048 14
    Rubinius 3.107 0.150 78
    JRuby 9.2.7.0 1.357 160
    TruffleRuby JVM 1.787 456
    Of ruby –e “p puts ‘Hello world’”

    View Slide

  15. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Execution of the C extensions

    View Slide

  16. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Execution of the C extensions
    • Sulong is a Truffle language
    – Interoperability with other languages
    – VALUEs in the C extension code can be Ruby objects
    • Managed vs. unmanaged memory
    – Managed (Garbage collected Ruby) objects cannot be put into unmanaged (native)
    memory
    – We do tricks to store Ruby objects into native memory (e.g., arrays or structs in C)
    • Optimized together
    – In Truffle all languages use the same Intermediate Representation
    • Polyglot protocol

    View Slide

  17. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Polyglot protocol
    • An API allowing languages to talk to foreign values without conversion
    – hasMembers, readMember, writeMember, ...
    • Example from C: a_ruby_object->member
    – isPointer, asPointer, ...
    • Example from C: a_struct->member = ruby_object
    – ...
    • Part of Truffle
    – Implemented with specializing nodes
    • If C reads from a Ruby object Ruby provides nodes defining the Ruby read
    – JITed

    View Slide

  18. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Understanding C extension evaluation
    static void gzfile_reader_rewind(struct gzfile *gz) {
    long n;
    n = gz->z.stream.total_in;
    if (!NIL_P(gz->z.input)) {
    n += RSTRING_LEN(gz->z.input);
    }
    rb_funcall(gz->io, id_seek, 2, INT2NUM(-n), INT2FIX(1));
    gzfile_reset(gz);
    }
    NIL_P calls nil? on a Ruby object
    read from a nested struct.
    Get a length as C long of a
    String stored in a nested struct.
    Call a method on a ruby object
    stored in a struct with
    arguments.

    View Slide

  19. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Old approach

    View Slide

  20. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Storing handles instead of managed objects
    • Managed objects
    – Are managed by VM
    – Can be moved
    – Can be garbage collected
    • Struct in native memory cannot hold managed object
    • Handles are stored instead – a number / virtualized pointer
    • A table of handles and managed objects is maintained
    – The managed objects cannot be released until the handle is
    – The handles and therefore the objects have to be released manually

    View Slide

  21. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib C extension
    static void
    zstream_passthrough_input(struct zstream *z)
    {
    if (!NIL_P(z->input)) {
    zstream_append_buffer2(z, z->input);
    z->input = Qnil;
    }
    }

    View Slide

  22. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib C extension
    static void
    zstream_passthrough_input(struct zstream *z)
    {
    if (!NIL_P(rb_tr_managed_from_handle(z->input))) {
    zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input));
    z->input = rb_tr_handle_for_managed(Qnil);
    }
    }
    Red links are strong references.
    Convert the handle back to a
    Ruby managed object.
    Convert the managed Ruby
    object to a handle.

    View Slide

  23. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib C extension
    static void gzfile_reader_rewind(struct gzfile *gz) {
    long n;
    n = gz->z.stream.total_in;
    if (!NIL_P(rb_tr_managed_from_handle(gz->z.input))) {
    n += RSTRING_LEN(rb_tr_managed_from_handle(gz->z.input));
    }
    rb_funcall(rb_tr_managed_from_handle(gz->io), id_seek, 2,
    INT2NUM(-n), INT2FIX(1));
    gzfile_reset(gz);
    }
    • About 200 handle methods added just in zlib.c
    – Not good, too many patches to maintain

    View Slide

  24. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Using managed structs to reduce handle methods
    • Trying to make more stuff managed to reduce number of required handle
    methods
    • Managed struct is A Ruby object which behaves as a C struct replacing
    native structs
    – Managed struct cannot be stored on the native stack and has to be initialized
    – Sometimes has to be turned into pointer
    – Inner structs required special handling
    • Does not solve everything
    – Number of patches reduced but still remaining
    – Calls to native libraries (e.g. libz.so) still require handles

    View Slide

  25. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib C extension
    static void
    zstream_passthrough_input(struct zstream *z)
    {
    if (!NIL_P(rb_tr_managed_from_handle(z->input))) {
    zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input));
    z->input = rb_tr_handle_for_managed(Qnil);
    }
    }
    struct zstream z;
    raise_zlib_error(err, z.stream.msg);

    View Slide

  26. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib C extension
    static void
    zstream_passthrough_input(struct zstream *z)
    {
    if (!NIL_P(rb_tr_managed_from_handle(z->input))) {
    zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input));
    z->input = rb_tr_handle_for_managed(Qnil);
    }
    }
    struct zstream *z;
    z = rb_tr_new_managed_struct(zstream);
    raise_zlib_error(err, z->stream.msg);
    The managed struct cannot be stored
    on native stack, the local variable has to
    be turned into pointer.
    The arrow operator has to be
    used instead.

    View Slide

  27. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Leaking handles
    • A table handle -> managed object is
    maintained
    – The managed objects are not released until the
    handle is
    • Part of the C extension patch has to be
    handle management
    – The C extension has be understood and the
    handles freed at the right places
    – Difficult in practice, e.g. a graph of structs
    representing a xml document
    • Red are strong references, blue are weak

    View Slide

  28. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Leaking handles
    • A table handle -> managed object is
    maintained
    – The managed objects are not released until the
    handle is
    • Part of the C extension patch has to be
    handle management
    – The C extension has be understood and the
    handles freed at the right places
    – Difficult in practice, e.g. a graph of structs
    representing a xml document
    • Red are strong references, blue are weak

    View Slide

  29. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    New approach

    View Slide

  30. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Wrap all the Ruby objects before giving them to C
    • A wrapper which knows how to be converted to a pointer
    – Converted lazily when needed
    – Allows to track all the conversions of the wrapper to a pointer
    • The pointer is stored into native memory instead of the managed wrapper
    public class ValueWrapper implements TruffleObject {
    private final Object object;
    private long handle;
    ! "
    @ExportMessage public boolean isPointer() { return true; }
    @ExportMessage public long asPointer() { # lazy $ return handle; }
    ! "
    }

    View Slide

  31. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Wrap all the Ruby objects before giving them to C
    • Ruby C boundary has to translate back and forth
    – Changes in our implementation, not in the C extensions
    polyglot_invoke( recv , method_name, 2,
    arg1 , arg2 )

    View Slide

  32. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Wrap all the Ruby objects before giving them to C
    • Ruby C boundary has to translate back and forth
    – Changes in our implementation, not in the C extensions
    rb_tr_wrap(polyglot_invoke(rb_tr_unwrap(recv), method_name, 2,
    rb_tr_unwrap(arg1), rb_tr_unwrap(arg2)))

    View Slide

  33. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    A method from zlib.c
    static void
    zstream_passthrough_input(struct zstream *z)
    {
    if (!NIL_P(z->input)) {
    zstream_append_buffer2(z, z->input);
    z->input = Qnil;
    }
    }
    • No changes needed in the C extension code
    • No patches to maintain
    A stored pointer converted back to the
    wrapper and then to a Ruby object
    before nil? is called.
    The Qnil constant already contains
    the wrapped nil Ruby object which is
    converted to a pointer to be stored
    in the native struct.
    The pointer is simply
    passed in.

    View Slide

  34. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Memory management
    • We need one solution for everything, we cannot do something special for
    each C extension
    – Managing per C extension patches is not a long-term maintainable solution
    • MRI keeps objects alive with
    – Stack marking
    – Custom mark functions for C data stored in Ruby objects

    View Slide

  35. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Memory management – Stack
    • MRI keeps alive all objects on the stack
    – We keep them alive by creating a list on each enter into a method implemented in C
    – Every lazily created pointer for a wrapper is added into the list
    – The list is discarded when the C method is left
    • Not all wrappers need the pointer created
    – Only when stored into native memory

    View Slide

  36. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  37. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  38. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  39. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  40. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  41. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  42. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  43. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  44. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Memory management – Mark functions
    • C data can be attached to a Ruby object
    – TypedData_Make_Struct
    – There is a custom mark function called during GC
    • Which makes sure the stored objects are not garbage
    collected
    struct zstream {
    VALUE buf;
    VALUE input;
    // ...
    }
    static void zstream_mark(void *p) {
    struct zstream *z = p;
    rb_gc_mark(z->buf);
    rb_gc_mark(z->input);
    }

    View Slide

  45. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Memory management – Mark functions
    • C data can be attached to a Ruby object
    – TypedData_Make_Struct
    – There is a custom mark function called during GC
    • Which makes sure the stored objects are not garbage
    collected
    struct zstream {
    VALUE buf;
    VALUE input;
    // ...
    }
    static void zstream_mark(void *p) {
    struct zstream *z = p;
    rb_gc_mark(z->buf);
    rb_gc_mark(z->input);
    }

    View Slide

  46. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Memory management – Mark functions
    • We keep weak list of all mark functions
    • We keep a fixed-sized buffer of wrappers which needed conversion to a
    pointer
    – Every lazily created pointer in wrapper is put into this buffer
    • Whenever the buffer is full we run the mark functions
    – Updating the held references to the marked objects

    View Slide

  47. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Pentagons are wrapped Ruby
    objects
    • Red arrows are strong references
    • ”A” has a C data attached
    – A struct with single VALUE member
    • Preservation table is fixed-sized
    buffer
    • Handle table maps pointers to
    objects
    Memory management – Mark functions

    View Slide

  48. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Assign B into the A’s struct member
    • Blue arrows are weak references
    – Can be garbage collected
    • B handle is a long (rectangle)
    – It is created lazily when B is being
    stored into native memory
    • C has no handle
    – It is translated to B by the Handle table
    when needed
    • B is put into Preservation table to
    prevent its garbage collection
    Memory management – Mark functions

    View Slide

  49. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • When the preservation table is full
    the mark functions are executed
    – B is marked by A’s marking function
    therefore it is put in A’s list of marked
    objects
    – After all mark functions run we can
    clear Preservation table
    Memory management – Mark functions
    static void
    a_mark(void *p) {
    struct a_struct *z = p;
    rb_gc_mark(z->member);
    }

    View Slide

  50. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Assign a different Ruby object C
    into A’s struct
    • C handle is stored in the struct
    • A list of marked objects stays
    pointing to B until mark functions
    are run again
    • C is put into preservation table
    Memory management – Mark functions

    View Slide

  51. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • Another run of mark functions will
    store C instead of B into A’s list of
    marked objects
    Memory management – Mark functions

    View Slide

  52. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • The B and its handle can be
    garbage collected
    – Assuming B is not referenced anywhere
    else
    Memory management – Mark functions

    View Slide

  53. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    • If A is not referenced anywhere
    everything can be garbage
    collected
    – Only internal global tables remain
    • Actually thread local tables
    Memory management – Mark functions

    View Slide

  54. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Conclusion

    View Slide

  55. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Much better compatibility
    • Significant compatibility improvement
    • We run without patches, out of the box
    – All the standard libraries: openssl, zlib, psych, etc, syslog, ...
    – Database adapters: sqlite3, mysql2, pg, ...
    • All these need to be re-implemented for JRuby
    – Gems: puma, nio4r, byebug, websocket_driver, racc, msgpack, nokogiri, ...
    • Probably many more, we do not know about

    View Slide

  56. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    TruffleRuby
    • C extensions
    – Supported
    • Extensions
    – Not required
    – Generally pure Ruby should
    be fast enough
    • pure JSON on TruffleRuby is
    faster than Cext on CRuby
    • FFI
    – Supported (RC16)
    JRuby
    • C extensions
    – Not supported
    • Replacements required
    • Extensions
    – Java extensions sometimes
    required
    – For performance reasons
    – Both C and Java extensions
    • FFI
    – Supported
    CRuby
    • C extensions
    – Supported
    • Extensions
    – Required
    – For performance reasons
    • FFI
    – Supported
    – Not enough gems though
    Comparison

    View Slide

  57. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Status
    • We are ready for experiments
    • Open-source: https://github.com/oracle/truffleruby
    • Give TruffleRuby a try and please report issues on Github
    – We are actively working on them
    • Installation, latest release for RubyKaigi with full FFI:
    – rvm install truffleruby
    – rbenv install truffleruby-1.0.0-rc16
    – ruby-install truffleruby

    View Slide

  58. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |
    Safe Harbor Statement
    The preceding is intended to provide some insight into a line of research in Oracle Labs. It
    is intended for information purposes only, and may not be incorporated into any
    contract. It is not a commitment to deliver any material, code, or functionality, and
    should not be relied upon in making purchasing decisions. Oracle reserves the right to
    alter its development plans and practices at any time, and the development, release, and
    timing of any features or functionality described in connection with any Oracle product or
    service remains at the sole discretion of Oracle. Any views expressed in this presentation
    are my own and do not necessarily reflect the views of Oracle.

    View Slide

  59. Copyright © 2019, Oracle and/or its affiliates. All rights reserved. |

    View Slide

  60. View Slide