Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TruffleRuby: Wrapping up compatibility for C extensions

TruffleRuby: Wrapping up compatibility for C extensions

We think it is crucial that any alternative Ruby implementation aiming to be fully compatible with MRI runs the C extensions. TruffleRuby's compatibility was recently significantly improved, with much better support that almost completely removes the need to patch C extensions.

In this talk you will hear and see: how the old approach to C extensions worked and where and why it was failing short; how does the new approach work and how much closer it brings TruffleRuby to its goal to be a drop-in replacement for MRI.

We have been interpreting the C extensions (and JITing together with Ruby code) for a while, however we have been passing the Ruby objects directly into the C code which had lead to problems. We now have a new innovative technique which no longer requires patches in almost all cases. The objects are wrapped for greater compatibility and there is a virtual GC marking phase to avoid memory leaks.

Petr Chalupa

April 20, 2019
Tweet

More Decks by Petr Chalupa

Other Decks in Programming

Transcript

  1. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | TruffleRuby: Wrapping up compatibility for C extensions Petr Chalupa Principal Member of Technical Staff Oracle Labs April 20, 2019
  2. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Safe Harbor Statement The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.
  3. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Program Agenda Technologies Execution of the C extensions Old approach New approach Conclusion 1 2 3 4 5
  4. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Alternative implementation of Ruby • A drop-in replacement for the CRuby implementation – C extensions support – Good startup time for development • Just-in-time compilation • Generally faster then any other implementation – If not, please file a bug TruffleRuby TruffleRuby Ruby
  5. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Specializing (self-modifying) Abstract Syntax Tree interpreter • Simpler language implementation • Polyglot protocol – Languages can pass values to each other – No usual slow language barrier • Instrumentation – Multi-language debugger – Profilers Truffle - Language Implementation Framework Truffle TruffleRuby Ruby
  6. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Truffle - Language Implementation Framework
  7. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Dynamic compiler written in Java • In combination with Truffle – Inlining – Splitting, method cloning – Partial Evaluation • All the optimizations are done in Truffle and Graal rather than the language implementations – Shared – Optimized together Graal Compiler Graal Truffle TruffleRuby Ruby
  8. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Executes compiled methods – Provided by Graal • Garbage collector • Runs Java Java VM Java VM Graal Truffle TruffleRuby Ruby
  9. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Distribution of – Java VM – Graal – Truffle – Languages you can run • Java, Kotlin, Scala, ... • Ruby, JS, R, Python, ... • C, C++, Fortran, ... GraalVM GraalVM Java VM Graal Truffle TruffleRuby Other languages ... Ruby
  10. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • LLVM bitcode runtime – Technically an interpreter with JIT – Any language transformable to LLVM bitcode can be executed – E.g. C/C++ and Fortran • TruffleRuby executes Ruby code • Sulong executes C extensions • Both are Truffle languages optimized together by Graal Sulong GraalVM Java VM Graal Truffle TruffleRuby Sulong C Ruby
  11. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Ahead of time compilation of Java applications • TruffleRuby, Sulong, Truffle, and Graal are written in Java • Executable Ruby binary is produced with fast startup – No slow startup limitation for day to day development Substrate VM GraalVM Graal Truffle TruffleRuby Sulong C Java VM Substrate VM Ruby
  12. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Startup time Implementation Time Memory MB TruffleRuby native 0.025 65 CRuby 2.6.2 0.048 14 Rubinius 3.107 0.150 78 JRuby 9.2.7.0 1.357 160 TruffleRuby JVM 1.787 456 Of ruby –e “p puts ‘Hello world’”
  13. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Execution of the C extensions • Sulong is a Truffle language – Interoperability with other languages – VALUEs in the C extension code can be Ruby objects • Managed vs. unmanaged memory – Managed (Garbage collected Ruby) objects cannot be put into unmanaged (native) memory – We do tricks to store Ruby objects into native memory (e.g., arrays or structs in C) • Optimized together – In Truffle all languages use the same Intermediate Representation • Polyglot protocol
  14. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Polyglot protocol • An API allowing languages to talk to foreign values without conversion – hasMembers, readMember, writeMember, ... • Example from C: a_ruby_object->member – isPointer, asPointer, ... • Example from C: a_struct->member = ruby_object – ... • Part of Truffle – Implemented with specializing nodes • If C reads from a Ruby object Ruby provides nodes defining the Ruby read – JITed
  15. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Understanding C extension evaluation static void gzfile_reader_rewind(struct gzfile *gz) { long n; n = gz->z.stream.total_in; if (!NIL_P(gz->z.input)) { n += RSTRING_LEN(gz->z.input); } rb_funcall(gz->io, id_seek, 2, INT2NUM(-n), INT2FIX(1)); gzfile_reset(gz); } NIL_P calls nil? on a Ruby object read from a nested struct. Get a length as C long of a String stored in a nested struct. Call a method on a ruby object stored in a struct with arguments.
  16. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Storing handles instead of managed objects • Managed objects – Are managed by VM – Can be moved – Can be garbage collected • Struct in native memory cannot hold managed object • Handles are stored instead – a number / virtualized pointer • A table of handles and managed objects is maintained – The managed objects cannot be released until the handle is – The handles and therefore the objects have to be released manually
  17. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib C extension static void zstream_passthrough_input(struct zstream *z) { if (!NIL_P(z->input)) { zstream_append_buffer2(z, z->input); z->input = Qnil; } }
  18. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib C extension static void zstream_passthrough_input(struct zstream *z) { if (!NIL_P(rb_tr_managed_from_handle(z->input))) { zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input)); z->input = rb_tr_handle_for_managed(Qnil); } } Red links are strong references. Convert the handle back to a Ruby managed object. Convert the managed Ruby object to a handle.
  19. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib C extension static void gzfile_reader_rewind(struct gzfile *gz) { long n; n = gz->z.stream.total_in; if (!NIL_P(rb_tr_managed_from_handle(gz->z.input))) { n += RSTRING_LEN(rb_tr_managed_from_handle(gz->z.input)); } rb_funcall(rb_tr_managed_from_handle(gz->io), id_seek, 2, INT2NUM(-n), INT2FIX(1)); gzfile_reset(gz); } • About 200 handle methods added just in zlib.c – Not good, too many patches to maintain
  20. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Using managed structs to reduce handle methods • Trying to make more stuff managed to reduce number of required handle methods • Managed struct is A Ruby object which behaves as a C struct replacing native structs – Managed struct cannot be stored on the native stack and has to be initialized – Sometimes has to be turned into pointer – Inner structs required special handling • Does not solve everything – Number of patches reduced but still remaining – Calls to native libraries (e.g. libz.so) still require handles
  21. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib C extension static void zstream_passthrough_input(struct zstream *z) { if (!NIL_P(rb_tr_managed_from_handle(z->input))) { zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input)); z->input = rb_tr_handle_for_managed(Qnil); } } struct zstream z; raise_zlib_error(err, z.stream.msg);
  22. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib C extension static void zstream_passthrough_input(struct zstream *z) { if (!NIL_P(rb_tr_managed_from_handle(z->input))) { zstream_append_buffer2(z, rb_tr_managed_from_handle(z->input)); z->input = rb_tr_handle_for_managed(Qnil); } } struct zstream *z; z = rb_tr_new_managed_struct(zstream); raise_zlib_error(err, z->stream.msg); The managed struct cannot be stored on native stack, the local variable has to be turned into pointer. The arrow operator has to be used instead.
  23. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Leaking handles • A table handle -> managed object is maintained – The managed objects are not released until the handle is • Part of the C extension patch has to be handle management – The C extension has be understood and the handles freed at the right places – Difficult in practice, e.g. a graph of structs representing a xml document • Red are strong references, blue are weak
  24. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Leaking handles • A table handle -> managed object is maintained – The managed objects are not released until the handle is • Part of the C extension patch has to be handle management – The C extension has be understood and the handles freed at the right places – Difficult in practice, e.g. a graph of structs representing a xml document • Red are strong references, blue are weak
  25. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Wrap all the Ruby objects before giving them to C • A wrapper which knows how to be converted to a pointer – Converted lazily when needed – Allows to track all the conversions of the wrapper to a pointer • The pointer is stored into native memory instead of the managed wrapper public class ValueWrapper implements TruffleObject { private final Object object; private long handle; ! " @ExportMessage public boolean isPointer() { return true; } @ExportMessage public long asPointer() { # lazy $ return handle; } ! " }
  26. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Wrap all the Ruby objects before giving them to C • Ruby C boundary has to translate back and forth – Changes in our implementation, not in the C extensions polyglot_invoke( recv , method_name, 2, arg1 , arg2 )
  27. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Wrap all the Ruby objects before giving them to C • Ruby C boundary has to translate back and forth – Changes in our implementation, not in the C extensions rb_tr_wrap(polyglot_invoke(rb_tr_unwrap(recv), method_name, 2, rb_tr_unwrap(arg1), rb_tr_unwrap(arg2)))
  28. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | A method from zlib.c static void zstream_passthrough_input(struct zstream *z) { if (!NIL_P(z->input)) { zstream_append_buffer2(z, z->input); z->input = Qnil; } } • No changes needed in the C extension code • No patches to maintain A stored pointer converted back to the wrapper and then to a Ruby object before nil? is called. The Qnil constant already contains the wrapped nil Ruby object which is converted to a pointer to be stored in the native struct. The pointer is simply passed in.
  29. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Memory management • We need one solution for everything, we cannot do something special for each C extension – Managing per C extension patches is not a long-term maintainable solution • MRI keeps objects alive with – Stack marking – Custom mark functions for C data stored in Ruby objects
  30. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Memory management – Stack • MRI keeps alive all objects on the stack – We keep them alive by creating a list on each enter into a method implemented in C – Every lazily created pointer for a wrapper is added into the list – The list is discarded when the C method is left • Not all wrappers need the pointer created – Only when stored into native memory
  31. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Memory management – Mark functions • C data can be attached to a Ruby object – TypedData_Make_Struct – There is a custom mark function called during GC • Which makes sure the stored objects are not garbage collected struct zstream { VALUE buf; VALUE input; // ... } static void zstream_mark(void *p) { struct zstream *z = p; rb_gc_mark(z->buf); rb_gc_mark(z->input); }
  32. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Memory management – Mark functions • C data can be attached to a Ruby object – TypedData_Make_Struct – There is a custom mark function called during GC • Which makes sure the stored objects are not garbage collected struct zstream { VALUE buf; VALUE input; // ... } static void zstream_mark(void *p) { struct zstream *z = p; rb_gc_mark(z->buf); rb_gc_mark(z->input); }
  33. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Memory management – Mark functions • We keep weak list of all mark functions • We keep a fixed-sized buffer of wrappers which needed conversion to a pointer – Every lazily created pointer in wrapper is put into this buffer • Whenever the buffer is full we run the mark functions – Updating the held references to the marked objects
  34. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Pentagons are wrapped Ruby objects • Red arrows are strong references • ”A” has a C data attached – A struct with single VALUE member • Preservation table is fixed-sized buffer • Handle table maps pointers to objects Memory management – Mark functions
  35. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Assign B into the A’s struct member • Blue arrows are weak references – Can be garbage collected • B handle is a long (rectangle) – It is created lazily when B is being stored into native memory • C has no handle – It is translated to B by the Handle table when needed • B is put into Preservation table to prevent its garbage collection Memory management – Mark functions
  36. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • When the preservation table is full the mark functions are executed – B is marked by A’s marking function therefore it is put in A’s list of marked objects – After all mark functions run we can clear Preservation table Memory management – Mark functions static void a_mark(void *p) { struct a_struct *z = p; rb_gc_mark(z->member); }
  37. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Assign a different Ruby object C into A’s struct • C handle is stored in the struct • A list of marked objects stays pointing to B until mark functions are run again • C is put into preservation table Memory management – Mark functions
  38. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • Another run of mark functions will store C instead of B into A’s list of marked objects Memory management – Mark functions
  39. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • The B and its handle can be garbage collected – Assuming B is not referenced anywhere else Memory management – Mark functions
  40. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | • If A is not referenced anywhere everything can be garbage collected – Only internal global tables remain • Actually thread local tables Memory management – Mark functions
  41. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Much better compatibility • Significant compatibility improvement • We run without patches, out of the box – All the standard libraries: openssl, zlib, psych, etc, syslog, ... – Database adapters: sqlite3, mysql2, pg, ... • All these need to be re-implemented for JRuby – Gems: puma, nio4r, byebug, websocket_driver, racc, msgpack, nokogiri, ... • Probably many more, we do not know about
  42. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | TruffleRuby • C extensions – Supported • Extensions – Not required – Generally pure Ruby should be fast enough • pure JSON on TruffleRuby is faster than Cext on CRuby • FFI – Supported (RC16) JRuby • C extensions – Not supported • Replacements required • Extensions – Java extensions sometimes required – For performance reasons – Both C and Java extensions • FFI – Supported CRuby • C extensions – Supported • Extensions – Required – For performance reasons • FFI – Supported – Not enough gems though Comparison
  43. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Status • We are ready for experiments • Open-source: https://github.com/oracle/truffleruby • Give TruffleRuby a try and please report issues on Github – We are actively working on them • Installation, latest release for RubyKaigi with full FFI: – rvm install truffleruby – rbenv install truffleruby-1.0.0-rc16 – ruby-install truffleruby
  44. Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

    | Safe Harbor Statement The preceding is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.