Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ferrari Driven Development: superfast Ruby with Rubex

Ferrari Driven Development: superfast Ruby with Rubex

My talk at Ruby Kaigi 2018, Sendai.

Sameer Deshmukh

June 01, 2018
Tweet

More Decks by Sameer Deshmukh

Other Decks in Programming

Transcript

  1. Hello Ruby Kaigi!



    View Slide

  2. Sameer Deshmukh

    @v0dro
    @v0dro

    View Slide

  3. India | Pune

    View Slide

  4. View Slide

  5. Master’s Degree Student(HPC)
    Tokyo Institute of Technology

    View Slide

  6. View Slide

  7. Ruby
    Science
    Foundation
    www.sciruby.com
    @sciruby
    @sciruby

    View Slide

  8. View Slide

  9. The first time: Daru.
    DataFrame library
    for Ruby

    View Slide

  10. Daru has seen
    > 185000 downloads
    ever since.
    Various plugins and
    many contributors.

    View Slide

  11. daru ==
    in Hindi*
    * India’s official language.

    View Slide

  12. View Slide

  13. The second time:
    very first introduction
    to Rubex.

    View Slide

  14. 120+ commits to
    Rubex ever since.
    More stable, usable
    language than before.

    View Slide

  15. The third time:
    Improved and
    polished Rubex.

    View Slide

  16. FDD:
    Ferrari Driven
    Development.

    View Slide

  17. Ruby is an awesome language,
    but it is slow.

    View Slide

  18. Ruby
    speed reliability
    C

    View Slide

  19. Ruby
    speed reliability
    C
    Nokogiri
    Nokogiri::XML()
    fast_blank
    String#blank?
    libxml Handwritten C

    View Slide

  20. C extensions have
    BIG problems

    View Slide

  21. Difficult and irritating to write.
    Steep learning curve.
    Lots of scaffolding code.

    View Slide

  22. Manually bootstrap the
    extension with Ruby.

    View Slide

  23. Hard to read and
    understand public C APIs.

    View Slide

  24. Need to care about small things™*.
    * © Matz

    View Slide

  25. Various solutions exist (partly)

    Ruby inline.
    – Doesn’t scale.

    FFI.
    – Reductive and manual compilation.

    SWIG.
    – Evil, unreadable wrappers.

    Helix.
    – Entirely new language/paradigm.

    View Slide

  26. Ideal solution:
    Super-fast
    (and nice) Ruby.

    View Slide

  27. View Slide

  28. Rubex: A Ferrari for Ruby
    Ruby C extensions without losing
    your happiness.

    View Slide

  29. Improvements from last year

    View Slide

  30. Improvements from last year



    View Slide

  31. Improvements from last year



    View Slide

  32. Improvements from last year




    Rubex is a much more robust and stable
    language.

    Lots of refactoring of internal codebase.

    Little shift in Rubex’s goals - from simply speed
    to portability/readability of C extensions.

    View Slide

  33. What you think of C APIs of gems
    Your ruby library’s C ext
    Another C ext API that
    you are using

    View Slide

  34. What it is actually feels like
    Your ruby library’s C ext
    Another C ext API that
    you are using

    View Slide

  35. Ruby vs. Rubex
    Ruby program Rubex program
    def add(int a,int b)
    return a + b
    end
    def add(a, b)
    return a + b
    end

    View Slide

  36. Rubex code
    C code
    CRuby runtime
    Language which looks like
    Ruby.
    C Code ready to interface
    with Ruby VM.
    Code actually runs here.

    View Slide

  37. View Slide

  38. ["a", "b", "see", "d" ... ]

    View Slide

  39. {
    "a" => 0,
    "b" => 1,
    "see" => 2,
    "d" => 4
    ...
    }

    View Slide

  40. View Slide

  41. array.each_with_index.to_h

    View Slide

  42. View Slide

  43. class Array2Hash
    def self.convert(arr a)
    long int i = a.size, j = 0
    hsh result = {}
    while j < i do
    result[a[j]] = j
    j += 1
    end
    return result
    end
    end

    View Slide

  44. class Array2Hash
    def self.convert(arr a)
    long int i = a.size, j = 0
    hsh result = {}
    while j < i do
    result[a[j]] = j
    j += 1
    end
    return result
    end
    end

    View Slide

  45. class Array2Hash
    def self.convert(arr a)
    long int i = a.size, j = 0
    hsh result = {}
    while j < i do
    result[a[j]] = j
    j += 1
    end
    return result
    end
    end

    View Slide

  46. class Array2Hash
    def self.convert(arr a)
    long int i = a.size, j = 0
    hsh result = {}
    while j < i do
    result[a[j]] = j
    j += 1
    end
    return result
    end
    end

    View Slide

  47. class Array2Hash
    def self.convert(arr a)
    long int i = a.size, j = 0
    hsh result = {}
    while j < i do
    result[a[j]] = j
    j += 1
    end
    return result
    end
    end

    View Slide

  48. require 'array2hash.so'
    Array2Hash.convert array

    View Slide

  49. Benchmarks
    Warming up --------------------------------------
    convert 368.000 i/100ms
    each_with_index.to_h 236.000 i/100ms
    Calculating -------------------------------------
    convert 3.488k (± 9.8%) i/s - 17.296k in 5.012260s
    each_with_index.to_h 2.192k (± 8.3%) i/s - 11.092k in 5.097432s
    Comparison:
    convert: 3487.8 i/s
    each_with_index.to_h: 2192.3 i/s - 1.59x slower

    View Slide

  50. Interfacing with
    C data types.

    View Slide

  51. View Slide

  52. View Slide

  53. struct blanket {
    int warmth_factor;
    char* owner;
    float len, breadth;
    };

    View Slide


  54. GC marking of Ruby objects.

    Memory deallocation.

    Write an extconf.rb.

    struct rb_data_type_t.

    TypedData_Make_Struct().

    TypedData_Get_Struct().

    rb_define_instance_method().

    rb_define_class().

    rb_define_alloc_func().

    View Slide

  55. struct blanket
    int warmth_factor
    char* owner
    float len, breadth
    end

    View Slide

  56. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  57. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  58. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  59. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  60. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  61. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  62. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end
    _

    View Slide

  63. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  64. class BlanketWrapper attach blanket
    def initialize(warmth_factor, owner, len, breadth)
    data$.blanket.warmth_factor = warmth_factor
    data$.blanket.owner = owner
    data$.blanket.len = len
    data$.blanket.breadth = breadth
    end
    def warmth_factor
    return data$.blanket.warmth_factor
    end
    # ... more code for blanket interface.
    end

    View Slide

  65. Rubex struct wrapping

    ~3x reduction in LoC written.

    Friendly, elegant Ruby-like interface.

    No compromise in speed.

    No C code!

    View Slide

  66. Codebase
    management through
    name-spacing and
    public APIs

    View Slide

  67. Seamlessly define
    C and Ruby functions
    in class/module
    namespaces.

    View Slide

  68. class Foo
    cfunc void bar(int a, b)
    # some C and Ruby intermix
    end
    def baz(float c, e)
    bar(1, e)
    end
    end

    View Slide

  69. class Foo
    cfunc void bar(int a, b)
    # some C and Ruby intermix
    end
    def baz(float c, e)
    bar(1, e)
    end
    end

    View Slide

  70. class Foo
    cfunc void bar(int a, b)
    # some C and Ruby intermix
    end
    def baz(float c, e)
    bar(1, e)
    end
    end

    View Slide

  71. Define public APIs

    View Slide

  72. APIs exist in
    separate .rubexd
    files

    View Slide

  73. class Klass
    cfunc void foo(int a, int b)
    def bar
    def baz(int a, b, float c)
    end
    class OtherKlass
    cfunc void foo(int a, int b)
    def bar
    def baz(int a, b, float c)
    end

    View Slide

  74. Advantages

    Easily import C extension APIs through a
    ‘require_rubex’ compiler declaration.

    Supply only the compiled binary and API files to
    like most C libraries.

    Portal implementations across Operating
    Systems.

    Auto-generated packaging and compiling
    scripts.

    View Slide

  75. The infamous GIL

    View Slide

  76. View Slide

  77. View Slide

  78. A simple file
    reading example

    View Slide

  79. Read a file of 5_00_000
    lines with a value at each
    line into memory
    Read line
    0 – 1_25_000
    Read line
    1_25_000 –
    2_50_00
    Read line
    2_50_000 –
    3_75_000
    Read line
    3_75_000 –
    5_00_00
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Get all values and
    compute the average

    View Slide

  80. Read a file of 5_00_000
    lines with a value at each
    line into memory
    Read line
    0 – 1_25_000
    Read line
    1_25_000 –
    2_50_00
    Read line
    2_50_000 –
    3_75_000
    Read line
    3_75_000 –
    5_00_00
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    CPU 1 CPU 2 CPU 3 CPU 4

    View Slide

  81. Read a file of 5_00_000
    lines with a value at each
    line into memory
    Read line
    0 – 1_25_000
    Read line
    1_25_000 –
    2_50_00
    Read line
    2_50_000 –
    3_75_000
    Read line
    3_75_000 –
    5_00_00
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Compute sum of
    values at each line
    Global Interpreter Lock
    CPU 1 CPU 2 CPU 3 CPU 4

    View Slide

  82. Rubex no_gil block:
    A simple way to
    release the GIL

    View Slide

  83. # In a rubex file test.rubex
    cfunc void _some_computation
    int i, j, k = 0
    no_gil
    # ... perform some computation
    end
    end

    View Slide

  84. # In a calling Ruby script caller.rb
    require ‘compiled_binary.so’
    def compute_without_gil
    t = []
    4.times {
    t << Thread.new {
    _some_computation
    }
    }
    4.times { t.join }
    end

    View Slide

  85. Actual implementation

    Made a simple implementation of the
    aforementioned example of reading and
    computing values from a file.

    Benchmarks indicate huge difference in
    performance.

    View Slide

  86. Warming up --------------------------------------
    without GIL in C 3.000 i/100ms
    with GIL in Ruby 1.000 i/100ms
    with GIL in C 1.000 i/100ms
    Calculating -------------------------------------
    without GIL in C 36.210 (± 2.8%) i/s - 183.000 in 5.059510s
    with GIL in Ruby 0.102 (± 0.0%) i/s - 1.000 in 9.830386s
    with GIL in C 18.591 (± 0.0%) i/s - 93.000 in 5.005381s
    Comparison:
    without GIL in C: 36.2 i/s
    with GIL in C: 18.6 i/s - 1.95x slower
    with GIL in Ruby: 0.1 i/s - 355.96x slower

    View Slide

  87. See my CPUs!

    View Slide

  88. See my code!
    https://github.com/v0dro/rubex_csv_reader

    View Slide

  89. Limitations of GIL release

    Can only use C data structures inside the no_gil
    block.

    Overhead associated with releasing and
    regaining GIL.

    Might break code that depends on the GIL.

    View Slide

  90. Exception Handling

    View Slide

  91. Many C functions need to be used

    rb_raise() for raising error.

    rb_rescue(), rb_rescue2(), rb_protect(),
    rb_ensure() for rescue and ensure blocks.

    rb_errinfo() for getting the last error raised.

    rb_set_errinfo(Qnil) for resetting error
    information.

    View Slide

  92. Workflow becomes complex

    Almost zero compliance with begin-ensure
    block workflow.

    Create C function callbacks.

    Manually catch and rescue exceptions.

    Inflexibility in sending data to callbacks.

    View Slide

  93. int i = accept_number()
    begin
    raise(ArgumentError) if i == 3
    raise(FooBarError) if i == 5
    rescue ArgumentError
    i += 1
    rescue FooBarError
    i += 2
    ensure
    i += 10
    end

    View Slide

  94. https://github.com/sciruby/rubex

    View Slide

  95. Differences from Ruby

    Must specify brackets for function calls.

    No support for blocks/closures (yet).

    Must specify return keyword to return from
    functions.

    No support for ‘value of’ operator *.

    No support for -> operator for struct pointers.
    Differences from C

    View Slide

  96. Notable Rubex examples

    Rubex repo examples/ folder.
    – Fully functional libcsv wrapper for reading
    CSV files written entirely in Rubex.

    Array2Hash gem
    – https://github.com/v0dro/array2hash

    View Slide

  97. Detailed Docs and Tutorial

    REFERENCE.md.
    – Complete specification of the entire
    language.

    TUTORIAL.md.
    – Quick, easy to use explanation with
    code samples.

    View Slide

  98. Conclusion

    Rubex is a fast and productive way of
    writing Ruby C extensions.

    Provides users with the elegance of Ruby
    and the power of C while following the
    principle of least surprise.

    Provides abstractions in C extensions at
    no performance cost.

    View Slide

  99. New ideas for a
    better Ruby

    View Slide

  100. Rubex ideas

    Typed memory views.
    – Get a ‘memory view’ of contiguous Ruby types.
    – Will work with NMatrix and NArray gems.

    Direct interfacing with GPUs through native
    kernels.
    – Zero-abstraction interfacing with GPUs for
    accelerating computation.
    – Possible use in cumo.

    Integration with GDB.

    View Slide

  101. Rubyplot – advanced ruby plotting
    library

    Ruby does not have a single native plotting
    solution that even comes close to the likes of
    matplotlib/bokeh/something else.

    Rubyists don’t have a single go-to solution for
    their visualization needs that can scale.

    I think this situation is ridiculous for such a
    mature language ecosystem.

    View Slide

  102. Various partial solutions exist

    Matplotlib.rb – interfaces python matplotlib via
    pycall.

    Nyaplot – Bokeh like web visualization but
    abandoned by author.

    Google charts/high charts/etc. – too much
    dependence on 3rd party web tools, some of
    which are paid/non-free.

    Various GNU plot frontends.

    View Slide

  103. Rubyplot can change that!

    A native plotting solution written in C++ with a
    Ruby wrapper.

    Will directly interface with image-magick, GTK
    and GR to create a powerful plotting tool.

    Unlike matplotlib, will be eventually a language
    neutral C++ library to leverage contributions
    from other language communities.

    View Slide

  104. View the progress of rubyplot

    Development started a few weeks ago.

    Follow on discourse:
    – https://discourse.ruby-data.org/

    Follow on GitHub:
    – https://github.com/sciruby/rubyplot

    Contributions/opinions are welcome!

    View Slide

  105. Common array library

    Nmatrix and numo/narray are two major array
    libraries.

    Important to bridge this divide and build on a
    library that is robust and well supported.

    Potential answer is plures – a language
    independent C backend to numpy.

    View Slide

  106. More about plures

    Plures is supported by Quansight by the
    creators of numpy (Python).

    Common C API across languages/frameworks.

    Need more discussion on Ruby frontend.

    View Slide

  107. Acknowledgements

    Ruby Association Grant 2016.

    Kenta Murata, Koichi Sasada and
    Naotoshi Seo for their support and
    mentorship.

    Fukuoka Ruby Award 2016.

    Ruby Science Foundation.

    View Slide

  108. I haz SciRuby stickers.
    ^_^

    View Slide

  109. THANK YOU!


    View Slide