Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Analysis in RUby with daru

Avatar for Sameer Deshmukh Sameer Deshmukh
September 09, 2016

Data Analysis in RUby with daru

This talk was presented at Ruby Kaigi 2016, Kyoto. It is a small demonstration about the capabilities of daru and what it is capable of so far. By the end of the talk I have also given a small demo.

The notebooks of this talk can be found at this link: https://github.com/v0dro/talks/tree/master/Ruby%20Kaigi%202016

Avatar for Sameer Deshmukh

Sameer Deshmukh

September 09, 2016
Tweet

More Decks by Sameer Deshmukh

Other Decks in Programming

Transcript

  1. Me

  2. Daru::Vector Heterogenous Array that can be indexed on any Ruby

    object. Name Label(0) Label(1) Label(2) ... Label(n-1)
  3. Daru::DataFrame 2D spreadsheet like data structure indexed by rows or

    columns. Col0 Label(0) Label(1) Label(2) ... Label(n-1) Col1 Col2 Col(n-1) ....
  4. Runs in your browser Input cell – accepts Ruby code

    Output cell – can render HTML/CSS/JS
  5. Acts as glue between other SciRuby libraries • statsample for

    Statistics. • mixed_models for Mixed Models. • daru­td for Treasure Data. • nmatrix for efficient data storage.
  6. Provides an R­like formula language for specifying regressions. “Y ~

    a+a:b+c+c:d” Y = ß0 + a*ß1 + a*b*ß2 + c*ß3 + c*d*ß4
  7. unsigned long long int calc_factorial(unsigned long long int n) {

    return (n > 1 ? n*calc_factorial(n-1) : 1); } static VALUE cfactorial(VALUE self, VALUE n) { return ULL2FIX( calc_factorial(NUM2ULL(n))); }
  8. Big Problems • Difficult and irritating to write. • Time

    consuming to debug. • Tough to trace memory leaks. • Change mindset from high level to low level language. • Need to care about small things.™* *Matz – Keynote at Red Dot Ruby Conf 2016, Singapore.
  9. # Create a C static array and return a Ruby

    Array def adder(n) a = StaticArray(i32, n) i32 i = 0 i32 sum = 0 a.each(n) { a[i] = i*5 } for 0 <= i < n do sum += a[i] end sum end
  10. Allows interfacing JRuby libraries with jBLAS for performance. Uses Apache

    Commons Math library for storage and operations on internal Java arrays.
  11. require 'symengine' x = SymEngine::Symbol.new("x") y = SymEngine::Symbol.new("y") z =

    SymEngine::Symbol.new("z") f = (x – y) * (x ** y / z) f.expand.to_s # x**(1 + y)/z – x**y*y/z f == - (x**y*y/z) + (x**y*x/z) # true
  12. require 'spice_rub' k_pool = SpiceRub::KernelPool.instance k_pool.load_folder("spec/data/kernels") epoch = SpiceRub::Time.now moon

    = SpiceRub::Body.new(:moon) earth = SpiceRub::Body.now(:earth) earth.position_at(epoch) moon.distance_from(:earth, epoch) # 395791.1464913574 (Km)
  13. Acknowledgements • @agisga and @lokeshh for statistics with daru and

    statsample­glm. • @gau27 for spice_rub. • @prasunanand for NMatrix on JRuby. • @rajithv for symengine.rb. • @gnilrets, @mrkn, @zverok and all the other contributors to daru.