Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Analysis in RUby with daru

Sameer Deshmukh
September 09, 2016

Data Analysis in RUby with daru

This talk was presented at Ruby Kaigi 2016, Kyoto. It is a small demonstration about the capabilities of daru and what it is capable of so far. By the end of the talk I have also given a small demo.

The notebooks of this talk can be found at this link: https://github.com/v0dro/talks/tree/master/Ruby%20Kaigi%202016

Sameer Deshmukh

September 09, 2016
Tweet

More Decks by Sameer Deshmukh

Other Decks in Programming

Transcript

  1. Me

  2. Daru::Vector Heterogenous Array that can be indexed on any Ruby

    object. Name Label(0) Label(1) Label(2) ... Label(n-1)
  3. Daru::DataFrame 2D spreadsheet like data structure indexed by rows or

    columns. Col0 Label(0) Label(1) Label(2) ... Label(n-1) Col1 Col2 Col(n-1) ....
  4. Runs in your browser Input cell – accepts Ruby code

    Output cell – can render HTML/CSS/JS
  5. Acts as glue between other SciRuby libraries • statsample for

    Statistics. • mixed_models for Mixed Models. • daru­td for Treasure Data. • nmatrix for efficient data storage.
  6. Provides an R­like formula language for specifying regressions. “Y ~

    a+a:b+c+c:d” Y = ß0 + a*ß1 + a*b*ß2 + c*ß3 + c*d*ß4
  7. unsigned long long int calc_factorial(unsigned long long int n) {

    return (n > 1 ? n*calc_factorial(n-1) : 1); } static VALUE cfactorial(VALUE self, VALUE n) { return ULL2FIX( calc_factorial(NUM2ULL(n))); }
  8. Big Problems • Difficult and irritating to write. • Time

    consuming to debug. • Tough to trace memory leaks. • Change mindset from high level to low level language. • Need to care about small things.™* *Matz – Keynote at Red Dot Ruby Conf 2016, Singapore.
  9. # Create a C static array and return a Ruby

    Array def adder(n) a = StaticArray(i32, n) i32 i = 0 i32 sum = 0 a.each(n) { a[i] = i*5 } for 0 <= i < n do sum += a[i] end sum end
  10. Allows interfacing JRuby libraries with jBLAS for performance. Uses Apache

    Commons Math library for storage and operations on internal Java arrays.
  11. require 'symengine' x = SymEngine::Symbol.new("x") y = SymEngine::Symbol.new("y") z =

    SymEngine::Symbol.new("z") f = (x – y) * (x ** y / z) f.expand.to_s # x**(1 + y)/z – x**y*y/z f == - (x**y*y/z) + (x**y*x/z) # true
  12. require 'spice_rub' k_pool = SpiceRub::KernelPool.instance k_pool.load_folder("spec/data/kernels") epoch = SpiceRub::Time.now moon

    = SpiceRub::Body.new(:moon) earth = SpiceRub::Body.now(:earth) earth.position_at(epoch) moon.distance_from(:earth, epoch) # 395791.1464913574 (Km)
  13. Acknowledgements • @agisga and @lokeshh for statistics with daru and

    statsample­glm. • @gau27 for spice_rub. • @prasunanand for NMatrix on JRuby. • @rajithv for symengine.rb. • @gnilrets, @mrkn, @zverok and all the other contributors to daru.