Save 37% off PRO during our Black Friday Sale! »

Data Analysis in RUby with daru

5083e35c5075b75473919524286239b3?s=47 Sameer Deshmukh
September 09, 2016

Data Analysis in RUby with daru

This talk was presented at Ruby Kaigi 2016, Kyoto. It is a small demonstration about the capabilities of daru and what it is capable of so far. By the end of the talk I have also given a small demo.

The notebooks of this talk can be found at this link: https://github.com/v0dro/talks/tree/master/Ruby%20Kaigi%202016

5083e35c5075b75473919524286239b3?s=128

Sameer Deshmukh

September 09, 2016
Tweet

Transcript

  1. namaste

  2. None
  3. None
  4. None
  5. None
  6. India must master Western science and yet preserve its Culture

    and Heritage. What India Dreams
  7. None
  8. City of Pune. Population: 6 million. Oxford of the East.

  9. Sameer Deshmukh github.com/v0dro @v0dro

  10. None
  11. Dr. Gopal Deshmukh Sameer Desmukh Dr. Hemchandra Deshmukh Dr. Satish

    Deshmukh
  12. www.soundcloud.com/catkamikazee Sameer

  13. None
  14. Me

  15. Pune Ruby Users Group www.punerb.org @punerb @punerb @deccanrubyconf www.deccanrubyconf.org

  16. Ruby Science Foundation www.sciruby.com @sciruby @sciruby

  17. None
  18. Data Analysis in Ruby with daru

  19. daru (Data Analysis in RUby)

  20. daru == (Hindi) ददार sake alcohol

  21. library for analysis, cleaning, manipulation and visualization of data.

  22. Read/write many data sources Ephemeral statistics functions Works well with

    'wild' data Advanced Data indexing
  23. Daru::Vector Heterogenous Array that can be indexed on any Ruby

    object. Name Label(0) Label(1) Label(2) ... Label(n-1)
  24. Daru::DataFrame 2D spreadsheet like data structure indexed by rows or

    columns. Col0 Label(0) Label(1) Label(2) ... Label(n-1) Col1 Col2 Col(n-1) ....
  25. Data visualization with Nyaplot, GNUplotrb and Gruff.

  26. iruby notebook gem install iruby

  27. Browser based Ruby REPL for interactive computing.

  28. Runs in your browser Input cell – accepts Ruby code

    Output cell – can render HTML/CSS/JS
  29. 60% of a data analyst's time is spent on cleaning

    data.
  30. Acts as glue between other SciRuby libraries • statsample for

    Statistics. • mixed_models for Mixed Models. • daru­td for Treasure Data. • nmatrix for efficient data storage.
  31. statsample­glm gem install statsample­glm

  32. Logistic, probit, poisson, normal regression methods in Ruby.

  33. Provides an R­like formula language for specifying regressions. “Y ~

    a+a:b+c+c:d” Y = ß0 + a*ß1 + a*b*ß2 + c*ß3 + c*d*ß4
  34. Use Case: Kaggle Animal Shelter Data

  35. OMG I've had too much daru!! STOP! STOP!

  36. New Ideas for better Ruby

  37. “Any sufficiently advanced technology is indistinguishable from magic.” ­ Arthur

    C. Clarke
  38. None
  39. Writing C extensions • FFI gem. • Rice. • SWIG.

    • Writing C bindings manually.
  40. Rubyist! Write me a C extension!

  41. def factorial n n > 1 ? n*factorial(n-1) : 1

    end
  42. unsigned long long int calc_factorial(unsigned long long int n) {

    return (n > 1 ? n*calc_factorial(n-1) : 1); } static VALUE cfactorial(VALUE self, VALUE n) { return ULL2FIX( calc_factorial(NUM2ULL(n))); }
  43. void Init_factorial() { VALUE cFact = rb_define_class("Fact", rb_cObject); rb_define_method(cFact, "factorial",

    cfactorial, 1); }
  44. a = Fact.new a.factorial(8000)

  45. Big Problems • Difficult and irritating to write. • Time

    consuming to debug. • Tough to trace memory leaks. • Change mindset from high level to low level language. • Need to care about small things.™* *Matz – Keynote at Red Dot Ruby Conf 2016, Singapore.
  46. Rubex

  47. Rubex is a Crystal­inspired superset of Ruby that compiles to

    C.
  48. class Fact def factorial(unsigned long long int n) n >

    1 ? n*factorial(n-1) : 1 end end
  49. # Create a C static array and return a Ruby

    Array def adder(n) a = StaticArray(i32, n) i32 i = 0 i32 sum = 0 a.each(n) { a[i] = i*5 } for 0 <= i < n do sum += a[i] end sum end
  50. https://github.com/v0dro/rubex

  51. Scientific Computing on JRuby

  52. NMatrix C/C++ core CRuby interpreter Numo::NArray C core CRuby interpreter

  53. JRuby backend for the NMatrix Ruby API – Sci. Computing

    on JVM.
  54. Allows interfacing JRuby libraries with jBLAS for performance. Uses Apache

    Commons Math library for storage and operations on internal Java arrays.
  55. https://github.com/prasunanand/ nmatrix/tree/jruby_port

  56. Symbolic Computation in Ruby with symengine.rb

  57. (x – y) * (x ** y / z)

  58. require 'symengine' x = SymEngine::Symbol.new("x") y = SymEngine::Symbol.new("y") z =

    SymEngine::Symbol.new("z") f = (x – y) * (x ** y / z) f.expand.to_s # x**(1 + y)/z – x**y*y/z f == - (x**y*y/z) + (x**y*x/z) # true
  59. https://github.com/symengine/ symengine.rb

  60. Ruby in Space

  61. NASA SPICE Ruby wrapper spice_rub

  62. require 'spice_rub' k_pool = SpiceRub::KernelPool.instance k_pool.load_folder("spec/data/kernels") epoch = SpiceRub::Time.now moon

    = SpiceRub::Body.new(:moon) earth = SpiceRub::Body.now(:earth) earth.position_at(epoch) moon.distance_from(:earth, epoch) # 395791.1464913574 (Km)
  63. https://github.com/gau27/spice_rub

  64. Cool SciRuby Stickers

  65. Acknowledgements • @agisga and @lokeshh for statistics with daru and

    statsample­glm. • @gau27 for spice_rub. • @prasunanand for NMatrix on JRuby. • @rajithv for symengine.rb. • @gnilrets, @mrkn, @zverok and all the other contributors to daru.
  66. Thank You Ruby Kaigi

  67. Any questions?