India must master Western science and
yet preserve its Culture and Heritage.
What India Dreams
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
City of Pune.
Population: 6 million.
Oxford of the East.
Slide 9
Slide 9 text
Sameer Deshmukh
github.com/v0dro
@v0dro
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
Dr. Gopal
Deshmukh
Sameer
Desmukh
Dr. Hemchandra
Deshmukh
Dr. Satish
Deshmukh
Slide 12
Slide 12 text
www.soundcloud.com/catkamikazee
Sameer
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Me
Slide 15
Slide 15 text
Pune
Ruby Users
Group
www.punerb.org
@punerb
@punerb
@deccanrubyconf
www.deccanrubyconf.org
Slide 16
Slide 16 text
Ruby
Science
Foundation
www.sciruby.com
@sciruby
@sciruby
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Data Analysis
in Ruby
with daru
Slide 19
Slide 19 text
daru
(Data Analysis in RUby)
Slide 20
Slide 20 text
daru
==
(Hindi)
ददार
sake
alcohol
Slide 21
Slide 21 text
library for
analysis, cleaning, manipulation and
visualization
of data.
Slide 22
Slide 22 text
Read/write many data sources
Ephemeral statistics functions
Works well with 'wild' data
Advanced Data indexing
Slide 23
Slide 23 text
Daru::Vector
Heterogenous Array that can be indexed on
any Ruby object.
Name
Label(0)
Label(1)
Label(2)
...
Label(n-1)
Slide 24
Slide 24 text
Daru::DataFrame
2D spreadsheet like data structure indexed by
rows or columns.
Col0
Label(0)
Label(1)
Label(2)
...
Label(n-1)
Col1 Col2 Col(n-1)
....
Slide 25
Slide 25 text
Data visualization with
Nyaplot, GNUplotrb and Gruff.
Slide 26
Slide 26 text
iruby notebook
gem install iruby
Slide 27
Slide 27 text
Browser based Ruby REPL
for interactive computing.
Slide 28
Slide 28 text
Runs in your
browser
Input cell – accepts
Ruby code
Output cell – can
render HTML/CSS/JS
Slide 29
Slide 29 text
60% of a data analyst's time is spent on
cleaning data.
Slide 30
Slide 30 text
Acts as glue between other
SciRuby libraries
●
statsample for Statistics.
●
mixed_models for Mixed Models.
●
darutd for Treasure Data.
●
nmatrix for efficient data storage.
Slide 31
Slide 31 text
statsampleglm
gem install statsampleglm
Slide 32
Slide 32 text
Logistic, probit, poisson, normal
regression methods in Ruby.
Slide 33
Slide 33 text
Provides an Rlike formula language
for specifying regressions.
“Y ~ a+a:b+c+c:d”
Y = ß0
+ a*ß1
+ a*b*ß2
+ c*ß3
+ c*d*ß4
Slide 34
Slide 34 text
Use Case:
Kaggle Animal Shelter Data
Slide 35
Slide 35 text
OMG I've had too
much daru!!
STOP! STOP!
Slide 36
Slide 36 text
New Ideas for better Ruby
Slide 37
Slide 37 text
“Any sufficiently advanced
technology is indistinguishable
from magic.”
Arthur C. Clarke
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
Writing C extensions
●
FFI gem.
●
Rice.
●
SWIG.
●
Writing C bindings manually.
Slide 40
Slide 40 text
Rubyist!
Write me a C extension!
Slide 41
Slide 41 text
def factorial n
n > 1 ? n*factorial(n-1) : 1
end
Slide 42
Slide 42 text
unsigned long long int
calc_factorial(unsigned long long int n)
{
return (n > 1 ? n*calc_factorial(n-1) : 1);
}
static VALUE
cfactorial(VALUE self, VALUE n)
{
return ULL2FIX(
calc_factorial(NUM2ULL(n)));
}
Big Problems
●
Difficult and irritating to write.
●
Time consuming to debug.
●
Tough to trace memory leaks.
●
Change mindset from high level to low
level language.
●
Need to care about small things.™*
*Matz – Keynote at Red Dot Ruby Conf 2016, Singapore.
Slide 46
Slide 46 text
Rubex
Slide 47
Slide 47 text
Rubex is a Crystalinspired superset of
Ruby that compiles to C.
Slide 48
Slide 48 text
class Fact
def factorial(unsigned long long int n)
n > 1 ? n*factorial(n-1) : 1
end
end
Slide 49
Slide 49 text
# Create a C static array and return a Ruby Array
def adder(n)
a = StaticArray(i32, n)
i32 i = 0
i32 sum = 0
a.each(n) { a[i] = i*5 }
for 0 <= i < n do
sum += a[i]
end
sum
end
Slide 50
Slide 50 text
https://github.com/v0dro/rubex
Slide 51
Slide 51 text
Scientific Computing on
JRuby
Slide 52
Slide 52 text
NMatrix
C/C++ core
CRuby
interpreter
Numo::NArray
C core
CRuby
interpreter
Slide 53
Slide 53 text
JRuby backend for the NMatrix
Ruby API –
Sci. Computing on JVM.
Slide 54
Slide 54 text
Allows interfacing JRuby libraries
with jBLAS for performance.
Uses Apache Commons Math
library for storage and operations on
internal Java arrays.
Acknowledgements
●
@agisga and @lokeshh for statistics with daru
and statsampleglm.
●
@gau27 for spice_rub.
●
@prasunanand for NMatrix on JRuby.
●
@rajithv for symengine.rb.
●
@gnilrets, @mrkn, @zverok and all the other
contributors to daru.