Having pushed Ruby to the limits of what it can accomplish in terms of number crunching and data analysis, we looked around for another solution in the data analysis and modelling space. We quickly found that with packages and tools like Numpy, Pandas, the iPython Notebook and new packages like Blaze, Python looked to be a good language fit.
Porting a large existing codebase and accompanying infrastructure from a Ruby to Python ecosystem simply wasn't an option, so we had to do something clever (and fun!).
This talk will be about how we managed to leverage the power of Python while retaining our modelling code in Ruby (and opening up opportunity for other languages), by embracing Lisp’s code-is-data philosophy.
Specifically this talk covers:
* Creating expression trees (like a Lisp s-expression) in another language (like Ruby), furthering the ideas of ActiveRecord and Django’s ORM.
* Performing expression tree rewrites similar to a compiler.
* Automatically identifying sub-expressions that can run concurrently.
* Implementing pluggable storage systems (HDF5 files, Postgres/MySQL and Riak).
* Customising the iPython notebook as a prototyping and debugging tool. We use it to visualise executed and unexecuted expression trees to find bottlenecks and errors.