Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2014: "Inspired by Lisp to get Ruby to ...

Pycon ZA
October 03, 2014

PyConZA 2014: "Inspired by Lisp to get Ruby to talk Python" by Martin Pretorius

Having pushed Ruby to the limits of what it can accomplish in terms of number crunching and data analysis, we looked around for another solution in the data analysis and modelling space. We quickly found that with packages and tools like Numpy, Pandas, the iPython Notebook and new packages like Blaze, Python looked to be a good language fit.

Porting a large existing codebase and accompanying infrastructure from a Ruby to Python ecosystem simply wasn't an option, so we had to do something clever (and fun!).

This talk will be about how we managed to leverage the power of Python while retaining our modelling code in Ruby (and opening up opportunity for other languages), by embracing Lisp’s code-is-data philosophy.
Specifically this talk covers:

* Creating expression trees (like a Lisp s-expression) in another language (like Ruby), furthering the ideas of ActiveRecord and Django’s ORM.
* Performing expression tree rewrites similar to a compiler.
* Automatically identifying sub-expressions that can run concurrently.
* Implementing pluggable storage systems (HDF5 files, Postgres/MySQL and Riak).
* Customising the iPython notebook as a prototyping and debugging tool. We use it to visualise executed and unexecuted expression trees to find bottlenecks and errors.

Pycon ZA

October 03, 2014
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. (Inspired by Lisp to get Ruby to talk Python) require

    'test/unit' require 'benchmark' ! class TestBenchmark < Test: BENCH_FOR_TIMES_UPTO = lambda n = 1000 tf = x. tt = x. tu = x. [tf+tt+tu, (tf+tt+tu)/ end ! BENCH_FOR_TIMES_UPTO_NO_LABEL = lambda n = 1000 x.report { for _ in 1..n; '1'; end } x.report { n.times do ; '1'; end } x.report { 1.upto(n) do ; '1'; end } end ! def labels %w[first second third] end ! def bench if block Benchmark. else Benchmark. labels.each { |label| x. } end end end ! def capture_output capture_io { yield }.first. end ! def capture_bench_output capture_output { bench(type, *args, &block) } end ! def test_tms_outputs_nicely assert_equal assert_equal assert_equal Benchmark::Tms.new(1,2,3,4,5,'label') assert_equal assert_equal Benchmark::Tms.new(100, 150, 0, 0, 200) end ! def test_tms_wont_modify_the_format_String_given format = Benchmark: assert_equal end ! BENCHMARK_OUTPUT_WITH_TOTAL_AVG = user system total real for: --time-- --time-- --time-- ( --time--) times: --time-- --time-- --time-- ( --time--) upto: --time-- --time-- --time-- ( --time--) >total: --time-- --time-- --time-- ( --time--) >avg: --time-- --time-- --time-- ( --time--) BENCH ! (defun format-redis-number ( "Write a prefix char and a number to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let* (( ( (when (write-byte (princ (write-byte (write-byte ! (defun format-redis-string ( "Write a string and CRLF-terminator to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let (( (when (write-sequence (write-byte (write-byte ! (defun ensure-string ( (typecase (string (symbol (t (princ-to-string ! ;;; Conditions ! (define-condition ((error :initarg :error :reader redis-error-error) (message :initarg :message :reader redis-error-message)) (:report ( ( ( ( (:documentation ! (define-condition () (:documentation that break the connection stream. They offer a :RECONNECT restart." ! (define-condition () (:documentation ! (define-condition () (:documentation ! ! ;;; Sending commands to the server ! (defgeneric (:documentation CMD is the command name (a string or a symbol), and ARGS are its arguments \(keyword arguments are also supported)." ! (defmethod (declare (force-output ! class _MergeOperation """ Perform a database (SQL) merge operation between two DataFrame objects using either columns as keys or their row indexes """ ! def __init__ left_on= left_index= suffixes=( self.left = self.orig_left = left self.right = self.orig_right = right self.how = how self.axis = axis ! self.on = com. self.left_on = com. self.right_on = com. ! self.copy = copy self.suffixes = suffixes self.sort = sort ! self.left_index = left_index self.right_index = right_index ! # note this function has side effects (self.left_join_keys, self.right_join_keys, self.join_names) = self. ! def get_result join_index, left_indexer, right_indexer = self. ! ldata, rdata = self.left._data, self.right._data lsuf, rsuf = self.suffixes ! llabels, rlabels = rdata.items, rsuf) ! lindexers = { rindexers = { ! result_data = [(ldata, lindexers), (rdata, rindexers)], axes=[llabels. concat_axis= ! typ = self.left._constructor result = ! self. ! return ! def _maybe_add_join_keys # insert group keys ! keys = for ! key_col = result[name] ! ! na_indexer = (left_indexer == -
  2. Martin Pretorius • I’m an electronic engineer • I work

    for Intellection Software • I have a few years of Python experience • I have a few months of Ruby experience • I have no Lisp experience
  3. require 'test/unit' require 'benchmark' ! class TestBenchmark < Test: BENCH_FOR_TIMES_UPTO

    = lambda n = 1000 tf = x. tt = x. tu = x. [tf+tt+tu, (tf+tt+tu)/ end ! BENCH_FOR_TIMES_UPTO_NO_LABEL = lambda n = 1000 x.report { for _ in 1..n; '1'; end } x.report { n.times do ; '1'; end } x.report { 1.upto(n) do ; '1'; end } end ! def labels %w[first second third] end ! def bench if block Benchmark. else Benchmark. labels.each { |label| x. } end end end ! def capture_output capture_io { yield }.first. end ! def capture_bench_output capture_output { bench(type, *args, &block) } end ! def test_tms_outputs_nicely assert_equal assert_equal assert_equal Benchmark::Tms.new(1,2,3,4,5,'label') assert_equal assert_equal Benchmark::Tms.new(100, 150, 0, 0, 200) end ! def test_tms_wont_modify_the_format_String_given format = Benchmark: assert_equal end ! BENCHMARK_OUTPUT_WITH_TOTAL_AVG = user system total real for: --time-- --time-- --time-- ( --time--) times: --time-- --time-- --time-- ( --time--) upto: --time-- --time-- --time-- ( --time--) >total: --time-- --time-- --time-- ( --time--) >avg: --time-- --time-- --time-- ( --time--) BENCH Ruby
  4. ! (defun format-redis-number ( "Write a prefix char and a

    number to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let* (( ( (when (write-byte (princ (write-byte (write-byte ! (defun format-redis-string ( "Write a string and CRLF-terminator to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let (( (when (write-sequence (write-byte (write-byte ! (defun ensure-string ( (typecase (string (symbol (t (princ-to-string ! ;;; Conditions ! (define-condition ((error :initarg :error :reader redis-error-error) (message :initarg :message :reader redis-error-message)) (:report ( ( ( ( (:documentation ! (define-condition () (:documentation that break the connection stream. They offer a :RECONNECT restart." ! (define-condition () (:documentation ! (define-condition () (:documentation ! ! ;;; Sending commands to the server ! (defgeneric (:documentation CMD is the command name (a string or a symbol), and ARGS are its arguments \(keyword arguments are also supported)." ! (defmethod (declare (force-output ! Lisp
  5. InsightOut • SaaS market research analytics platform • Require realtime

    statistics • Highly dimensional data • It needs to be fast
  6. • Build up an expression tree, in Ruby • We

    send the tree to a cluster of python workers • We rewrite the tree so that Python can understand it • We execute the tree • We return the result to Ruby
  7. Languages in which program code is represented as the language's

    fundamental data type are called homoiconic.
  8. • Symbolic expression • The first element is an operator

    • All remaining elements are data • s-expressions can be nested • Every element can be replaced with the value it evaluates to
  9. • Singly-linked lists • Built up from ‘cons’ cells •

    nil-terminated • We create a list using the ‘list’ operator • ‘car’ returns the first element of the cons cell (head) • ‘cdr’ returns the rest (tail) • ‘last’ returns the last cons cell in the list
  10. * (1 1 2 3) ; in: 1 1 ;

    (1 1 2 3) ; ; caught ERROR: ; illegal function call ; ; compilation unit finished ; caught 1 ERROR condition ! debugger invoked on a SB-INT:COMPILED-PROGRAM- ERROR: Execution of a form compiled with errors. Form: (1 1 2 3) Compile-time error: illegal function call
  11. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26)
  12. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26)
  13. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26) ! (head (last a)) > 26
  14. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26) ! (head (last a)) > 26 ! (setf (head (last a)) 25) a > (1 1 2 3 5 8 13 25)
  15. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2)
  16. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3
  17. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3 ! (setf (head (last a)) 3) a > (+ 1 3)
  18. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3 ! (setf (head (last a)) 3) a > (+ 1 3) ! eval(a) > 4
  19. –Douglas Hofstadter, Godel, Escher, Bach “One of the most important

    and fascinating of all computer languages is Lisp (standing for "List Processing"), which was invented by John McCarthy around the time Algol was invented.”
  20. tap

  21. What do we gain • Lock down our modelling language

    • Manipulate our code as data • Use other languages and packages