Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2014: "Inspired by Lisp to get Ruby to talk Python" by Martin Pretorius

Pycon ZA
October 03, 2014

PyConZA 2014: "Inspired by Lisp to get Ruby to talk Python" by Martin Pretorius

Having pushed Ruby to the limits of what it can accomplish in terms of number crunching and data analysis, we looked around for another solution in the data analysis and modelling space. We quickly found that with packages and tools like Numpy, Pandas, the iPython Notebook and new packages like Blaze, Python looked to be a good language fit.

Porting a large existing codebase and accompanying infrastructure from a Ruby to Python ecosystem simply wasn't an option, so we had to do something clever (and fun!).

This talk will be about how we managed to leverage the power of Python while retaining our modelling code in Ruby (and opening up opportunity for other languages), by embracing Lisp’s code-is-data philosophy.
Specifically this talk covers:

* Creating expression trees (like a Lisp s-expression) in another language (like Ruby), furthering the ideas of ActiveRecord and Django’s ORM.
* Performing expression tree rewrites similar to a compiler.
* Automatically identifying sub-expressions that can run concurrently.
* Implementing pluggable storage systems (HDF5 files, Postgres/MySQL and Riak).
* Customising the iPython notebook as a prototyping and debugging tool. We use it to visualise executed and unexecuted expression trees to find bottlenecks and errors.

Pycon ZA

October 03, 2014
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. (Inspired by Lisp to get Ruby to talk Python) require

    'test/unit' require 'benchmark' ! class TestBenchmark < Test: BENCH_FOR_TIMES_UPTO = lambda n = 1000 tf = x. tt = x. tu = x. [tf+tt+tu, (tf+tt+tu)/ end ! BENCH_FOR_TIMES_UPTO_NO_LABEL = lambda n = 1000 x.report { for _ in 1..n; '1'; end } x.report { n.times do ; '1'; end } x.report { 1.upto(n) do ; '1'; end } end ! def labels %w[first second third] end ! def bench if block Benchmark. else Benchmark. labels.each { |label| x. } end end end ! def capture_output capture_io { yield }.first. end ! def capture_bench_output capture_output { bench(type, *args, &block) } end ! def test_tms_outputs_nicely assert_equal assert_equal assert_equal Benchmark::Tms.new(1,2,3,4,5,'label') assert_equal assert_equal Benchmark::Tms.new(100, 150, 0, 0, 200) end ! def test_tms_wont_modify_the_format_String_given format = Benchmark: assert_equal end ! BENCHMARK_OUTPUT_WITH_TOTAL_AVG = user system total real for: --time-- --time-- --time-- ( --time--) times: --time-- --time-- --time-- ( --time--) upto: --time-- --time-- --time-- ( --time--) >total: --time-- --time-- --time-- ( --time--) >avg: --time-- --time-- --time-- ( --time--) BENCH ! (defun format-redis-number ( "Write a prefix char and a number to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let* (( ( (when (write-byte (princ (write-byte (write-byte ! (defun format-redis-string ( "Write a string and CRLF-terminator to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let (( (when (write-sequence (write-byte (write-byte ! (defun ensure-string ( (typecase (string (symbol (t (princ-to-string ! ;;; Conditions ! (define-condition ((error :initarg :error :reader redis-error-error) (message :initarg :message :reader redis-error-message)) (:report ( ( ( ( (:documentation ! (define-condition () (:documentation that break the connection stream. They offer a :RECONNECT restart." ! (define-condition () (:documentation ! (define-condition () (:documentation ! ! ;;; Sending commands to the server ! (defgeneric (:documentation CMD is the command name (a string or a symbol), and ARGS are its arguments \(keyword arguments are also supported)." ! (defmethod (declare (force-output ! class _MergeOperation """ Perform a database (SQL) merge operation between two DataFrame objects using either columns as keys or their row indexes """ ! def __init__ left_on= left_index= suffixes=( self.left = self.orig_left = left self.right = self.orig_right = right self.how = how self.axis = axis ! self.on = com. self.left_on = com. self.right_on = com. ! self.copy = copy self.suffixes = suffixes self.sort = sort ! self.left_index = left_index self.right_index = right_index ! # note this function has side effects (self.left_join_keys, self.right_join_keys, self.join_names) = self. ! def get_result join_index, left_indexer, right_indexer = self. ! ldata, rdata = self.left._data, self.right._data lsuf, rsuf = self.suffixes ! llabels, rlabels = rdata.items, rsuf) ! lindexers = { rindexers = { ! result_data = [(ldata, lindexers), (rdata, rindexers)], axes=[llabels. concat_axis= ! typ = self.left._constructor result = ! self. ! return ! def _maybe_add_join_keys # insert group keys ! keys = for ! key_col = result[name] ! ! na_indexer = (left_indexer == -
  2. Martin Pretorius • I’m an electronic engineer • I work

    for Intellection Software • I have a few years of Python experience • I have a few months of Ruby experience • I have no Lisp experience
  3. A word of caution

  4. Today you’ll see some…

  5. require 'test/unit' require 'benchmark' ! class TestBenchmark < Test: BENCH_FOR_TIMES_UPTO

    = lambda n = 1000 tf = x. tt = x. tu = x. [tf+tt+tu, (tf+tt+tu)/ end ! BENCH_FOR_TIMES_UPTO_NO_LABEL = lambda n = 1000 x.report { for _ in 1..n; '1'; end } x.report { n.times do ; '1'; end } x.report { 1.upto(n) do ; '1'; end } end ! def labels %w[first second third] end ! def bench if block Benchmark. else Benchmark. labels.each { |label| x. } end end end ! def capture_output capture_io { yield }.first. end ! def capture_bench_output capture_output { bench(type, *args, &block) } end ! def test_tms_outputs_nicely assert_equal assert_equal assert_equal Benchmark::Tms.new(1,2,3,4,5,'label') assert_equal assert_equal Benchmark::Tms.new(100, 150, 0, 0, 200) end ! def test_tms_wont_modify_the_format_String_given format = Benchmark: assert_equal end ! BENCHMARK_OUTPUT_WITH_TOTAL_AVG = user system total real for: --time-- --time-- --time-- ( --time--) times: --time-- --time-- --time-- ( --time--) upto: --time-- --time-- --time-- ( --time--) >total: --time-- --time-- --time-- ( --time--) >avg: --time-- --time-- --time-- ( --time--) BENCH Ruby
  6. ! (defun format-redis-number ( "Write a prefix char and a

    number to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let* (( ( (when (write-byte (princ (write-byte (write-byte ! (defun format-redis-string ( "Write a string and CRLF-terminator to the stream of the current connection. If *ECHOP-P* is not NIL, write that string to *ECHO-STREAM*, too." (let (( (when (write-sequence (write-byte (write-byte ! (defun ensure-string ( (typecase (string (symbol (t (princ-to-string ! ;;; Conditions ! (define-condition ((error :initarg :error :reader redis-error-error) (message :initarg :message :reader redis-error-message)) (:report ( ( ( ( (:documentation ! (define-condition () (:documentation that break the connection stream. They offer a :RECONNECT restart." ! (define-condition () (:documentation ! (define-condition () (:documentation ! ! ;;; Sending commands to the server ! (defgeneric (:documentation CMD is the command name (a string or a symbol), and ARGS are its arguments \(keyword arguments are also supported)." ! (defmethod (declare (force-output ! Lisp
  7. Today’s talk • Background • What inspired us • Some

    awesome stuff
  8. Background

  9. InsightOut • SaaS market research analytics platform • Require realtime

    statistics • Highly dimensional data • It needs to be fast
  10. None
  11. • Ruby is slow • There are no good data-analysis

    libraries and tools for Ruby
  12. None
  13. but…

  14. We can’t rewrite everything in Python

  15. Basically, we want to write Ruby and execute Python

  16. How do we do this?

  17. • Build up an expression tree, in Ruby • We

    send the tree to a cluster of python workers • We rewrite the tree so that Python can understand it • We execute the tree • We return the result to Ruby
  18. None
  19. beanz = CoffeeBeans.objects \ .filter(country="Kenya") !

  20. beanz = CoffeeBeans.objects \ .filter(country=“Kenya”)\ .exclude(country=“Kenya”)

  21. beanz = CoffeeBeans.objects \ .filter(country=“Kenya”)\ .exclude(country=“Kenya”) \ .order(price)

  22. beanz = CoffeeBeans.objects \ .filter(country=“Kenya”)\ .exclude(country=“Kenya”) \ .order(price) beanz[0]

  23. SELECT * FROM CoffeeBeans WHERE country_of_origin = 'Kenya' AND roast

    != 'dark' ORDER BY price;
  24. But how does it work?

  25. beanz = CoffeeBeans.objects \ .filter(country="Kenya") !

  26. beanz = CoffeeBeans.objects \ .filter(country=“Kenya”)\ .exclude(country=“Kenya”)

  27. beanz = CoffeeBeans.objects \ .filter(country=“Kenya”)\ .exclude(country=“Kenya”) \ .order(price)

  28. LISP

  29. LISt Processing

  30. Homoiconicity

  31. Languages in which program code is represented as the language's

    fundamental data type are called homoiconic.
  32. • s-expressions • lists

  33. s-expressions

  34. • Symbolic expression • The first element is an operator

    • All remaining elements are data • s-expressions can be nested • Every element can be replaced with the value it evaluates to
  35. 1 + 18

  36. (+ 1 18)

  37. (+ 1 (* 3 6))

  38. (+ 1 18)

  39. lists

  40. • Singly-linked lists • Built up from ‘cons’ cells •

    nil-terminated • We create a list using the ‘list’ operator • ‘car’ returns the first element of the cons cell (head) • ‘cdr’ returns the rest (tail) • ‘last’ returns the last cons cell in the list
  41. (1 1 2 3)

  42. * (1 1 2 3) ; in: 1 1 ;

    (1 1 2 3) ; ; caught ERROR: ; illegal function call ; ; compilation unit finished ; caught 1 ERROR condition ! debugger invoked on a SB-INT:COMPILED-PROGRAM- ERROR: Execution of a form compiled with errors. Form: (1 1 2 3) Compile-time error: illegal function call
  43. (list 1 1 2 3)

  44. '(1 1 2 3)

  45. 1 1 2 3 nil

  46. (head ‘(1 1 2 3))

  47. 1 1 2 3 nil

  48. (tail ‘(1 1 2 3))

  49. 1 1 2 3 nil

  50. (1 2 3)

  51. (last ‘(1 1 2 3))

  52. 1 1 2 3 nil

  53. Back to homoiconicity…

  54. '(1 1 2 3 5 8 13 26)

  55. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26)
  56. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26)
  57. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26) ! (head (last a)) > 26
  58. '(1 1 2 3 5 8 13 26) ! (setf

    a '(1 1 2 3 5 8 13 26)) a > (1 1 2 3 5 8 13 26) ! (last a) > (26) ! (head (last a)) > 26 ! (setf (head (last a)) 25) a > (1 1 2 3 5 8 13 25)
  59. So what?

  60. '(+ 1 2) > (+ 1 2)

  61. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2)
  62. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3
  63. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3 ! (setf (head (last a)) 3) a > (+ 1 3)
  64. '(+ 1 2) > (+ 1 2) ! (setf a

    '(+ 1 2)) a > (+ 1 2) ! (eval a) > 3 ! (setf (head (last a)) 3) a > (+ 1 3) ! eval(a) > 4
  65. None
  66. –Douglas Hofstadter, Godel, Escher, Bach “One of the most important

    and fascinating of all computer languages is Lisp (standing for "List Processing"), which was invented by John McCarthy around the time Algol was invented.”
  67. Finally…

  68. None
  69. df = DataFrame([x for x in range(100)]) df.index.name = "index"

  70. (setattr (getattr "index" df) "name" "index_name")

  71. (nil)

  72. tap

  73. def tap(obj, *_): return obj

  74. (tap df (setattr (getattr "index" df) "name" "index_name"))

  75. That is way too much Python though

  76. (name_index series "index")

  77. (tap df (setattr (getattr "index" df) "name" "index_name"))

  78. Take a moment to think about this…

  79. Inspectability

  80. None
  81. None
  82. None
  83. What do we gain • Lock down our modelling language

    • Manipulate our code as data • Use other languages and packages
  84. Thank you!

  85. Questions?