Slide 1

Slide 1 text

recipy Effortless  provenance  tracking  in  Python www.recipy.org Robin  Wilson [email protected] @sciremotesense

Slide 2

Slide 2 text

?

Slide 3

Slide 3 text

Provenance ‘Lab  notebook’

Slide 4

Slide 4 text

It  must: Be  easy  – no  effort! Work  with  libraries  without   modification

Slide 5

Slide 5 text

import  recipy

Slide 6

Slide 6 text

Raquel  Alegre,  Robin  Wilson,  Janneke van  der  Zwaan #CollabW2015 www.software.ac.uk/cw15 WINNERS!

Slide 7

Slide 7 text

import pandas  as pd from matplotlib.pyplot import savefig data  = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')

Slide 8

Slide 8 text

import recipy import pandas  as pd from matplotlib.pyplot import savefig data  = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')

Slide 9

Slide 9 text

DEMO

Slide 10

Slide 10 text

Features: • Store  input/output  file  hashes • Store  git  information  – including  diff! • Store  output  file  diffs  (if  relevant) • Store  library  versions • Annotate  individual  runs • Search  via  name,  hash,  notes  etc • Wrap  open via  recipy.open • Export  to  JSON All  of  these  can  be  turned  on/off  via  the   configuration  file

Slide 11

Slide 11 text

import recipy import pandas  as pd from matplotlib.pyplot import * data  = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv') DB ‘Monkey  Patched’ Hooks Set  up

Slide 12

Slide 12 text

‘Monkey  Patching’ No  on_save hooks So,  change  code  at  runtime

Slide 13

Slide 13 text

def wrapped_read_csv(*args): print('You  called  read_csv!') pd.read_csv(*args) pd.read_csv = wrapped_read_csv patch_function(mod,  f,  wrapper_function)

Slide 14

Slide 14 text

NoSQL  Database Client-­Server Separate  installation Can  be  remote Scalable? Pure  Python No  install  needed! JSON-­based Scalability?

Slide 15

Slide 15 text

sys.meta_path A  list  of  objects  used  to  search  for  packages When  running  import  numpy: Objects  in  sys.meta_path are used  to  find and  load the  module

Slide 16

Slide 16 text

sys.meta_path 1.  Find  module Search  file  system 2.  Load  module Load  as  standard  Python  module

Slide 17

Slide 17 text

1.  Find  module Search  file  system Only  work  with  one module 2.  Load  module Load  as  standard  Python  module AND patch  functions  to  use  wrapper sys.meta_path

Slide 18

Slide 18 text

PatchImporter PatchSimple PatchPandas PatchNumpy PatchMPL

Slide 19

Slide 19 text

Crazy  magic! Simplification PatchImporter PatchSimple PatchPandas PatchNumpy PatchMPL

Slide 20

Slide 20 text

class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

Slide 21

Slide 21 text

class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

Slide 22

Slide 22 text

class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')

Slide 23

Slide 23 text

Automated  testing  &  CI SSI  Open  Call 1-­2  person-­months  of  effort Testing  using  py.test parameterised tests www.software.ac.uk

Slide 24

Slide 24 text

Sprint  with  us! • Patch more  modules • Design a  logo • Create  the  website • Make proper  docs • Improve CLI • IPython/Jupyter support • Conda support • Fix  bugs! What  do  you want?