Upgrade to Pro — share decks privately, control downloads, hide ads and more …

recipy: completely effortless provenance for Python

Robin Wilson
September 17, 2016

recipy: completely effortless provenance for Python

Robin Wilson

September 17, 2016
Tweet

More Decks by Robin Wilson

Other Decks in Technology

Transcript

  1. ?

  2. import pandas  as pd from matplotlib.pyplot import savefig data  =

    pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')
  3. import recipy import pandas  as pd from matplotlib.pyplot import savefig

    data  = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv')
  4. Features: • Store  input/output  file  hashes • Store  git  information

     – including  diff! • Store  output  file  diffs  (if  relevant) • Store  library  versions • Annotate  individual  runs • Search  via  name,  hash,  notes  etc • Wrap  open via  recipy.open • Export  to  JSON All  of  these  can  be  turned  on/off  via  the   configuration  file
  5. import recipy import pandas  as pd from matplotlib.pyplot import *

    data  = pd.read_csv('data.csv') data.plot(x='year', y='temperature') savefig('graph.png') data.temperature = data.temperature * 100 data.to_csv('output.csv') DB ‘Monkey  Patched’ Hooks Set  up
  6. NoSQL  Database Client-­Server Separate  installation Can  be  remote Scalable? Pure

     Python No  install  needed! JSON-­based Scalability?
  7. sys.meta_path A  list  of  objects  used  to  search  for  packages

    When  running  import  numpy: Objects  in  sys.meta_path are used  to  find and  load the  module
  8. 1.  Find  module Search  file  system Only  work  with  one

    module 2.  Load  module Load  as  standard  Python  module AND patch  functions  to  use  wrapper sys.meta_path
  9. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load',

    'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')
  10. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load',

    'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')
  11. class PatchNumpy(PatchSimple): modulename = 'numpy' input_functions = ['genfromtxt', 'loadtxt', 'load',

    'fromfile'] output_functions = ['save', 'savez', 'savez_compressed', 'savetxt'] input_wrapper = create_wrapper(log_input, 0, 'numpy') output_wrapper = create_wrapper(log_output, 0, 'numpy')
  12. Automated  testing  &  CI SSI  Open  Call 1-­2  person-­months  of

     effort Testing  using  py.test parameterised tests www.software.ac.uk
  13. Sprint  with  us! • Patch more  modules • Design a

     logo • Create  the  website • Make proper  docs • Improve CLI • IPython/Jupyter support • Conda support • Fix  bugs! What  do  you want?