Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyConZA 2015: "A little scripting goes a long way: automating data processing in science" by Adrianna Pińska

Pycon ZA
October 01, 2015

PyConZA 2015: "A little scripting goes a long way: automating data processing in science" by Adrianna Pińska

When we think about scientific programming, we often focus on complex high-performance applications for performing simulations of chemical processes, or on data analysis tools. It is easy to overlook the gaps between tools, such as format conversions, which may be simple for a programmer to automate, but require hours of tedious work for a researcher without programming experience.

In this talk I will present a specific example of an application which fills such a gap in a medical research laboratory, where readings of chemical samples are used to measure the response of TB-causing bacteria to various drugs. The readings must be converted from the raw format produced by the instrument in the laboratory into a format suitable for uploading into an online tool for further analysis.

I will also discuss more broadly how research institutions can improve efficiency by collaborating with programmers and by encouraging researchers to acquire basic programming skills.

Pycon ZA

October 01, 2015
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. TB drug discovery A plate reader detects properties of chemical

    samples e.g. how bacteria respond to different concentrations of drugs CDD Vault: a web portal for sharing and analysis of data 3 e.g. dose-response curves 3https://www.collaborativedrug.com/buzz/2015/09/15/cdd-spotlight- interview-with-ronnett-seldon-university-of-cape-town-south-africa/
  2. One end: plate reader output 96-well plate reader (8 rows,

    12 columns) spits out readings with no metadata in xlsx file cell layout corresponds to plate wells metadata stored in a separate file created by the researcher
  3. Other end: CDD import format csv file, one row per

    well data and metadata combined
  4. How to get from here to there If you’re a

    programmer? Simple. If you aren’t? Not so simple.
  5. The previous solution Excel macro Still requires lots of manual

    editing Creation of ”platemap” – the layout of compounds on each plate Copying and pasting of readings and metadata into macro, one plate at a time
  6. Basic automation strategy The only thing that varies is the

    list of compounds to be tested The plate layout, concentrations and control are all fixed Everything can be generated from a list of compounds and the readings No GUI for now; just a command-line interface
  7. Some refinements Generate intermediate platemap file CSV: can subsequently be

    used instead of a compound file HTML: for printing and use in the lab Generate a compound registration file for CDD Futureproof: design in a way that allows for other plate layouts, etc.. Unit tests!
  8. The results A bit of a learning curve to use

    the command-line If the tool is adopted more widely, a GUI may be a good idea Researcher says that the script saves her ”almost two days a week” We want to release the code as open source 4 4when we do, this repo should become public: https://bitbucket.org/eresearchapplicationops/plate reader data converter
  9. What is this talk really about? Scientific programming isn’t all

    HPC and/or domain-specific Sometimes simple data processing is a crucial step For a non-programmer this can be hours of tedious work Researchers shouldn’t waste time doing things a computer can do better and faster
  10. How can we fix this? Training: teach everyone basic scripting

    skills Support: connect programmers and researchers A little scripting goes a long way!
  11. Software Carpentry International organisation teaching programming skills to researchers 5

    Workshop this weekend! 6 5https://software-carpentry.org/ 6http://za.pycon.org/swc