Zen And The aRt Of Workflow Maintenance

Zen And The aRt Of Workflow Maintenance

Links: https://rstd.io/jenny-latinr

Delivered at

* Latin-R, http://latin-r.com, Buenos Aires 04-05 Sept 2018
* 10th Conference of the IASC-ARS/68th Annual NZSA Conference, http://www.nzsa2017.com, Auckland, 10-14 Dec 2017

0a4f62e90c976eeb44d33add75cca5af?s=128

Jennifer (Jenny) Bryan

September 04, 2018
Tweet

Transcript

  1. 1.

      Zen And The aRt Of Workflow Maintenance Jennifer

    Bryan 
 RStudio, University of British Columbia @JennyBryan @jennybc
  2. 3.

    Is data science just a trendy term for statistics? 7

    Life-Changing Workflow Tips Every useR Should Know
  3. 9.

    2008−10−07−p03−01−D11 Aligned Time (s) Motion index 0 5 10 15

    20 25 30 0 20 40 60 80 100 Motion Index PRE L1 L2 E1 E2 R1 R2
  4. 10.

    2009−03−25−p03−01−C10 Aligned Time (s) Motion index 0 5 10 15

    20 25 30 0 20 40 60 80 100 isoproteretol 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− diazepam 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− R(−)−Apomorphine 2008−02−05− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−01−23− 2008−02−14− 2008−02−05− 2008−02−05− digitoxigenin 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 6−Nitroquipazine maleate 2008−02−14− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−02−05− 2008−01−23− 2008−02−05− 2008−02−05− 2009−04−10−p01−01−D03 Aligned Time (s) Motion index 0 5 10 15 20 25 30 0 20 40 60 80 100 2008−01−23−p01−03−E08 Aligned Time (s) Motion index 0 5 10 15 20 25 30 0 20 40 60 80 100 Isoproterenol Diazepam Digitoxigenin Apomorphine 6-nitroquipazine 2008−02−14−p03−02−G09 Aligned Time (s) Motion index 0 5 10 15 20 25 30 0 20 40 60 80 100 Motion index Motion index Motion index Time (s) Time (s) Time (s) isoproteretol 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− diazepam 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− R(−)−Apomorphine 2008−02−05− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−01−23− 2008−02−14− 2008−02−05− 2008−02−05− digitoxigenin 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 6−Nitroquipazine maleate 2008−02−14− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−02−05− 2008−01−23− 2008−02−05− 2008−02−05− isoproteretol 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− diazepam 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− R(−)−Apomorphine 2008−02−05− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−01−23− 2008−02−14− 2008−02−05− 2008−02−05− digitoxigenin 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 6−Nitroquipazine maleate 2008−02−14− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−02−05− 2008−01−23− 2008−02−05− 2008−02−05− isoproteretol 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− 2009−04−14− diazepam 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− 2009−04−10− R(−)−Apomorphine 2008−02−05− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−01−23− 2008−02−14− 2008−02−05− 2008−02−05− digitoxigenin 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 2009−03−25− 6−Nitroquipazine maleate 2008−02−14− 2008−02−14− 2008−02−14− 2008−01−23− 2008−01−23− 2008−02−05− 2008−01−23− 2008−02−05− 2008−02−05− Time (s) Motion index D G J M E H K N F I L O 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 −4 −2 −1 0 1 2 3 4 less active than controls more active than controls PRE L1 L2 E1 E2 R1 R2 helSed letSick Vab © 2010 Nature America, Inc. All rights reserved. NATURE CHEMICAL BIOLOGY | ADVANCE ONLINE PUBLICATION | www.nature.com/naturechemicalbiology 1 Neuroactive drugs discovered in the 1950s revolutionized our understanding of the nervous system and the treatment of its disorders1. Most of these drugs were discovered seren- dipitously when they produced unexpected behavioral changes in animals or humans. Elucidation of the targets of these behavior- modifying compounds led to insights into nervous system function, and many of the drugs used for treating nervous system disorders today were derived from those same, serendipitous discoveries. Unfortunately, few new classes of neuroactive molecules have been discovered in the last 50 years, in part because pharmaceutical discovery efforts are dominated by simple, in vitro screening assays that fail to capture the complexity of the vertebrate nervous system2. Current drug discovery approaches are typically target based, mean- ing they seek to identify compounds that modify the in vitro activity of a specific protein target. These approaches benefit from being sys- tematic and high throughput. However, they generally lack the ability to discover drugs that modify nervous system function in new ways. Unlike target-based approaches, phenotype-based screens can identify compounds that produce a desired phenotype without a priori assumptions about their targets. Phenotype-based screens in cultured cells and whole organisms have identified powerful new compounds with novel activities on unexpected targets in vivo3. However, it has been difficult to combine chemical- screening para- digms with behavioral phenotyping, perhaps because many well studied behaviors are too variable or occur in animals that are too large for screening in multi-well format. A common limitation of compounds discovered by phenotype-based methods is the difficulty in determining their mechanisms of action. It has been proposed that systems-level analyses of content-rich pheno- typic data could be used to identify mechanistic similarities between compounds and predict their mechanisms of action4. Repositories of high-throughput screening data such as PubChem and ChemBank are beginning to make such analyses possible, but difficulties remain, including the challenges of comparing phenotypes across disparate assay types, libraries and experimental conditions. Theoretically, the behavioral effects of small molecules could provide sufficient content to enable compound characterization and prediction of their mecha- nisms of action. However, because behaviors can be complex and dif- ficult to quantify, systems-level comparison of behavioral phenotypes would require conversion of the behaviors into simple, quantitative measures that are more amenable to such approaches. Given the unmet need for novel psychotropic drugs, we sought to develop a small-molecule discovery process that combined the scale of modern high-throughput screening with the biological complexity of behavioral phenotyping in living animals. Here, we report development of a fully automated platform for analyzing the behavioral effects of small molecules on embryonic zebrafish. Using this platform, we have identified hundreds of behavior-modifying compounds. We further demonstrate that complex behavioral changes can be distilled into simple behavioral ‘barcodes’ to classify psychotropic drugs and determine their mechanisms of action. RESULTS The photomotor response We discovered that a high-intensity light stimulus elicits a stereo- typic series of motor behaviors in embryonic zebrafish that we call the photomotor response (PMR) (Fig. 1a,b and Supplementary Movies 1–3). The PMR can be divided into four broad phases: a pre-stimulus background phase, a latency phase, an excitation phase and a refractory phase (Fig. 1c). During the pre-stimulus phase, zebrafish embryos were mostly inactive, showing low basal activity characterized by spontaneous and infrequent body flexions within their chorions. Presentation of a light stimulus elicited a robust motor excitation phase (lasting 5–7 s) characterized by vigorous 1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, San Francisco, California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for Psychiatric Research, Cambridge, Massachusetts, USA. *e-mail: dkokel@cvrc.mgh.harvard.edu or peterson@cvrc.mgh.harvard.edu Rapid behavior-based identification of neuroactive small molecules in the zebrafish David Kokel1,2*, Jennifer Bryan3,4, Christian Laggner5, Rick White3, Chung Yan J Cheung1,2, Rita Mateus1,2, David Healey1,2, Sonia Kim1,2, Andreas A Werdich1, Stephen J Haggarty2,6,7, Calum A MacRae1, Brian Shoichet5 & Randall T Peterson1,2* Neuroactive small molecules are indispensable tools for treating mental illnesses and dissecting nervous system function. However, it has been difficult to discover novel neuroactive drugs. Here, we describe a high-throughput, behavior-based approach to neuroactive small molecule discovery in the zebrafish. We used automated screening assays to evaluate thousands of chemical compounds and found that diverse classes of neuroactive molecules caused distinct patterns of behavior. These ‘behavioral barcodes’ can be used to rapidly identify new psychotropic chemicals and to predict their molecular targets. For example, we identified new acetylcholinesterase and monoamine oxidase inhibitors using phenotypic comparisons and computational techniques. By combining high-throughput screening technologies with behavioral phenotyping in vivo, behavior-based chemical screens can accelerate the pace of neuroactive drug discovery and provide small-molecule tools for understanding vertebrate behavior. NATURE CHEMICAL BIOLOGY | VOL 6 | MARCH 2010 | www.nature.com/naturechemicalbiology 1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical S Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Labo University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, S California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for Research, Cambridge, Massachusetts, USA. *e-mail: dkokel@cvrc.mgh.harvard.edu or peterson@cvrc.mgh.harvard.edu © 2010 Nature America, Inc. All rights reserved. NATURE CHEMICAL BIOLOGY | ADVANCE ONLINE PUBLICATION | www.nature.com/naturechemicalbiology 1 Neuroactive drugs discovered in the 1950s revolutionized our understanding of the nervous system and the treatment of its disorders1. Most of these drugs were discovered seren- dipitously when they produced unexpected behavioral changes in animals or humans. Elucidation of the targets of these behavior- modifying compounds led to insights into nervous system function, and many of the drugs used for treating nervous system disorders today were derived from those same, serendipitous discoveries. Unfortunately, few new classes of neuroactive molecules have been discovered in the last 50 years, in part because pharmaceutical discovery efforts are dominated by simple, in vitro screening assays that fail to capture the complexity of the vertebrate nervous system2. Current drug discovery approaches are typically target based, mean- ing they seek to identify compounds that modify the in vitro activity of a specific protein target. These approaches benefit from being sys- tematic and high throughput. However, they generally lack the ability to discover drugs that modify nervous system function in new ways. Unlike target-based approaches, phenotype-based screens can identify compounds that produce a desired phenotype without a priori assumptions about their targets. Phenotype-based screens in cultured cells and whole organisms have identified powerful new compounds with novel activities on unexpected targets in vivo3. However, it has been difficult to combine chemical- screening para- digms with behavioral phenotyping, perhaps because many well studied behaviors are too variable or occur in animals that are too large for screening in multi-well format. A common limitation of compounds discovered by phenotype-based methods is the difficulty in determining their mechanisms of action. It has been proposed that systems-level analyses of content-rich pheno- typic data could be used to identify mechanistic similarities between compounds and predict their mechanisms of action4. Repositories of high-throughput screening data such as PubChem and ChemBank are beginning to make such analyses possible, but difficulties remain, including the challenges of comparing phenotypes across disparate assay types, libraries and experimental conditions. Theoretically, the behavioral effects of small molecules could provide sufficient content to enable compound characterization and prediction of their mecha- nisms of action. However, because behaviors can be complex and dif- ficult to quantify, systems-level comparison of behavioral phenotypes would require conversion of the behaviors into simple, quantitative measures that are more amenable to such approaches. Given the unmet need for novel psychotropic drugs, we sought to develop a small-molecule discovery process that combined the scale of modern high-throughput screening with the biological complexity of behavioral phenotyping in living animals. Here, we report development of a fully automated platform for analyzing the behavioral effects of small molecules on embryonic zebrafish. Using this platform, we have identified hundreds of behavior-modifying compounds. We further demonstrate that complex behavioral changes can be distilled into simple behavioral ‘barcodes’ to classify psychotropic drugs and determine their mechanisms of action. RESULTS The photomotor response We discovered that a high-intensity light stimulus elicits a stereo- typic series of motor behaviors in embryonic zebrafish that we call the photomotor response (PMR) (Fig. 1a,b and Supplementary Movies 1–3). The PMR can be divided into four broad phases: a pre-stimulus background phase, a latency phase, an excitation phase and a refractory phase (Fig. 1c). During the pre-stimulus phase, zebrafish embryos were mostly inactive, showing low basal activity characterized by spontaneous and infrequent body flexions within their chorions. Presentation of a light stimulus elicited a robust motor excitation phase (lasting 5–7 s) characterized by vigorous 1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, San Francisco, California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for Psychiatric Research, Cambridge, Massachusetts, USA. *e-mail: dkokel@cvrc.mgh.harvard.edu or peterson@cvrc.mgh.harvard.edu Rapid behavior-based identification of neuroactive small molecules in the zebrafish David Kokel1,2*, Jennifer Bryan3,4, Christian Laggner5, Rick White3, Chung Yan J Cheung1,2, Rita Mateus1,2, David Healey1,2, Sonia Kim1,2, Andreas A Werdich1, Stephen J Haggarty2,6,7, Calum A MacRae1, Brian Shoichet5 & Randall T Peterson1,2* Neuroactive small molecules are indispensable tools for treating mental illnesses and dissecting nervous system function. However, it has been difficult to discover novel neuroactive drugs. Here, we describe a high-throughput, behavior-based approach to neuroactive small molecule discovery in the zebrafish. We used automated screening assays to evaluate thousands of chemical compounds and found that diverse classes of neuroactive molecules caused distinct patterns of behavior. These ‘behavioral barcodes’ can be used to rapidly identify new psychotropic chemicals and to predict their molecular targets. For example, we identified new acetylcholinesterase and monoamine oxidase inhibitors using phenotypic comparisons and computational techniques. By combining high-throughput screening technologies with behavioral phenotyping in vivo, behavior-based chemical screens can accelerate the pace of neuroactive drug discovery and provide small-molecule tools for understanding vertebrate behavior.
  5. 11.

    dopa Dihydrexidine HCl (DOPA) L−Cysteine sulfinic acid (EAA) 5−Methylurapidil (5HT)

    MK 212 HCl (5HT) 5654183 (unk) Blank (Control) Chloroethylclonidine 2HCl (ADR) 6951255 (unk) Glycopyrrolate (MUSC−) (±)−PD 128,907 HCl (DOPA) Kyotorphin (OPI) N6−P−Sulfophenyladenosine (ADS) BICUCULLINE (+) (GABA) VERATRIDINE (CHA) Dipropyl−6,7−ADTN HBr (DOPA) 6986473 (unk) Bromocriptine mesylate (DOPA) R(−)−2,10,11−Trihydroxyap (DOPA) Dihydroergocristine mesyl (DOPA) R(−)−2,10,11−Trihydroxyap (DOPA) A−77636 HCl (DOPA) A−77636 HCl (DOPA) O−Phospho−L−serine (EAA) Dipropyl−6,7−ADTN HBr (DOPA) (+)−PD 128907 HCl (DOPA) 5−Methylurapidil (5HT) Imidazole−4−acetic acid H (GABA) Dihydrexidine HCl (DOPA) Dihydroergocristine mesyl (DOPA) Dihydroergocristine mesyl (DOPA) V V V V V V
  6. 13.
  7. 14.
  8. 15.

    How STAT 545 projects went sideways: An Incomplete List inability

    to … scrape data off the web … request data from an API … parse JSON or XML utter defeat by date times text encoding fiascos ineptitude with regular expressions R scripts that consume infinite time and RAM software installation gong shows
  9. 18.

    Professional Masters degree 10-months full-time 24 1-credit course modules 6-credit

    Capstone Project Collaborative effort by STAT & CS (Faculty of Science)
  10. 19.

    Descriptive Statistics and Probability Statistical Inference and Computation I Statistical

    Inference and Computation II Regression I Regression II Spatial and Temporal Models Experimentation and Causal Inference Algorithms and Data Structures Databases and Data Retrieval STAT CS Supervised Learning I Supervised Learning II Unsupervised Learning Feature and Model Selection Advanced Machine Learning STAT/CS 14/30 credits
  11. 20.

    Programming for Data Science Computing Platforms for Data Science Data

    Science Workflows Collaborative Software Development Web and Cloud Computing Data Wrangling Data Visualization I Data Visualization II Privacy, Ethics, and Security Communication and Argumentation Capstone Project 16/30 credits STAT CS DS
  12. 22.

    "We don't have to teach data science, it's just a

    fancy word for statistics" "Why would we teach programming? This is a statistics course" Tweet by David Robinson @drob
  13. 23.

    pick one: data science is ‘just’ statistics data wrangling is

    not statistics programming version control visualization testing web apps ...
  14. 24.

    pick, at most, one: data science is ‘just’ statistics data

    wrangling is not statistics programming version control visualization testing web apps ...
  15. 25.

    We can say "data science is just statistics" if and

    only if we broaden the definition of "statistics".
  16. 28.
  17. 31.

    http://kbroman.org/hipsteR/ hipsteR re-educating people who learned R before it was

    cool … my knowledge of R seems stuck in 2001. I keep finding out about “new” R functions (like replicate, which was new in 2003). This is a tutorial for people like me, or people who were taught by people like me.
  18. 32.

    Tidy Import Visualise Transform Model Program tibble tidyr purrr magrittr

    dplyr forcats hms ggplot2 broom modelr readr readxl haven xml2 lubridate stringr tidyverse.org r4ds.had.co.nz tidyposterior yardstick rsample recipes
  19. 35.

    source is real “The source code is real. The objects

    are realizations of the source code. Source for EVERY user modified object is placed in a particular directory or directories, for later editing and retrieval.” -- from the ESS manual
  20. 36.
  21. 37.

    If the first line of your R script is rm(list

    = ls()) I will come into your office and SET YOUR COMPUTER ON FIRE .
  22. 41.
  23. 42.
  24. 44.
  25. 45.

    One folder per project That folder is an • RStudio

    Project (package? website? whatever) • Or similar implementation in your IDE of choice • Git repo, with associated GitHub remote Work on multiple projects at once w/ multiple instances of RStudio (or other IDE) • Each gets own child R process • R & file browser have sane working directory
  26. 47.

    If the first line of your R script is setwd("C:\Users\jenny\path\that\only\I\have")

    I* will come into your office and SET YOUR COMPUTER ON FIRE . * or maybe Timothée Poisot will
  27. 49.

    Don’t rush into a complicated folder hierarchy Build paths relative

    to project’s top-level folder But once you need sub-folders … Use the here package to build paths install.packages("here")
  28. 50.

    ggsave(here("figs", “cleveland-alloc.png”)) Works on my machine, works on yours! Works

    even if working directory is in a sub-folder Works for RStudio projects, Git repos, R packages, … Works with knitr / rmarkdown
  29. 52.
  30. 54.
  31. 55.

    Reason to iterate #2: Refine and Extend Make your code

    more Readable Efficient Resilient General
  32. 62.

    Input Code Output raw data smell-test.R wisdom raw data wrangle.R

    data.csv data.csv model.R fits.rds ests.csv data.csv make-figs.R figs/* fits.rds ests.csv figs/* report.Rmd report.html ests.csv report.docx report.pdf
  33. 66.

    “commit” a file or project state that is meaningful to

    you for inspection, comparison, restoration
  34. 68.
  35. 69.
  36. 70.

    Excuse me, do you have a moment to talk about

    version control? https://doi.org/10.7287/peerj.preprints.3159v2
  37. 72.
  38. 73.

    Good enough practices in scientific computing Wilson, Bryan, Cranston, Kitzes,

    Nederbragt, Teal https://doi.org/10.1371/journal.pcbi.1005510 http://bit.ly/good-enuff
  39. 74.

    Thanks: Mara Averick Matthew Lincoln Hadley Wickham STAT 545 TAs

    UBC MDS Fellows & Faculty @JennyBryan @jennybc  