Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zen And The aRt Of Workflow Maintenance

Zen And The aRt Of Workflow Maintenance

Links: https://rstd.io/jenny-latinr

Delivered at

* Latin-R, http://latin-r.com, Buenos Aires 04-05 Sept 2018
* 10th Conference of the IASC-ARS/68th Annual NZSA Conference, http://www.nzsa2017.com, Auckland, 10-14 Dec 2017

Jennifer (Jenny) Bryan

September 04, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1.  
    Zen And The aRt Of
    Workflow Maintenance
    Jennifer Bryan 

    RStudio, University of British Columbia
    @JennyBryan @jennybc

    View Slide

  2. rstd.io/jenny-latinr
    links to stuff in this talk!!

    View Slide

  3. Is data science just a trendy
    term for statistics?
    7 Life-Changing Workflow
    Tips Every useR Should Know

    View Slide

  4. Is data science just a trendy
    term for statistics?

    View Slide

  5. Is data science just a trendy
    term for statistics?
    No.

    View Slide

  6. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View Slide

  7. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View Slide

  8. Quandary of the applied statistician

    View Slide

  9. 2008−10−07−p03−01−D11
    Aligned Time (s)
    Motion index
    0 5 10 15 20 25 30
    0 20 40 60 80 100
    Motion Index
    PRE L1 L2 E1 E2 R1 R2

    View Slide

  10. 2009−03−25−p03−01−C10
    Aligned Time (s)
    Motion index
    0 5 10 15 20 25 30
    0 20 40 60 80 100
    isoproteretol
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    diazepam
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    R(−)−Apomorphine
    2008−02−05−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−01−23−
    2008−02−14−
    2008−02−05−
    2008−02−05−
    digitoxigenin
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    6−Nitroquipazine maleate
    2008−02−14−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−02−05−
    2008−01−23−
    2008−02−05−
    2008−02−05−
    2009−04−10−p01−01−D03
    Aligned Time (s)
    Motion index
    0 5 10 15 20 25 30
    0 20 40 60 80 100
    2008−01−23−p01−03−E08
    Aligned Time (s)
    Motion index
    0 5 10 15 20 25 30
    0 20 40 60 80 100
    Isoproterenol
    Diazepam
    Digitoxigenin
    Apomorphine
    6-nitroquipazine
    2008−02−14−p03−02−G09
    Aligned Time (s)
    Motion index
    0 5 10 15 20 25 30
    0 20 40 60 80 100
    Motion index
    Motion index Motion index
    Time (s)
    Time (s)
    Time (s)
    isoproteretol
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    diazepam
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    R(−)−Apomorphine
    2008−02−05−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−01−23−
    2008−02−14−
    2008−02−05−
    2008−02−05−
    digitoxigenin
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    6−Nitroquipazine maleate
    2008−02−14−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−02−05−
    2008−01−23−
    2008−02−05−
    2008−02−05−
    isoproteretol
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    diazepam
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    R(−)−Apomorphine
    2008−02−05−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−01−23−
    2008−02−14−
    2008−02−05−
    2008−02−05−
    digitoxigenin
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    6−Nitroquipazine maleate
    2008−02−14−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−02−05−
    2008−01−23−
    2008−02−05−
    2008−02−05−
    isoproteretol
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    2009−04−14−
    diazepam
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    2009−04−10−
    R(−)−Apomorphine
    2008−02−05−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−01−23−
    2008−02−14−
    2008−02−05−
    2008−02−05−
    digitoxigenin
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    2009−03−25−
    6−Nitroquipazine maleate
    2008−02−14−
    2008−02−14−
    2008−02−14−
    2008−01−23−
    2008−01−23−
    2008−02−05−
    2008−01−23−
    2008−02−05−
    2008−02−05−
    Time (s)
    Motion index
    D
    G
    J
    M
    E
    H
    K
    N
    F
    I
    L
    O
    1
    2
    3
    4
    5
    6
    7
    8
    9
    1
    2
    3
    4
    5
    6
    7
    8
    9
    1
    2
    3
    4
    5
    6
    7
    8
    9
    1
    2
    3
    4
    5
    6
    7
    8
    9
    −4 −2 −1 0 1 2 3 4
    less active
    than controls
    more active
    than controls
    PRE L1 L2 E1 E2 R1 R2
    helSed
    letSick
    Vab
    © 2010 Nature America, Inc. All rights reserved.
    NATURE CHEMICAL BIOLOGY | ADVANCE ONLINE PUBLICATION | www.nature.com/naturechemicalbiology 1
    Neuroactive drugs discovered in the 1950s revolutionized our
    understanding of the nervous system and the treatment of
    its disorders1. Most of these drugs were discovered seren-
    dipitously when they produced unexpected behavioral changes in
    animals or humans. Elucidation of the targets of these behavior-
    modifying compounds led to insights into nervous system function,
    and many of the drugs used for treating nervous system disorders
    today were derived from those same, serendipitous discoveries.
    Unfortunately, few new classes of neuroactive molecules have
    been discovered in the last 50 years, in part because pharmaceutical
    discovery efforts are dominated by simple, in vitro screening assays
    that fail to capture the complexity of the vertebrate nervous system2.
    Current drug discovery approaches are typically target based, mean-
    ing they seek to identify compounds that modify the in vitro activity
    of a specific protein target. These approaches benefit from being sys-
    tematic and high throughput. However, they generally lack the ability
    to discover drugs that modify nervous system function in new ways.
    Unlike target-based approaches, phenotype-based screens can
    identify compounds that produce a desired phenotype without a
    priori assumptions about their targets. Phenotype-based screens in
    cultured cells and whole organisms have identified powerful new
    compounds with novel activities on unexpected targets in vivo3.
    However, it has been difficult to combine chemical- screening para-
    digms with behavioral phenotyping, perhaps because many well
    studied behaviors are too variable or occur in animals that are too
    large for screening in multi-well format.
    A common limitation of compounds discovered by phenotype-based
    methods is the difficulty in determining their mechanisms of action. It
    has been proposed that systems-level analyses of content-rich pheno-
    typic data could be used to identify mechanistic similarities between
    compounds and predict their mechanisms of action4. Repositories of
    high-throughput screening data such as PubChem and ChemBank
    are beginning to make such analyses possible, but difficulties remain,
    including the challenges of comparing phenotypes across disparate
    assay types, libraries and experimental conditions. Theoretically, the
    behavioral effects of small molecules could provide sufficient content
    to enable compound characterization and prediction of their mecha-
    nisms of action. However, because behaviors can be complex and dif-
    ficult to quantify, systems-level comparison of behavioral phenotypes
    would require conversion of the behaviors into simple, quantitative
    measures that are more amenable to such approaches.
    Given the unmet need for novel psychotropic drugs, we sought
    to develop a small-molecule discovery process that combined the
    scale of modern high-throughput screening with the biological
    complexity of behavioral phenotyping in living animals. Here, we
    report development of a fully automated platform for analyzing the
    behavioral effects of small molecules on embryonic zebrafish. Using
    this platform, we have identified hundreds of behavior-modifying
    compounds. We further demonstrate that complex behavioral
    changes can be distilled into simple behavioral ‘barcodes’ to classify
    psychotropic drugs and determine their mechanisms of action.
    RESULTS
    The photomotor response
    We discovered that a high-intensity light stimulus elicits a stereo-
    typic series of motor behaviors in embryonic zebrafish that we call
    the photomotor response (PMR) (Fig. 1a,b and Supplementary
    Movies 1–3). The PMR can be divided into four broad phases:
    a pre-stimulus background phase, a latency phase, an excitation
    phase and a refractory phase (Fig. 1c). During the pre-stimulus
    phase, zebrafish embryos were mostly inactive, showing low basal
    activity characterized by spontaneous and infrequent body flexions
    within their chorions. Presentation of a light stimulus elicited a robust
    motor excitation phase (lasting 5–7 s) characterized by vigorous
    1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
    Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Laboratories,
    University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, San Francisco,
    California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for Psychiatric
    Research, Cambridge, Massachusetts, USA. *e-mail: [email protected] or [email protected]
    Rapid behavior-based identification of
    neuroactive small molecules in the zebrafish
    David Kokel1,2*, Jennifer Bryan3,4, Christian Laggner5, Rick White3, Chung Yan J Cheung1,2,
    Rita Mateus1,2, David Healey1,2, Sonia Kim1,2, Andreas A Werdich1, Stephen J Haggarty2,6,7,
    Calum A MacRae1, Brian Shoichet5 & Randall T Peterson1,2*
    Neuroactive small molecules are indispensable tools for treating mental illnesses and dissecting nervous system function.
    However, it has been difficult to discover novel neuroactive drugs. Here, we describe a high-throughput, behavior-based
    approach to neuroactive small molecule discovery in the zebrafish. We used automated screening assays to evaluate thousands
    of chemical compounds and found that diverse classes of neuroactive molecules caused distinct patterns of behavior. These
    ‘behavioral barcodes’ can be used to rapidly identify new psychotropic chemicals and to predict their molecular targets.
    For example, we identified new acetylcholinesterase and monoamine oxidase inhibitors using phenotypic comparisons
    and computational techniques. By combining high-throughput screening technologies with behavioral phenotyping in vivo,
    behavior-based chemical screens can accelerate the pace of neuroactive drug discovery and provide small-molecule tools for
    understanding vertebrate behavior.
    NATURE CHEMICAL BIOLOGY | VOL 6 | MARCH 2010 | www.nature.com/naturechemicalbiology
    1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical S
    Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Labo
    University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, S
    California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for
    Research, Cambridge, Massachusetts, USA. *e-mail: [email protected] or [email protected]
    © 2010 Nature America, Inc. All rights reserved.
    NATURE CHEMICAL BIOLOGY | ADVANCE ONLINE PUBLICATION | www.nature.com/naturechemicalbiology 1
    Neuroactive drugs discovered in the 1950s revolutionized our
    understanding of the nervous system and the treatment of
    its disorders1. Most of these drugs were discovered seren-
    dipitously when they produced unexpected behavioral changes in
    animals or humans. Elucidation of the targets of these behavior-
    modifying compounds led to insights into nervous system function,
    and many of the drugs used for treating nervous system disorders
    today were derived from those same, serendipitous discoveries.
    Unfortunately, few new classes of neuroactive molecules have
    been discovered in the last 50 years, in part because pharmaceutical
    discovery efforts are dominated by simple, in vitro screening assays
    that fail to capture the complexity of the vertebrate nervous system2.
    Current drug discovery approaches are typically target based, mean-
    ing they seek to identify compounds that modify the in vitro activity
    of a specific protein target. These approaches benefit from being sys-
    tematic and high throughput. However, they generally lack the ability
    to discover drugs that modify nervous system function in new ways.
    Unlike target-based approaches, phenotype-based screens can
    identify compounds that produce a desired phenotype without a
    priori assumptions about their targets. Phenotype-based screens in
    cultured cells and whole organisms have identified powerful new
    compounds with novel activities on unexpected targets in vivo3.
    However, it has been difficult to combine chemical- screening para-
    digms with behavioral phenotyping, perhaps because many well
    studied behaviors are too variable or occur in animals that are too
    large for screening in multi-well format.
    A common limitation of compounds discovered by phenotype-based
    methods is the difficulty in determining their mechanisms of action. It
    has been proposed that systems-level analyses of content-rich pheno-
    typic data could be used to identify mechanistic similarities between
    compounds and predict their mechanisms of action4. Repositories of
    high-throughput screening data such as PubChem and ChemBank
    are beginning to make such analyses possible, but difficulties remain,
    including the challenges of comparing phenotypes across disparate
    assay types, libraries and experimental conditions. Theoretically, the
    behavioral effects of small molecules could provide sufficient content
    to enable compound characterization and prediction of their mecha-
    nisms of action. However, because behaviors can be complex and dif-
    ficult to quantify, systems-level comparison of behavioral phenotypes
    would require conversion of the behaviors into simple, quantitative
    measures that are more amenable to such approaches.
    Given the unmet need for novel psychotropic drugs, we sought
    to develop a small-molecule discovery process that combined the
    scale of modern high-throughput screening with the biological
    complexity of behavioral phenotyping in living animals. Here, we
    report development of a fully automated platform for analyzing the
    behavioral effects of small molecules on embryonic zebrafish. Using
    this platform, we have identified hundreds of behavior-modifying
    compounds. We further demonstrate that complex behavioral
    changes can be distilled into simple behavioral ‘barcodes’ to classify
    psychotropic drugs and determine their mechanisms of action.
    RESULTS
    The photomotor response
    We discovered that a high-intensity light stimulus elicits a stereo-
    typic series of motor behaviors in embryonic zebrafish that we call
    the photomotor response (PMR) (Fig. 1a,b and Supplementary
    Movies 1–3). The PMR can be divided into four broad phases:
    a pre-stimulus background phase, a latency phase, an excitation
    phase and a refractory phase (Fig. 1c). During the pre-stimulus
    phase, zebrafish embryos were mostly inactive, showing low basal
    activity characterized by spontaneous and infrequent body flexions
    within their chorions. Presentation of a light stimulus elicited a robust
    motor excitation phase (lasting 5–7 s) characterized by vigorous
    1Cardiovascular Research Center and Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
    Charlestown, Massachusetts, USA. 2Broad Institute, Cambridge, Massachusetts, USA. 3Department of Statistics and 4Michael Smith Laboratories,
    University of British Columbia, Vancouver, British Columbia, Canada. 5Department of Pharmaceutical Chemistry, University of California, San Francisco,
    California, USA. 6Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 7Stanley Center for Psychiatric
    Research, Cambridge, Massachusetts, USA. *e-mail: [email protected] or [email protected]
    Rapid behavior-based identification of
    neuroactive small molecules in the zebrafish
    David Kokel1,2*, Jennifer Bryan3,4, Christian Laggner5, Rick White3, Chung Yan J Cheung1,2,
    Rita Mateus1,2, David Healey1,2, Sonia Kim1,2, Andreas A Werdich1, Stephen J Haggarty2,6,7,
    Calum A MacRae1, Brian Shoichet5 & Randall T Peterson1,2*
    Neuroactive small molecules are indispensable tools for treating mental illnesses and dissecting nervous system function.
    However, it has been difficult to discover novel neuroactive drugs. Here, we describe a high-throughput, behavior-based
    approach to neuroactive small molecule discovery in the zebrafish. We used automated screening assays to evaluate thousands
    of chemical compounds and found that diverse classes of neuroactive molecules caused distinct patterns of behavior. These
    ‘behavioral barcodes’ can be used to rapidly identify new psychotropic chemicals and to predict their molecular targets.
    For example, we identified new acetylcholinesterase and monoamine oxidase inhibitors using phenotypic comparisons
    and computational techniques. By combining high-throughput screening technologies with behavioral phenotyping in vivo,
    behavior-based chemical screens can accelerate the pace of neuroactive drug discovery and provide small-molecule tools for
    understanding vertebrate behavior.

    View Slide

  11. dopa
    Dihydrexidine HCl (DOPA)
    L−Cysteine sulfinic acid (EAA)
    5−Methylurapidil (5HT)
    MK 212 HCl (5HT)
    5654183 (unk)
    Blank (Control)
    Chloroethylclonidine 2HCl (ADR)
    6951255 (unk)
    Glycopyrrolate (MUSC−)
    (±)−PD 128,907 HCl (DOPA)
    Kyotorphin (OPI)
    N6−P−Sulfophenyladenosine (ADS)
    BICUCULLINE (+) (GABA)
    VERATRIDINE (CHA)
    Dipropyl−6,7−ADTN HBr (DOPA)
    6986473 (unk)
    Bromocriptine mesylate (DOPA)
    R(−)−2,10,11−Trihydroxyap (DOPA)
    Dihydroergocristine mesyl (DOPA)
    R(−)−2,10,11−Trihydroxyap (DOPA)
    A−77636 HCl (DOPA)
    A−77636 HCl (DOPA)
    O−Phospho−L−serine (EAA)
    Dipropyl−6,7−ADTN HBr (DOPA)
    (+)−PD 128907 HCl (DOPA)
    5−Methylurapidil (5HT)
    Imidazole−4−acetic acid H (GABA)
    Dihydrexidine HCl (DOPA)
    Dihydroergocristine mesyl (DOPA)
    Dihydroergocristine mesyl (DOPA)
    V
    V
    V
    V
    V
    V



    View Slide

  12. http://stat545.com

    View Slide

  13. View Slide

  14. View Slide

  15. How STAT 545 projects went sideways:
    An Incomplete List
    inability to
    … scrape data off the web
    … request data from an API
    … parse JSON or XML
    utter defeat by date times
    text encoding fiascos
    ineptitude with regular expressions
    R scripts that consume infinite time and RAM
    software installation gong shows

    View Slide

  16. @JennyBryan
    @jennybc
    @STAT545
    http://stat545.com



    What if …
    I actually taught that?!

    View Slide

  17. @JennyBryan
    @jennybc
    @STAT545
    http://stat545.com



    Step 1:
    get better at it myself!

    View Slide

  18. Professional Masters degree
    10-months full-time
    24 1-credit course modules
    6-credit Capstone Project
    Collaborative effort by STAT & CS (Faculty of Science)

    View Slide

  19. Descriptive Statistics and Probability
    Statistical Inference and Computation I
    Statistical Inference and Computation II
    Regression I
    Regression II
    Spatial and Temporal Models
    Experimentation and Causal Inference
    Algorithms and Data Structures
    Databases and Data Retrieval
    STAT
    CS
    Supervised Learning I
    Supervised Learning II
    Unsupervised Learning
    Feature and Model Selection
    Advanced Machine Learning
    STAT/CS
    14/30 credits

    View Slide

  20. Programming for Data Science
    Computing Platforms for Data Science
    Data Science Workflows
    Collaborative Software Development
    Web and Cloud Computing
    Data Wrangling
    Data Visualization I
    Data Visualization II
    Privacy, Ethics, and Security
    Communication and Argumentation
    Capstone Project
    16/30 credits
    STAT
    CS
    DS

    View Slide

  21. via Donoho’s 50 years of Data Science

    View Slide

  22. "We don't have to teach data science,
    it's just a fancy word for statistics"
    "Why would we teach programming?
    This is a statistics course"
    Tweet by David Robinson @drob

    View Slide

  23. pick one:
    data science is ‘just’ statistics
    data wrangling is not statistics
    programming
    version control
    visualization
    testing
    web apps
    ...

    View Slide

  24. pick, at most, one:
    data science is ‘just’ statistics
    data wrangling is not statistics
    programming
    version control
    visualization
    testing
    web apps
    ...

    View Slide

  25. We can say
    "data science is just statistics"
    if and only if
    we broaden the
    definition of "statistics".

    View Slide

  26. https://doi.org/10.1080/10618600.2017.1389743

    View Slide

  27. Is data science just a trendy
    term for statistics?
    No.

    View Slide

  28. View Slide

  29. from Mächler’s talk

    View Slide

  30. R has
    changed
    since you
    graduated

    View Slide

  31. http://kbroman.org/hipsteR/
    hipsteR re-educating people who learned R before it was cool
    … my knowledge of R seems stuck in 2001. I keep finding out
    about “new” R functions (like replicate, which was new in
    2003).
    This is a tutorial for people like me, or people who were
    taught by people like me.

    View Slide

  32. Tidy
    Import Visualise
    Transform
    Model
    Program
    tibble
    tidyr
    purrr
    magrittr
    dplyr
    forcats
    hms
    ggplot2
    broom
    modelr
    readr
    readxl
    haven
    xml2
    lubridate
    stringr
    tidyverse.org r4ds.had.co.nz
    tidyposterior
    yardstick
    rsample
    recipes

    View Slide

  33. Use an IDE
    Integrated
    Development
    Environment

    View Slide

  34. Emacs + ESS
    vim + Nvim-R

    RStudio

    View Slide

  35. source is real
    “The source code is real. The objects are
    realizations of the source code. Source for
    EVERY user modified object is placed in a
    particular directory or directories, for later
    editing and retrieval.”
    -- from the ESS manual

    View Slide

  36. View Slide

  37. If the first line of your R script is
    rm(list = ls())
    I will come into your office and
    SET YOUR COMPUTER ON FIRE .

    View Slide

  38. Restart R with a clean slate OFTEN,
    e.g., multiple times per day

    View Slide

  39. Accept help re: missing )’s or errant ,s

    View Slide

  40. use [Pp]rojects

    View Slide

  41. View Slide

  42. View Slide

  43. @JennyBryan
    @jennybc
    @STAT545
    http://stat545.com



    View Slide

  44. View Slide

  45. One folder per project
    That folder is an
    • RStudio Project (package? website? whatever)
    • Or similar implementation in your IDE of choice
    • Git repo, with associated GitHub remote
    Work on multiple projects at once w/ multiple
    instances of RStudio (or other IDE)
    • Each gets own child R process
    • R & file browser have sane working directory

    View Slide

  46. use portable
    file paths

    View Slide

  47. If the first line of your R script is
    setwd("C:\Users\jenny\path\that\only\I\have")
    I* will come into your office and
    SET YOUR COMPUTER ON FIRE .
    * or maybe Timothée Poisot will

    View Slide

  48. Blog post: Project-oriented workflow

    View Slide

  49. Don’t rush into a complicated folder hierarchy
    Build paths relative to project’s top-level folder
    But once you need sub-folders …
    Use the here package to build paths
    install.packages("here")

    View Slide

  50. ggsave(here("figs", “cleveland-alloc.png”))
    Works on my machine, works on yours!
    Works even if working directory is in a sub-folder
    Works for RStudio projects, Git repos, R packages, …
    Works with knitr / rmarkdown

    View Slide

  51. expect to
    iterate

    View Slide

  52. View Slide

  53. Reason to iterate #1:
    Get it right!
    New data?
    New understanding of data?

    View Slide

  54. View Slide

  55. Reason to iterate #2:
    Refine and Extend
    Make your code more
    Readable
    Efficient
    Resilient
    General

    View Slide

  56. beware of
    monoliths

    View Slide

  57. break logic &
    output into
    pieces

    View Slide

  58. everything.R
    smell-test.R
    wrangle.R
    model.R
    make-figs.R
    report.Rmd
    >>>

    View Slide

  59. smell-test.R
    wrangle.R
    model.R
    make-figs.R
    report.Rmd

    View Slide

  60. .Rdata
    raw-data.xlsx
    data.csv
    fits.rds
    ests.csv
    >>>

    View Slide

  61. raw-data.xlsx
    data.csv
    fits.rds
    ests.csv
    figs/hist.png
    figs/dot.png

    View Slide

  62. Input Code Output
    raw data smell-test.R wisdom
    raw data wrangle.R data.csv
    data.csv model.R fits.rds
    ests.csv
    data.csv make-figs.R figs/*
    fits.rds
    ests.csv
    figs/* report.Rmd report.html
    ests.csv report.docx
    report.pdf

    View Slide

  63. a humane API
    for your analysis

    View Slide

  64. consider version control

    View Slide

  65. I use Git + GitHub

    View Slide

  66. “commit”
    a file or project state that is meaningful to you
    for inspection, comparison, restoration

    View Slide

  67. “diff”
    What changed here?
    Why?
    Δ

    View Slide

  68. View Slide

  69. View Slide

  70. Excuse me, do you have a moment
    to talk about version control?
    https://doi.org/10.7287/peerj.preprints.3159v2

    View Slide

  71. happygitwithr.com

    View Slide

  72. View Slide

  73. Good enough practices in scientific computing
    Wilson, Bryan, Cranston, Kitzes, Nederbragt, Teal
    https://doi.org/10.1371/journal.pcbi.1005510
    http://bit.ly/good-enuff

    View Slide

  74. Thanks:
    Mara Averick
    Matthew Lincoln
    Hadley Wickham
    STAT 545 TAs
    UBC MDS Fellows & Faculty
    @JennyBryan
    @jennybc


    View Slide