Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Reproducible research practices are central for scholarly dissemination and communication. The importance of sharing data and computational code has been highlighted extensively to promote transparency and replication of scientific results. Data and code sharing practices are rare in the plant pathology, but is expect to increase in the near future. It is urgent that plant pathologists get training in how to organize their research in a way that promotes effective project management, reproducibility, collaboration and sharing of results. Standardized procedures for the collection and systematic organization of research outcomes (data files, codes, graphs, manuscript, etc.), known as "research compendium", has been proposed and tools are available to facilitate its production. Participants will learn how to use existing R tools for producing a research compendium for optimizing workflow and enhancing the efficiency of analysis, re-analysis and sharing. Other tools will include introductions to the Open Science Framework, websites for generating Digital Object Identifiers (DOI), how and why to establish pre-print publications, and tools for for making independent and automated analyses.

54197e156adad1b100edde325b492b3d?s=128

Emerson M. Del Ponte

August 03, 2019
Tweet

Transcript

  1. Get your Research Project/Article Organized, Shareable and Reproducible using and

    friends Emerson M. Del Ponte Open Plant Pathology Universidade Federal de Viçosa 4-h Workshop
  2. Why Reproducible Research (RR)? Data + Protocols + Code Knowledge

    Reproducible Documenting and Sharing Efficiency short term effect Accessibility short and long term effect Transparency Information pen Reproducibility Research Practices Replicability
  3. - Sponsors/journals require data (standard in molecular) - Work more

    efficiently and facilitate collaborations - Improved reproducibility (data and/or methods) - Technology (less cumbersome) is becoming available - Enhanced visibility and transparency - Multiple citable outcomes: data, code, manuscript, etc. Why to change/learn new things?
  4. When RR? Idea Register (proposal) Run experiments Get data Analyze

    Communicate
  5. Idea Register Preregistration Of studies University/Institution Laboratory computer When RR?

  6. Idea Register Preregistration of Studies University/Institution Laboratory computer https://osf.io/6tsnj When

    RR?
  7. Run experiments When RR?

  8. Run experiments When RR?

  9. Get data Submit datasets accompanying data descriptors to: Discipline-specific repositories

    Generalist repositories When RR?
  10. Analyze Research Compendium When RR?

  11. Communicate Abstract When RR? Preprint OA paper Paywalled paper Quick

    Files Talk Poster
  12. How RR? Science Collect Analyse Publish Write Summarise Reproduce Re-analyse

    (meta-analysis) Share data Open Repository Share code open/free tools Collaborative tools Citation manager Pre-prints Open Access
  13. Sparks et al. (unpublished) 101 99 Count How are we

    plant pathologists doing?
  14. Sparks et al. (unpublished) No Upon request Paywalled Free access

    Are data made available?
  15. Sparks et al. (unpublished) No Free access Are codes made

    available?
  16. Sparks et al. (unpublished) No Name only Version # Full

    citation Is Software properly cited?
  17. Sparks et al. (unpublished) What are the software being used?

  18. - Lack of interest/knowledge (supplemental rarely posted) - Low incentive/pressure

    - that may change! - Perception that it takes time and effort - Document data and code - Versioning code and maintaining - FOBS - Fear of being scooped? - Not valued/taught in our graduate programs Why is it being so slow to adopt RR?
  19. Tools Workflows Environments Collaborative & sharing platforms Ok, I want

    do it differently, but how? Research Project Organized Documented Shared Accessible Reproducible
  20. Data wrangling - Excel Data visualization - Excel Data analysis

    - SAS, STATA Scientific plots - SigmaPlot Text editor - MSWord BIB Save money in Software! Use R and Friends
  21. Start small! Then build on it... Article Article (+ preprint)

    Supplemental (zip) - Protocols - Data Article + preprint Repository (citable) - Protocols - Data - Code Article + preprint Research compendium - Raw Data - Clean Data - Analysis (reproducible) Reproducibility 0 1 2 3 level
  22. A new research/submission workflow? Project Data Analysis Manuscript Preprint Journal

    Submission system Early view Final publication Poster/Talk Research Compendium
  23. http://inundata.org/talks/rstd19/ https://research-compendium.science/ How RR? Source: https://research-compendium.science/

  24. http://inundata.org/talks/rstd19/ How RR ?

  25. https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375986 https://peerj.com/preprints/3192/ How RR?

  26. R package structure as inspiration Small Medium Large

  27. BIB Minimal RC Short RC webpage Full RC Website +

    manuscript Structures/templates for RC (not a package) CSL
  28. Let's work? 5 Workout Sessions (45 min each) 1) Introduction

    2) RStudio project + GitHub 3) The research compendium 4) Manuscript in RMarkdown 5) Packages for automating tasks & RStudio Cloud
  29. Everything starts as a PROJECT!

  30. RStudio and his friend Git 1. Download and install Git

    (GitHub Desktop) 2. Download and install RStudio 3. Go to GitHub and create a new repository Let's practice
  31. RStudio and his friend Git 1. Create a new RStudio

    Project from Git 2. Add your repository URL Your turn
  32. None
  33. 1. Clone a repository (open in Desktop) 2. https:/ /github.com/emdelponte/paper-FHB-yield-loss

    3. Change files 4. Create a pull request to the owner
  34. Let's explore a research compendium Short RC webpage 1. Fork

    the repository (short webpage compendium) a. https:/ /github.com/emdelponte/RC-example-webpage 2. Explore the files 3. Reproduce the analysis 4. Knit to generate the webpage 5. Commit and push to your GH
  35. GitHub and Open Science Framework 1. Link GitHub and Open

    Science Framework 2. Create an OSF Project
  36. 1. Link GitHub and Open Science Framework 2. Name your

    project 3. Enable GitHub Add-on GitHub and Open Science Framework
  37. 1. Link GitHub and Open Science Framework 2. Name your

    project 3. Select GitHub Add-on 4. Configure Add-on GitHub and Open Science Framework
  38. 1. Finish description GitHub and Open Science Framework

  39. Let's practice: Data management 1. Create/Open an RMarkdown file a.

    Modify the output parameters 2. Load data a. .csv file b. .xlsx file c. .gsheet file 3. Add some basic commenting 4. Do some basic wrangling 5. Export data to .csv
  40. How RR for data management Get data Data file types

    Binary Text files web-based
  41. Data management: must read!

  42. Organizing, naming, shaping! Data management Broman and Woo (2018)

  43. Analyze

  44. Analyze

  45. Fork or download a Repository 1. Fork the RC as

    webpage template a. https://github.com/emdelponte/RC-example-webpage 2. Download the files from GitHub 3. Start with a new RStudio Project + Git 4. Reproduce the analysis 5. Change some content/parameters 6. Push it to your GH account
  46. Add local folder and create a GitHub Repository

  47. Add local folder and create a GitHub Repository

  48. Let's practice 1. Fork the RC as website template a.

    https:/ /github.com/emdelponte/RC-example-website 2. Change and/or generate the website (knit) 3. Push changes to your GitHub 4. Create a GH webpage for it 5. Send it to OSF project
  49. RMarkdown templates 1. RMarkdown html from the basic RStudio templates

    2. Rmdformats: https:/ /github.com/juba/rmdformats 3. Distill for RMarkdown: https:/ /github.com/rstudio/distill 4. RMarkdown websites: https:/ /rmarkdown.rstudio.com/lesson-13.html
  50. YML header Manuscript in RMarkdown? --- title: "The title goes

    here" author: "Author name goes here" output: html_document: default word_document: reference_docx: template.docx linestretch: 2 link-citations: yes linkcolor: blue csl: chicago-author-date.csl bibliography: crossref.bib --- BIB CSL https://www.zotero.org/styles
  51. Automating RC website creation Workflowr package Organized • Provides a

    project template with organized subdirectories • Mixes code and results with R Markdown • Uses Git to version both source code and results Reproducible • Displays the code version used to create each result • Runs each analysis in an isolated R session ◦ Records the session information of each analysis ◦ Sets the same seed for random number generation for each analysis Shareable ◦ Creates a website to present your research results ◦ Documents how to host your website for free via GitHub Pages or GitLab Pages ◦ Creates links to past versions of results https://jdblischak.github.io/workflowr
  52. Workflowr pkg https:/ /jdblischak.github.io/workflowr/articles/wflow-01-getting-started.html

  53. rrtools, an R package to create RC as package! https://github.com/benmarwick/rrtools

  54. rrtools, pkg to facilitate creation of RC as a pkg!

  55. rrtools, pkg to facilitate creation of RC as a pkg!

    https:/ /rstudio.cloud/project/424109
  56. Let's make it reproducible for future?

  57. remotes::install_github("karthik/holepunch") library(holepunch) write_compendium_description(package = "RC template ", description = "A

    template for a research compendium") write_dockerfile(maintainer = "Your name") generate_badge() # This generates a badge for your readme. # At this time push the code to GitHub # And click on the badge or use the function below to get the build # ready ahead of time. build_binder() Let's make it reproducible for future?
  58. https:/ /github.com/emdelponte/RC-template RC was a website template Live examples: https://emdelponte.github.io/paper-FHB-Brazil-meta-analysis/

    https://emdelponte.github.io/paper-fungicides-whitemold/ https://mladencucak.github.io/AnalysisPLBIreland/index.html
  59. www.openplantpathology.org