Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Reproducible research practices are central for scholarly dissemination and communication. The importance of sharing data and computational code has been highlighted extensively to promote transparency and replication of scientific results. Data and code sharing practices are rare in the plant pathology, but is expect to increase in the near future. It is urgent that plant pathologists get training in how to organize their research in a way that promotes effective project management, reproducibility, collaboration and sharing of results. Standardized procedures for the collection and systematic organization of research outcomes (data files, codes, graphs, manuscript, etc.), known as "research compendium", has been proposed and tools are available to facilitate its production. Participants will learn how to use existing R tools for producing a research compendium for optimizing workflow and enhancing the efficiency of analysis, re-analysis and sharing. Other tools will include introductions to the Open Science Framework, websites for generating Digital Object Identifiers (DOI), how and why to establish pre-print publications, and tools for for making independent and automated analyses.

Emerson M. Del Ponte

August 03, 2019
Tweet

More Decks by Emerson M. Del Ponte

Other Decks in Education

Transcript

  1. Get your Research Project/Article Organized, Shareable and Reproducible using and

    friends Emerson M. Del Ponte Open Plant Pathology Universidade Federal de Viçosa 4-h Workshop
  2. Why Reproducible Research (RR)? Data + Protocols + Code Knowledge

    Reproducible Documenting and Sharing Efficiency short term effect Accessibility short and long term effect Transparency Information pen Reproducibility Research Practices Replicability
  3. - Sponsors/journals require data (standard in molecular) - Work more

    efficiently and facilitate collaborations - Improved reproducibility (data and/or methods) - Technology (less cumbersome) is becoming available - Enhanced visibility and transparency - Multiple citable outcomes: data, code, manuscript, etc. Why to change/learn new things?
  4. How RR? Science Collect Analyse Publish Write Summarise Reproduce Re-analyse

    (meta-analysis) Share data Open Repository Share code open/free tools Collaborative tools Citation manager Pre-prints Open Access
  5. Sparks et al. (unpublished) No Name only Version # Full

    citation Is Software properly cited?
  6. - Lack of interest/knowledge (supplemental rarely posted) - Low incentive/pressure

    - that may change! - Perception that it takes time and effort - Document data and code - Versioning code and maintaining - FOBS - Fear of being scooped? - Not valued/taught in our graduate programs Why is it being so slow to adopt RR?
  7. Tools Workflows Environments Collaborative & sharing platforms Ok, I want

    do it differently, but how? Research Project Organized Documented Shared Accessible Reproducible
  8. Data wrangling - Excel Data visualization - Excel Data analysis

    - SAS, STATA Scientific plots - SigmaPlot Text editor - MSWord BIB Save money in Software! Use R and Friends
  9. Start small! Then build on it... Article Article (+ preprint)

    Supplemental (zip) - Protocols - Data Article + preprint Repository (citable) - Protocols - Data - Code Article + preprint Research compendium - Raw Data - Clean Data - Analysis (reproducible) Reproducibility 0 1 2 3 level
  10. A new research/submission workflow? Project Data Analysis Manuscript Preprint Journal

    Submission system Early view Final publication Poster/Talk Research Compendium
  11. BIB Minimal RC Short RC webpage Full RC Website +

    manuscript Structures/templates for RC (not a package) CSL
  12. Let's work? 5 Workout Sessions (45 min each) 1) Introduction

    2) RStudio project + GitHub 3) The research compendium 4) Manuscript in RMarkdown 5) Packages for automating tasks & RStudio Cloud
  13. RStudio and his friend Git 1. Download and install Git

    (GitHub Desktop) 2. Download and install RStudio 3. Go to GitHub and create a new repository Let's practice
  14. RStudio and his friend Git 1. Create a new RStudio

    Project from Git 2. Add your repository URL Your turn
  15. Let's explore a research compendium Short RC webpage 1. Fork

    the repository (short webpage compendium) a. https:/ /github.com/emdelponte/RC-example-webpage 2. Explore the files 3. Reproduce the analysis 4. Knit to generate the webpage 5. Commit and push to your GH
  16. GitHub and Open Science Framework 1. Link GitHub and Open

    Science Framework 2. Create an OSF Project
  17. 1. Link GitHub and Open Science Framework 2. Name your

    project 3. Enable GitHub Add-on GitHub and Open Science Framework
  18. 1. Link GitHub and Open Science Framework 2. Name your

    project 3. Select GitHub Add-on 4. Configure Add-on GitHub and Open Science Framework
  19. Let's practice: Data management 1. Create/Open an RMarkdown file a.

    Modify the output parameters 2. Load data a. .csv file b. .xlsx file c. .gsheet file 3. Add some basic commenting 4. Do some basic wrangling 5. Export data to .csv
  20. Fork or download a Repository 1. Fork the RC as

    webpage template a. https://github.com/emdelponte/RC-example-webpage 2. Download the files from GitHub 3. Start with a new RStudio Project + Git 4. Reproduce the analysis 5. Change some content/parameters 6. Push it to your GH account
  21. Let's practice 1. Fork the RC as website template a.

    https:/ /github.com/emdelponte/RC-example-website 2. Change and/or generate the website (knit) 3. Push changes to your GitHub 4. Create a GH webpage for it 5. Send it to OSF project
  22. RMarkdown templates 1. RMarkdown html from the basic RStudio templates

    2. Rmdformats: https:/ /github.com/juba/rmdformats 3. Distill for RMarkdown: https:/ /github.com/rstudio/distill 4. RMarkdown websites: https:/ /rmarkdown.rstudio.com/lesson-13.html
  23. YML header Manuscript in RMarkdown? --- title: "The title goes

    here" author: "Author name goes here" output: html_document: default word_document: reference_docx: template.docx linestretch: 2 link-citations: yes linkcolor: blue csl: chicago-author-date.csl bibliography: crossref.bib --- BIB CSL https://www.zotero.org/styles
  24. Automating RC website creation Workflowr package Organized • Provides a

    project template with organized subdirectories • Mixes code and results with R Markdown • Uses Git to version both source code and results Reproducible • Displays the code version used to create each result • Runs each analysis in an isolated R session ◦ Records the session information of each analysis ◦ Sets the same seed for random number generation for each analysis Shareable ◦ Creates a website to present your research results ◦ Documents how to host your website for free via GitHub Pages or GitLab Pages ◦ Creates links to past versions of results https://jdblischak.github.io/workflowr
  25. rrtools, pkg to facilitate creation of RC as a pkg!

    https:/ /rstudio.cloud/project/424109
  26. remotes::install_github("karthik/holepunch") library(holepunch) write_compendium_description(package = "RC template ", description = "A

    template for a research compendium") write_dockerfile(maintainer = "Your name") generate_badge() # This generates a badge for your readme. # At this time push the code to GitHub # And click on the badge or use the function below to get the build # ready ahead of time. build_binder() Let's make it reproducible for future?
  27. https:/ /github.com/emdelponte/RC-template RC was a website template Live examples: https://emdelponte.github.io/paper-FHB-Brazil-meta-analysis/

    https://emdelponte.github.io/paper-fungicides-whitemold/ https://mladencucak.github.io/AnalysisPLBIreland/index.html