Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Get your Research Project/Article Organized, Shareable and Reproducible using R and friends

Reproducible research practices are central for scholarly dissemination and communication. The importance of sharing data and computational code has been highlighted extensively to promote transparency and replication of scientific results. Data and code sharing practices are rare in the plant pathology, but is expect to increase in the near future. It is urgent that plant pathologists get training in how to organize their research in a way that promotes effective project management, reproducibility, collaboration and sharing of results. Standardized procedures for the collection and systematic organization of research outcomes (data files, codes, graphs, manuscript, etc.), known as "research compendium", has been proposed and tools are available to facilitate its production. Participants will learn how to use existing R tools for producing a research compendium for optimizing workflow and enhancing the efficiency of analysis, re-analysis and sharing. Other tools will include introductions to the Open Science Framework, websites for generating Digital Object Identifiers (DOI), how and why to establish pre-print publications, and tools for for making independent and automated analyses.

Emerson M. Del Ponte

August 03, 2019
Tweet

More Decks by Emerson M. Del Ponte

Other Decks in Education

Transcript

  1. Get your Research Project/Article
    Organized, Shareable and Reproducible
    using and friends
    Emerson M. Del Ponte
    Open Plant Pathology
    Universidade Federal de Viçosa
    4-h Workshop

    View Slide

  2. Why Reproducible Research (RR)?
    Data + Protocols + Code
    Knowledge
    Reproducible
    Documenting and Sharing
    Efficiency
    short term effect
    Accessibility
    short and long term effect
    Transparency
    Information
    pen
    Reproducibility
    Research
    Practices
    Replicability

    View Slide

  3. - Sponsors/journals require data (standard in molecular)
    - Work more efficiently and facilitate collaborations
    - Improved reproducibility (data and/or methods)
    - Technology (less cumbersome) is becoming available
    - Enhanced visibility and transparency
    - Multiple citable outcomes: data, code, manuscript, etc.
    Why to change/learn new things?

    View Slide

  4. When RR?
    Idea
    Register
    (proposal)
    Run experiments
    Get data
    Analyze
    Communicate

    View Slide

  5. Idea
    Register
    Preregistration
    Of studies
    University/Institution
    Laboratory computer
    When RR?

    View Slide

  6. Idea
    Register
    Preregistration
    of Studies
    University/Institution
    Laboratory computer
    https://osf.io/6tsnj
    When RR?

    View Slide

  7. Run experiments
    When RR?

    View Slide

  8. Run experiments
    When RR?

    View Slide

  9. Get data
    Submit datasets
    accompanying data
    descriptors to:
    Discipline-specific repositories
    Generalist repositories
    When RR?

    View Slide

  10. Analyze Research Compendium
    When RR?

    View Slide

  11. Communicate
    Abstract
    When RR?
    Preprint
    OA paper
    Paywalled paper
    Quick Files
    Talk Poster

    View Slide

  12. How RR?
    Science
    Collect
    Analyse
    Publish
    Write
    Summarise
    Reproduce
    Re-analyse
    (meta-analysis)
    Share data
    Open Repository
    Share code
    open/free tools
    Collaborative tools
    Citation manager
    Pre-prints
    Open Access

    View Slide

  13. Sparks et al. (unpublished)
    101
    99
    Count
    How are we plant pathologists doing?

    View Slide

  14. Sparks et al. (unpublished)
    No
    Upon request
    Paywalled
    Free access
    Are data made available?

    View Slide

  15. Sparks et al. (unpublished)
    No
    Free access
    Are codes made available?

    View Slide

  16. Sparks et al. (unpublished)
    No
    Name only
    Version #
    Full citation
    Is Software properly cited?

    View Slide

  17. Sparks et al. (unpublished)
    What are the software being used?

    View Slide

  18. - Lack of interest/knowledge (supplemental rarely posted)
    - Low incentive/pressure - that may change!
    - Perception that it takes time and effort
    - Document data and code
    - Versioning code and maintaining
    - FOBS - Fear of being scooped?
    - Not valued/taught in our graduate programs
    Why is it being so slow to adopt RR?

    View Slide

  19. Tools
    Workflows
    Environments
    Collaborative
    & sharing
    platforms
    Ok, I want do it differently, but how?
    Research Project
    Organized
    Documented
    Shared
    Accessible
    Reproducible

    View Slide

  20. Data wrangling - Excel
    Data visualization - Excel
    Data analysis - SAS, STATA
    Scientific plots - SigmaPlot
    Text editor - MSWord
    BIB
    Save money in Software! Use R and Friends

    View Slide

  21. Start small! Then build on it...
    Article
    Article (+ preprint)
    Supplemental (zip)
    - Protocols
    - Data
    Article + preprint
    Repository (citable)
    - Protocols
    - Data
    - Code
    Article + preprint
    Research compendium
    - Raw Data
    - Clean Data
    - Analysis (reproducible)
    Reproducibility
    0
    1
    2
    3
    level

    View Slide

  22. A new research/submission workflow?
    Project
    Data Analysis
    Manuscript
    Preprint
    Journal Submission
    system
    Early view
    Final publication
    Poster/Talk
    Research
    Compendium

    View Slide

  23. http://inundata.org/talks/rstd19/
    https://research-compendium.science/
    How RR?
    Source: https://research-compendium.science/

    View Slide

  24. http://inundata.org/talks/rstd19/
    How RR ?

    View Slide

  25. https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375986 https://peerj.com/preprints/3192/
    How RR?

    View Slide

  26. R package structure as inspiration
    Small
    Medium
    Large

    View Slide

  27. BIB
    Minimal RC
    Short RC
    webpage
    Full RC
    Website + manuscript
    Structures/templates for RC (not a package)
    CSL

    View Slide

  28. Let's work?
    5 Workout Sessions (45 min each)
    1) Introduction
    2) RStudio project + GitHub
    3) The research compendium
    4) Manuscript in RMarkdown
    5) Packages for automating tasks & RStudio Cloud

    View Slide

  29. Everything starts as a PROJECT!

    View Slide

  30. RStudio and his friend Git
    1. Download and install Git (GitHub Desktop)
    2. Download and install RStudio
    3. Go to GitHub and create a new repository
    Let's practice

    View Slide

  31. RStudio and his friend Git
    1. Create a new RStudio Project from Git
    2. Add your repository URL
    Your turn

    View Slide

  32. View Slide

  33. 1. Clone a repository (open in Desktop)
    2. https:/
    /github.com/emdelponte/paper-FHB-yield-loss
    3. Change files
    4. Create a pull request to the owner

    View Slide

  34. Let's explore a research compendium
    Short RC
    webpage
    1. Fork the repository (short webpage compendium)
    a. https:/
    /github.com/emdelponte/RC-example-webpage
    2. Explore the files
    3. Reproduce the analysis
    4. Knit to generate the webpage
    5. Commit and push to your GH

    View Slide

  35. GitHub and Open Science Framework
    1. Link GitHub and Open Science Framework
    2. Create an OSF Project

    View Slide

  36. 1. Link GitHub and Open Science Framework
    2. Name your project
    3. Enable GitHub Add-on
    GitHub and Open Science Framework

    View Slide

  37. 1. Link GitHub and Open Science Framework
    2. Name your project
    3. Select GitHub Add-on
    4. Configure Add-on
    GitHub and Open Science Framework

    View Slide

  38. 1. Finish description
    GitHub and Open Science Framework

    View Slide

  39. Let's practice: Data management
    1. Create/Open an RMarkdown file
    a. Modify the output parameters
    2. Load data
    a. .csv file
    b. .xlsx file
    c. .gsheet file
    3. Add some basic commenting
    4. Do some basic wrangling
    5. Export data to .csv

    View Slide

  40. How RR for data management
    Get data
    Data file types
    Binary
    Text files
    web-based

    View Slide

  41. Data management: must read!

    View Slide

  42. Organizing, naming, shaping!
    Data management
    Broman and Woo (2018)

    View Slide

  43. Analyze

    View Slide

  44. Analyze

    View Slide

  45. Fork or download a Repository
    1. Fork the RC as webpage template
    a. https://github.com/emdelponte/RC-example-webpage
    2. Download the files from GitHub
    3. Start with a new RStudio Project + Git
    4. Reproduce the analysis
    5. Change some content/parameters
    6. Push it to your GH account

    View Slide

  46. Add local folder and create a GitHub Repository

    View Slide

  47. Add local folder and create a GitHub Repository

    View Slide

  48. Let's practice
    1. Fork the RC as website template
    a. https:/
    /github.com/emdelponte/RC-example-website
    2. Change and/or generate the website (knit)
    3. Push changes to your GitHub
    4. Create a GH webpage for it
    5. Send it to OSF project

    View Slide

  49. RMarkdown templates
    1. RMarkdown html from the basic RStudio templates
    2. Rmdformats: https:/
    /github.com/juba/rmdformats
    3. Distill for RMarkdown: https:/
    /github.com/rstudio/distill
    4. RMarkdown websites:
    https:/
    /rmarkdown.rstudio.com/lesson-13.html

    View Slide

  50. YML header
    Manuscript in RMarkdown?
    ---
    title: "The title goes here"
    author: "Author name goes here"
    output:
    html_document: default
    word_document:
    reference_docx: template.docx
    linestretch: 2
    link-citations: yes
    linkcolor: blue
    csl: chicago-author-date.csl
    bibliography: crossref.bib
    ---
    BIB
    CSL
    https://www.zotero.org/styles

    View Slide

  51. Automating RC website creation
    Workflowr package
    Organized
    ● Provides a project template with organized subdirectories
    ● Mixes code and results with R Markdown
    ● Uses Git to version both source code and results
    Reproducible
    ● Displays the code version used to create each result
    ● Runs each analysis in an isolated R session
    ○ Records the session information of each analysis
    ○ Sets the same seed for random number generation for each analysis
    Shareable
    ○ Creates a website to present your research results
    ○ Documents how to host your website for free via GitHub Pages or GitLab Pages
    ○ Creates links to past versions of results
    https://jdblischak.github.io/workflowr

    View Slide

  52. Workflowr pkg
    https:/
    /jdblischak.github.io/workflowr/articles/wflow-01-getting-started.html

    View Slide

  53. rrtools, an R package to create RC as package!
    https://github.com/benmarwick/rrtools

    View Slide

  54. rrtools, pkg to facilitate creation of RC as a pkg!

    View Slide

  55. rrtools, pkg to facilitate creation of RC as a pkg!
    https:/
    /rstudio.cloud/project/424109

    View Slide

  56. Let's make it reproducible for future?

    View Slide

  57. remotes::install_github("karthik/holepunch")
    library(holepunch)
    write_compendium_description(package = "RC template ",
    description = "A template for a research compendium")
    write_dockerfile(maintainer = "Your name")
    generate_badge() # This generates a badge for your readme.
    # At this time push the code to GitHub
    # And click on the badge or use the function below to get the build
    # ready ahead of time.
    build_binder()
    Let's make it reproducible for future?

    View Slide

  58. https:/
    /github.com/emdelponte/RC-template
    RC was a website template
    Live examples:
    https://emdelponte.github.io/paper-FHB-Brazil-meta-analysis/
    https://emdelponte.github.io/paper-fungicides-whitemold/
    https://mladencucak.github.io/AnalysisPLBIreland/index.html

    View Slide

  59. www.openplantpathology.org

    View Slide