Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing reports in RStudio: an introduction to ...

Michael Harper
October 25, 2017

Writing reports in RStudio: an introduction to RMarkdown

RMarkdown is a powerful tool for producing documents with embedded R code, allowing you to easily reproduce analysis and greatly simplify your data-driven workflow. It can be used to generate dynamic reports, presentations, journal papers, and even your thesis!

Michael Harper

October 25, 2017
Tweet

More Decks by Michael Harper

Other Decks in Education

Transcript

  1. Overview • The benefits of RMarkdown and reproducible research •

    The syntax for writing an RMarkdown document • Highlight some examples of using RMarkdown • Explain the "sotonthesis" R package: a template for Southampton University thesis • Provide resources for further reading to master RMarkdown 2
  2. What is RMarkdown? • A combination of R and Markdown

    (a simple markup language) • Save and execute code within the report • Generate high quality reports directly from the analysis
  3. What is Reproducible Research? Reproducible research is the idea that

    data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them
  4. Other visual editor programs Non-Reproducible Workflow 5 Figure 1 Results

    Table Script.R Excel to run analysis Dataset1.csv Dataset2.csv Report × Point and click × Often rely on manual processes × Datasets may be updated × Analysis changes × Figure Layouts need update Managing analysis can become difficult! Manually Add Save figures Challenges
  5. Reproducible Workflow with RMarkdown 6 Dataset1.csv Dataset2.csv Report & Analysis

    Compile Single file for code and analysis Latest figures and results included within report Benefits
  6. Why Bother? BENEFITS FOR YOU • Ease of updating work

    • Efficiency • Flexibility BENEFIT FOR OTHERS • Reproducibility of analysis • Transparency • Trust in work However, there are substantial technical and cultural limitations See http://benmarwick.github.io/CSSS-Primer-Reproducible-Research/ for some more information
  7. RMarkdown Setup • Recommended to use Rstudio. Download direct from

    website as the university version is out of date • Builds upon a number of packages knitr and rmarkdown. 9 # Run this within R install.packages("rmarkdown") • This should install all the required dependencies within R • To make PDF reports you will need to have LaTeX installed
  8. Rmarkdown File Components 10 YAML header surrounded by --- R

    code chunks surrounded by ``` Text mixed with simple text formatting
  9. YAML header • Act as the document template settings •

    “output” determines which file type will be built from your .Rmd file • Can customise the font, table of contents, page size within YAML: http://rmarkdown.rstudio.com/pdf_document_format.html 11 --- title: “Untitled” author: “Anonymous” output: pdf_document ----
  10. Code Chunks • When you render your .Rmd file, R

    Markdown will run each code chunk and embed the results beneath the code chunk in your final report 12 ```{r} # Insert any R code plot(cars) ```
  11. Code Chunks (2) 14 •include = FALSE prevents code and

    results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks. •echo = FALSE prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures. •message = FALSE prevents messages that are generated by code from appearing in the finished file. •warning = FALSE prevents warnings that are generated by code from appearing in the finished. •fig.cap = "..." adds a caption to graphical results. ```{r cars, fig.cap = “A scatter diagram of the distance required for a vehicle to stop”} plot(cars) ``` • The output of code chunks can be controlled by settings • Some common settings include • Full list of options available here https://yihui.name/knitr/options/
  12. Markdown Document elements: 15 •Headers •Lists •Links •Images •Block quotes

    •Latex equations •Horizontal rules •Tables •Footnotes •Bibliographies and Citations •Slide breaks •Italicized text •Bold text •Superscripts •Subscripts •Strikethrough text Read more: http://rmarkdown.rstudio.com/authoring_pandoc_markdown.html Markdown is designed to be easy to write and easy to read
  13. Bibliographies and Citations • References stored within a .bib file

    and called within YAML 18 Blah blah [@doe99]. • Citations go inside square brackets and are separated by semicolons. Each citation must have a key, composed of ‘@’ + the citation identifier from the database, and may optionally have a prefix, a locator, and a suffix. Here are some examples: http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html • Reference managers (Mendeley) can easily export bib files. (goo.gl/VXj8pA)
  14. Markdown Example MARKDOWN ## R Markdown This is an R

    Markdown document [@Reference2000]. **Bold text** is great, and *italics* are also useful. The following list shows: 1. Item 1 2. Item 2 3. Item 3 LATEX \subsection{R Markdown} This is an R Markdown document \cite{Reference2000}. \textbf{Bold text} is great, and \emph{italics} are also useful. The following list shows : \begin{enumerate} \item Item 1 \item Item 2 \item Item 3 \end{enumerate}
  15. Building File 20 Output format is determined within the “output”

    option in the pandoc settings: • html_document • pdf_document (LaTeX) • word_document • beamer_presentation • ioslides_presentation Runs analysis Builds report
  16. 1. Download data from web • Call data directly from

    web datasource • Some examples:  Twitter data  Website traffic  Weather data  Latest Satellite imagery • Graphs update every time report is compiled. • Some ideas Intermediate
  17. 1. Download data from web Example: Google search terms More

    ideas: http://rstudio-pubs-static.s3.amazonaws.com/155168_d306bcd159da4ff5991c961025dbcb8e.html Intermediate
  18. 2. Create Diagrams • Link analysis with figures • Uses

    the DiagrammeR http://rich- iannone.github.io/DiagrammeR/docs.html 25 Advanced
  19. 3. Customise Layouts • Pandoc uses templates to create the

    output report • These can be altered to create custom templates • variables within YAML will be substituted into the template at $variable$ location • https://github.com/svmiller/svm-r- markdown- templates/blob/master/svm- rmarkdown-article-example.pdf • https://pandoc.org/MANUAL.html 26 Advanced
  20. Further Gallery Ideas Interactive Documents & Web Apps http://rmarkdown.rstudio.co m/gallery.html

    Also check out these user projects: https://yihui.name/knitr/dem o/showcase/
  21. sotonthesis template • Available on GitHub: www.github.com/mikey- harper/sotonthesis • Template

    for RMarkdown. • Builds upon the package bookdown, an extension of RMarkdown designed for long- format reports or books. Benefits  Easy to install  Build your thesis and progress reports within RMarkdown  Advanced YAML customisation allows  Build reports into PDF, Word or HTML  No need to edit LaTeX template  Meet university thesis template guidelines1 1 http://library.soton.ac.uk/thesis/templates
  22. sotonthesis 31 • Document can be split into multiple .Rmd

    files • Reading the bookdown book is essential https://bookdown.org/yihui/bookdown/ Good knowledge of RMarkdown vital to be able to use effectively and without frustration
  23. Cheatsheets •RMarkdown Cheatsheet: A very reference sheet. Print this out

    and have above your desk when you start learning RMarkdown. •RMarkdown Reference: similar to the cheatsheet, but provides more detail surrounding the customisation and documents settings 34
  24. Books 35 Available for free online: https://bookdown.org/yihui /bookdown/ First three

    chapters here: https://github.com/yihui/kni tr-book Google “Reproducible Research with R and Rstudio”
  25. Tips for Mastering RMarkdown 1. Read, Read, Read! 2. Learn

    the basics of RMarkdown before considering more advanced techniques & sotonthesis template 3. An understanding of LaTeX is useful for developing reports. 36
  26. Summary • Takes time to master, but it is worth

    it. • Some advanced ideas presented to highlight the powers of Rmarkdown 37