Slide 1

Slide 1 text

Moving to Quarto from RMarkdown and Python Jupyter Notebooks NYR Conference 2023 Daniel Chen 1 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 2

Slide 2 text

Hello 3 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 3

Slide 3 text

Munsee Lenape 4 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 4

Slide 4 text

Daniel Chen @chendaniely Postdoctoral Research and Teaching Fellow UBC, MDS-Vancouver Data Science Educator, Posit, PBC Author,     The Carpentries Pandas for Everyone 5 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 5

Slide 5 text

Literate Programming 7 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 6

Slide 6 text

Why Literate Programming? Data Scientist RMarkdown + Jupyter Notebooks Analysis Reports + Documentation Academic Papers Technical Writer Blog Website Presentation Book 8 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 7

Slide 7 text

RMarkdown 9 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 8

Slide 8 text

Code Chunks ```{r} 1 cmv <- read_excel("data/cmv.xlsx") 2 head(cmv) 3 ``` 4 10 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 9

Slide 9 text

RMarkdown Document --- 1 title: "example-analysis" 2 author: "Daniel Chen" 3 output: html_document 4 --- 5 ```{r setup, include=FALSE} 6 library(tidyverse) 7 library(readxl) 8 library(writexl) 9 ``` 10 ## Load Data 11 Retrospective Cohort Study of the Effects of 12 Donor KIR genotype on the reactivation of cytomegalovirus (CMV) 13 after myeloablative allogeneic hematopoietic stem cell transplant. 14 ```{r} 15 cmv <- read_excel("data/cmv.xlsx") 16 head(cmv) 17 ``` 18 11 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 10

Slide 10 text

Render .Rmd with {rmarkdown} Demo file: example-analysis.Rmd Render Command: Specify output file (and location): Rscript -e "rmarkdown::render('example-analysis.Rmd')" 1 Rscript -e "rmarkdown::render( 1 input = 'example-analysis.Rmd', 2 output_file = 'output/010-example-analysis-rmd.html')" 3 12 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 11

Slide 11 text

Render .Rmd with quarto Demo file: example-analysis.Rmd Render Command: Specify output file: quarto is command line tool! quarto render example-analysis.Rmd 1 # output folders only work with quarto projects 1 touch _quarto.yml 2 3 quarto render example-analysis.Rmd \ 4 --toc \ 5 --output output/020-example-analysis-rmd-qmd.html 6 13 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 12

Slide 12 text

Caveat: Single Quarto Document Output directory Github Discussion Pre and Post Render https://github.com/quarto-dev/quarto-cli/discussions/2171 https://quarto.org/docs/projects/scripts.html#pre-and-post-render 14 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 13

Slide 13 text

Project templates DSCI 310: Reproducible and trustworthy workflows for data science: DCR 2018: Structuring Your (Data Science/Analysis) Projects NYR 2019: Building Reproducible and Replicable Projects Tiffany Timbers https://ubc-dsci.github.io/dsci-310-student/ 15 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 14

Slide 14 text

Quarto Plain text source document Literate programming Multiple language support Even in the same document! Multiple output formats Pandoc + Markdown Familiar Quarto Gallery: Quarto Guide: Quarto Reference: https://quarto.org/ https://quarto.org/docs/gallery/ https://quarto.org/docs/guide/ https://quarto.org/docs/reference/ 16 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 15

Slide 15 text

Quarto Documents RMarkdown YAML Quarto YAML RMarkdown and Quarto chunk options: --- 1 title: "Example Analysis" 2 subtitle: "RMarkdown" 3 author: "Daniel Chen" 4 output: html_document 5 --- 6 --- 1 title: "Example Analysis" 2 subtitle: "Quarto" 3 author: "Daniel Chen" 4 format: html 5 --- 6 ```{r setup} 1 #| include: false 2 knitr::opts_chunk$set(echo = TRUE) 3 library(tidyverse) 4 library(readxl) 5 library(writexl) 6 ``` 7 17 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 16

Slide 16 text

Render a Quarto Document Demo file: example-analysis.qmd Render Command: Specify output file: quarto render example-analysis.qmd 1 quarto render example-analysis.Rmd \ 1 --toc \ 2 --output-dir output \ 3 --output 030-example-analysis-rmd-qmd.html 4 18 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 17

Slide 17 text

Jupyter 20 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 18

Slide 18 text

Notebooks Youtube Youtube 21 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 19

Slide 19 text

Daniel’s List Technical Writing ✅ Literate programming ❌ Editing JSON Data Science More an output format than a source document ✅ Great for posting code+output (e.g. a workshop) ❌ Not great for source control collaborative document Teaching ✅ nbgrader for course assignment creation + grading ✅ Restart Kernel > Run All 22 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 20

Slide 20 text

Jupyter Notebooks are JSON { 1 "cells": [ 2 { 3 "cell_type": "code", 4 "execution_count": 1, 5 "id": "4a9a7246-de20-4aac-945a-b8f0e7db0ac6", 6 "metadata": {}, 7 "outputs": [], 8 "source": [ 9 "import pandas as pd\n", 10 "import plotnine as p9\n", 11 "from plotnine import ggplot, aes, geom_histogram\n", 12 "import statsmodels.formula.api as smf" 13 ] 14 }, 15 { 16 "cell_type": "markdown", 17 "id": "8f8205a7-a172-492a-bb22-e24bc1fc7ce2", 18 "metadata": {} 19 23 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 21

Slide 21 text

Need Something to View + Render . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 22

Slide 22 text

VSCode Jupyter Lab 24 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 23

Slide 23 text

Jupyter does R! You need the IRKernel installed: https://github.com/IRkernel/IRkernel install.packages('IRkernel') 1 IRkernel::installspec() 2 25 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 24

Slide 24 text

Render .ipynb with nbconvert Demo files: example-analysis-python.ipynb example-analysis-r.ipynb Python Kernel: R Kernel: (Hint: they’re the same command) jupyter nbconvert \ 1 --to html \ 2 --output output/040-example-analysis-python-jupyter.html \ 3 --execute example-analysis-python.ipynb 4 jupyter nbconvert \ 1 --to html \ 2 --output output/050-example-analysis-r-jupyter.html \ 3 --execute example-analysis-r.ipynb 4 26 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 25

Slide 25 text

Jupyter Notebook as a Source Document To make your version control diffing easier, you may want to clear the output from the notebook JSON file. In nbconvert 6.0+, you can use--clear-output --inplace: Or use the --to notebook argument if you want to preserve a rendered notebook jupyter nbconvert --clear-output --inplace example-analysis-python.ipynb 1 jupyter nbconvert --clear-output --inplace example-analysis-r.ipynb 2 27 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 26

Slide 26 text

Render .ipynb with quarto Takes whatever is in the notebook (no additional execution) and rendered (to html by default) Use --execute to execute the cells and render quarto render example-analysis-python.ipynb 1 quarto render example-analysis-r.ipynb 2 quarto render example-analysis-python.ipynb --execute 1 quarto render example-analysis-r.ipynb --execute 2 28 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 27

Slide 27 text

Render .ipynb with quarto Python Kernel: R Kernel: quarto render example-analysis-python.ipynb \ 1 --to html \ 2 --execute \ 3 --toc \ 4 --output-dir output \ 5 --output 060-example-analysis-python-ipynb.html 6 quarto render example-analysis-r.ipynb \ 1 --to html \ 2 --execute \ 3 --toc \ 4 --output-dir output \ 5 --output 060-example-analysis-r-ipynb.html 6 29 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 28

Slide 28 text

Embed Jupyter output in Quarto From a Jupyter notebook with code output: Demo files: example-analysis-python-qmd_meta.ipynb example-analysis-python-qmd_meta.qmd Using a notebook with existing output: You can add quarto #| metadata comments to a cell, and use jupyter output directly in a quarto document jupyter nbconvert \ 1 --to notebook \ 2 --execute \ 3 --inplace \ 4 example-analysis-python-qmd_meta.ipynb 5 30 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 29

Slide 29 text

Embed Jupyter output in Quarto Use a quarto shortcode: Render the example: #| label: fig-age_hist 1 #| fig-cap: > 2 #| A histogram of the ages in our Cytomegalovirus dataset 3 ggplot(cmv_tidy, aes(x="age")) + geom_histogram() 4 {{< embed example-analysis-python-qmd_meta.ipynb#fig-age_hist >}} 1 quarto render example-analysis-python-qmd_meta.qmd \ 1 --to html \ 2 --output-dir output \ 3 --output 080-example-analysis-python-qmd_meta.html 4 https://quarto.org/docs/authoring/notebook-embed.html 31 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 30

Slide 30 text

Converting 33 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 31

Slide 31 text

jupytext Rmd -> qmd ipynb -> qmd https://jupytext.readthedocs.io/ jupytext \ 1 --to qmd \ 2 --output output/090-convert-rmd_qmd.qmd \ 3 example-analysis.Rmd 4 jupytext \ 1 --to qmd \ 2 --output output/100-convert-ipynb_qmd.qmd \ 3 example-analysis-python.ipynb 4 34 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 32

Slide 32 text

quarto convert quarto convert example-analysis-python.ipynb \ 1 --output output/120-convert-ipynb_qmd.qmd 2 35 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 33

Slide 33 text

Publication 37 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 34

Slide 34 text

Publish your files quarto publish # Publish Project (ask provider) 1 quarto pubish talk.qmd # Publish document (ask provider) 2 3 quarto publish quarto-pub # Quarto.pub 4 5 quarto publish gh-pages # GitHub Pages 6 quarto publish netlify # Netlify 7 8 quarto publish connect # RStudio Connect 9 quarto publish confluence # Confluence 10 https://quartopub.com/ 38 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 35

Slide 35 text

Thanks! 40 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 36

Slide 36 text

Thanks @chendaniely github.com/chendaniely/rstatsnyc-2023-quarto chendaniely.quarto.pub/rstatsnyc-rmd-jupyter-quarto/ 41 . @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto

Slide 37

Slide 37 text

. @chendaniely. Repo/Slides: Daniel Chen https://github.com/chendaniely/rstatsnyc-2023-quarto