Save 37% off PRO during our Black Friday Sale! »

RNAseq: A five course meal

RNAseq: A five course meal

These slides are from a talk I gave as part of a two-week next-generation sequencing analysis workshop ( In this talk, I provide a framework for understanding RNA-seq workflows that is based on the R for Data Science workflow of "import, tidy, transform, visualize, model, and communicate" mantra ( I also provide strategies for "healthy" workflows by making analogies to a five-course meal.


Rayna M Harris

July 05, 2019


  1. RNAseq: A five course meal #DIBSI2019 Rayna M. Harris @raynamharris

  2. Rayna M. Harris Postdoctoral scholar at UC Davis Scientist, Educator,

    Community Builder, Translator 2
  3. Special thanks to the Data Intensive Biology Lab and the

    Birds, Brain, & Banter Lab 3
  4. I learned R and RNAseq in communities of practice The

    University of Texas at Austin, University California, Davis, Data Carpentry, Software Carpentry, The Carpentries-es, @cienciaPR @RLadiesGlobal @RLadiesBA @r4ds_es #DISBI2018 4
  5. Who reads books? 5

  6. Read R for Data Science 6

  7. R 4 Data Science workflow 7

  8. Data science as a five course meal 8

  9. Data snacks prevent “hanger” Appetizer 9

  10. Data snacks and source code `library()` `source()` `data()` Substantial, complex,

    noteworthy, valuable Provides insights into what follows 10
  11. Soup & salad, tidy then transform Salad Soup 11

  12. Data wrangling Appetizer Salad Soup Wrangle 12

  13. Data wrangling Simultaneously fun and painful Learn from your mistakes

    Write good documentation Write tests 13
  14. The tidyverse and scientific python are your data wrangling friends

  15. The DESeq2 salad 15

  16. DESeq2 colData and counts 16

  17. ncol(counts) == nrow(colData) 17

  18. Summarized experiment soup 18

  19. “You can’t have any pudding if you don’t eat yer

    meat. How can you have any pudding if you don’t eat yer meat?” Pink Floyd Dessert 19
  20. “Communication breakdown, it’s always the same I’m having a nervous

    breakdown, drive me insane.” Led Zeppelin Nuts 20
  21. Procrastigraphing: excessive creation of data visualizations Communicate Nuts Dessert 21

  22. Modeling is challenging. So many choices. Main course 22

  23. If you were at a banquet, would you order? A.

    Lobster B. Steak C. Lobster and steak D. Other E. Lobster, steak, and other 23
  24. If you were an RNAseq workflow, would you use: A.

    R B. Python C. R and Python D. Other E. An R/Python/other mix 24
  25. FlavoRs of differential gene expression models library("DESeq2") dds <- DESeqDataSetFromMatrix(countData

    = cts, colData = coldata, design = ~ condition) results(dds, contrast=c("condition","B","A")) 25
  26. Strategies for a well-balanced RNAseq analysis R and/or Python or

    other library() data() counts dds colData Rmarkdown Jupyter GitHub pages ~ condition Wrangle Explore ggplot() matplotlib Communicate 26
  27. BioJupies for instant RNA-seq visualizations in the cloud

  28. Quickly assess sample quality and variation 28

  29. Quickly view results of gene ontology enrichment analyses 29

  30. Quickly explore patterns of differential gene expression 30

  31. But how many dots are red and blue? ?

    ? 31
  32. But, what if I want to view many volcano

    plots? 32
  33. Convert .Rmd files to GitHub pages to communicate results

  34. Open source options for programming 34

  35. My progress: from a novice with data and tools

    Harris, Kao, Alarcón, Hofmann, Fenton 2017 35
  36. To a practitioner with reproducible workflows Harris, Kao, Alarcón,

    Hofmann, Fenton 2019 36
  37. Onward to experiments with many factors Made for a

    poster at the Society for Behavioral Neuroendocrinology 37
  38. Identify time-like changes in specific genes 38

  39. Find meaningful principle components of variation 39

  40. Create functions to run all pairwise comparisons `contrast = c(“treatment”,

    “varB”, “varA”)` 40
  41. The risk of unintentional p-hacking 41

  42. The risk of unintentional p-hacking 42

  43. Exploratory analyses vs. model testing Explore 43

  44. 60% 20% 20% training set query set testing set Should

    we be analyzing all the data all the time? 44
  45. “Soup to nuts: setting up a new RNAseq analysis”Titus 45

  46. “A five course meal is an excellent analogy!”Rayna Appetizer Soup

    & Salad Dessert Nuts Main course :) 46
  47. Framework for understanding RNAseq workflows R and/or Python or other

    library() data() counts dds colData ggplot() matplotlib Rmarkdown Jupyter GitHub pages ~ condition 47
  48. Strategies for creating healthy RNAseq workflows R and/or Python or

    other library() data() counts dds colData ggplot() matplotlib Rmarkdown Jupyter GitHub pages ~ condition Communicate Wrangle Explore Model 48
  49. What questions do you have? @raynamharris 49