RNAseq: A five course meal

RNAseq: A five course meal

These slides are from a talk I gave as part of a two-week next-generation sequencing analysis workshop (https://angus.readthedocs.io/en/2019/). In this talk, I provide a framework for understanding RNA-seq workflows that is based on the R for Data Science workflow of "import, tidy, transform, visualize, model, and communicate" mantra (https://r4ds.had.co.nz/). I also provide strategies for "healthy" workflows by making analogies to a five-course meal.

359f7070cb587948e7da4e1028f5fc41?s=128

Rayna M Harris

July 05, 2019
Tweet

Transcript

  1. RNAseq: A five course meal #DIBSI2019 Rayna M. Harris @raynamharris

    1
  2. Rayna M. Harris Postdoctoral scholar at UC Davis Scientist, Educator,

    Community Builder, Translator 2
  3. Special thanks to the Data Intensive Biology Lab and the

    Birds, Brain, & Banter Lab http://calisilab.ucdavis.edu/ http://ivory.idyll.org/lab/ 3
  4. I learned R and RNAseq in communities of practice The

    University of Texas at Austin, University California, Davis, Data Carpentry, Software Carpentry, The Carpentries-es, @cienciaPR @RLadiesGlobal @RLadiesBA @r4ds_es #DISBI2018 4
  5. Who reads books? 5

  6. Read R for Data Science 6

  7. R 4 Data Science workflow https://r4ds.had.co.nz/ 7

  8. Data science as a five course meal 8

  9. Data snacks prevent “hanger” Appetizer 9

  10. Data snacks and source code `library()` `source()` `data()` Substantial, complex,

    noteworthy, valuable Provides insights into what follows 10
  11. Soup & salad, tidy then transform Salad Soup 11

  12. Data wrangling Appetizer Salad Soup Wrangle 12

  13. Data wrangling Simultaneously fun and painful Learn from your mistakes

    Write good documentation Write tests 13
  14. The tidyverse and scientific python are your data wrangling friends

    14
  15. The DESeq2 salad http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html 15

  16. DESeq2 colData and counts https://biojupies.cloud/notebook/AHnOIzJEq 16

  17. ncol(counts) == nrow(colData) https://biojupies.cloud/notebook/AHnOIzJEq 17

  18. Summarized experiment soup http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis 18

  19. “You can’t have any pudding if you don’t eat yer

    meat. How can you have any pudding if you don’t eat yer meat?” Pink Floyd Dessert 19
  20. “Communication breakdown, it’s always the same I’m having a nervous

    breakdown, drive me insane.” Led Zeppelin Nuts 20
  21. Procrastigraphing: excessive creation of data visualizations Communicate Nuts Dessert 21

  22. Modeling is challenging. So many choices. Main course 22

  23. If you were at a banquet, would you order? A.

    Lobster B. Steak C. Lobster and steak D. Other E. Lobster, steak, and other 23
  24. If you were an RNAseq workflow, would you use: A.

    R B. Python C. R and Python D. Other E. An R/Python/other mix 24
  25. FlavoRs of differential gene expression models library("DESeq2") dds <- DESeqDataSetFromMatrix(countData

    = cts, colData = coldata, design = ~ condition) results(dds, contrast=c("condition","B","A")) http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis 25
  26. Strategies for a well-balanced RNAseq analysis R and/or Python or

    other library() data() counts dds colData Rmarkdown Jupyter GitHub pages ~ condition Wrangle Explore ggplot() matplotlib Communicate 26
  27. BioJupies for instant RNA-seq visualizations in the cloud https://biojupies.cloud/notebook/AHnOIzJEq https://twitter.com/i/status/1062745599538282498

    27
  28. https://biojupies.cloud/notebook/AHnOIzJEq Quickly assess sample quality and variation 28

  29. https://biojupies.cloud/notebook/AHnOIzJEq Quickly view results of gene ontology enrichment analyses 29

  30. https://biojupies.cloud/notebook/AHnOIzJEq Quickly explore patterns of differential gene expression 30

  31. https://biojupies.cloud/notebook/AHnOIzJEq But how many dots are red and blue? ?

    ? 31
  32. https://biojupies.cloud/notebook/AHnOIzJEq But, what if I want to view many volcano

    plots? 32
  33. Convert .Rmd files to GitHub pages to communicate results https://macmanes-lab.github.io/austinCORT/

    33
  34. Open source options for programming 34

  35. My progress: from a novice with data and tools https://github.com/raynamharris/DissociationTest

    Harris, Kao, Alarcón, Hofmann, Fenton 2017 https://www.biorxiv.org/content/10.1101/153585v1 35
  36. To a practitioner with reproducible workflows https://github.com/raynamharris/DissociationTest Harris, Kao, Alarcón,

    Hofmann, Fenton 2019 https://onlinelibrary.wiley.com/doi/10.1002/hipo.23095 36
  37. Onward to experiments with many factors https://github.com/macmanes-lab/DoveParentsRNAseq/ Made for a

    poster at the Society for Behavioral Neuroendocrinology 37
  38. Identify time-like changes in specific genes https://github.com/macmanes-lab/DoveParentsRNAseq/ 38

  39. Find meaningful principle components of variation https://github.com/macmanes-lab/DoveParentsRNAseq/ 39

  40. Create functions to run all pairwise comparisons `contrast = c(“treatment”,

    “varB”, “varA”)` https://github.com/macmanes-lab/DoveParentsRNAseq/ 40
  41. The risk of unintentional p-hacking http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf 41

  42. The risk of unintentional p-hacking http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf 42

  43. Exploratory analyses vs. model testing Explore 43

  44. 60% 20% 20% training set query set testing set Should

    we be analyzing all the data all the time? 44
  45. “Soup to nuts: setting up a new RNAseq analysis”Titus 45

  46. “A five course meal is an excellent analogy!”Rayna Appetizer Soup

    & Salad Dessert Nuts Main course :) 46
  47. Framework for understanding RNAseq workflows R and/or Python or other

    library() data() counts dds colData ggplot() matplotlib Rmarkdown Jupyter GitHub pages ~ condition 47
  48. Strategies for creating healthy RNAseq workflows R and/or Python or

    other library() data() counts dds colData ggplot() matplotlib Rmarkdown Jupyter GitHub pages ~ condition Communicate Wrangle Explore Model 48
  49. What questions do you have? @raynamharris 49