Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Welcome to the World of Single-Cell RNA-Sequencing

Welcome to the World of Single-Cell RNA-Sequencing

Presentation at the Single Cell Nanocourse in Spring 2017 at Harvard Medical School (https://nanosandothercourses.hms.harvard.edu/node/420)

Stephanie Hicks

March 07, 2017
Tweet

More Decks by Stephanie Hicks

Other Decks in Science

Transcript

  1. Welcome to the World of Single-Cell RNA-Sequencing Stephanie Hicks Dana-Farber

    Cancer Institute / Harvard SPH @stephaniehicks Spring 2017 Single-Cell Sequencing Nanocourse March 7, 2017 If you want to follow along, my slides & code are available here: https://github.com/stephaniehicks/singlecellnano2017 1
  2. Game plan: 2 •  scRNA-Seq versus bulk RNA-Seq? •  Technologies

    used to sequence single-cells •  ApplicaDons of scRNA-Seq data •  Biological versus technical variability •  Raw, noisy data à clean data? (e.g. quality control, normalizaDon) •  Intro to experimental design (from the staDsDcal perspecDve) •  How batch effects can occur in single-cell RNA-Seq data •  A case study using R/Bioconductor
  3. Single-cell RNA-Seq (scRNA-Seq) Cell 1 Cell 2 … Gene 1

    18 0 Gene 2 1010 506 Gene 3 0 49 Gene 4 22 0 … Read Counts Gene 1 Compare gene expression profiles of single cells Tissue (e.g. tumor) Isolate and sequence individual cells Cells Genes Principal Component 2 Principal Component 1 Cell 1 3
  4. Common types scRNA-Seq data Adapted from Kolodziejczyk et al. (2015).

    Molecular Cell 58 Heterogeneous cell populaDons Purified cell populaDons Ba]le droids Super ba]le droids! Mixed bag of R-Series droids 7
  5. Common applicaDons using scRNA-Seq data Adapted from Kolodziejczyk et al.

    (2015). Molecular Cell 58 CharacterizaDon of cell type populaDons IdenDfy cell type populaDons (e.g. dim reducDon or clustering) DifferenDal splicing between populaDons IdenDfy allele-specific expression IdenDfy genes that drive a process across Dme 8
  6. 9 Variability in scRNA-Seq data Kolodziejczyk et al. (2015). Molecular

    Cell 58 Signal Noise overdispersion, batch effects, sequencing depth, gc bias, amplificaDon bias Extrinsic noise (regulaDon by transcripDon factors) vs IntrinsIc noise (stochasDc bursDng/firing, cell cycle) capture efficiency (starDng amount of mRNA) Variability visible in bulk RNA-Seq AddiDonal variability in scRNA-Seq data (and var from bulk)
  7. Going from “raw” data to “clean” data Taken from Davis

    McCarthy’s Slides at Genome InformaIcs 2016 h]ps://speakerdeck.com/davismcc/what-do-we-need- computaDonally-to-make-the-most-of-single-cell-rna-seq-data 10
  8. Quality Control Adapted from Stegle et al. (2015) Nature Reviews

    Gene8cs 16: 133-145 Lun et al. (2016) F1000 Cell-level quality control Gene-level quality control 11
  9. So… what about normalizaIon and dealing with other technical variaIon

    in scRNA-Seq data? Much to learn, you sDll have …. 12
  10. NormalizaDon •  Without Spike-ins or UMIs – Between-sample normalizaDon methods • 

    Global scaling factors mostly developed for bulk RNA-Seq •  Number of zeros (see Lun et al., 2016. Genome Biology) •  With Spike-ins or UMIs –  Spike-ins: theoreDcally a good idea, but many challenges sDll remain for scRNA-Seq (see Stegle et al., 2015, Tung et al., 2016); ConflicDng view points on if ERCCs are appropriate –  UMIs: Reduces amplificaDon bias, not appropriate for isoform or allele-specific expression •  Biological (nuisance?) variability –  differences among cells in cell-cycle stage or cell size 13
  11. “Hey, someone told me of this thing called batch effects….

    Should I be worried about them in my scRNA-Seq data?” 14
  12. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition Confounded study design Batch effects in single-cell RNA-Seq data 18
  13. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data 19
  14. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Proposed by Tung et al. (2017) Scien8fic Reports 20
  15. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Implemented by Tung et al. (2017) Scien8fic Reports 21
  16. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* Hicks et al. (2015) bioRxiv 23
  17. Bad news: Good news: Batch effects and methods to correct

    for batch effects have been around for many years (lots of places to start). Bad news: Poor experimental design is a big limiDng factor. …. also, more complicated because of sparsity (biology and technology), capture efficiency, etc Good news: Increase awareness about good experimental design. New methods specific for scRNA-Seq are being developed. Batch effects can be a big problem in scRNA-Seq data (but not always). 28
  18. Batch CorrecDon for scRNA-Seq data •  Methods for microarrays or

    bulk RNA-Seq – linear mixed models (requires technical replicates) – ComBat •  Methods for scRNA-Seq: – DifferenDal expression (just a few): •  SCDE, MAST, scDD, BASiCs, M3Drop – More generalized •  Scone, scater 29
  19. DemonstraDon on how to correct for batch effects in an

    unconfounded study design Data from Tung et al. (2017) Scien8fic Reports Complete analysis in R Markdown on GitHub here: h]ps://github.com/stephaniehicks/singlecellnano2017 30
  20. Plot cells along first two Principal Components −5 0 5

    −5 0 5 Component 1: 3% variance Component 2: 2% variance replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 31 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  21. t-SNE How to use t-SNE effectively http://distill.pub/2016/misread-tsne/ −30 −20 −10

    0 10 20 30 −30 −20 −10 0 10 20 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 2 −20 −10 0 10 20 −10 0 10 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 5 −2.5 0.0 2.5 −2 0 2 4 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 25 −2 −1 0 1 2 3 −2 0 2 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity (default) 32 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  22. PCA (post-batch correcDon) −5 0 5 −5 0 5 Component

    1: 3% variance Component 2: 2% variance total_features 7000 7500 8000 8500 9000 9500 replicate r1 r2 r3 PCA − no normalisation −10 −5 0 5 −10 0 10 Component 1: 2% variance Component 2: 1% variance total_features 7000 7500 8000 8500 9000 9500 replicate r1 r2 r3 PCA − after batch correction and normalization 33 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  23. 34