Welcome to the World of Single-Cell RNA-Sequencing

Welcome to the World of Single-Cell RNA-Sequencing

Presentation at the Single Cell Nanocourse in Spring 2017 at Harvard Medical School (https://nanosandothercourses.hms.harvard.edu/node/420)

68c6191fa302627da003b9ac1eaba4b5?s=128

Stephanie Hicks

March 07, 2017
Tweet

Transcript

  1. Welcome to the World of Single-Cell RNA-Sequencing Stephanie Hicks Dana-Farber

    Cancer Institute / Harvard SPH @stephaniehicks Spring 2017 Single-Cell Sequencing Nanocourse March 7, 2017 If you want to follow along, my slides & code are available here: https://github.com/stephaniehicks/singlecellnano2017 1
  2. Game plan: 2 •  scRNA-Seq versus bulk RNA-Seq? •  Technologies

    used to sequence single-cells •  ApplicaDons of scRNA-Seq data •  Biological versus technical variability •  Raw, noisy data à clean data? (e.g. quality control, normalizaDon) •  Intro to experimental design (from the staDsDcal perspecDve) •  How batch effects can occur in single-cell RNA-Seq data •  A case study using R/Bioconductor
  3. Single-cell RNA-Seq (scRNA-Seq) Cell 1 Cell 2 … Gene 1

    18 0 Gene 2 1010 506 Gene 3 0 49 Gene 4 22 0 … Read Counts Gene 1 Compare gene expression profiles of single cells Tissue (e.g. tumor) Isolate and sequence individual cells Cells Genes Principal Component 2 Principal Component 1 Cell 1 3
  4. scRNA-Seq vs bulk RNA-Seq Korthauer et al. (2016). Genome Biology

    4
  5. Kolodziejczyk et al. (2015). Molecular Cell 58 5

  6. Kolodziejczyk et al. (2015). Molecular Cell 58 6

  7. Common types scRNA-Seq data Adapted from Kolodziejczyk et al. (2015).

    Molecular Cell 58 Heterogeneous cell populaDons Purified cell populaDons Ba]le droids Super ba]le droids! Mixed bag of R-Series droids 7
  8. Common applicaDons using scRNA-Seq data Adapted from Kolodziejczyk et al.

    (2015). Molecular Cell 58 CharacterizaDon of cell type populaDons IdenDfy cell type populaDons (e.g. dim reducDon or clustering) DifferenDal splicing between populaDons IdenDfy allele-specific expression IdenDfy genes that drive a process across Dme 8
  9. 9 Variability in scRNA-Seq data Kolodziejczyk et al. (2015). Molecular

    Cell 58 Signal Noise overdispersion, batch effects, sequencing depth, gc bias, amplificaDon bias Extrinsic noise (regulaDon by transcripDon factors) vs IntrinsIc noise (stochasDc bursDng/firing, cell cycle) capture efficiency (starDng amount of mRNA) Variability visible in bulk RNA-Seq AddiDonal variability in scRNA-Seq data (and var from bulk)
  10. Going from “raw” data to “clean” data Taken from Davis

    McCarthy’s Slides at Genome InformaIcs 2016 h]ps://speakerdeck.com/davismcc/what-do-we-need- computaDonally-to-make-the-most-of-single-cell-rna-seq-data 10
  11. Quality Control Adapted from Stegle et al. (2015) Nature Reviews

    Gene8cs 16: 133-145 Lun et al. (2016) F1000 Cell-level quality control Gene-level quality control 11
  12. So… what about normalizaIon and dealing with other technical variaIon

    in scRNA-Seq data? Much to learn, you sDll have …. 12
  13. NormalizaDon •  Without Spike-ins or UMIs – Between-sample normalizaDon methods • 

    Global scaling factors mostly developed for bulk RNA-Seq •  Number of zeros (see Lun et al., 2016. Genome Biology) •  With Spike-ins or UMIs –  Spike-ins: theoreDcally a good idea, but many challenges sDll remain for scRNA-Seq (see Stegle et al., 2015, Tung et al., 2016); ConflicDng view points on if ERCCs are appropriate –  UMIs: Reduces amplificaDon bias, not appropriate for isoform or allele-specific expression •  Biological (nuisance?) variability –  differences among cells in cell-cycle stage or cell size 13
  14. “Hey, someone told me of this thing called batch effects….

    Should I be worried about them in my scRNA-Seq data?” 14
  15. Patel et al. (2014) Science Cells cluster by tumor 15

  16. Verhaak et al. (2010). Cancer Cell 16

  17. Leek et al. (2010) Nat Reviews Genetics Batch effects in

    genomics data 17
  18. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition Confounded study design Batch effects in single-cell RNA-Seq data 18
  19. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data 19
  20. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Proposed by Tung et al. (2017) Scien8fic Reports 20
  21. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Implemented by Tung et al. (2017) Scien8fic Reports 21
  22. Use FASTQ header as a surrogate for batch http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/ swSEQ_mCA_FASTQFiles.htm

    22
  23. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* Hicks et al. (2015) bioRxiv 23
  24. Cells cluster by tumor Hicks et al. (2015) bioRxiv 24

  25. Cells cluster by batch (tumors are confounded with batch) Hicks

    et al. (2015) bioRxiv 25
  26. Hicks et al. (2015) bioRxiv 26

  27. Different batches have different detecDon rates Hicks et al. (2015)

    bioRxiv 27
  28. Bad news: Good news: Batch effects and methods to correct

    for batch effects have been around for many years (lots of places to start). Bad news: Poor experimental design is a big limiDng factor. …. also, more complicated because of sparsity (biology and technology), capture efficiency, etc Good news: Increase awareness about good experimental design. New methods specific for scRNA-Seq are being developed. Batch effects can be a big problem in scRNA-Seq data (but not always). 28
  29. Batch CorrecDon for scRNA-Seq data •  Methods for microarrays or

    bulk RNA-Seq – linear mixed models (requires technical replicates) – ComBat •  Methods for scRNA-Seq: – DifferenDal expression (just a few): •  SCDE, MAST, scDD, BASiCs, M3Drop – More generalized •  Scone, scater 29
  30. DemonstraDon on how to correct for batch effects in an

    unconfounded study design Data from Tung et al. (2017) Scien8fic Reports Complete analysis in R Markdown on GitHub here: h]ps://github.com/stephaniehicks/singlecellnano2017 30
  31. Plot cells along first two Principal Components −5 0 5

    −5 0 5 Component 1: 3% variance Component 2: 2% variance replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 31 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  32. t-SNE How to use t-SNE effectively http://distill.pub/2016/misread-tsne/ −30 −20 −10

    0 10 20 30 −30 −20 −10 0 10 20 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 2 −20 −10 0 10 20 −10 0 10 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 5 −2.5 0.0 2.5 −2 0 2 4 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity = 25 −2 −1 0 1 2 3 −2 0 2 Dimension 1 Dimension 2 replicate r1 r2 r3 total_features 7000 7500 8000 8500 9000 9500 Perplexity (default) 32 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  33. PCA (post-batch correcDon) −5 0 5 −5 0 5 Component

    1: 3% variance Component 2: 2% variance total_features 7000 7500 8000 8500 9000 9500 replicate r1 r2 r3 PCA − no normalisation −10 −5 0 5 −10 0 10 Component 1: 2% variance Component 2: 1% variance total_features 7000 7500 8000 8500 9000 9500 replicate r1 r2 r3 PCA − after batch correction and normalization 33 Complete analysis in R Markdown on GitHub here: https://github.com/stephaniehicks/singlecellnano2017
  34. 34

  35. Support from NIH R01 grants GM083084, RR021967/GM103552, HG005220 NIH K99/R00

    grant HG009007 Rafael Irizarry 35 QuesDons?