Welcome to the World of Single-Cell RNA-Sequencing Stephanie Hicks Dana-Farber Cancer Institute / Harvard SPH @stephaniehicks Spring 2017 Single-Cell Sequencing Nanocourse March 7, 2017 If you want to follow along, my slides & code are available here: https://github.com/stephaniehicks/singlecellnano2017 1
Game plan: 2 • scRNA-Seq versus bulk RNA-Seq? • Technologies used to sequence single-cells • ApplicaDons of scRNA-Seq data • Biological versus technical variability • Raw, noisy data à clean data? (e.g. quality control, normalizaDon) • Intro to experimental design (from the staDsDcal perspecDve) • How batch effects can occur in single-cell RNA-Seq data • A case study using R/Bioconductor
Common types scRNA-Seq data Adapted from Kolodziejczyk et al. (2015). Molecular Cell 58 Heterogeneous cell populaDons Purified cell populaDons Ba]le droids Super ba]le droids! Mixed bag of R-Series droids 7
Common applicaDons using scRNA-Seq data Adapted from Kolodziejczyk et al. (2015). Molecular Cell 58 CharacterizaDon of cell type populaDons IdenDfy cell type populaDons (e.g. dim reducDon or clustering) DifferenDal splicing between populaDons IdenDfy allele-specific expression IdenDfy genes that drive a process across Dme 8
9 Variability in scRNA-Seq data Kolodziejczyk et al. (2015). Molecular Cell 58 Signal Noise overdispersion, batch effects, sequencing depth, gc bias, amplificaDon bias Extrinsic noise (regulaDon by transcripDon factors) vs IntrinsIc noise (stochasDc bursDng/firing, cell cycle) capture efficiency (starDng amount of mRNA) Variability visible in bulk RNA-Seq AddiDonal variability in scRNA-Seq data (and var from bulk)
Going from “raw” data to “clean” data Taken from Davis McCarthy’s Slides at Genome InformaIcs 2016 h]ps://speakerdeck.com/davismcc/what-do-we-need- computaDonally-to-make-the-most-of-single-cell-rna-seq-data 10
Quality Control Adapted from Stegle et al. (2015) Nature Reviews Gene8cs 16: 133-145 Lun et al. (2016) F1000 Cell-level quality control Gene-level quality control 11
NormalizaDon • Without Spike-ins or UMIs – Between-sample normalizaDon methods • Global scaling factors mostly developed for bulk RNA-Seq • Number of zeros (see Lun et al., 2016. Genome Biology) • With Spike-ins or UMIs – Spike-ins: theoreDcally a good idea, but many challenges sDll remain for scRNA-Seq (see Stegle et al., 2015, Tung et al., 2016); ConflicDng view points on if ERCCs are appropriate – UMIs: Reduces amplificaDon bias, not appropriate for isoform or allele-specific expression • Biological (nuisance?) variability – differences among cells in cell-cycle stage or cell size 13
Use FASTQ header as a surrogate for batch http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/ swSEQ_mCA_FASTQFiles.htm 22
Bad news: Good news: Batch effects and methods to correct for batch effects have been around for many years (lots of places to start). Bad news: Poor experimental design is a big limiDng factor. …. also, more complicated because of sparsity (biology and technology), capture efficiency, etc Good news: Increase awareness about good experimental design. New methods specific for scRNA-Seq are being developed. Batch effects can be a big problem in scRNA-Seq data (but not always). 28
DemonstraDon on how to correct for batch effects in an unconfounded study design Data from Tung et al. (2017) Scien8fic Reports Complete analysis in R Markdown on GitHub here: h]ps://github.com/stephaniehicks/singlecellnano2017 30