Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Welcome to the World of Single-Cell RNA-Sequencing

Welcome to the World of Single-Cell RNA-Sequencing

Presentation at the Single Cell Nanocourse in Spring 2017 at Harvard Medical School (https://nanosandothercourses.hms.harvard.edu/node/420)

Stephanie Hicks

March 07, 2017
Tweet

More Decks by Stephanie Hicks

Other Decks in Science

Transcript

  1. Welcome to the World of
    Single-Cell RNA-Sequencing
    Stephanie Hicks
    Dana-Farber Cancer Institute / Harvard SPH
    @stephaniehicks
    Spring 2017 Single-Cell Sequencing Nanocourse
    March 7, 2017
    If you want to follow along, my slides & code are available here:
    https://github.com/stephaniehicks/singlecellnano2017
    1

    View Slide

  2. Game plan:
    2
    •  scRNA-Seq versus bulk RNA-Seq?
    •  Technologies used to sequence single-cells
    •  ApplicaDons of scRNA-Seq data
    •  Biological versus technical variability
    •  Raw, noisy data à clean data? (e.g. quality control, normalizaDon)
    •  Intro to experimental design (from the staDsDcal perspecDve)
    •  How batch effects can occur in single-cell RNA-Seq data
    •  A case study using R/Bioconductor

    View Slide

  3. Single-cell RNA-Seq (scRNA-Seq)
    Cell 1 Cell 2 …
    Gene 1 18 0
    Gene 2 1010 506
    Gene 3 0 49
    Gene 4 22 0

    Read Counts
    Gene 1
    Compare gene expression
    profiles of single cells
    Tissue (e.g. tumor)
    Isolate and sequence
    individual cells
    Cells
    Genes
    Principal Component 2
    Principal Component 1
    Cell 1
    3

    View Slide

  4. scRNA-Seq vs bulk RNA-Seq
    Korthauer et al. (2016). Genome Biology 4

    View Slide

  5. Kolodziejczyk et al. (2015).
    Molecular Cell 58
    5

    View Slide

  6. Kolodziejczyk et al. (2015). Molecular Cell 58
    6

    View Slide

  7. Common types scRNA-Seq data
    Adapted from Kolodziejczyk et al. (2015). Molecular Cell 58
    Heterogeneous
    cell populaDons
    Purified cell populaDons
    Ba]le droids Super ba]le droids!
    Mixed bag of R-Series droids 7

    View Slide

  8. Common applicaDons using scRNA-Seq data
    Adapted from Kolodziejczyk et al. (2015). Molecular Cell 58
    CharacterizaDon of cell type populaDons
    IdenDfy cell type populaDons
    (e.g. dim reducDon or clustering)
    DifferenDal splicing between populaDons
    IdenDfy allele-specific expression
    IdenDfy genes that drive a process across Dme
    8

    View Slide

  9. 9
    Variability in scRNA-Seq data
    Kolodziejczyk et al. (2015). Molecular Cell 58
    Signal
    Noise
    overdispersion, batch effects, sequencing
    depth, gc bias, amplificaDon bias
    Extrinsic noise (regulaDon by transcripDon factors)
    vs IntrinsIc noise (stochasDc bursDng/firing, cell cycle)
    capture efficiency (starDng amount of mRNA)
    Variability visible in bulk RNA-Seq
    AddiDonal variability in
    scRNA-Seq data (and var from bulk)

    View Slide

  10. Going from “raw” data to “clean” data
    Taken from Davis McCarthy’s Slides at Genome InformaIcs 2016
    h]ps://speakerdeck.com/davismcc/what-do-we-need-
    computaDonally-to-make-the-most-of-single-cell-rna-seq-data 10

    View Slide

  11. Quality Control
    Adapted from Stegle et al. (2015) Nature Reviews Gene8cs 16: 133-145
    Lun et al. (2016) F1000
    Cell-level quality control
    Gene-level
    quality control
    11

    View Slide

  12. So… what about normalizaIon and
    dealing with other technical variaIon
    in scRNA-Seq data?
    Much to learn, you sDll have ….
    12

    View Slide

  13. NormalizaDon
    •  Without Spike-ins or UMIs
    – Between-sample normalizaDon methods
    •  Global scaling factors mostly developed for bulk RNA-Seq
    •  Number of zeros (see Lun et al., 2016. Genome Biology)
    •  With Spike-ins or UMIs
    –  Spike-ins: theoreDcally a good idea, but many challenges sDll
    remain for scRNA-Seq (see Stegle et al., 2015, Tung et al., 2016);
    ConflicDng view points on if ERCCs are appropriate
    –  UMIs: Reduces amplificaDon bias, not appropriate for isoform
    or allele-specific expression
    •  Biological (nuisance?) variability
    –  differences among cells in cell-cycle stage or cell size
    13

    View Slide

  14. “Hey, someone told me of this thing called batch effects….
    Should I be worried about them in my scRNA-Seq data?”
    14

    View Slide

  15. Patel et al. (2014) Science
    Cells cluster by tumor
    15

    View Slide

  16. Verhaak et al. (2010). Cancer Cell
    16

    View Slide

  17. Leek et al. (2010) Nat Reviews Genetics
    Batch effects in genomics data
    17

    View Slide

  18. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    Confounded study design
    Batch effects in
    single-cell RNA-Seq data
    18

    View Slide

  19. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data
    19

    View Slide

  20. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data
    Proposed by
    Tung et al.
    (2017) Scien8fic
    Reports
    20

    View Slide

  21. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data
    Implemented by Tung et al. (2017) Scien8fic Reports 21

    View Slide

  22. Use FASTQ header as a surrogate for batch
    http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/
    swSEQ_mCA_FASTQFiles.htm
    22

    View Slide

  23. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    Hicks et al. (2015) bioRxiv 23

    View Slide

  24. Cells cluster by tumor
    Hicks et al. (2015) bioRxiv 24

    View Slide

  25. Cells cluster by batch
    (tumors are confounded with batch)
    Hicks et al. (2015) bioRxiv 25

    View Slide

  26. Hicks et al. (2015) bioRxiv 26

    View Slide

  27. Different batches have different detecDon rates
    Hicks et al. (2015) bioRxiv 27

    View Slide

  28. Bad news:
    Good news:
    Batch effects and methods to correct for batch effects have
    been around for many years (lots of places to start).
    Bad news:
    Poor experimental design is a big limiDng factor.
    …. also, more complicated because of sparsity (biology and
    technology), capture efficiency, etc
    Good news:
    Increase awareness about good experimental design.
    New methods specific for scRNA-Seq are being developed.
    Batch effects can be a big problem in scRNA-Seq data
    (but not always).
    28

    View Slide

  29. Batch CorrecDon for scRNA-Seq data
    •  Methods for microarrays or bulk RNA-Seq
    – linear mixed models (requires technical replicates)
    – ComBat
    •  Methods for scRNA-Seq:
    – DifferenDal expression (just a few):
    •  SCDE, MAST, scDD, BASiCs, M3Drop
    – More generalized
    •  Scone, scater
    29

    View Slide

  30. DemonstraDon on how to correct for batch
    effects in an unconfounded study design
    Data from Tung et al. (2017) Scien8fic Reports
    Complete analysis in R Markdown on GitHub here:
    h]ps://github.com/stephaniehicks/singlecellnano2017
    30

    View Slide

  31. Plot cells along first two
    Principal Components
    −5
    0
    5
    −5 0 5
    Component 1: 3% variance
    Component 2: 2% variance
    replicate
    r1
    r2
    r3
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    31
    Complete analysis in R Markdown on GitHub here:
    https://github.com/stephaniehicks/singlecellnano2017

    View Slide

  32. t-SNE
    How to use t-SNE effectively
    http://distill.pub/2016/misread-tsne/
    −30
    −20
    −10
    0
    10
    20
    30
    −30 −20 −10 0 10 20
    Dimension 1
    Dimension 2
    replicate
    r1
    r2
    r3
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    Perplexity = 2
    −20
    −10
    0
    10
    20
    −10 0 10
    Dimension 1
    Dimension 2
    replicate
    r1
    r2
    r3
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    Perplexity = 5
    −2.5
    0.0
    2.5
    −2 0 2 4
    Dimension 1
    Dimension 2
    replicate
    r1
    r2
    r3
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    Perplexity = 25
    −2
    −1
    0
    1
    2
    3
    −2 0 2
    Dimension 1
    Dimension 2
    replicate
    r1
    r2
    r3
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    Perplexity (default)
    32
    Complete analysis in R Markdown on GitHub here:
    https://github.com/stephaniehicks/singlecellnano2017

    View Slide

  33. PCA (post-batch correcDon)
    −5
    0
    5
    −5 0 5
    Component 1: 3% variance
    Component 2: 2% variance
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    replicate r1 r2 r3
    PCA − no normalisation
    −10
    −5
    0
    5
    −10 0 10
    Component 1: 2% variance
    Component 2: 1% variance
    total_features
    7000
    7500
    8000
    8500
    9000
    9500
    replicate r1 r2 r3
    PCA − after batch correction and normalization
    33
    Complete analysis in R Markdown on GitHub here:
    https://github.com/stephaniehicks/singlecellnano2017

    View Slide

  34. 34

    View Slide

  35. Support from
    NIH R01 grants GM083084, RR021967/GM103552, HG005220
    NIH K99/R00 grant HG009007
    Rafael Irizarry
    35
    QuesDons?

    View Slide