Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards progress in batch effects and biases in single-cell RNA-Seq data

Stephanie Hicks
September 16, 2016

Towards progress in batch effects and biases in single-cell RNA-Seq data

Presented at the 2016 Single Cell Genomics Conference at the Wellcome Genome Campus in Hinxton, UK

Stephanie Hicks

September 16, 2016
Tweet

More Decks by Stephanie Hicks

Other Decks in Science

Transcript

  1. Towards progress in
    batch effects and biases in
    single-cell RNA-Seq data
    Stephanie Hicks
    Dana-Farber Cancer Institute / Harvard SPH
    @stephaniehicks
    Single Cell Genomics 2016
    Wellcome Genome Campus

    View full-size slide

  2. Bimodal distribu=on of expression
    Number of Cells
    Expression level of Gene X
    μ
    1
    μ
    2
    True Expression

    View full-size slide

  3. Bimodal distribu=on of expression
    Expression level of Gene X
    μ
    1
    μ
    2
    True Expression
    Observed Expression
    Number of Cells

    View full-size slide

  4. Patel et al. (2014) Science
    Cells cluster by tumor

    View full-size slide

  5. Verhaak et al. (2010). Cancer Cell

    View full-size slide

  6. Leek et al. (2010) Nat Reviews Genetics
    Batch effects in genomics data

    View full-size slide

  7. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    Confounded study design
    Batch effects in
    single-cell RNA-Seq data

    View full-size slide

  8. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data

    View full-size slide

  9. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data
    Proposed by
    Tung et al.
    (2016) bioRxiv

    View full-size slide

  10. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    When cells from one biological group or
    condition are cultured, captured and
    sequenced separate from cells in a
    second condition
    More balanced study design:
    Cells from different biological group
    processed in same batch
    Batch effects in
    single-cell RNA-Seq data
    Implemented by Tung et al. (2016) bioRxiv

    View full-size slide

  11. Use FASTQ header as a surrogate for batch
    http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/
    swSEQ_mCA_FASTQFiles.htm

    View full-size slide

  12. Processing*Batch*
    Biological*Group*
    Completely*confounded*study*design*
    Balanced*study*design*
    Rep:*1* Rep:*2* Rep:*2* Rep:*2*
    Rep:*1* Rep:*1*
    Group:*1*
    Group:*2*
    Group:*3*
    Batch:*3* Batch:*1*
    Batch:*2*
    Batch:*3*
    Observed*Differences*
    We*cannot*determine*if*
    variaCon*is*driven*by*
    biology*or*batch*effects*


    Batch
    Batch 1
    Batch 2
    Batch 3


    Batch
    Batch 1
    Batch 2
    Batch 3






    Batch
    Batch 1
    Batch 2
    Batch 3
    Batch:*2* Batch:*1*
    Group:*3* Group:*1*
    Group:*2*
    The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$
    ProporCon*of*detected*genes*
    ProporCon*of*detected*genes*
    Principal*Component*2*
    Principal*Component*2*
    Principal*Component*1*
    Principal*Component*1*
    Principal*Component*2*
    Principal*Component*1*
    ProporCon*of*detected*genes*
    Group*1*
    Group*1* Group*2* Group*3*
    Group*2* Group*3*
    Group*1* Group*2* Group*3*
    Good*
    Bad*
    Hicks et al. (2015) bioRxiv

    View full-size slide

  13. Cells cluster by tumor
    Hicks et al. (2015) bioRxiv

    View full-size slide

  14. Cells cluster by batch
    (tumors are confounded with batch)
    Hicks et al. (2015) bioRxiv

    View full-size slide

  15. Hicks et al. (2015) bioRxiv

    View full-size slide

  16. Different batches have different detec=on rates
    Hicks et al. (2015) bioRxiv

    View full-size slide

  17. Deng et al. (2014) Science
    Treutlein et al. (2014) Nature
    Trapnell et al. (2014) Nat Biotech
    Patel et al. (2014) Science

    View full-size slide

  18. Batch and outcomes of interest are confounded in
    published scRNA-Seq experiments

    View full-size slide

  19. Differences in the detec=on rates
    between batches of cells
    Study design #1:

    View full-size slide

  20. Study design #2:
    Differences in the detec=on rates
    between batches of cells

    View full-size slide

  21. Two studies had completely confounded study designs

    View full-size slide

  22. Correla=on between PC1 and
    propor=on of detected genes
    Propor=on of detected genes
    Shalek et al. (2014). Nature
    Finak et al. (2015). Genome Biology

    View full-size slide