Towards progress in batch effects and biases in single-cell RNA-Seq data

68c6191fa302627da003b9ac1eaba4b5?s=47 Stephanie Hicks
September 16, 2016

Towards progress in batch effects and biases in single-cell RNA-Seq data

Presented at the 2016 Single Cell Genomics Conference at the Wellcome Genome Campus in Hinxton, UK

68c6191fa302627da003b9ac1eaba4b5?s=128

Stephanie Hicks

September 16, 2016
Tweet

Transcript

  1. Towards progress in batch effects and biases in single-cell RNA-Seq

    data Stephanie Hicks Dana-Farber Cancer Institute / Harvard SPH @stephaniehicks Single Cell Genomics 2016 Wellcome Genome Campus
  2. Bimodal distribu=on of expression Number of Cells Expression level of

    Gene X μ 1 μ 2 True Expression
  3. Bimodal distribu=on of expression Expression level of Gene X μ

    1 μ 2 True Expression Observed Expression Number of Cells
  4. Patel et al. (2014) Science Cells cluster by tumor

  5. Verhaak et al. (2010). Cancer Cell

  6. Leek et al. (2010) Nat Reviews Genetics Batch effects in

    genomics data
  7. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition Confounded study design Batch effects in single-cell RNA-Seq data
  8. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data
  9. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Proposed by Tung et al. (2016) bioRxiv
  10. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Implemented by Tung et al. (2016) bioRxiv
  11. Use FASTQ header as a surrogate for batch http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/ swSEQ_mCA_FASTQFiles.htm

  12. Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1*

    Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* • • Batch Batch 1 Batch 2 Batch 3 • • Batch Batch 1 Batch 2 Batch 3 • • • • • • Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* Hicks et al. (2015) bioRxiv
  13. Cells cluster by tumor Hicks et al. (2015) bioRxiv

  14. Cells cluster by batch (tumors are confounded with batch) Hicks

    et al. (2015) bioRxiv
  15. Hicks et al. (2015) bioRxiv

  16. Different batches have different detec=on rates Hicks et al. (2015)

    bioRxiv
  17. Deng et al. (2014) Science Treutlein et al. (2014) Nature

    Trapnell et al. (2014) Nat Biotech Patel et al. (2014) Science
  18. Batch and outcomes of interest are confounded in published scRNA-Seq

    experiments
  19. Differences in the detec=on rates between batches of cells Study

    design #1:
  20. Study design #2: Differences in the detec=on rates between batches

    of cells
  21. Two studies had completely confounded study designs

  22. Correla=on between PC1 and propor=on of detected genes Propor=on of

    detected genes Shalek et al. (2014). Nature Finak et al. (2015). Genome Biology
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None