Slide 1

Slide 1 text

Towards progress in batch effects and biases in single-cell RNA-Seq data Stephanie Hicks Dana-Farber Cancer Institute / Harvard SPH @stephaniehicks Single Cell Genomics 2016 Wellcome Genome Campus

Slide 2

Slide 2 text

Bimodal distribu=on of expression Number of Cells Expression level of Gene X μ 1 μ 2 True Expression

Slide 3

Slide 3 text

Bimodal distribu=on of expression Expression level of Gene X μ 1 μ 2 True Expression Observed Expression Number of Cells

Slide 4

Slide 4 text

Patel et al. (2014) Science Cells cluster by tumor

Slide 5

Slide 5 text

Verhaak et al. (2010). Cancer Cell

Slide 6

Slide 6 text

Leek et al. (2010) Nat Reviews Genetics Batch effects in genomics data

Slide 7

Slide 7 text

Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1* Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* ● ● Batch Batch 1 Batch 2 Batch 3 ● ● Batch Batch 1 Batch 2 Batch 3 ● ● ● ● ● ● Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition Confounded study design Batch effects in single-cell RNA-Seq data

Slide 8

Slide 8 text

Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1* Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* ● ● Batch Batch 1 Batch 2 Batch 3 ● ● Batch Batch 1 Batch 2 Batch 3 ● ● ● ● ● ● Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data

Slide 9

Slide 9 text

Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1* Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* ● ● Batch Batch 1 Batch 2 Batch 3 ● ● Batch Batch 1 Batch 2 Batch 3 ● ● ● ● ● ● Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Proposed by Tung et al. (2016) bioRxiv

Slide 10

Slide 10 text

Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1* Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* ● ● Batch Batch 1 Batch 2 Batch 3 ● ● Batch Batch 1 Batch 2 Batch 3 ● ● ● ● ● ● Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* When cells from one biological group or condition are cultured, captured and sequenced separate from cells in a second condition More balanced study design: Cells from different biological group processed in same batch Batch effects in single-cell RNA-Seq data Implemented by Tung et al. (2016) bioRxiv

Slide 11

Slide 11 text

Use FASTQ header as a surrogate for batch http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/ swSEQ_mCA_FASTQFiles.htm

Slide 12

Slide 12 text

Processing*Batch* Biological*Group* Completely*confounded*study*design* Balanced*study*design* Rep:*1* Rep:*2* Rep:*2* Rep:*2* Rep:*1* Rep:*1* Group:*1* Group:*2* Group:*3* Batch:*3* Batch:*1* Batch:*2* Batch:*3* Observed*Differences* We*cannot*determine*if* variaCon*is*driven*by* biology*or*batch*effects* ● ● Batch Batch 1 Batch 2 Batch 3 ● ● Batch Batch 1 Batch 2 Batch 3 ● ● ● ● ● ● Batch Batch 1 Batch 2 Batch 3 Batch:*2* Batch:*1* Group:*3* Group:*1* Group:*2* The$Problem$of$Confounding$Biological$Varia6on$and$Batch$Effects$ ProporCon*of*detected*genes* ProporCon*of*detected*genes* Principal*Component*2* Principal*Component*2* Principal*Component*1* Principal*Component*1* Principal*Component*2* Principal*Component*1* ProporCon*of*detected*genes* Group*1* Group*1* Group*2* Group*3* Group*2* Group*3* Group*1* Group*2* Group*3* Good* Bad* Hicks et al. (2015) bioRxiv

Slide 13

Slide 13 text

Cells cluster by tumor Hicks et al. (2015) bioRxiv

Slide 14

Slide 14 text

Cells cluster by batch (tumors are confounded with batch) Hicks et al. (2015) bioRxiv

Slide 15

Slide 15 text

Hicks et al. (2015) bioRxiv

Slide 16

Slide 16 text

Different batches have different detec=on rates Hicks et al. (2015) bioRxiv

Slide 17

Slide 17 text

Deng et al. (2014) Science Treutlein et al. (2014) Nature Trapnell et al. (2014) Nat Biotech Patel et al. (2014) Science

Slide 18

Slide 18 text

Batch and outcomes of interest are confounded in published scRNA-Seq experiments

Slide 19

Slide 19 text

Differences in the detec=on rates between batches of cells Study design #1:

Slide 20

Slide 20 text

Study design #2: Differences in the detec=on rates between batches of cells

Slide 21

Slide 21 text

Two studies had completely confounded study designs

Slide 22

Slide 22 text

Correla=on between PC1 and propor=on of detected genes Propor=on of detected genes Shalek et al. (2014). Nature Finak et al. (2015). Genome Biology

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content