(mRNA molecule) RNA-Seq: relative abundance of genes in each sample Bulk RNA-Seq: tissue sample, many cells averaged together Single cell RNA-Seq (scRNA-Seq): each sample is a single cell About 20,000 human genes profiled per sample
unreliable in high dimensions1 Hard to compute with skewed, discrete counts Most genes uninformative Too many zeros? (≥ 90% of data) 1Aggarwal et al 2001
0 200 400 600 ENSG00000114391 read counts number of droplets in bin Zheng monocytes reads 0 200 400 600 0 5 10 15 20 ENSG00000114391 UMI counts number of droplets in bin Zheng monocytes UMIs
. , I, genes: j = 1, . . . , J. f (yi ) = ni yi1, . . . , yiJ j πyij ij UMI counts for single cell yi = (yi1, . . . , yiJ) Relative abundance of gene in cell: πij Total UMIs: ni = j yij
q 250000 270000 290000 310000 330000 dmn mult poi nb zip ziln distribution BIC Zheng monocytes q q q q q q q 450000 500000 550000 600000 mult poi dmn nb zip ziln nml distribution BIC Tung iPSCs