late 00’s •Measures the average expression level in a population of cells •Useful for comparative transcriptomics (the same tissue from different species) •Useful for quantifying expression signatures from ensembles
lower sequencing costs •Measures the distribution of expression levels in cells •Allows to study cell-specific changes in transcriptome Single-cell RNA sequencing
or 3’-end Can be combined with unique molecular identifiers (UMIs) which help improve the quantification •Important implications for downstream analysis Quantification scRNA-seq protocols
combined with fluorescent activated cell sorting (FACS), making it possible to select cells based on surface markers One can take pictures of the cells, this facilitates QC (damaged cells or doublets) Low-throughput Capture scRNA-seq protocols
of cells are captured, therefore not appropriate for rare cell-types or very small amounts of input The chip is relatively expensive Capture scRNA-seq protocols
bead Each bead contains enzymes and a unique barcode which is attached to all of the reads Droplets can be pooled and sequenced together Highest throughput Costs are ~0.05 USD/cell Capture scRNA-seq protocols
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type A
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type B
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type C
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type A
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type C
scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be unassigned
human Embryo development 90 Tang et al Goolam mouse Embryo development 124 Smart-Seq2 Deng mouse Embryo development 268 Smart-Seq Smart-Seq2 Pollen human Cerebral cortex 301 SMARTer Li human Colorectal tumors 561 SMARTer Usoskin mouse Brain 622 STRT-Seq Kolodziejczyk mouse Embryo stem cells 704 SMARTer Xin human Pancreas 1492 SMARTer Tasic mouse Cortex 1679 SMARTer Baron mouse Pancreas 1886 inDrop Muraro human Pancreas 2126 CEL-Seq2 Segerstolpe human Pancreas 2209 Smart-Seq2 Klein mouse Embryo stem cells 2717 inDrop Zeisel mouse Brain 3005 STRT-Seq UMI Baron human Pancreas 8569 inDrop Shekhar mouse Retina 27499 Drop-Seq Macosko mouse Retina 44808 Drop-Seq We used publicly available datasets to validate and benchmark scmap In all datasets the cell types were identified by the authors
data becomes sparse • Definitions of density and distance between points become less meaningful • Classification algorithms do not work well https://shapeofdata.wordpress.com/2013/04/02/the-curse-of-dimensionality/ … N = 2 N = 3 N = 16 N = 17
data to facilitate fast and accurate information retrieval • Search engines, like Google, use it • Represents data in a compressed format • Saves space
into subsets • Every cell in the reference is identified with a set of sub-centroids • Sub-centroids are defined via k-means clustering Sub-centroids Genes … Cells Product quantizer Andrew Yiu
Pearson Spearman • For scmap-cluster at least two similarities have to agree and the similarity threshold is 0.7 • For scmap-cell the similarity threshold is 0.5 Andrew Yiu