scmap – projection of
single-cell RNA-seq
data across datasets
Vlad(imir) Kiselev
(postdoc @ Martin Hemberg team)
Head of Cellular Genetics
Informatics team
Slide 2
Slide 2 text
Single-cell
RNA-seq
The introductory slides were kindly
provided by Mike Stubbington
(from his Human Cell Atlas
presentation)
Slide 3
Slide 3 text
The Art of Clean Up, Ursus Wehrli
Slide 4
Slide 4 text
The Art of Clean Up, Ursus Wehrli
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Moore’s law in single-cell
RNA-seq experiments
Svensson et al., Nature Protocols, April 2018
Slide 7
Slide 7 text
Single-cell
RNA-seq atlases
October 2016
400,000 single cells
All major mouse organs
Han et al, Cell, February 2018
Human Cell Atlas Mouse Cell Atlas
Fly Cell Atlas
All cells in a fly
(~25 million)
December 2017
Slide 8
Slide 8 text
Typical analysis
Macosko et al, Nature Biotechnology, 2016
Slide 9
Slide 9 text
Can we make use of
all these data in an
integrative manner?
Slide 10
Slide 10 text
Yes!
A method for projecting cells from
a single-cell RNA-seq dataset onto
cell-types or individual cells from
other experiments.
www.bioconductor.org
www.bioconductor.org
scmap
Slide 11
Slide 11 text
The Power of bioRxiv
Slide 12
Slide 12 text
The Power of bioRxiv
Slide 13
Slide 13 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be assigned to
the cell-type A
Slide 14
Slide 14 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be assigned to
the cell-type B
Slide 15
Slide 15 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be assigned to
the cell-type C
Slide 16
Slide 16 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be assigned to
the cell from the cell type A
Slide 17
Slide 17 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be assigned to
the cell from the cell type C
Slide 18
Slide 18 text
How does it work?
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Cell type A
Cell type B
Cell type C
Unknown cell type
This cell will be unassigned
Slide 19
Slide 19 text
Discovery vs validation
Query Reference
scmap-cluster
scmap-cell
a
Method scmap−cluster scmap−cell SVM RF
b Method scmap−cluster scmap−cell SVM RF
c
Validation
Discovery
Slide 20
Slide 20 text
Datasets
Dataset Organism Tissue
# of
cells
Experimental
protocol
Yan human
Embryo
development
90 Tang et al
Goolam mouse
Embryo
development
124 Smart-Seq2
Deng mouse
Embryo
development
268
Smart-Seq
Smart-Seq2
Pollen human Cerebral cortex 301 SMARTer
Li human Colorectal tumors 561 SMARTer
Usoskin mouse Brain 622 STRT-Seq
Kolodziejczyk mouse Embryo stem cells 704 SMARTer
Xin human Pancreas 1492 SMARTer
Tasic mouse Cortex 1679 SMARTer
Baron mouse Pancreas 1886 inDrop
Muraro human Pancreas 2126 CEL-Seq2
Segerstolpe human Pancreas 2209 Smart-Seq2
Klein mouse Embryo stem cells 2717 inDrop
Zeisel mouse Brain 3005 STRT-Seq UMI
Baron human Pancreas 8569 inDrop
Shekhar mouse Retina 27499 Drop-Seq
Macosko mouse Retina 44808 Drop-Seq
We used publicly available
datasets to validate and
benchmark scmap
In all datasets the cell types were
identified by the authors
Feature selection
(Reference)
Curse of dimensionality
• With increased dimensions data becomes sparse
• Definitions of density and distance between points become less
meaningful
• Classification algorithms do not work well
https://shapeofdata.wordpress.com/2013/04/02/the-curse-of-dimensionality/
…
N = 2 N = 3 N = 16 N = 17