Visualising single-cell transcriptomes

Visualising single-cell transcriptomes Davis McCarthy NHMRC Early Career Fellow Stegle
Group, EMBL-EBI www.ebi.ac.uk @davisjmcc

1. Overview of single-cell transcriptomics 2. Visualisation for dimensionality reduction
3. Visualisation for exploratory data analysis and quality control 4. Visualisation of results

Single-cell transcriptomics: a brave new world?

van Leeuwenhoek’s microscope (http://www.history-of-the-microscope.org/) Nakamura et al, Nature, 2016 +
https://en.wikipedia.org/wiki/James_Webb_Space_Telescope

Cell_01 Tx_01 0 Tx_02 10 0 Tx_03 0 Tx_04 25
Tx_05 13 Tx_06 0 Tx_07 100 … … Assay expression for 100s - 1000s of transcripts/ genes from a cell

• Spatial transcriptomics • Single-cell RNA sequencing + www.spatialtranscriptomics.com A
MERFISH measurement of an ~20mm2 sample area (~15,000 cells) Jeffrey R. Mofﬁtt et al. PNAS 2016;113:11046-11051 Achim K et al. Nat Biotechnol. 2015;33: 503–509. Satija R et al. Nat Biotechnol. 2015;33: 495–502.

Technological developments drive Moore’s Law in single-cell transcriptomics Svensson V,
Vento-Tormo R, Teichmann SA. Moore’s Law in Single Cell Transcriptomics, arXiv, 2017. Available: http://arxiv.org/abs/1704.01379

Cell_01 Tx_01 0 Tx_02 10 0 Tx_03 0 Tx_04 25
Tx_05 13 Tx_06 0 Tx_07 100 … … Assay expression for 1000s of transcripts/genes from a cell Cell_01 Cell_02 Cell_03 Cell_04 Cell_05 Cell_06 … Tx_01 0 0 5 19 8013 3012 … Tx_02 10 0 7 0 134 299 … Tx_03 0 0 19 0 0 0 … Tx_04 25 0 0 179 0 0 … Tx_05 13 3 0 136 15 27 … Tx_06 0 0 150 987 0 0 … Tx_07 100 795 248 196 139 0 … … … … … … … … … e.g. Expression for 10,000 genes x 15,000 cells 1000s of cells

Single-cell vs bulk transcriptomics: challenges for visualisation • Scale: tens
to hundreds of samples for bulk vs thousands to millions of single cells • Data characteristics: limitations of chemistries (RNA capture and conversion) in single-cell protocols yield fewer transcripts expressed and many zero observations • New questions: single-cell transcriptomics can address previously unanswerable questions, and we need to visualisations to help answer these

1. Visualisation for dimensionality reduction 2. Visualisation for exploratory data
analysis and quality control 3. Visualisation of results

Dimensionality reduction

The “crowding problem”. What do we lose? What can go
wrong? How do we represent extremely high dimensional data in a way that we can interpret?

PCA: linear combinations of gene expression values to maximise variance
between cells Novembre et al, Nature, 2008 “Genes mirror geography within Europe”

PCA on single-cell data typically captures technical effects, most often
number of genes detected First principal component explains 55% of variance, very strongly correlated with number of genes detected per cell. PCA extremely useful for QC. Produced with the R/Bioconductor package scater (McCarthy et al, Bioinformatics, 2017)

t-SNE “t-distributed stochastic neighbour embedding” Stochastic Neighbour Embedding (SNE) starts
by converting the high-dimensional Euclidean distances between datapoints into conditional probabilities that represent similarities*. t-SNE often better than earlier techniques at creating a single map that reveals structure at many different scales. non-linear method that often produces beautiful plots capturing real structure in single-cell datasets * van der Maaten and Hinton, Journal of Machine Learning Research, 2008.  Resources from the developer: https://lvdmaaten.github.io/tsne/

1.3 million cells from embryonic mouse brain (10x genomics) Figure:
Vlad Kiselev, Sanger Institute

t-SNE can be challenging to use effectively Cons: • cluster
sizes in a t-SNE plot don’t mean anything • distances between clusters may not mean anything • algorithms are not deterministic (different runs will yield different results) • lots of hyper parameters that affect the visualisation • prone to overinterpretation! How to use t-SNE effectively: http://distill.pub/2016/misread-tsne/

Diffusion maps: show continuities between cell states Haghverdi et al,
Bioinformatics, 2015

EDA and quality control

“If this is paradise, I wish I had a lawn
mower.” - Talking Heads, Nothing but ﬂowers

Visualising single-cell transcriptomes

Visualising single-cell transcriptomes

Davis McCarthy

More Decks by Davis McCarthy

Other Decks in Science

Featured

Transcript