Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualising single-cell transcriptomes

Visualising single-cell transcriptomes

Presented at VIZBI 2017 in Sydney, Australia in June 2017 (https://vizbi.org/2017/).

Davis McCarthy

June 14, 2017
Tweet

More Decks by Davis McCarthy

Other Decks in Science

Transcript

  1. 1. Overview of single-cell transcriptomics 2. Visualisation for dimensionality reduction

    3. Visualisation for exploratory data analysis and quality control 4. Visualisation of results
  2. Cell_01 Tx_01 0 Tx_02 10 0 Tx_03 0 Tx_04 25

    Tx_05 13 Tx_06 0 Tx_07 100 … … Assay expression for 100s - 1000s of transcripts/ genes from a cell
  3. • Spatial transcriptomics • Single-cell RNA sequencing + www.spatialtranscriptomics.com A

    MERFISH measurement of an ~20mm2 sample area (~15,000 cells) Jeffrey R. Moffitt et al. PNAS 2016;113:11046-11051 Achim K et al. Nat Biotechnol. 2015;33: 503–509. Satija R et al. Nat Biotechnol. 2015;33: 495–502.
  4. Technological developments drive Moore’s Law in single-cell transcriptomics Svensson V,

    Vento-Tormo R, Teichmann SA. Moore’s Law in Single Cell Transcriptomics, arXiv, 2017. Available: http://arxiv.org/abs/1704.01379
  5. Cell_01 Tx_01 0 Tx_02 10 0 Tx_03 0 Tx_04 25

    Tx_05 13 Tx_06 0 Tx_07 100 … … Assay expression for 1000s of transcripts/genes from a cell Cell_01 Cell_02 Cell_03 Cell_04 Cell_05 Cell_06 … Tx_01 0 0 5 19 8013 3012 … Tx_02 10 0 7 0 134 299 … Tx_03 0 0 19 0 0 0 … Tx_04 25 0 0 179 0 0 … Tx_05 13 3 0 136 15 27 … Tx_06 0 0 150 987 0 0 … Tx_07 100 795 248 196 139 0 … … … … … … … … … e.g. Expression for 10,000 genes x 15,000 cells 1000s of cells
  6. Single-cell vs bulk transcriptomics: challenges for visualisation • Scale: tens

    to hundreds of samples for bulk vs thousands to millions of single cells • Data characteristics: limitations of chemistries (RNA capture and conversion) in single-cell protocols yield fewer transcripts expressed and many zero observations • New questions: single-cell transcriptomics can address previously unanswerable questions, and we need to visualisations to help answer these
  7. 1. Visualisation for dimensionality reduction 2. Visualisation for exploratory data

    analysis and quality control 3. Visualisation of results
  8. The “crowding problem”. What do we lose? What can go

    wrong? How do we represent extremely high dimensional data in a way that we can interpret?
  9. PCA: linear combinations of gene expression values to maximise variance

    between cells Novembre et al, Nature, 2008 “Genes mirror geography within Europe”
  10. PCA on single-cell data typically captures technical effects, most often

    number of genes detected First principal component explains 55% of variance, very strongly correlated with number of genes detected per cell. PCA extremely useful for QC. Produced with the R/Bioconductor package scater (McCarthy et al, Bioinformatics, 2017)
  11. t-SNE “t-distributed stochastic neighbour embedding” Stochastic Neighbour Embedding (SNE) starts

    by converting the high-dimensional Euclidean distances between datapoints into conditional probabilities that represent similarities*. t-SNE often better than earlier techniques at creating a single map that reveals structure at many different scales. non-linear method that often produces beautiful plots capturing real structure in single-cell datasets * van der Maaten and Hinton, Journal of Machine Learning Research, 2008.
 Resources from the developer: https://lvdmaaten.github.io/tsne/
  12. t-SNE can be challenging to use effectively Cons: • cluster

    sizes in a t-SNE plot don’t mean anything • distances between clusters may not mean anything • algorithms are not deterministic (different runs will yield different results) • lots of hyper parameters that affect the visualisation • prone to overinterpretation! How to use t-SNE effectively: http://distill.pub/2016/misread-tsne/
  13. “If this is paradise, I wish I had a lawn

    mower.” - Talking Heads, Nothing but flowers