Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Single-cell RNA-sequencing

Single-cell RNA-sequencing

Introduction to NGS course at EBI

Vladimir Kiselev

April 12, 2018
Tweet

More Decks by Vladimir Kiselev

Other Decks in Education

Transcript

  1. Projection of single- cell RNA-seq data across datasets Vladimir Kiselev

    Head of Cellular Genetics Informatics Wellcome Sanger Institute
  2. Bulk RNA sequencing •A major breakthrough after microarrays in the

    late 00’s •Measures the average expression level in a population of cells •Useful for comparative transcriptomics (the same tissue from different species) •Useful for quantifying expression signatures from ensembles
  3. Bulk RNA sequencing •Insufficient for studying heterogeneous systems, e.g. early

    development studies, complex tissues (brain) •Does not provide insights into the stochastic nature of gene expression
  4. •First publication in 2009, became popular in ~2014 due to

    lower sequencing costs •Measures the distribution of expression levels in cells •Allows to study cell-specific changes in transcriptome Single-cell RNA sequencing
  5. Single-cell RNA sequencing •Datasets range from 102 to 106 cells

    and increase in size every year •Computational analysis requires adaptation of the existing methods or development of new ones
  6. •Full-length Uniform read coverage Amplification biases •Tag-based Only captures 5’-

    or 3’-end Can be combined with unique molecular identifiers (UMIs) which help improve the quantification •Important implications for downstream analysis Quantification scRNA-seq protocols
  7. •Microwell-based Cells isolated and placed in microfluidic wells Can be

    combined with fluorescent activated cell sorting (FACS), making it possible to select cells based on surface markers One can take pictures of the cells, this facilitates QC (damaged cells or doublets) Low-throughput Capture scRNA-seq protocols
  8. •Microfluidic-based Higher throughput than microwell based platforms. Only around 10%

    of cells are captured, therefore not appropriate for rare cell-types or very small amounts of input The chip is relatively expensive Capture scRNA-seq protocols
  9. •Droplet-based Encapsulates each cell inside a droplet together with a

    bead Each bead contains enzymes and a unique barcode which is attached to all of the reads Droplets can be pooled and sequenced together Highest throughput Costs are ~0.05 USD/cell Capture scRNA-seq protocols
  10. • CEL-seq (Hashimshony et al. 2012) • CEL-seq2 (Hashimshony et

    al. 2016) • Drop-seq (Macosko et al. 2015) • InDrop-seq (Klein et al. 2015) • MARS-seq (Jaitin et al. 2014) • SCRB-seq (Soumillon et al. 2014) • Seq-well (Gierahn et al. 2017) • Smart-seq (Picelli et al. 2014) • Smart-seq2 (Picelli et al. 2014) • SMARTer • STRT-seq (Islam et al. 2013) scRNA-seq protocols
  11. • Tissue composition - clustering (droplet-based) • Rare cell-population (microwell-based)

    • Temporal composition • RNA isoforms (full-length) scRNA-seq applications
  12. yellow - bulk RNA tools can be used orange -

    require a mix of bulk RNA tools and novel methods cyan - only novel methods can be used scRNA-seq computational analysis
  13. Single-cell RNA-seq atlases October 2016 400,000 single cells All major

    mouse organs Han et al, Cell, February 2018 Human Cell Atlas Mouse Cell Atlas Fly Cell Atlas All cells in a fly (~25 million) December 2017
  14. Yes! A method for projecting cells from a single-cell RNA-seq

    dataset onto cell-types or individual cells from other experiments. www.bioconductor.org www.bioconductor.org scmap
  15. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type A
  16. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type B
  17. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type C
  18. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type A
  19. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type C
  20. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be unassigned
  21. Discovery vs validation Query Reference scmap-cluster scmap-cell a Method scmap−cluster

    scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Validation Discovery
  22. Datasets Dataset Organism Tissue # of cells Experimental protocol Yan

    human Embryo development 90 Tang et al Goolam mouse Embryo development 124 Smart-Seq2 Deng mouse Embryo development 268 Smart-Seq Smart-Seq2 Pollen human Cerebral cortex 301 SMARTer Li human Colorectal tumors 561 SMARTer Usoskin mouse Brain 622 STRT-Seq Kolodziejczyk mouse Embryo stem cells 704 SMARTer Xin human Pancreas 1492 SMARTer Tasic mouse Cortex 1679 SMARTer Baron mouse Pancreas 1886 inDrop Muraro human Pancreas 2126 CEL-Seq2 Segerstolpe human Pancreas 2209 Smart-Seq2 Klein mouse Embryo stem cells 2717 inDrop Zeisel mouse Brain 3005 STRT-Seq UMI Baron human Pancreas 8569 inDrop Shekhar mouse Retina 27499 Drop-Seq Macosko mouse Retina 44808 Drop-Seq We used publicly available datasets to validate and benchmark scmap In all datasets the cell types were identified by the authors
  23. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  24. Feature selection (Reference) Curse of dimensionality • With increased dimensions

    data becomes sparse • Definitions of density and distance between points become less meaningful • Classification algorithms do not work well https://shapeofdata.wordpress.com/2013/04/02/the-curse-of-dimensionality/ … N = 2 N = 3 N = 16 N = 17
  25. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  26. Index creation (Reference) Search index: • Collects, parses, and stores

    data to facilitate fast and accurate information retrieval • Search engines, like Google, use it • Represents data in a compressed format • Saves space
  27. Index creation (Reference) scmap-cluster • Compute a centroid of (median)

    of each cell type in the Reference Cells Genes Cells types Genes
  28. Index creation (Reference) scmap-cell • The features are randomly split

    into subsets • Every cell in the reference is identified with a set of sub-centroids • Sub-centroids are defined via k-means clustering Sub-centroids Genes … Cells Product quantizer Andrew Yiu
  29. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  30. Projection sub-centroids … cosine new sample Cells types Genes cosine

    Pearson Spearman • For scmap-cluster at least two similarities have to agree and the similarity threshold is 0.7 • For scmap-cell the similarity threshold is 0.5 Andrew Yiu
  31. Results www.bioconductor.org www.bioconductor.org scmap Since all datasets were annotated with

    corresponding cell-types, we were able to measure projection in a qualitative manner Cohen’s Kappa: [0, 1] % of unassigned: [0, 100]