Single-cell RNA-sequencing

Single-cell RNA-sequencing

Introduction to NGS course at EBI

D68d36a42d9c44c29abb391e051e592d?s=128

Vladimir Kiselev

April 12, 2018
Tweet

Transcript

  1. Projection of single- cell RNA-seq data across datasets Vladimir Kiselev

    Head of Cellular Genetics Informatics Wellcome Sanger Institute
  2. Bulk RNA sequencing •A major breakthrough after microarrays in the

    late 00’s •Measures the average expression level in a population of cells •Useful for comparative transcriptomics (the same tissue from different species) •Useful for quantifying expression signatures from ensembles
  3. Bulk RNA sequencing •Insufficient for studying heterogeneous systems, e.g. early

    development studies, complex tissues (brain) •Does not provide insights into the stochastic nature of gene expression
  4. None
  5. None
  6. •First publication in 2009, became popular in ~2014 due to

    lower sequencing costs •Measures the distribution of expression levels in cells •Allows to study cell-specific changes in transcriptome Single-cell RNA sequencing
  7. Single-cell RNA sequencing •Datasets range from 102 to 106 cells

    and increase in size every year •Computational analysis requires adaptation of the existing methods or development of new ones
  8. Bulk RNA-seq vs scRNA-seq

  9. The Art of Clean Up, Ursus Wehrli

  10. The Art of Clean Up, Ursus Wehrli

  11. None
  12. •Full-length Uniform read coverage Amplification biases •Tag-based Only captures 5’-

    or 3’-end Can be combined with unique molecular identifiers (UMIs) which help improve the quantification •Important implications for downstream analysis Quantification scRNA-seq protocols
  13. •Microwell-based Cells isolated and placed in microfluidic wells Can be

    combined with fluorescent activated cell sorting (FACS), making it possible to select cells based on surface markers One can take pictures of the cells, this facilitates QC (damaged cells or doublets) Low-throughput Capture scRNA-seq protocols
  14. •Microfluidic-based Higher throughput than microwell based platforms. Only around 10%

    of cells are captured, therefore not appropriate for rare cell-types or very small amounts of input The chip is relatively expensive Capture scRNA-seq protocols
  15. •Droplet-based Encapsulates each cell inside a droplet together with a

    bead Each bead contains enzymes and a unique barcode which is attached to all of the reads Droplets can be pooled and sequenced together Highest throughput Costs are ~0.05 USD/cell Capture scRNA-seq protocols
  16. • CEL-seq (Hashimshony et al. 2012) • CEL-seq2 (Hashimshony et

    al. 2016) • Drop-seq (Macosko et al. 2015) • InDrop-seq (Klein et al. 2015) • MARS-seq (Jaitin et al. 2014) • SCRB-seq (Soumillon et al. 2014) • Seq-well (Gierahn et al. 2017) • Smart-seq (Picelli et al. 2014) • Smart-seq2 (Picelli et al. 2014) • SMARTer • STRT-seq (Islam et al. 2013) scRNA-seq protocols
  17. Interactive exercise What questions can we answer with scRNA-seq? Why

    scRNA-seq rather than bulk RNA-seq?
  18. • Tissue composition - clustering (droplet-based) • Rare cell-population (microwell-based)

    • Temporal composition • RNA isoforms (full-length) scRNA-seq applications
  19. yellow - bulk RNA tools can be used orange -

    require a mix of bulk RNA tools and novel methods cyan - only novel methods can be used scRNA-seq computational analysis
  20. Typical analysis Trapnell et al, Nature Biotechnology, 2014

  21. Typical analysis Macosko et al, Nature Biotechnology, 2016

  22. New analysis Manno et al, bioRxiv, 2017 RNA velocity (time

    derivative of expression)
  23. Moore’s law in single-cell RNA-seq experiments Svensson et al., Nature

    Protocols, April 2018
  24. Single-cell RNA-seq atlases October 2016 400,000 single cells All major

    mouse organs Han et al, Cell, February 2018 Human Cell Atlas Mouse Cell Atlas Fly Cell Atlas All cells in a fly (~25 million) December 2017
  25. Can we make use of all these data in an

    integrative manner?
  26. Yes! A method for projecting cells from a single-cell RNA-seq

    dataset onto cell-types or individual cells from other experiments. www.bioconductor.org www.bioconductor.org scmap
  27. The Power of bioRxiv

  28. The Power of bioRxiv

  29. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type A
  30. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type B
  31. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell-type C
  32. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type A
  33. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be assigned to the cell from the cell type C
  34. How does it work? Query Reference scmap-cluster scmap-cell a Method

    scmap−cluster scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Cell type A Cell type B Cell type C Unknown cell type This cell will be unassigned
  35. Discovery vs validation Query Reference scmap-cluster scmap-cell a Method scmap−cluster

    scmap−cell SVM RF b Method scmap−cluster scmap−cell SVM RF c Validation Discovery
  36. Datasets Dataset Organism Tissue # of cells Experimental protocol Yan

    human Embryo development 90 Tang et al Goolam mouse Embryo development 124 Smart-Seq2 Deng mouse Embryo development 268 Smart-Seq Smart-Seq2 Pollen human Cerebral cortex 301 SMARTer Li human Colorectal tumors 561 SMARTer Usoskin mouse Brain 622 STRT-Seq Kolodziejczyk mouse Embryo stem cells 704 SMARTer Xin human Pancreas 1492 SMARTer Tasic mouse Cortex 1679 SMARTer Baron mouse Pancreas 1886 inDrop Muraro human Pancreas 2126 CEL-Seq2 Segerstolpe human Pancreas 2209 Smart-Seq2 Klein mouse Embryo stem cells 2717 inDrop Zeisel mouse Brain 3005 STRT-Seq UMI Baron human Pancreas 8569 inDrop Shekhar mouse Retina 27499 Drop-Seq Macosko mouse Retina 44808 Drop-Seq We used publicly available datasets to validate and benchmark scmap In all datasets the cell types were identified by the authors
  37. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  38. Feature selection (Reference) Curse of dimensionality • With increased dimensions

    data becomes sparse • Definitions of density and distance between points become less meaningful • Classification algorithms do not work well https://shapeofdata.wordpress.com/2013/04/02/the-curse-of-dimensionality/ … N = 2 N = 3 N = 16 N = 17
  39. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  40. Index creation (Reference) Search index: • Collects, parses, and stores

    data to facilitate fast and accurate information retrieval • Search engines, like Google, use it • Represents data in a compressed format • Saves space
  41. Index creation (Reference) scmap-cluster • Compute a centroid of (median)

    of each cell type in the Reference Cells Genes Cells types Genes
  42. Index creation (Reference) scmap-cell • The features are randomly split

    into subsets • Every cell in the reference is identified with a set of sub-centroids • Sub-centroids are defined via k-means clustering Sub-centroids Genes … Cells Product quantizer Andrew Yiu
  43. Compression = size(matrix) / size(index) scmap-cluster scmap-cell log10(Compression)

  44. Algorithm 1. Feature (gene, transcript) selection 2. Index creation 3.

    Projection www.bioconductor.org www.bioconductor.org scmap
  45. Projection sub-centroids … cosine new sample Cells types Genes cosine

    Pearson Spearman • For scmap-cluster at least two similarities have to agree and the similarity threshold is 0.7 • For scmap-cell the similarity threshold is 0.5 Andrew Yiu
  46. Results www.bioconductor.org www.bioconductor.org scmap Since all datasets were annotated with

    corresponding cell-types, we were able to measure projection in a qualitative manner Cohen’s Kappa: [0, 1] % of unassigned: [0, 100]
  47. None
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. None
  60. None