Tools and techniques for single-cell RNA sequencing data

9d81fd2d95185ac557a4a6a1e2139657?s=47 Luke Zappia
March 22, 2019

Tools and techniques for single-cell RNA sequencing data

RNA sequencing of individual cells allows us to take a snapshot of the dynamic processes within a cell and explore the differences between cell types. As this technology has developed over the last few years it has been rapidly adopted by researchers in areas such as developmental biology. Along with the development of protocols for producing this data has been a simultaneous burst in the development of computational methods for analysing it. My thesis explores the computational tools and techniques for analysing single-cell RNA-sequencing data. I will present a database that charts the release of analysis software (https://scrna-tools.org), Splatter, a software package for easily simulating single-cell datasets from multiple models (http://bioconductor.org/packages/splatter/) and clustering trees, a visualisation approach for inspecting clustering results at multiple resolutions (https://CRAN.R-project.org/package=clustree). In the final part of my thesis, I use an analysis of a kidney organoid dataset to demonstrate and compare some of the current analysis methods.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

March 22, 2019
Tweet

Transcript

  1. TOOLS AND TECHNIQUES FOR SINGLE-CELL RNA SEQUENCING DATA LUKE ZAPPIA

  2. 1 Introduction 2 Tools 3 Simulations 4 Clustering trees 5

    Analysis 6 Conclusion 1 2 3 4 5 6
  3. Introduction Matthew Daniels via The Cell Image Library http://www.cellimagelibrary.org/images/38912 1

  4. None
  5. ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Sample 1 A 43 B

    3 C 17 D 24 RNA sequencing
  6. Alignment-free quantification Alignment Counting Aligned reads Raw reads Reference genome

    Gene annotation Reference transcriptome Expression matrix Normalisation Differential expression testing Gene set testing Visualisation Gene sets Interpretation Quality control
  7. Svensson et al. DOI: 10.1038/nprot.2017.149

  8. mccarrolllab.com/dropseq/ Droplet cell capture

  9. scRNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA

    TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  10. None
  11. None
  12. Kidney development Images from OpenStax College, CC BY 3.0 via

    Wikimedia Commons
  13. Kidney organoids Day 0 4 7 10 18 25 CHIR

    FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid
  14. Aims 1. Understand the computational tools used to analyse scRNA-seq

    data 2. Contribute to tool development 3. Apply tools to a kidney organoid dataset
  15. Tools Andres J Garcia and Ankur Singh via The Cell

    Image Library 2 http://www.cellimagelibrary.org/images/44701
  16. None
  17. www. .org

  18. None
  19. Number of tools Publication status

  20. Software licenses Platforms

  21. Analysis categories

  22. Users over time By country Top 10 By continent

  23. “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”

    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245
  24. Simulations 3 S. Schuller via The Cell Image Library http://www.cellimagelibrary.org/images/38903

  25. Simulations Provide a truth to test against But - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  26. Bioconductor package Consistent, easy-to-use interface Multiple simulation models

  27. Splat model Negative binomial Expression outliers Defined library sizes Mean-variance

    trend Dropout
  28. Other models Simple - Negative binomial Lun - NB with

    cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113
  29. Real data Parameters Dataset Estimation Simulation params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare
  30. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression
  31. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size % Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data
  32. Complex simulations Groups Batches Paths

  33. “Splatter: simulation of single-cell RNA sequencing data” Genome Biology (2017)

    DOI: 10.1186/s13059-017-1305-0
  34. Clustering trees 4 http://www.cellimagelibrary.org/images/40483 M Uhlen et al. via The

    Cell Image Library
  35. How many clusters? Low resolution (fewer clusters) High resolution (more

    clusters)
  36. A tree of clusters?

  37. None
  38. None
  39. Weighting edges In proportion = Number of cells on edge

    Number of cells in higher res cluster
  40. None
  41. None
  42. Gene expression

  43. Cell cycle SC3 stability Number of genes

  44. None
  45. t-SNE 2 t-SNE 1 t-SNE 1 t-SNE 2

  46. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”

    GigaScience (2018) DOI: gigascience/giy083
  47. Analysis Natalie Prigozhina via The Cell Image Library http://www.cellimagelibrary.org/images/48101 5

  48. GATA3 ECAD LTL WT1 CD + DT + PT +

    Glo
  49. Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1)

    7937 cells (6649 + 1288) Identify cell types Alignment Quantification Quality control Integration Clustering Gene detection CellRanger CellRanger scater Seurat Seurat Seurat Analysis steps
  50. Stroma Endothelium Cell cycle Podocyte Epithelium

  51. Glial Neural progenitor Muscle progenitor

  52. Human dataset 16 week fetal kidney 3178 cells 10x Chromium

    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890
  53. Stroma Fetal kidney Stroma Organoid Stroma Stroma Endothelium Cell cycle

    Nephron progenitor Podocyte Cell cycle Stroma Nephron Glial Immune Blood Neural progenitor Podocyte 1000 750 500 250 0 1250 1000 750 500 250 0 1250 Number of cells
  54. “Single-cell analysis reveals congruence between kidney organoids and human fetal

    kidney” Genome Medicine (2019) DOI: 10.1186/s13073-019-0615-0
  55. What if we did things differently?

  56. Droplet selection ~ 1 million droplets

  57. None
  58. Quality control Manual thresholds PCA based

  59. Gene selection Seurat

  60. Gene selection M3Drop

  61. Gene selection Overlap Seurat only M3Drop only Both

  62. Comparison

  63. Marker genes

  64. Partition-based graph abstraction Cell graph PAGA cluster graph

  65. Cell velocity AAAAAAA Unspliced RNA Mature mRNA

  66. Podocyte Epithelial Endothelial Stroma Neural Muscle Immune?

  67. Summary New droplet selection methods can give many more cells

    Seurat clustering is robust to gene selection Possible immune-like population Alternative methods can help interpretation
  68. Natalie Prigozhina via The Cell Image Library http://www.cellimagelibrary.org/images/48108 Conclusion 6

  69. What did I do? Build a database of scRNA-seq analysis

    tools and a website to interact with it Develop a software package for simulating scRNA-seq data and a flexible simulation model Design an algorithm for visualising clustering at multiple resolutions and a software package that implements it Perform an analysis of kidney organoid data to profile the cell types present and demonstrate the effect of different tools and decisions
  70. What next? Bigger datasets, more computation Convergence on methods that

    work Continued development of software Reference datasets Integration of data types Spatial transcriptomics
  71. Acknowledgments Supervisors Alicia Oshlack Melissa Little Committee Andrew Pask Christine

    Wells Edmund Crampin Everyone that makes their tools and data available MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes COMBINE Friends and family Developers dplyr, ggplot2, scater, scran, Seurat, workflowr, rmarkdown, knitr, tidygraph, ggraph, edgeR...
  72. install.packages(“clustree”) Paper: 10.1093/gigascience/giy083 Paper: doi.org/10.1186/s13059-017-1305-0 biocLite(“splatter”) Paper: doi.org/10.1093/gigascience/giy083 www.scRNA-tools.org oshlacklab.com/combes-organoid-paper

    Paper: doi.org/10.1186/s13073-019-0615-0 @_lazappi_ oshlacklab.com github.com/lazappi