PhD Europe 2018

9d81fd2d95185ac557a4a6a1e2139657?s=47 Luke Zappia
September 27, 2018

PhD Europe 2018

The past few years have seen an explosion in the development of single-cell RNA-sequencing technology and it has quickly become a commonly used tool for interrogating complex tissues. Access to this new type of data has lead to a corresponding surge in the production of statistical and computational tools to analyse it. We are cataloguing in the scRNA-tools database (www.scRNA-tools.org). Deciding which of these tools to use for a specific task is difficult and comprehensive evaluations and comparisons are required. One way to demonstrate how well tools perform at their selected task is by testing them on simulated data. To make this easier we developed Splatter, a Bioconductor R package that provides a consistent, easy-to-use interface for multiple models for simulating scRNA-seq data (https://bioconductor.org/packages/splatter). Providing independent simulation software avoids relying on simulations that are not reproducible, match the tools assumptions and do not demonstrate similarity to real datasets .

Even the most effective methods usually have parameters that affect how they perform. For scRNA-seq data one of the analysis tasks that has received the most attention is defining groups of similar cells, usually through unsupervised clustering. Most clustering methods have parameters which, directly or indirectly, affect the number of clusters produced. The clustering resolution that is chosen can have a profound effect on further analysis and interpretation but it is unclear how to make this choice. To aid analysts in deciding which clustering resolution to use we have developed clustering trees, a visualisation that shows how clusters form and change as the resolution increases. These trees can be produced using the clustree R package (http://cran.r-project.org/package=clustree) and are applicable to any clustering method. Clustering trees highlight instability that may indicate over clustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. This presentation will demonstrate our methods and tools using an scRNA-seq dataset we have generated to explore the cell type composition of kidney organoids.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

September 27, 2018
Tweet

Transcript

  1. Tools, simulations and trees for scRNA-seq Luke Zappia @_lazappi_

  2. None
  3. Parkville precinct

  4. MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA

    Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools clustree Visualisation
  5. MCRI Kidney Development OpenStax College, CC BY 3.0 via Wikimedia

    Commons
  6. Kidney organoids Day 0 4 7 10 18 25 CHIR

    FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid
  7. 1 Tools 2 Simulations 3 Clustering trees 4 Analysis 1

    2 3 4
  8. Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1)

    7937 cells (6649 + 1288) Identify cell types
  9. Tools 1

  10. Svensson et al. DOI: 10.1038/nprot.2017.149

  11. None
  12. None
  13. None
  14. www. .org “Exploring the single-cell RNA-seq analysis landscape with the

    scRNA-tools database” PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245
  15. None
  16. None
  17. Simulations 2

  18. Simulations Provide a truth to test against BUT - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  19. “Splatter: simulation of single-cell RNA sequencing data” Genome Biology (2017)

    DOI: 10.1186/s13059-017-1305-0
  20. Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend

    Dropout
  21. Simulation models Simple - Negative binomial Lun - NB with

    cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113
  22. 1. Estimate 2. Simulate 3. Compare params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter
  23. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression
  24. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size %Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data
  25. CountSimQC DESeq2 dispersions Feature correlations Soneson et al. DOI: 10.1093/bioinformatics/btx631

  26. https://github.com/YosefLab/SymSim https://github.com/bvieth/powsimR

  27. Complex simulations Groups Batches Paths

  28. Clustering trees 3

  29. Clustering methods > 25% of all tools

  30. How many clusters?

  31. A tree of clusters?

  32. None
  33. None
  34. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”

    GigaScience (2018) DOI: doi.org/10.1093/gigascience/giy083
  35. Organoid data

  36. None
  37. NPHS1

  38. Analysis 4

  39. GATA3 ECAD LTL WT1 CD + DT + PT +

    Glo
  40. Alignment Quantification Quality control Integration Clustering Gene detection Ordering CellRanger

    CellRanger scater Seurat Seurat Seurat Monocle Analysis steps
  41. Stroma Endothelium Cell cycle Podocyte Epithelium

  42. Podocyte Early podocyte Early proximal Early distal Progenitor

  43. Progenitor Early tubule Podocyte

  44. Human dataset 16 week fetal kidney 3178 cells 10x Chromium

    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890
  45. Stroma Endothelium Cell cycle Podocyte Nephron Progenitor Immune

  46. Fetal kidney Organoid

  47. Podocyte (human only) Early podocyte Proximal Distal Progenitor Stroma

  48. Podocyte Early proximal Early distal Early podocyte Progenitor Diff. progenitor

    Human pod. Stroma Fetal kidney Organoid
  49. install.packages(“clustree”) Paper doi.org/10.1093/gigascience/giy083 @_lazappi_ oshlacklab.com github.com/lazappi biocLite(“splatter”) Paper doi.org/10.1186/s13059-017-1305-0 www.scRNA-tools.org

    Paper doi.org/10.1093/gigascience/giy083
  50. install.packages(“clustree”) Paper doi.org/10.1093/gigascience/giy083 la