Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PhD Europe 2018

Luke Zappia
September 27, 2018

PhD Europe 2018

The past few years have seen an explosion in the development of single-cell RNA-sequencing technology and it has quickly become a commonly used tool for interrogating complex tissues. Access to this new type of data has lead to a corresponding surge in the production of statistical and computational tools to analyse it. We are cataloguing in the scRNA-tools database (www.scRNA-tools.org). Deciding which of these tools to use for a specific task is difficult and comprehensive evaluations and comparisons are required. One way to demonstrate how well tools perform at their selected task is by testing them on simulated data. To make this easier we developed Splatter, a Bioconductor R package that provides a consistent, easy-to-use interface for multiple models for simulating scRNA-seq data (https://bioconductor.org/packages/splatter). Providing independent simulation software avoids relying on simulations that are not reproducible, match the tools assumptions and do not demonstrate similarity to real datasets .

Even the most effective methods usually have parameters that affect how they perform. For scRNA-seq data one of the analysis tasks that has received the most attention is defining groups of similar cells, usually through unsupervised clustering. Most clustering methods have parameters which, directly or indirectly, affect the number of clusters produced. The clustering resolution that is chosen can have a profound effect on further analysis and interpretation but it is unclear how to make this choice. To aid analysts in deciding which clustering resolution to use we have developed clustering trees, a visualisation that shows how clusters form and change as the resolution increases. These trees can be produced using the clustree R package (http://cran.r-project.org/package=clustree) and are applicable to any clustering method. Clustering trees highlight instability that may indicate over clustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. This presentation will demonstrate our methods and tools using an scRNA-seq dataset we have generated to explore the cell type composition of kidney organoids.

Luke Zappia

September 27, 2018
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA

    Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools clustree Visualisation
  2. Kidney organoids Day 0 4 7 10 18 25 CHIR

    FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid
  3. Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1)

    7937 cells (6649 + 1288) Identify cell types
  4. www. .org “Exploring the single-cell RNA-seq analysis landscape with the

    scRNA-tools database” PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245
  5. Simulations Provide a truth to test against BUT - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  6. Simulation models Simple - Negative binomial Lun - NB with

    cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113
  7. 1. Estimate 2. Simulate 3. Compare params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter
  8. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression
  9. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size %Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data
  10. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”

    GigaScience (2018) DOI: doi.org/10.1093/gigascience/giy083
  11. Human dataset 16 week fetal kidney 3178 cells 10x Chromium

    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890