Tools, simulations and trees for scRNA-seq Luke Zappia @_lazappi_

Parkville precinct

MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools clustree Visualisation

MCRI Kidney Development OpenStax College, CC BY 3.0 via Wikimedia Commons

Kidney organoids Day 0 4 7 10 18 25 CHIR FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid

1 Tools 2 Simulations 3 Clustering trees 4 Analysis 1 2 3 4

Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1) 7937 cells (6649 + 1288) Identify cell types

Tools 1

Svensson et al. DOI: 10.1038/nprot.2017.149

www. .org “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

Simulations 2

Simulations Provide a truth to test against BUT - Often poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data

“Splatter: simulation of single-cell RNA sequencing data” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend Dropout

Simulation models Simple - Negative binomial Lun - NB with cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113

1. Estimate 2. Simulate 3. Compare params1 <- splatEstimate( params2 <- simpleEstimate( sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real =, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size %Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data

CountSimQC DESeq2 dispersions Feature correlations Soneson et al. DOI: 10.1093/bioinformatics/btx631

Complex simulations Groups Batches Paths

Clustering trees 3

Clustering methods > 25% of all tools

How many clusters?

A tree of clusters?

“Clustering trees: a visualisation for evaluating clusterings at multiple resolutions” GigaScience (2018) DOI:

Organoid data

Analysis 4

Alignment Quantification Quality control Integration Clustering Gene detection Ordering CellRanger CellRanger scater Seurat Seurat Seurat Monocle Analysis steps

Stroma Endothelium Cell cycle Podocyte Epithelium

Podocyte Early podocyte Early proximal Early distal Progenitor

Progenitor Early tubule Podocyte

Human dataset 16 week fetal kidney 3178 cells 10x Chromium Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

Stroma Endothelium Cell cycle Podocyte Nephron Progenitor Immune

Fetal kidney Organoid

Podocyte (human only) Early podocyte Proximal Distal Progenitor Stroma

Podocyte Early proximal Early distal Early podocyte Progenitor Diff. progenitor Human pod. Stroma Fetal kidney Organoid

install.packages(“clustree”) Paper @_lazappi_ biocLite(“splatter”) Paper Paper

