Slide 1

Slide 1 text

Tools, simulations and trees for scRNA-seq Luke Zappia @_lazappi_

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Parkville precinct

Slide 4

Slide 4 text

MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools clustree Visualisation

Slide 5

Slide 5 text

MCRI Kidney Development OpenStax College, CC BY 3.0 via Wikimedia Commons

Slide 6

Slide 6 text

Kidney organoids Day 0 4 7 10 18 25 CHIR FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid

Slide 7

Slide 7 text

1 Tools 2 Simulations 3 Clustering trees 4 Analysis 1 2 3 4

Slide 8

Slide 8 text

Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1) 7937 cells (6649 + 1288) Identify cell types

Slide 9

Slide 9 text

Tools 1

Slide 10

Slide 10 text

Svensson et al. DOI: 10.1038/nprot.2017.149

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

www. .org “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Simulations 2

Slide 18

Slide 18 text

Simulations Provide a truth to test against BUT - Often poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data

Slide 19

Slide 19 text

“Splatter: simulation of single-cell RNA sequencing data” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

Slide 20

Slide 20 text

Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend Dropout

Slide 21

Slide 21 text

Simulation models Simple - Negative binomial Lun - NB with cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113

Slide 22

Slide 22 text

1. Estimate 2. Simulate 3. Compare params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter

Slide 23

Slide 23 text

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression

Slide 24

Slide 24 text

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size %Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data

Slide 25

Slide 25 text

CountSimQC DESeq2 dispersions Feature correlations Soneson et al. DOI: 10.1093/bioinformatics/btx631

Slide 26

Slide 26 text

https://github.com/YosefLab/SymSim https://github.com/bvieth/powsimR

Slide 27

Slide 27 text

Complex simulations Groups Batches Paths

Slide 28

Slide 28 text

Clustering trees 3

Slide 29

Slide 29 text

Clustering methods > 25% of all tools

Slide 30

Slide 30 text

How many clusters?

Slide 31

Slide 31 text

A tree of clusters?

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

“Clustering trees: a visualisation for evaluating clusterings at multiple resolutions” GigaScience (2018) DOI: doi.org/10.1093/gigascience/giy083

Slide 35

Slide 35 text

Organoid data

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

NPHS1

Slide 38

Slide 38 text

Analysis 4

Slide 39

Slide 39 text

GATA3 ECAD LTL WT1 CD + DT + PT + Glo

Slide 40

Slide 40 text

Alignment Quantification Quality control Integration Clustering Gene detection Ordering CellRanger CellRanger scater Seurat Seurat Seurat Monocle Analysis steps

Slide 41

Slide 41 text

Stroma Endothelium Cell cycle Podocyte Epithelium

Slide 42

Slide 42 text

Podocyte Early podocyte Early proximal Early distal Progenitor

Slide 43

Slide 43 text

Progenitor Early tubule Podocyte

Slide 44

Slide 44 text

Human dataset 16 week fetal kidney 3178 cells 10x Chromium Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

Slide 45

Slide 45 text

Stroma Endothelium Cell cycle Podocyte Nephron Progenitor Immune

Slide 46

Slide 46 text

Fetal kidney Organoid

Slide 47

Slide 47 text

Podocyte (human only) Early podocyte Proximal Distal Progenitor Stroma

Slide 48

Slide 48 text

Podocyte Early proximal Early distal Early podocyte Progenitor Diff. progenitor Human pod. Stroma Fetal kidney Organoid

Slide 49

Slide 49 text

install.packages(“clustree”) Paper doi.org/10.1093/gigascience/giy083 @_lazappi_ oshlacklab.com github.com/lazappi biocLite(“splatter”) Paper doi.org/10.1186/s13059-017-1305-0 www.scRNA-tools.org Paper doi.org/10.1093/gigascience/giy083

Slide 50

Slide 50 text

install.packages(“clustree”) Paper doi.org/10.1093/gigascience/giy083 la