Single cells, simulation and kidneys in a dish

Single-cells, simulation and kidneys in a dish Luke Zappia MCRI
Bioinformatics @_lazappi_

Parkville Precinct MCRI

MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA
Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools

Bulk RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Sample 1 A
43 B 3 C 17 D 24

Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA
ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0

Moore’s Law Sevensson et al. arXiv 1704.01379, 2017

Unique Molecular Identiﬁers UMIs 5’ 3’ AAAA (PCR){BC}[UMI]TTTT 5 4
Aligned reads De-duplication and counting

Gene Cell 1 Cell 2 Cell 3 Cell 4 A
12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0

Gene Cell 1 Cell 2 Cell 3 Cell 4 A
12 0 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0 Bad cell? Low expression? Cell type speciﬁc? Cell cycle? Dropout?

www. .org

Simulation Biology Evaluation

Simulation

Simulations Provide a truth to test against BUT - Often
poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data

“Splatter: simulation of single-cell RNA sequencing data.” Genome Biology (2017)
DOI: 10.1186/s13059-017-1305-0

Splatter Bioconductor package Collection of simulation methods Consistent, easy to
use, interface Functions for comparison

Negative binomial

Splat Negative binomial Expression outliers Deﬁned library sizes Mean-variance trend
Dropout

Simple - Negative binomial Lun - NB with cell factors
Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics (2017). DOI: 10.1093/biostatistics/kxw055. Simulations scDD - NB with bimodality Korthauer KD, et al. Genome Biology (2016). DOI: 10.1186/s13059-016-1077-y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015). DOI: 10.1371/journal.pcbi.1004333.

Using Splatter params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <-
splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare

Real data 3 HapMap individuals 3 plates each 200 random
cells Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921 A1 A2 A3 A B1 B2 B3 B C1 C2 C3 C Tung et al. iPSCs, C1 capture

Means Difference in Means

Zeros per cell Difference in zeros

Mean-zeros Difference

Rank 1 8

Rank 1 8 Full-length Full-length

Complex simulations Groups Batches Paths

Example evaluation Parameters - Estimated from Tung data Simulation -
400 cells - 3 groups (60%, 25%, 15%) - 10% DE (~1700 genes) - 20 replicates Method - SC3 - k-means consensus clustering - Differential expression - Marker genes

Clustering Gene identiﬁcation

New in Splatter 1.2.0 SingleCellExperiment Batch effects Simulations - BASiCS
- mfa - PhenoPath - ZINB-WaVE Bioconductor 3.6

Simulation summary Simulations are a great tool But they should
be: - Reusable - Reproducible - Realistic Splatter is our solution Genome Biology 10.1186/s13059-017-1305-0

Biology

The kidney OpenStax College, CC BY 3.0 via Wikimedia Commons

Organoids Day 0 4 7 10 18 25 CHIR FGF9
FGF9 CHIR Form pellets No GF iPSCs organoid Takasato M et al. Nature. (2015) DOI: 10.1038/nature15695

GATA3 ECAD LTL WT1 CD + DT + PT +
Glo

Fluidigm experiment 4 organoids C1 capture Full-length No spike-ins

Analysis Alignment Quantiﬁcation Quality control Clustering Gene detection Interpretation STAR
featureCounts scater SC3 SC3 Biologists

Quality control Cells - Alignment - Quantiﬁcation - Expression 278
-> 155 Genes - Expression - Class 23388

Clustering

10x experiment 3 organoids Chromium capture UMI ~7000 cells

Analysis CellRanger CellRanger scater Seurat Seurat Biologists Alignment Quantiﬁcation Quality
control Clustering Gene detection Interpretation

Three clusters Vasculature Epithelium “Stroma”

Many clusters

Vasculature Proximal tubule Podocytes

Mesangium Renal stroma

Nephron? Neuronal?

Nodes Resolution (k) Cluster Size

Edges Cluster from (lower resolution) Cluster to (higher resolution) Number
Proportion

Proportions 100 60 40 k = 1 k = 2
p from = n / size low n = 60 n = 40 p to = n / size high

Clustering tree Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6
0.7 0.8 0.9 1.0

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 Vasculature

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 Proximal Tubule Podocytes

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 Mesangium

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 Renal stroma

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 Nephron/neuronal?

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0 ?

Summary Kidney organoids are complex More cells help Cluster relationships
can be useful Background knowledge is vital

Acknowledgements Everyone that makes tools and data available Supervisors Alicia
Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes

@_lazappi_ oshlacklab.com www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing
data.” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0 tinyurl.com/clust-tree-funcs “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv (2017) DOI: 10.1101/206573 bioconductor.org/packages/ splatter

Single cells, simulation and kidneys in a dish

Single cells, simulation and kidneys in a dish

More Decks by Luke Zappia

Other Decks in Science

Featured

Transcript