Slide 1

Slide 1 text

TOOLS AND TECHNIQUES FOR SINGLE-CELL RNA SEQUENCING DATA LUKE ZAPPIA

Slide 2

Slide 2 text

1 Introduction 2 Tools 3 Simulations 4 Clustering trees 5 Analysis 6 Conclusion 1 2 3 4 5 6

Slide 3

Slide 3 text

Introduction Matthew Daniels via The Cell Image Library http://www.cellimagelibrary.org/images/38912 1

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Sample 1 A 43 B 3 C 17 D 24 RNA sequencing

Slide 6

Slide 6 text

Alignment-free quantification Alignment Counting Aligned reads Raw reads Reference genome Gene annotation Reference transcriptome Expression matrix Normalisation Differential expression testing Gene set testing Visualisation Gene sets Interpretation Quality control

Slide 7

Slide 7 text

Svensson et al. DOI: 10.1038/nprot.2017.149

Slide 8

Slide 8 text

mccarrolllab.com/dropseq/ Droplet cell capture

Slide 9

Slide 9 text

scRNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Kidney development Images from OpenStax College, CC BY 3.0 via Wikimedia Commons

Slide 13

Slide 13 text

Kidney organoids Day 0 4 7 10 18 25 CHIR FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid

Slide 14

Slide 14 text

Aims 1. Understand the computational tools used to analyse scRNA-seq data 2. Contribute to tool development 3. Apply tools to a kidney organoid dataset

Slide 15

Slide 15 text

Tools Andres J Garcia and Ankur Singh via The Cell Image Library 2 http://www.cellimagelibrary.org/images/44701

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

www. .org

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Number of tools Publication status

Slide 20

Slide 20 text

Software licenses Platforms

Slide 21

Slide 21 text

Analysis categories

Slide 22

Slide 22 text

Users over time By country Top 10 By continent

Slide 23

Slide 23 text

“Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

Slide 24

Slide 24 text

Simulations 3 S. Schuller via The Cell Image Library http://www.cellimagelibrary.org/images/38903

Slide 25

Slide 25 text

Simulations Provide a truth to test against But - Often poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data

Slide 26

Slide 26 text

Bioconductor package Consistent, easy-to-use interface Multiple simulation models

Slide 27

Slide 27 text

Splat model Negative binomial Expression outliers Defined library sizes Mean-variance trend Dropout

Slide 28

Slide 28 text

Other models Simple - Negative binomial Lun - NB with cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113

Slide 29

Slide 29 text

Real data Parameters Dataset Estimation Simulation params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare

Slide 30

Slide 30 text

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression

Slide 31

Slide 31 text

ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size % Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data

Slide 32

Slide 32 text

Complex simulations Groups Batches Paths

Slide 33

Slide 33 text

“Splatter: simulation of single-cell RNA sequencing data” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

Slide 34

Slide 34 text

Clustering trees 4 http://www.cellimagelibrary.org/images/40483 M Uhlen et al. via The Cell Image Library

Slide 35

Slide 35 text

How many clusters? Low resolution (fewer clusters) High resolution (more clusters)

Slide 36

Slide 36 text

A tree of clusters?

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Weighting edges In proportion = Number of cells on edge Number of cells in higher res cluster

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Gene expression

Slide 43

Slide 43 text

Cell cycle SC3 stability Number of genes

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

t-SNE 2 t-SNE 1 t-SNE 1 t-SNE 2

Slide 46

Slide 46 text

“Clustering trees: a visualisation for evaluating clusterings at multiple resolutions” GigaScience (2018) DOI: gigascience/giy083

Slide 47

Slide 47 text

Analysis Natalie Prigozhina via The Cell Image Library http://www.cellimagelibrary.org/images/48101 5

Slide 48

Slide 48 text

GATA3 ECAD LTL WT1 CD + DT + PT + Glo

Slide 49

Slide 49 text

Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1) 7937 cells (6649 + 1288) Identify cell types Alignment Quantification Quality control Integration Clustering Gene detection CellRanger CellRanger scater Seurat Seurat Seurat Analysis steps

Slide 50

Slide 50 text

Stroma Endothelium Cell cycle Podocyte Epithelium

Slide 51

Slide 51 text

Glial Neural progenitor Muscle progenitor

Slide 52

Slide 52 text

Human dataset 16 week fetal kidney 3178 cells 10x Chromium Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

Slide 53

Slide 53 text

Stroma Fetal kidney Stroma Organoid Stroma Stroma Endothelium Cell cycle Nephron progenitor Podocyte Cell cycle Stroma Nephron Glial Immune Blood Neural progenitor Podocyte 1000 750 500 250 0 1250 1000 750 500 250 0 1250 Number of cells

Slide 54

Slide 54 text

“Single-cell analysis reveals congruence between kidney organoids and human fetal kidney” Genome Medicine (2019) DOI: 10.1186/s13073-019-0615-0

Slide 55

Slide 55 text

What if we did things differently?

Slide 56

Slide 56 text

Droplet selection ~ 1 million droplets

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

Quality control Manual thresholds PCA based

Slide 59

Slide 59 text

Gene selection Seurat

Slide 60

Slide 60 text

Gene selection M3Drop

Slide 61

Slide 61 text

Gene selection Overlap Seurat only M3Drop only Both

Slide 62

Slide 62 text

Comparison

Slide 63

Slide 63 text

Marker genes

Slide 64

Slide 64 text

Partition-based graph abstraction Cell graph PAGA cluster graph

Slide 65

Slide 65 text

Cell velocity AAAAAAA Unspliced RNA Mature mRNA

Slide 66

Slide 66 text

Podocyte Epithelial Endothelial Stroma Neural Muscle Immune?

Slide 67

Slide 67 text

Summary New droplet selection methods can give many more cells Seurat clustering is robust to gene selection Possible immune-like population Alternative methods can help interpretation

Slide 68

Slide 68 text

Natalie Prigozhina via The Cell Image Library http://www.cellimagelibrary.org/images/48108 Conclusion 6

Slide 69

Slide 69 text

What did I do? Build a database of scRNA-seq analysis tools and a website to interact with it Develop a software package for simulating scRNA-seq data and a flexible simulation model Design an algorithm for visualising clustering at multiple resolutions and a software package that implements it Perform an analysis of kidney organoid data to profile the cell types present and demonstrate the effect of different tools and decisions

Slide 70

Slide 70 text

What next? Bigger datasets, more computation Convergence on methods that work Continued development of software Reference datasets Integration of data types Spatial transcriptomics

Slide 71

Slide 71 text

Acknowledgments Supervisors Alicia Oshlack Melissa Little Committee Andrew Pask Christine Wells Edmund Crampin Everyone that makes their tools and data available MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes COMBINE Friends and family Developers dplyr, ggplot2, scater, scran, Seurat, workflowr, rmarkdown, knitr, tidygraph, ggraph, edgeR...

Slide 72

Slide 72 text

install.packages(“clustree”) Paper: 10.1093/gigascience/giy083 Paper: doi.org/10.1186/s13059-017-1305-0 biocLite(“splatter”) Paper: doi.org/10.1093/gigascience/giy083 www.scRNA-tools.org oshlacklab.com/combes-organoid-paper Paper: doi.org/10.1186/s13073-019-0615-0 @_lazappi_ oshlacklab.com github.com/lazappi