Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tools and techniques for single-cell RNA sequencing data

Luke Zappia
March 22, 2019

Tools and techniques for single-cell RNA sequencing data

RNA sequencing of individual cells allows us to take a snapshot of the dynamic processes within a cell and explore the differences between cell types. As this technology has developed over the last few years it has been rapidly adopted by researchers in areas such as developmental biology. Along with the development of protocols for producing this data has been a simultaneous burst in the development of computational methods for analysing it. My thesis explores the computational tools and techniques for analysing single-cell RNA-sequencing data. I will present a database that charts the release of analysis software (https://scrna-tools.org), Splatter, a software package for easily simulating single-cell datasets from multiple models (http://bioconductor.org/packages/splatter/) and clustering trees, a visualisation approach for inspecting clustering results at multiple resolutions (https://CRAN.R-project.org/package=clustree). In the final part of my thesis, I use an analysis of a kidney organoid dataset to demonstrate and compare some of the current analysis methods.

Luke Zappia

March 22, 2019
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Alignment-free quantification Alignment Counting Aligned reads Raw reads Reference genome

    Gene annotation Reference transcriptome Expression matrix Normalisation Differential expression testing Gene set testing Visualisation Gene sets Interpretation Quality control
  2. scRNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA

    TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  3. Kidney organoids Day 0 4 7 10 18 25 CHIR

    FGF9 FGF9 CHIR Form pellets No GF iPSCs organoid
  4. Aims 1. Understand the computational tools used to analyse scRNA-seq

    data 2. Contribute to tool development 3. Apply tools to a kidney organoid dataset
  5. Tools Andres J Garcia and Ankur Singh via The Cell

    Image Library 2 http://www.cellimagelibrary.org/images/44701
  6. “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”

    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245
  7. Simulations Provide a truth to test against But - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  8. Other models Simple - Negative binomial Lun - NB with

    cell factors DOI: 10.1186/s13059-016-0947-7 Lun2 - Sampled NB with batch effects DOI: 10.1093/biostatistics/kxw055 scDD - NB with bimodality DOI: 10.1186/s13059-016-1077-y BASiCS - NB with spike-ins DOI: 10.1371/journal.pcbi.1004333 mfa - Bifurcating pseudotime trajectory DOI: 10.12688/wellcomeopenres.11087.1 PhenoPath - Pseudotime with gene types DOI: 10.1038/s41467-018-04696-6 ZINB-WaVE - Sophisticated ZINB DOI: 10.1186/s13059-018-1406-4 SparseDC - Clusters across two conditions DOI: 10.1093/nar/gkx1113
  9. Real data Parameters Dataset Estimation Simulation params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare
  10. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Real Mean log 2 (CPM + 1) Distribution of mean expression ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun Simple Splat (Drop) Splat Rank Difference Mean log 2 (CPM + 1) Difference in mean expression
  11. ZINB-WaVE SparseDC PhenoPath mfa BASiCS scDD Lun2 (ZINB) Lun2 Lun

    Simple Splat (Drop) Splat Mean Variance Mean-Variance Library size % Zeros (Cell) % Zeros (Gene) Mean-Zeros Rank of MAD from real data
  12. Weighting edges In proportion = Number of cells on edge

    Number of cells in higher res cluster
  13. Dataset 4 Organoids 10x Chromium 2 Batches (3 + 1)

    7937 cells (6649 + 1288) Identify cell types Alignment Quantification Quality control Integration Clustering Gene detection CellRanger CellRanger scater Seurat Seurat Seurat Analysis steps
  14. Human dataset 16 week fetal kidney 3178 cells 10x Chromium

    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney” J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890
  15. Stroma Fetal kidney Stroma Organoid Stroma Stroma Endothelium Cell cycle

    Nephron progenitor Podocyte Cell cycle Stroma Nephron Glial Immune Blood Neural progenitor Podocyte 1000 750 500 250 0 1250 1000 750 500 250 0 1250 Number of cells
  16. “Single-cell analysis reveals congruence between kidney organoids and human fetal

    kidney” Genome Medicine (2019) DOI: 10.1186/s13073-019-0615-0
  17. Summary New droplet selection methods can give many more cells

    Seurat clustering is robust to gene selection Possible immune-like population Alternative methods can help interpretation
  18. What did I do? Build a database of scRNA-seq analysis

    tools and a website to interact with it Develop a software package for simulating scRNA-seq data and a flexible simulation model Design an algorithm for visualising clustering at multiple resolutions and a software package that implements it Perform an analysis of kidney organoid data to profile the cell types present and demonstrate the effect of different tools and decisions
  19. What next? Bigger datasets, more computation Convergence on methods that

    work Continued development of software Reference datasets Integration of data types Spatial transcriptomics
  20. Acknowledgments Supervisors Alicia Oshlack Melissa Little Committee Andrew Pask Christine

    Wells Edmund Crampin Everyone that makes their tools and data available MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes COMBINE Friends and family Developers dplyr, ggplot2, scater, scran, Seurat, workflowr, rmarkdown, knitr, tidygraph, ggraph, edgeR...