Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BiocAsia 2017

Luke Zappia
November 17, 2017

BiocAsia 2017

Single-cell RNA sequencing (scRNA-seq) has opened up a range of opportunities for investigating the transcriptome, but with the dramatic increase in resolution comes an array of bioinformatics challenges. Single-cell data is relatively sparse (for both biological and technical reasons), quality control is difficult and it is unclear if methods designed for bulk RNA-seq are appropriate for scRNA-seq data. Researchers have risen to address these challenges and there are now more than 140 scRNA-seq analysis tools available. However, with so many tools available researchers are faced with the difficult task of choosing which to use, making it important to be able to assess and compare the performance, quality and limitations of each tool. One common approach is to test methods on simulated datasets where the true answers are known. To aid this process we have developed Splatter, a Bioconductor R package for reproducible simulation of scRNA-seq datasets
(bioconductor.org/packages/splatter).

Splatter is a simulation framework that provides access to a variety of simulation models, allowing users to estimate parameters from real data in order to easily generate realistic synthetic scRNA-seq datasets. As part of Splatter we also introduce our own simulation model, Splat, capable of reproducing scRNA-seq datasets with multiple groups of cells, differentiation paths or batch effects. Here we will discuss some how Splatter can be used to develop and compare analysis tools. We will also touch on our experience developing Splatter, some of the design choices we made and how we have integrated other Bioconductor packages such as the SingleCellExperiment class.

Luke Zappia

November 17, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA

    ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  2. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 0 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0 Bad cell? Low expression? Cell type specific? Cell cycle? Dropout?
  3. Provide a truth to test against BUT - Often poorly

    documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data Simulations
  4. Checkmate fact <- function(n, method = "stirling") { if (length(n)

    != 1) stop("Argument 'n' must have length 1") if (!is.numeric(n)) stop("Argument 'n' must be numeric") if (is.na(n)) stop("Argument 'n' may not be NA") if (is.double(n)) { if (is.nan(n)) stop("Argument 'n' may not be NaN") if (is.infinite(n)) stop("Argument 'n' must be finite") if (abs(n - round(n, 0)) > sqrt(.Machine$double.eps)) stop("Argument 'n' must be an integerish value") n <- as.integer(n) } fact <- function(n, method = "stirling") { assertCount(n) assertChoice(method, c("stirling", "factorial")) if (method == "factorial") factorial(n) else sqrt(2 * pi * n) * (n / exp(1))^n }
  5. Simple - Negative binomial Lun - NB with cell factors

    Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics (2017). DOI: 10.1093/biostatistics/kxw055. scDD - NB with bimodality Korthauer KD, et al. Genome Biology (2016). DOI: 10.1186/s13059-016-1077-y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015). DOI: 10.1371/journal.pcbi.1004333. Simulations
  6. 1. Estimate 2. Simulate 3. Compare params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter
  7. SingleCellExperiment Batch effects Simulations - BASiCS - mfa - PhenoPath

    - ZINB-WaVE New in Splatter 1.2.0 Bioconductor 3.6
  8. Many tools for scRNA-seq analysis Catalogued in the scRNA-tools database

    Can be tested using synthetic datasets Splatter is our package for simulating scRNA-seq data Making a package is not as hard as you think Summary
  9. @_lazappi_ oshlacklab.com Supervisors Alicia Oshlack Melissa Little MCRI Bioinformatics Belinda

    Phipson Breon Schmidt Everyone that makes tools and data available www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing data.” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0 “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv (2017) DOI: 10.1101/206573 bioconductor.org/packages/ splatter
  10. Real data 3 HapMap individuals 3 plates each 200 random

    cells Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921 A1 A2 A3 A B1 B2 B3 B C1 C2 C3 C Tung et al. iPSCs, C1 capture