Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gi2017: Simulation and analysis tools for single-cell RNA sequencing data

9d81fd2d95185ac557a4a6a1e2139657?s=47 Luke Zappia
November 01, 2017

gi2017: Simulation and analysis tools for single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists who wish to investigate gene expression. In contrast to traditional bulk RNA-seq experiments, which measure expression averaged across millions of cells, single-cell experiments can be used to observe how genes are expressed in individual cells. Along with the dramatic increase in resolution provided by scRNA-seq comes an array of bioinformatics challenges. Single-cell data is relatively sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. Researchers have risen to address these challenges and there are currently more than 125 software tools available for analysing scRNA-seq data. We have catalogued these software tools in the scRNA-tools database (www.scRNA-tools.org). Analysis of this database shows that there are now methods available for a wide range of tasks, from pre-processing unique molecular identifiers to detecting allele-specific expression. However, the biggest areas of development have been in clustering cells to identify cell types and ordering of cells to understand dynamic processes. We also find that the R statistical programming language is the most popular platform for scRNA-seq analysis tools, followed by Python, and that the majority of tools have been described in peer-reviewed papers or preprints and are available under open-source software licenses.

With the ever increasing number of analysis methods available it is important to be able to assess and compare the performance, quality and limitations of an analysis tool. This is often done, at least in part, by testing methods on simulated datasets where the true answers are known. Unfortunately, current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. To address these concerns we have developed Splatter, a Bioconductor R package for reproducible simulation of scRNA-seq datasets. Splatter is a simulation framework that currently includes four previously published simulation models, allowing users to estimate parameters from real data in order to easily generate realistic synthetic scRNA-seq datasets. Here we discuss some of the challenges of simulating scRNA-seq data and present a comparison of the simulation methods available in Splatter (bioconductor.org/packages/splatter). As part of Splatter we also introduce our own simulation model, Splat, capable of reproducing scRNA-seq datasets with multiple groups of cells, differentiation paths or batch effects.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

November 01, 2017
Tweet

Transcript

  1. Simulation and analysis tools for single-cell RNA sequencing data Luke

    Zappia @_lazappi_ #gi2017
  2. Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA

    ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell Cell Cell Cell A B C D
  3. Svensson et al. arXiv . ,

  4. Gene Cell Cell Cell Cell A B C D Bad

    cell? Low expression? Cell type specific? Cell cycle? Dropout?
  5. None
  6. None
  7. None
  8. www. .org

  9. None
  10. None
  11. Provide a truth to test against BUT - Often poorly

    documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data Simulations
  12. “Splatter: simulation of single-cell RNA sequencing data.” Genome Biology (

    ) DOI: . /s - - -
  13. Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend

    Dropout
  14. Simple - Negative binomial Lun - NB with cell factors

    Lun ATL, Bach K, Marioni JC. Genome Biology ( ). DOI: . /s - - - . Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics ( ). DOI: . /biostatistics/kxw . scDD - NB with bimodality Korthauer KD, et al. Genome Biology ( ). DOI: . /s - - -y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. ( ). DOI: . /journal.pcbi. . Simulations
  15. . Estimate . Simulate . Compare params1 <- splatEstimate(real.data) params2

    <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter
  16. Real data HapMap individuals plates each random cells Tung P-Y

    et al. Sci. Rep. ( ) DOI: . /srep A A A A B B B B C C C C Tung et al. iPSCs, C capture
  17. Means Difference in Means

  18. Zeros per cell Difference in zeros

  19. Mean-zeros Difference

  20. Rank

  21. Rank Full-length Full-length

  22. Complex simulations Groups Batches Paths

  23. SingleCellExperiment Batch effects Simulations - BASiCS - mfa - PhenoPath

    - ZINB-WaVE New in Splatter . . Bioconductor 3.6
  24. Many tools for scRNA-seq analysis Catalogued in the scRNA-tools database

    Can be tested using synthetic datasets Splatter is our package for simulating scRNA-seq data Summary
  25. @_lazappi_ oshlacklab.com Supervisors Alicia Oshlack Melissa Little MCRI Bioinformatics Belinda

    Phipson Breon Schmidt Everyone that makes tools and data available www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing data.” Genome Biology ( ) DOI: . /s - - - “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv ( ) DOI: . / bioconductor.org/packages/ splatter