$30 off During Our Annual Pro Sale. View Details »

gi2017: Simulation and analysis tools for single-cell RNA sequencing data

Luke Zappia
November 01, 2017

gi2017: Simulation and analysis tools for single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists who wish to investigate gene expression. In contrast to traditional bulk RNA-seq experiments, which measure expression averaged across millions of cells, single-cell experiments can be used to observe how genes are expressed in individual cells. Along with the dramatic increase in resolution provided by scRNA-seq comes an array of bioinformatics challenges. Single-cell data is relatively sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. Researchers have risen to address these challenges and there are currently more than 125 software tools available for analysing scRNA-seq data. We have catalogued these software tools in the scRNA-tools database (www.scRNA-tools.org). Analysis of this database shows that there are now methods available for a wide range of tasks, from pre-processing unique molecular identifiers to detecting allele-specific expression. However, the biggest areas of development have been in clustering cells to identify cell types and ordering of cells to understand dynamic processes. We also find that the R statistical programming language is the most popular platform for scRNA-seq analysis tools, followed by Python, and that the majority of tools have been described in peer-reviewed papers or preprints and are available under open-source software licenses.

With the ever increasing number of analysis methods available it is important to be able to assess and compare the performance, quality and limitations of an analysis tool. This is often done, at least in part, by testing methods on simulated datasets where the true answers are known. Unfortunately, current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. To address these concerns we have developed Splatter, a Bioconductor R package for reproducible simulation of scRNA-seq datasets. Splatter is a simulation framework that currently includes four previously published simulation models, allowing users to estimate parameters from real data in order to easily generate realistic synthetic scRNA-seq datasets. Here we discuss some of the challenges of simulating scRNA-seq data and present a comparison of the simulation methods available in Splatter (bioconductor.org/packages/splatter). As part of Splatter we also introduce our own simulation model, Splat, capable of reproducing scRNA-seq datasets with multiple groups of cells, differentiation paths or batch effects.

Luke Zappia

November 01, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Simulation and analysis
    tools for single-cell RNA
    sequencing data
    Luke Zappia
    @_lazappi_
    #gi2017

    View Slide

  2. Single-cell RNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Cell Cell Cell Cell
    A
    B
    C
    D

    View Slide

  3. Svensson et al. arXiv . ,

    View Slide

  4. Gene Cell Cell Cell Cell
    A
    B
    C
    D
    Bad cell?
    Low expression?
    Cell type specific?
    Cell cycle?
    Dropout?

    View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. www. .org

    View Slide

  9. View Slide

  10. View Slide

  11. Provide a truth to test against
    BUT
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data
    Simulations

    View Slide

  12. “Splatter: simulation of single-cell RNA sequencing data.”
    Genome Biology ( ) DOI: . /s - - -

    View Slide

  13. Splat
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View Slide

  14. Simple - Negative binomial
    Lun - NB with cell factors
    Lun ATL, Bach K, Marioni JC. Genome Biology ( ).
    DOI: . /s - - - .
    Lun 2 - Sampled NB with batch effects
    Lun ATL, Marioni JC. Biostatistics ( ).
    DOI: . /biostatistics/kxw .
    scDD - NB with bimodality
    Korthauer KD, et al. Genome Biology ( ).
    DOI: . /s - - -y.
    BASiCS - NB with spike-ins
    Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. ( ).
    DOI: . /journal.pcbi. .
    Simulations

    View Slide

  15. . Estimate
    . Simulate
    . Compare
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    Using Splatter

    View Slide

  16. Real data
    HapMap individuals
    plates each
    random cells
    Tung P-Y et al. Sci. Rep. ( ) DOI: . /srep
    A A A
    A
    B B B
    B
    C C C
    C
    Tung et al. iPSCs, C capture

    View Slide

  17. Means Difference in Means

    View Slide

  18. Zeros per cell Difference in zeros

    View Slide

  19. Mean-zeros Difference

    View Slide

  20. Rank

    View Slide

  21. Rank
    Full-length
    Full-length

    View Slide

  22. Complex simulations
    Groups Batches Paths

    View Slide

  23. SingleCellExperiment
    Batch effects
    Simulations
    - BASiCS
    - mfa
    - PhenoPath
    - ZINB-WaVE
    New in Splatter . .
    Bioconductor 3.6

    View Slide

  24. Many tools for scRNA-seq analysis
    Catalogued in the scRNA-tools database
    Can be tested using synthetic datasets
    Splatter is our package for simulating scRNA-seq data
    Summary

    View Slide

  25. @_lazappi_
    oshlacklab.com
    Supervisors
    Alicia Oshlack
    Melissa Little
    MCRI Bioinformatics
    Belinda Phipson
    Breon Schmidt
    Everyone that makes tools and data available
    www.scRNA-tools.org
    @scRNAtools
    “Splatter: simulation of
    single-cell RNA sequencing
    data.”
    Genome Biology ( )
    DOI:
    . /s - - -
    “Exploring the single-cell
    RNA-seq analysis landscape
    with the scRNA-tools database”
    bioRxiv ( )
    DOI: . /
    bioconductor.org/packages/
    splatter

    View Slide