Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Single cells, simulation and kidneys in a dish

Luke Zappia
October 27, 2017

Single cells, simulation and kidneys in a dish

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists wishing to investigate gene expression at greater resolution, particularly in areas such as development and differentiation. Single-cell data presents an array of bioinformatics challenges, data is sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. As scRNA-seq datasets have become available so have a plethora of analysis methods. We have catalogued software tools that implement these methods in the scRNA-tools database (www.scRNA-tools.org). Evaluation of analysis methods relies on having a truth to test against or deep biological knowledge to interpret the results. Unfortunately current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. In this talk I will present Splatter, a Bioconductor package for simulating scRNA-seq data that is designed to address these issues. Splatter provides a consistent, easy to use interface to several previously published simulations allowing researchers to estimate parameters, produce synthetic datasets and compare how well they replicate real data. Splatter also includes Splat, our own simulation model. Based on a gamma-Poisson hierarchical model, Splat includes additional features often seen in scRNA-Seq data, such as dropout, and can be used to simulate complex experiments including multiple cell types, differentiation lineages and multiple batches. I will also briefly discuss an analysis of a complex kidney organoid dataset, showing how more cells and different levels of clustering help to reveal greater biological insight.

Luke Zappia

October 27, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA

    Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools
  2. Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA

    ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  3. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  4. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 0 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0 Bad cell? Low expression? Cell type specific? Cell cycle? Dropout?
  5. Simulations Provide a truth to test against BUT - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  6. Simple - Negative binomial Lun - NB with cell factors

    Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics (2017). DOI: 10.1093/biostatistics/kxw055. Simulations scDD - NB with bimodality Korthauer KD, et al. Genome Biology (2016). DOI: 10.1186/s13059-016-1077-y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015). DOI: 10.1371/journal.pcbi.1004333.
  7. Using Splatter params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <-

    splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare
  8. Real data 3 HapMap individuals 3 plates each 200 random

    cells Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921 A1 A2 A3 A B1 B2 B3 B C1 C2 C3 C Tung et al. iPSCs, C1 capture
  9. Example evaluation Parameters - Estimated from Tung data Simulation -

    400 cells - 3 groups (60%, 25%, 15%) - 10% DE (~1700 genes) - 20 replicates Method - SC3 - k-means consensus clustering - Differential expression - Marker genes
  10. Simulation summary Simulations are a great tool But they should

    be: - Reusable - Reproducible - Realistic Splatter is our solution Genome Biology 10.1186/s13059-017-1305-0
  11. Organoids Day 0 4 7 10 18 25 CHIR FGF9

    FGF9 CHIR Form pellets No GF iPSCs organoid Takasato M et al. Nature. (2015) DOI: 10.1038/nature15695
  12. ?

  13. Proportions 100 60 40 k = 1 k = 2

    p from = n / size low n = 60 n = 40 p to = n / size high
  14. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Proximal Tubule Podocytes
  15. Acknowledgements Everyone that makes tools and data available Supervisors Alicia

    Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes
  16. @_lazappi_ oshlacklab.com www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing

    data.” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0 tinyurl.com/clust-tree-funcs “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv (2017) DOI: 10.1101/206573 bioconductor.org/packages/ splatter