Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WEHI Bioinformatics Seminar

WEHI Bioinformatics Seminar

Single-cells, simulation and kidneys in a dish

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists wishing to investigate gene expression at greater resolution, particularly in areas such as development and differentiation. Single-cell data presents an array of bioinformatics challenges, data is sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. As scRNA-seq datasets have become available so have a plethora of analysis methods. Evaluation of these methods relies on having a truth to test against or a deep biological knowledge to interpret the results. Unfortunately current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. In this talk I will present Splatter, a Bioconductor package for simulating scRNA-seq data that is designed to address these issues. Splatter provides a consistent, easy to use interface to several previously published simulations allowing researchers to estimate parameters, produce synthetic datasets and compare how well they replicate real data. Splatter also includes Splat, our own simulation model. Based on a gamma-Poisson hierarchical model, Splat includes additional features often seen in scRNA-Seq data, such as dropout, and can be used to simulate complex experiments including multiple cell types, differentiation lineages and multiple batches. I will also discuss an analysis of a complex kidney organoid dataset, showing how more cells and different levels of clustering help to reveal greater biological insight.

Luke Zappia

July 17, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA

    ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0
  2. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0
  3. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 0 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0 Bad cell? Low expression? Cell type specific? Cell cycle? Dropout?
  4. Simulations Provide a truth to test against BUT - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  5. Simple - Negative binomial Lun - NB with cell factors

    Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics (2017). DOI: 10.1093/biostatistics/kxw055. Simulations scDD - NB with bimodality Korthauer KD, et al. Genome Biology (2016). DOI: 10.1186/s13059-016-1077-y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015). DOI: 10.1371/journal.pcbi.1004333.
  6. Using Splatter params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <-

    splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare
  7. Real data 3 HapMap individuals 3 plates each 200 random

    cells Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921 A1 A2 A3 A B1 B2 B3 B C1 C2 C3 C Tung et al. iPSCs, C1 capture
  8. Example evaluation Parameters - Estimated from Tung data Simulation -

    400 cells - 3 groups (60%, 25%, 15%) - 10% DE (~1700 genes) - 20 replicates Method - SC3 - k-means consensus clustering - Differential expression - Marker genes
  9. Simulation summary Simulations are a great tool But they should

    be: - Reusable - Reproducible - Realistic Splatter is our solution bioRxiv 10.1101/133173
  10. Organoids Day 0 4 7 10 18 25 CHIR FGF9

    FGF9 CHIR Form pellets No GF iPSCs organoid
  11. ?

  12. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Proximal Tubule Podocytes
  13. Acknowledgements Everyone that makes tools and data available Supervisors Alicia

    Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes