Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Single cells, simulation and kidneys in a dish

9d81fd2d95185ac557a4a6a1e2139657?s=47 Luke Zappia
October 27, 2017

Single cells, simulation and kidneys in a dish

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists wishing to investigate gene expression at greater resolution, particularly in areas such as development and differentiation. Single-cell data presents an array of bioinformatics challenges, data is sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. As scRNA-seq datasets have become available so have a plethora of analysis methods. We have catalogued software tools that implement these methods in the scRNA-tools database (www.scRNA-tools.org). Evaluation of analysis methods relies on having a truth to test against or deep biological knowledge to interpret the results. Unfortunately current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. In this talk I will present Splatter, a Bioconductor package for simulating scRNA-seq data that is designed to address these issues. Splatter provides a consistent, easy to use interface to several previously published simulations allowing researchers to estimate parameters, produce synthetic datasets and compare how well they replicate real data. Splatter also includes Splat, our own simulation model. Based on a gamma-Poisson hierarchical model, Splat includes additional features often seen in scRNA-Seq data, such as dropout, and can be used to simulate complex experiments including multiple cell types, differentiation lineages and multiple batches. I will also briefly discuss an analysis of a complex kidney organoid dataset, showing how more cells and different levels of clustering help to reveal greater biological insight.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

October 27, 2017
Tweet

Transcript

  1. Single-cells, simulation and kidneys in a dish Luke Zappia MCRI

    Bioinformatics @_lazappi_
  2. None
  3. Parkville Precinct MCRI

  4. MCRI Bioinformatics bpipe Corset Lace Necklace GOseq Splatter clinker JAFFA

    Cpipe Ximmer Schism missMethyl STRetch Structural Clinical STRs Single-cell Pipelines Gene sets Fusions Assembly superTranscripts Methylation scRNA-tools
  5. Bulk RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Sample 1 A

    43 B 3 C 17 D 24
  6. Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA

    ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  7. Moore’s Law Sevensson et al. arXiv 1704.01379, 2017

  8. None
  9. Unique Molecular Identifiers UMIs 5’ 3’ AAAA (PCR){BC}[UMI]TTTT 5 4

    Aligned reads De-duplication and counting
  10. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0
  11. Gene Cell 1 Cell 2 Cell 3 Cell 4 A

    12 0 10 9 0 B 0 0 0 1 C 9 6 0 0 D 7 0 4 0 Bad cell? Low expression? Cell type specific? Cell cycle? Dropout?
  12. None
  13. None
  14. None
  15. www. .org

  16. None
  17. None
  18. None
  19. Simulation Biology Evaluation

  20. Simulation

  21. Simulations Provide a truth to test against BUT - Often

    poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data
  22. “Splatter: simulation of single-cell RNA sequencing data.” Genome Biology (2017)

    DOI: 10.1186/s13059-017-1305-0
  23. Splatter Bioconductor package Collection of simulation methods Consistent, easy to

    use, interface Functions for comparison
  24. Negative binomial

  25. Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend

    Dropout
  26. Simple - Negative binomial Lun - NB with cell factors

    Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics (2017). DOI: 10.1093/biostatistics/kxw055. Simulations scDD - NB with bimodality Korthauer KD, et al. Genome Biology (2016). DOI: 10.1186/s13059-016-1077-y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015). DOI: 10.1371/journal.pcbi.1004333.
  27. Using Splatter params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <-

    splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) 1. Estimate 2. Simulate 3. Compare
  28. Real data 3 HapMap individuals 3 plates each 200 random

    cells Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921 A1 A2 A3 A B1 B2 B3 B C1 C2 C3 C Tung et al. iPSCs, C1 capture
  29. Means Difference in Means

  30. Zeros per cell Difference in zeros

  31. Mean-zeros Difference

  32. Rank 1 8

  33. Rank 1 8 Full-length Full-length

  34. Complex simulations Groups Batches Paths

  35. Example evaluation Parameters - Estimated from Tung data Simulation -

    400 cells - 3 groups (60%, 25%, 15%) - 10% DE (~1700 genes) - 20 replicates Method - SC3 - k-means consensus clustering - Differential expression - Marker genes
  36. Clustering Gene identification

  37. New in Splatter 1.2.0 SingleCellExperiment Batch effects Simulations - BASiCS

    - mfa - PhenoPath - ZINB-WaVE Bioconductor 3.6
  38. Simulation summary Simulations are a great tool But they should

    be: - Reusable - Reproducible - Realistic Splatter is our solution Genome Biology 10.1186/s13059-017-1305-0
  39. Biology

  40. The kidney OpenStax College, CC BY 3.0 via Wikimedia Commons

  41. Organoids Day 0 4 7 10 18 25 CHIR FGF9

    FGF9 CHIR Form pellets No GF iPSCs organoid Takasato M et al. Nature. (2015) DOI: 10.1038/nature15695
  42. GATA3 ECAD LTL WT1 CD + DT + PT +

    Glo
  43. Fluidigm experiment 4 organoids C1 capture Full-length No spike-ins

  44. Analysis Alignment Quantification Quality control Clustering Gene detection Interpretation STAR

    featureCounts scater SC3 SC3 Biologists
  45. Quality control Cells - Alignment - Quantification - Expression 278

    -> 155 Genes - Expression - Class 23388
  46. Clustering

  47. 10x experiment 3 organoids Chromium capture UMI ~7000 cells

  48. Analysis CellRanger CellRanger scater Seurat Seurat Biologists Alignment Quantification Quality

    control Clustering Gene detection Interpretation
  49. Three clusters Vasculature Epithelium “Stroma”

  50. Many clusters

  51. Vasculature Proximal tubule Podocytes

  52. Mesangium Renal stroma

  53. Nephron? Neuronal?

  54. ?

  55. None
  56. Nodes Resolution (k) Cluster Size

  57. Edges Cluster from (lower resolution) Cluster to (higher resolution) Number

    Proportion
  58. Proportions 100 60 40 k = 1 k = 2

    p from = n / size low n = 60 n = 40 p to = n / size high
  59. Clustering tree Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6

    0.7 0.8 0.9 1.0
  60. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0
  61. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Vasculature
  62. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Proximal Tubule Podocytes
  63. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Mesangium
  64. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Renal stroma
  65. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 Nephron/neuronal?
  66. Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    0.9 1.0 ?
  67. Summary Kidney organoids are complex More cells help Cluster relationships

    can be useful Background knowledge is vital
  68. Acknowledgements Everyone that makes tools and data available Supervisors Alicia

    Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson Breon Schmidt MCRI KDDR Alex Combes
  69. @_lazappi_ oshlacklab.com www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing

    data.” Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0 tinyurl.com/clust-tree-funcs “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv (2017) DOI: 10.1101/206573 bioconductor.org/packages/ splatter
  70. None