Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Single cells, simulation and kidneys in a dish

Luke Zappia
October 27, 2017

Single cells, simulation and kidneys in a dish

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists wishing to investigate gene expression at greater resolution, particularly in areas such as development and differentiation. Single-cell data presents an array of bioinformatics challenges, data is sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. As scRNA-seq datasets have become available so have a plethora of analysis methods. We have catalogued software tools that implement these methods in the scRNA-tools database (www.scRNA-tools.org). Evaluation of analysis methods relies on having a truth to test against or deep biological knowledge to interpret the results. Unfortunately current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. In this talk I will present Splatter, a Bioconductor package for simulating scRNA-seq data that is designed to address these issues. Splatter provides a consistent, easy to use interface to several previously published simulations allowing researchers to estimate parameters, produce synthetic datasets and compare how well they replicate real data. Splatter also includes Splat, our own simulation model. Based on a gamma-Poisson hierarchical model, Splat includes additional features often seen in scRNA-Seq data, such as dropout, and can be used to simulate complex experiments including multiple cell types, differentiation lineages and multiple batches. I will also briefly discuss an analysis of a complex kidney organoid dataset, showing how more cells and different levels of clustering help to reveal greater biological insight.

Luke Zappia

October 27, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Single-cells,
    simulation and
    kidneys in a dish
    Luke Zappia
    MCRI Bioinformatics
    @_lazappi_

    View Slide

  2. View Slide

  3. Parkville Precinct
    MCRI

    View Slide

  4. MCRI Bioinformatics
    bpipe Corset
    Lace
    Necklace
    GOseq
    Splatter
    clinker
    JAFFA
    Cpipe
    Ximmer
    Schism
    missMethyl
    STRetch
    Structural
    Clinical
    STRs
    Single-cell
    Pipelines
    Gene sets
    Fusions
    Assembly
    superTranscripts
    Methylation
    scRNA-tools

    View Slide

  5. Bulk RNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Sample 1
    A 43
    B 3
    C 17
    D 24

    View Slide

  6. Single-cell RNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 0 1
    C 9 6 0 0
    D 7 0 4 0

    View Slide

  7. Moore’s Law
    Sevensson et al. arXiv 1704.01379, 2017

    View Slide

  8. View Slide

  9. Unique Molecular Identifiers
    UMIs
    5’
    3’
    AAAA
    (PCR){BC}[UMI]TTTT
    5 4
    Aligned reads De-duplication and counting

    View Slide

  10. Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 0 1
    C 9 6 0 0
    D 7 0 4 0

    View Slide

  11. Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 0 10 9 0
    B 0 0 0 1
    C 9 6 0 0
    D 7 0 4 0
    Bad cell?
    Low expression?
    Cell type specific?
    Cell cycle?
    Dropout?

    View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. www. .org

    View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. Simulation
    Biology
    Evaluation

    View Slide

  20. Simulation

    View Slide

  21. Simulations
    Provide a truth to test against
    BUT
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View Slide

  22. “Splatter: simulation of single-cell RNA sequencing data.”
    Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

    View Slide

  23. Splatter
    Bioconductor package
    Collection of simulation methods
    Consistent, easy to use, interface
    Functions for comparison

    View Slide

  24. Negative binomial

    View Slide

  25. Splat
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View Slide

  26. Simple - Negative binomial
    Lun - NB with cell factors
    Lun ATL, Bach K, Marioni JC. Genome Biology (2016).
    DOI: 10.1186/s13059-016-0947-7.
    Lun 2 - Sampled NB with batch effects
    Lun ATL, Marioni JC. Biostatistics (2017).
    DOI: 10.1093/biostatistics/kxw055.
    Simulations
    scDD - NB with bimodality
    Korthauer KD, et al. Genome Biology (2016).
    DOI: 10.1186/s13059-016-1077-y.
    BASiCS - NB with spike-ins
    Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015).
    DOI: 10.1371/journal.pcbi.1004333.

    View Slide

  27. Using Splatter
    params1 params2 sim1 sim2 datasets Splat = sim1,
    Simple = sim2)
    comp diff 1. Estimate
    2. Simulate
    3. Compare

    View Slide

  28. Real data
    3 HapMap individuals
    3 plates each
    200 random cells
    Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921
    A1 A2 A3
    A
    B1 B2 B3
    B
    C1 C2 C3
    C
    Tung et al. iPSCs, C1 capture

    View Slide

  29. Means Difference in Means

    View Slide

  30. Zeros per cell Difference in zeros

    View Slide

  31. Mean-zeros Difference

    View Slide

  32. Rank 1 8

    View Slide

  33. Rank 1 8
    Full-length
    Full-length

    View Slide

  34. Complex simulations
    Groups Batches Paths

    View Slide

  35. Example evaluation
    Parameters
    - Estimated from Tung data
    Simulation
    - 400 cells
    - 3 groups (60%, 25%, 15%)
    - 10% DE (~1700 genes)
    - 20 replicates
    Method
    - SC3
    - k-means consensus clustering
    - Differential expression
    - Marker genes

    View Slide

  36. Clustering Gene identification

    View Slide

  37. New in Splatter 1.2.0
    SingleCellExperiment
    Batch effects
    Simulations
    - BASiCS
    - mfa
    - PhenoPath
    - ZINB-WaVE
    Bioconductor 3.6

    View Slide

  38. Simulation summary
    Simulations are a great tool
    But they should be:
    - Reusable
    - Reproducible
    - Realistic
    Splatter is our solution Genome Biology
    10.1186/s13059-017-1305-0

    View Slide

  39. Biology

    View Slide

  40. The kidney
    OpenStax College, CC BY 3.0 via Wikimedia Commons

    View Slide

  41. Organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid
    Takasato M et al. Nature. (2015) DOI: 10.1038/nature15695

    View Slide

  42. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View Slide

  43. Fluidigm experiment
    4 organoids
    C1 capture
    Full-length
    No spike-ins

    View Slide

  44. Analysis
    Alignment
    Quantification
    Quality control
    Clustering
    Gene detection
    Interpretation
    STAR
    featureCounts
    scater
    SC3
    SC3
    Biologists

    View Slide

  45. Quality control
    Cells
    - Alignment
    - Quantification
    - Expression
    278 -> 155
    Genes
    - Expression
    - Class
    23388

    View Slide

  46. Clustering

    View Slide

  47. 10x experiment
    3 organoids
    Chromium capture
    UMI
    ~7000 cells

    View Slide

  48. Analysis
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Biologists
    Alignment
    Quantification
    Quality control
    Clustering
    Gene detection
    Interpretation

    View Slide

  49. Three clusters
    Vasculature
    Epithelium
    “Stroma”

    View Slide

  50. Many clusters

    View Slide

  51. Vasculature
    Proximal tubule
    Podocytes

    View Slide

  52. Mesangium Renal stroma

    View Slide

  53. Nephron?
    Neuronal?

    View Slide

  54. ?

    View Slide

  55. View Slide

  56. Nodes
    Resolution (k)
    Cluster
    Size

    View Slide

  57. Edges
    Cluster from
    (lower resolution)
    Cluster to
    (higher resolution)
    Number
    Proportion

    View Slide

  58. Proportions
    100
    60
    40
    k = 1 k = 2
    p
    from
    = n / size
    low
    n = 60
    n = 40
    p
    to
    = n / size
    high

    View Slide

  59. Clustering tree
    Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0

    View Slide

  60. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0

    View Slide

  61. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Vasculature

    View Slide

  62. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Proximal
    Tubule
    Podocytes

    View Slide

  63. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Mesangium

    View Slide

  64. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Renal stroma

    View Slide

  65. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Nephron/neuronal?

    View Slide

  66. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    ?

    View Slide

  67. Summary
    Kidney organoids are complex
    More cells help
    Cluster relationships can be useful
    Background knowledge is vital

    View Slide

  68. Acknowledgements
    Everyone that makes tools and data available
    Supervisors
    Alicia Oshlack
    Melissa Little
    MCRI Bioinformatics
    Belinda Phipson
    Breon Schmidt
    MCRI KDDR
    Alex Combes

    View Slide

  69. @_lazappi_
    oshlacklab.com
    www.scRNA-tools.org
    @scRNAtools
    “Splatter: simulation of
    single-cell RNA sequencing
    data.”
    Genome Biology (2017)
    DOI:
    10.1186/s13059-017-1305-0
    tinyurl.com/clust-tree-funcs
    “Exploring the single-cell
    RNA-seq analysis landscape
    with the scRNA-tools database”
    bioRxiv (2017)
    DOI: 10.1101/206573
    bioconductor.org/packages/
    splatter

    View Slide

  70. View Slide