$30 off During Our Annual Pro Sale. View Details »

WEHI Bioinformatics Seminar

WEHI Bioinformatics Seminar

Single-cells, simulation and kidneys in a dish

Single-cell RNA sequencing (scRNA-seq) is rapidly becoming a tool of choice for biologists wishing to investigate gene expression at greater resolution, particularly in areas such as development and differentiation. Single-cell data presents an array of bioinformatics challenges, data is sparse (for both biological and technical reasons), quality control is difficult and it is unclear how to replicate measurements. As scRNA-seq datasets have become available so have a plethora of analysis methods. Evaluation of these methods relies on having a truth to test against or a deep biological knowledge to interpret the results. Unfortunately current scRNA-seq simulations are frequently poorly documented, not reproducible and do not demonstrate similarity to real data or experimental designs. In this talk I will present Splatter, a Bioconductor package for simulating scRNA-seq data that is designed to address these issues. Splatter provides a consistent, easy to use interface to several previously published simulations allowing researchers to estimate parameters, produce synthetic datasets and compare how well they replicate real data. Splatter also includes Splat, our own simulation model. Based on a gamma-Poisson hierarchical model, Splat includes additional features often seen in scRNA-Seq data, such as dropout, and can be used to simulate complex experiments including multiple cell types, differentiation lineages and multiple batches. I will also discuss an analysis of a complex kidney organoid dataset, showing how more cells and different levels of clustering help to reveal greater biological insight.

Luke Zappia

July 17, 2017
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Single-cells,
    simulation and
    kidneys in a dish
    Luke Zappia
    MCRI Bioinformatics
    @_lazappi_

    View Slide

  2. Bulk RNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Sample 1
    A 43
    B 3
    C 17
    D 24

    View Slide

  3. Single-cell RNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 1 4
    C 9 6 0 0
    D 7 0 4 0

    View Slide

  4. Moore’s Law
    Sevensson et al. arXiv 1704.01379, 2017

    View Slide

  5. View Slide

  6. Unique Molecular Identifiers
    UMIs
    5’
    3’
    AAAA
    (PCR){BC}[UMI]TTTT
    5 4
    Aligned reads De-duplication and counting

    View Slide

  7. Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 1 4
    C 9 6 0 0
    D 7 0 4 0

    View Slide

  8. Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 0 10 9 0
    B 0 0 1 4
    C 9 6 0 0
    D 7 0 4 0
    Bad cell?
    Low expression?
    Cell type specific?
    Cell cycle?
    Dropout?

    View Slide

  9. Analysis
    Over 120 packages
    - www.scRNA-tools.org
    Identify cell types
    - Clustering
    - Lineage tracing

    View Slide

  10. View Slide

  11. Simulation
    Biology
    Evaluation

    View Slide

  12. Simulation

    View Slide

  13. Simulations
    Provide a truth to test against
    BUT
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View Slide

  14. View Slide

  15. Splatter
    Bioconductor package
    Collection of simulation methods
    Consistent, easy to use, interface
    Functions for comparison

    View Slide

  16. Negative binomial

    View Slide

  17. Splat
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View Slide

  18. Simple - Negative binomial
    Lun - NB with cell factors
    Lun ATL, Bach K, Marioni JC. Genome Biology (2016).
    DOI: 10.1186/s13059-016-0947-7.
    Lun 2 - Sampled NB with batch effects
    Lun ATL, Marioni JC. Biostatistics (2017).
    DOI: 10.1093/biostatistics/kxw055.
    Simulations
    scDD - NB with bimodality
    Korthauer KD, et al. Genome Biology (2016).
    DOI: 10.1186/s13059-016-1077-y.
    BASiCS - NB with spike-ins
    Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. (2015).
    DOI: 10.1371/journal.pcbi.1004333.

    View Slide

  19. Using Splatter
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    1. Estimate
    2. Simulate
    3. Compare

    View Slide

  20. Real data
    3 HapMap individuals
    3 plates each
    200 random cells
    Tung P-Y et al. Sci. Rep. (2017) DOI:10.1038/srep39921
    A1 A2 A3
    A
    B1 B2 B3
    B
    C1 C2 C3
    C
    Tung et al. iPSCs, C1 capture

    View Slide

  21. Means Difference in Means

    View Slide

  22. Zeros per cell Difference in zeros

    View Slide

  23. Mean-zeros Difference

    View Slide

  24. Rank 1 8
    Full-length

    View Slide

  25. Complex simulations
    Groups Batches Paths

    View Slide

  26. Example evaluation
    Parameters
    - Estimated from Tung data
    Simulation
    - 400 cells
    - 3 groups (60%, 25%, 15%)
    - 10% DE (~1700 genes)
    - 20 replicates
    Method
    - SC3
    - k-means consensus clustering
    - Differential expression
    - Marker genes

    View Slide

  27. Clustering Gene identification

    View Slide

  28. Simulation summary
    Simulations are a great tool
    But they should be:
    - Reusable
    - Reproducible
    - Realistic
    Splatter is our solution bioRxiv 10.1101/133173

    View Slide

  29. Biology

    View Slide

  30. The kidney
    OpenStax College, CC BY 3.0 via Wikimedia Commons

    View Slide

  31. Organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid

    View Slide

  32. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View Slide

  33. Fluidigm experiment
    4 organoids
    C1 capture
    Full-length
    No spike-ins

    View Slide

  34. Analysis
    Alignment
    Quantification
    Quality control
    Clustering
    Gene detection
    Interpretation
    STAR
    featureCounts
    scater
    SC3
    SC3
    Biologists

    View Slide

  35. Quality control
    Cells
    - Alignment
    - Quantification
    - Expression
    278 -> 155
    Genes
    - Expression
    - Class
    23388

    View Slide

  36. Clustering

    View Slide

  37. 10x experiment
    3 organoids
    Chromium capture
    UMI
    ~7000 cells

    View Slide

  38. Analysis
    Alignment
    Quantification
    Quality control
    Clustering
    Gene detection
    Interpretation
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Biologists

    View Slide

  39. Three clusters
    Vasculature
    Epithelium
    “Stroma”

    View Slide

  40. Many clusters

    View Slide

  41. Vasculature
    Proximal tubule
    Podocytes

    View Slide

  42. Mesangium Renal stroma

    View Slide

  43. Nephron?
    Neuronal?

    View Slide

  44. ?

    View Slide

  45. Cluster tree
    Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0

    View Slide

  46. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0

    View Slide

  47. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Vasculature

    View Slide

  48. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Proximal
    Tubule
    Podocytes

    View Slide

  49. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Mesangium

    View Slide

  50. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Renal stroma

    View Slide

  51. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Nephron/neuronal?

    View Slide

  52. Resolution
    0.01
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    ?

    View Slide

  53. Summary
    Kidney organoids are complex
    More cells help
    Cluster relationships can be useful
    Background knowledge is vital

    View Slide

  54. Acknowledgements
    Everyone that makes tools and data available
    Supervisors
    Alicia Oshlack
    Melissa Little
    MCRI Bioinformatics
    Belinda Phipson
    Breon Schmidt
    MCRI KDDR
    Alex Combes

    View Slide

  55. bioconductor.org/packages/splatter
    bioRxiv: “Splatter:
    simulation of single-cell
    RNA sequencing data”
    @scRNAtools
    www.scRNA-tools.org
    @_lazappi_
    oshlacklab.com

    View Slide