Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PhD Europe 2018

Luke Zappia
September 27, 2018

PhD Europe 2018

The past few years have seen an explosion in the development of single-cell RNA-sequencing technology and it has quickly become a commonly used tool for interrogating complex tissues. Access to this new type of data has lead to a corresponding surge in the production of statistical and computational tools to analyse it. We are cataloguing in the scRNA-tools database (www.scRNA-tools.org). Deciding which of these tools to use for a specific task is difficult and comprehensive evaluations and comparisons are required. One way to demonstrate how well tools perform at their selected task is by testing them on simulated data. To make this easier we developed Splatter, a Bioconductor R package that provides a consistent, easy-to-use interface for multiple models for simulating scRNA-seq data (https://bioconductor.org/packages/splatter). Providing independent simulation software avoids relying on simulations that are not reproducible, match the tools assumptions and do not demonstrate similarity to real datasets .

Even the most effective methods usually have parameters that affect how they perform. For scRNA-seq data one of the analysis tasks that has received the most attention is defining groups of similar cells, usually through unsupervised clustering. Most clustering methods have parameters which, directly or indirectly, affect the number of clusters produced. The clustering resolution that is chosen can have a profound effect on further analysis and interpretation but it is unclear how to make this choice. To aid analysts in deciding which clustering resolution to use we have developed clustering trees, a visualisation that shows how clusters form and change as the resolution increases. These trees can be produced using the clustree R package (http://cran.r-project.org/package=clustree) and are applicable to any clustering method. Clustering trees highlight instability that may indicate over clustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. This presentation will demonstrate our methods and tools using an scRNA-seq dataset we have generated to explore the cell type composition of kidney organoids.

Luke Zappia

September 27, 2018
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Tools, simulations and
    trees for scRNA-seq
    Luke Zappia
    @_lazappi_

    View full-size slide

  2. Parkville precinct

    View full-size slide

  3. MCRI Bioinformatics
    bpipe Corset Lace
    Necklace
    GOseq
    Splatter
    clinker
    JAFFA
    Cpipe
    Ximmer
    Schism
    missMethyl
    STRetch
    Structural
    Clinical
    STRs
    Single-cell
    Pipelines
    Gene sets
    Fusions
    Assembly superTranscripts
    Methylation
    scRNA-tools
    clustree
    Visualisation

    View full-size slide

  4. MCRI Kidney Development
    OpenStax College, CC BY 3.0 via Wikimedia Commons

    View full-size slide

  5. Kidney organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid

    View full-size slide

  6. 1 Tools
    2 Simulations
    3 Clustering trees
    4 Analysis
    1
    2
    3
    4

    View full-size slide

  7. Dataset
    4 Organoids
    10x Chromium
    2 Batches (3 + 1)
    7937 cells (6649 + 1288)
    Identify cell types

    View full-size slide

  8. Svensson et al. DOI: 10.1038/nprot.2017.149

    View full-size slide

  9. www. .org
    “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”
    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

    View full-size slide

  10. Simulations
    2

    View full-size slide

  11. Simulations
    Provide a truth to test against
    BUT
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View full-size slide

  12. “Splatter: simulation of single-cell RNA sequencing data”
    Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

    View full-size slide

  13. Splat
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View full-size slide

  14. Simulation models
    Simple - Negative binomial
    Lun - NB with cell factors
    DOI: 10.1186/s13059-016-0947-7
    Lun2 - Sampled NB with batch effects
    DOI: 10.1093/biostatistics/kxw055
    scDD - NB with bimodality
    DOI: 10.1186/s13059-016-1077-y
    BASiCS - NB with spike-ins
    DOI: 10.1371/journal.pcbi.1004333
    mfa - Bifurcating pseudotime trajectory
    DOI: 10.12688/wellcomeopenres.11087.1
    PhenoPath - Pseudotime with gene types
    DOI: 10.1038/s41467-018-04696-6
    ZINB-WaVE - Sophisticated ZINB
    DOI: 10.1186/s13059-018-1406-4
    SparseDC - Clusters across two conditions
    DOI: 10.1093/nar/gkx1113

    View full-size slide

  15. 1. Estimate
    2. Simulate
    3. Compare
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    Using Splatter

    View full-size slide

  16. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Real
    Mean log
    2
    (CPM + 1)
    Distribution of mean expression
    ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Rank Difference Mean log
    2
    (CPM + 1)
    Difference in mean expression

    View full-size slide

  17. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Mean
    Variance
    Mean-Variance
    Library size
    %Zeros (Cell)
    % Zeros (Gene)
    Mean-Zeros
    Rank of MAD from real data

    View full-size slide

  18. CountSimQC
    DESeq2 dispersions Feature correlations
    Soneson et al. DOI: 10.1093/bioinformatics/btx631

    View full-size slide

  19. https://github.com/YosefLab/SymSim
    https://github.com/bvieth/powsimR

    View full-size slide

  20. Complex simulations
    Groups Batches Paths

    View full-size slide

  21. Clustering trees
    3

    View full-size slide

  22. Clustering methods
    > 25% of all tools

    View full-size slide

  23. How many clusters?

    View full-size slide

  24. A tree of clusters?

    View full-size slide

  25. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”
    GigaScience (2018) DOI: doi.org/10.1093/gigascience/giy083

    View full-size slide

  26. Organoid data

    View full-size slide

  27. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View full-size slide

  28. Alignment
    Quantification
    Quality control
    Integration
    Clustering
    Gene detection
    Ordering
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Seurat
    Monocle
    Analysis steps

    View full-size slide

  29. Stroma Endothelium
    Cell cycle Podocyte
    Epithelium

    View full-size slide

  30. Podocyte
    Early
    podocyte
    Early
    proximal
    Early distal
    Progenitor

    View full-size slide

  31. Progenitor
    Early
    tubule
    Podocyte

    View full-size slide

  32. Human dataset
    16 week fetal kidney
    3178 cells
    10x Chromium
    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell
    Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney”
    J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

    View full-size slide

  33. Stroma
    Endothelium
    Cell cycle Podocyte
    Nephron
    Progenitor
    Immune

    View full-size slide

  34. Fetal kidney Organoid

    View full-size slide

  35. Podocyte
    (human
    only)
    Early
    podocyte
    Proximal
    Distal
    Progenitor Stroma

    View full-size slide

  36. Podocyte
    Early proximal
    Early distal
    Early podocyte
    Progenitor
    Diff. progenitor
    Human pod.
    Stroma
    Fetal kidney Organoid

    View full-size slide

  37. install.packages(“clustree”)
    Paper
    doi.org/10.1093/gigascience/giy083
    @_lazappi_
    oshlacklab.com
    github.com/lazappi
    biocLite(“splatter”)
    Paper
    doi.org/10.1186/s13059-017-1305-0
    www.scRNA-tools.org
    Paper
    doi.org/10.1093/gigascience/giy083

    View full-size slide

  38. install.packages(“clustree”)
    Paper
    doi.org/10.1093/gigascience/giy083
    la

    View full-size slide