Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PhD Europe 2018

Luke Zappia
September 27, 2018

PhD Europe 2018

The past few years have seen an explosion in the development of single-cell RNA-sequencing technology and it has quickly become a commonly used tool for interrogating complex tissues. Access to this new type of data has lead to a corresponding surge in the production of statistical and computational tools to analyse it. We are cataloguing in the scRNA-tools database (www.scRNA-tools.org). Deciding which of these tools to use for a specific task is difficult and comprehensive evaluations and comparisons are required. One way to demonstrate how well tools perform at their selected task is by testing them on simulated data. To make this easier we developed Splatter, a Bioconductor R package that provides a consistent, easy-to-use interface for multiple models for simulating scRNA-seq data (https://bioconductor.org/packages/splatter). Providing independent simulation software avoids relying on simulations that are not reproducible, match the tools assumptions and do not demonstrate similarity to real datasets .

Even the most effective methods usually have parameters that affect how they perform. For scRNA-seq data one of the analysis tasks that has received the most attention is defining groups of similar cells, usually through unsupervised clustering. Most clustering methods have parameters which, directly or indirectly, affect the number of clusters produced. The clustering resolution that is chosen can have a profound effect on further analysis and interpretation but it is unclear how to make this choice. To aid analysts in deciding which clustering resolution to use we have developed clustering trees, a visualisation that shows how clusters form and change as the resolution increases. These trees can be produced using the clustree R package (http://cran.r-project.org/package=clustree) and are applicable to any clustering method. Clustering trees highlight instability that may indicate over clustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. This presentation will demonstrate our methods and tools using an scRNA-seq dataset we have generated to explore the cell type composition of kidney organoids.

Luke Zappia

September 27, 2018
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Tools, simulations and
    trees for scRNA-seq
    Luke Zappia
    @_lazappi_

    View Slide

  2. View Slide

  3. Parkville precinct

    View Slide

  4. MCRI Bioinformatics
    bpipe Corset Lace
    Necklace
    GOseq
    Splatter
    clinker
    JAFFA
    Cpipe
    Ximmer
    Schism
    missMethyl
    STRetch
    Structural
    Clinical
    STRs
    Single-cell
    Pipelines
    Gene sets
    Fusions
    Assembly superTranscripts
    Methylation
    scRNA-tools
    clustree
    Visualisation

    View Slide

  5. MCRI Kidney Development
    OpenStax College, CC BY 3.0 via Wikimedia Commons

    View Slide

  6. Kidney organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid

    View Slide

  7. 1 Tools
    2 Simulations
    3 Clustering trees
    4 Analysis
    1
    2
    3
    4

    View Slide

  8. Dataset
    4 Organoids
    10x Chromium
    2 Batches (3 + 1)
    7937 cells (6649 + 1288)
    Identify cell types

    View Slide

  9. Tools
    1

    View Slide

  10. Svensson et al. DOI: 10.1038/nprot.2017.149

    View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. www. .org
    “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”
    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

    View Slide

  15. View Slide

  16. View Slide

  17. Simulations
    2

    View Slide

  18. Simulations
    Provide a truth to test against
    BUT
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View Slide

  19. “Splatter: simulation of single-cell RNA sequencing data”
    Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

    View Slide

  20. Splat
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View Slide

  21. Simulation models
    Simple - Negative binomial
    Lun - NB with cell factors
    DOI: 10.1186/s13059-016-0947-7
    Lun2 - Sampled NB with batch effects
    DOI: 10.1093/biostatistics/kxw055
    scDD - NB with bimodality
    DOI: 10.1186/s13059-016-1077-y
    BASiCS - NB with spike-ins
    DOI: 10.1371/journal.pcbi.1004333
    mfa - Bifurcating pseudotime trajectory
    DOI: 10.12688/wellcomeopenres.11087.1
    PhenoPath - Pseudotime with gene types
    DOI: 10.1038/s41467-018-04696-6
    ZINB-WaVE - Sophisticated ZINB
    DOI: 10.1186/s13059-018-1406-4
    SparseDC - Clusters across two conditions
    DOI: 10.1093/nar/gkx1113

    View Slide

  22. 1. Estimate
    2. Simulate
    3. Compare
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    Using Splatter

    View Slide

  23. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Real
    Mean log
    2
    (CPM + 1)
    Distribution of mean expression
    ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Rank Difference Mean log
    2
    (CPM + 1)
    Difference in mean expression

    View Slide

  24. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Mean
    Variance
    Mean-Variance
    Library size
    %Zeros (Cell)
    % Zeros (Gene)
    Mean-Zeros
    Rank of MAD from real data

    View Slide

  25. CountSimQC
    DESeq2 dispersions Feature correlations
    Soneson et al. DOI: 10.1093/bioinformatics/btx631

    View Slide

  26. https://github.com/YosefLab/SymSim
    https://github.com/bvieth/powsimR

    View Slide

  27. Complex simulations
    Groups Batches Paths

    View Slide

  28. Clustering trees
    3

    View Slide

  29. Clustering methods
    > 25% of all tools

    View Slide

  30. How many clusters?

    View Slide

  31. A tree of clusters?

    View Slide

  32. View Slide

  33. View Slide

  34. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”
    GigaScience (2018) DOI: doi.org/10.1093/gigascience/giy083

    View Slide

  35. Organoid data

    View Slide

  36. View Slide

  37. NPHS1

    View Slide

  38. Analysis
    4

    View Slide

  39. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View Slide

  40. Alignment
    Quantification
    Quality control
    Integration
    Clustering
    Gene detection
    Ordering
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Seurat
    Monocle
    Analysis steps

    View Slide

  41. Stroma Endothelium
    Cell cycle Podocyte
    Epithelium

    View Slide

  42. Podocyte
    Early
    podocyte
    Early
    proximal
    Early distal
    Progenitor

    View Slide

  43. Progenitor
    Early
    tubule
    Podocyte

    View Slide

  44. Human dataset
    16 week fetal kidney
    3178 cells
    10x Chromium
    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor Cell
    Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney”
    J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

    View Slide

  45. Stroma
    Endothelium
    Cell cycle Podocyte
    Nephron
    Progenitor
    Immune

    View Slide

  46. Fetal kidney Organoid

    View Slide

  47. Podocyte
    (human
    only)
    Early
    podocyte
    Proximal
    Distal
    Progenitor Stroma

    View Slide

  48. Podocyte
    Early proximal
    Early distal
    Early podocyte
    Progenitor
    Diff. progenitor
    Human pod.
    Stroma
    Fetal kidney Organoid

    View Slide

  49. install.packages(“clustree”)
    Paper
    doi.org/10.1093/gigascience/giy083
    @_lazappi_
    oshlacklab.com
    github.com/lazappi
    biocLite(“splatter”)
    Paper
    doi.org/10.1186/s13059-017-1305-0
    www.scRNA-tools.org
    Paper
    doi.org/10.1093/gigascience/giy083

    View Slide

  50. install.packages(“clustree”)
    Paper
    doi.org/10.1093/gigascience/giy083
    la

    View Slide