Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clustering trees for visualising scRNA-seq data

Clustering trees for visualising scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify and compare the cell types present. This type of experiment is particularly prevalent in the developmental setting. A key step in this approach is assigning cells to different clusters that are assumed to be distinct cell types. Although this can be done by comparison with reference datasets, cells are more routinely grouped using unsupervised clustering and we have catalogued more than 60 scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either through specifying an exact number, a parameter which controls the clustering resolution or indirectly through other parameters. The resolution that is chosen can have a profound effect on further analysis but it is unclear how to make this choice. Existing clustering metrics often score only single clusters or resolutions, or require datasets to be perturbed and clustered multiple times which can be infeasible for large datasets. Here we present clustering trees as an alternative visualisation that shows the relationship between clusters as the clustering resolution increases. These trees can highlight instability that may indicate overclustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. More generally, clustering trees are a compact, information dense visualisation that can serve as an alternative to plotting cells in reduced dimensions such as t-SNE. Here we explain how clustering trees are produced using the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how they can be used with a examples of scRNA-seq data from kidney organoids.

Presented at the Oz Single Cells meeting, July 2018.

Luke Zappia

July 15, 2018
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Clustering trees
    for visualising
    scRNA-seq data
    Luke Zappia
    @_lazappi_

    View Slide

  2. Zappia L, Phipson B, Oshlack A. 2018. DOI:10.1371/journal.pcbi.1006245

    View Slide

  3. Many clustering tools
    > 25% of all tools
    Data from www.scRNA-tools.org

    View Slide

  4. View Slide

  5. How many
    clusters?

    View Slide

  6. View Slide

  7. View Slide

  8. A tree of
    clusters?

    View Slide

  9. View Slide

  10. View Slide

  11. Weighting edges
    In proportion =
    Number of cells
    on edge
    Number of cells in
    high res cluster

    View Slide

  12. Some
    examples

    View Slide

  13. My data
    iPSCs organoid

    View Slide

  14. View Slide

  15. View Slide

  16. NPHS1

    View Slide

  17. NPHS1

    View Slide

  18. Cell cycle SC3 stability
    Number of
    genes

    View Slide

  19. View Slide

  20. t-SNE 2
    t-SNE 1
    t-SNE 1
    t-SNE 2

    View Slide

  21. Summary
    Choosing the number of clusters is hard but important
    A clustering tree can help by showing:
    - Relationships between clusters
    - Which clusters are distinct
    - Where samples are changing
    Compact, information dense visualisation
    - Alternative to t-SNE plots (or similar)

    View Slide

  22. Acknowledgements
    Everyone that makes tools and data available
    MCRI Bioinformatics
    Belinda Phipson
    MCRI KDDR
    Alex Combes
    @_lazappi_
    oshlacklab.com
    lazappi.github.io/clustree
    Paper
    doi.org/10.1093/gigascience/giy083
    Slides
    tidyurl.com/clustree-OzSingleCells
    Supervisors
    Alicia Oshlack
    Melissa Little

    View Slide