Clustering trees for visualising scRNA-seq data

Clustering trees for visualising scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify and compare the cell types present. This type of experiment is particularly prevalent in the developmental setting. A key step in this approach is assigning cells to different clusters that are assumed to be distinct cell types. Although this can be done by comparison with reference datasets, cells are more routinely grouped using unsupervised clustering and we have catalogued more than 60 scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either through specifying an exact number, a parameter which controls the clustering resolution or indirectly through other parameters. The resolution that is chosen can have a profound effect on further analysis but it is unclear how to make this choice. Existing clustering metrics often score only single clusters or resolutions, or require datasets to be perturbed and clustered multiple times which can be infeasible for large datasets. Here we present clustering trees as an alternative visualisation that shows the relationship between clusters as the clustering resolution increases. These trees can highlight instability that may indicate overclustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. More generally, clustering trees are a compact, information dense visualisation that can serve as an alternative to plotting cells in reduced dimensions such as t-SNE. Here we explain how clustering trees are produced using the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how they can be used with a examples of scRNA-seq data from kidney organoids.

Presented at the Oz Single Cells meeting, July 2018.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

July 15, 2018
Tweet

Transcript

  1. Clustering trees for visualising scRNA-seq data Luke Zappia @_lazappi_

  2. Zappia L, Phipson B, Oshlack A. 2018. DOI:10.1371/journal.pcbi.1006245

  3. Many clustering tools > 25% of all tools Data from

    www.scRNA-tools.org
  4. None
  5. How many clusters?

  6. None
  7. None
  8. A tree of clusters?

  9. None
  10. None
  11. Weighting edges In proportion = Number of cells on edge

    Number of cells in high res cluster
  12. Some examples

  13. My data iPSCs organoid

  14. None
  15. None
  16. NPHS1

  17. NPHS1

  18. Cell cycle SC3 stability Number of genes

  19. None
  20. t-SNE 2 t-SNE 1 t-SNE 1 t-SNE 2

  21. Summary Choosing the number of clusters is hard but important

    A clustering tree can help by showing: - Relationships between clusters - Which clusters are distinct - Where samples are changing Compact, information dense visualisation - Alternative to t-SNE plots (or similar)
  22. Acknowledgements Everyone that makes tools and data available MCRI Bioinformatics

    Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ oshlacklab.com lazappi.github.io/clustree Paper doi.org/10.1093/gigascience/giy083 Slides tidyurl.com/clustree-OzSingleCells Supervisors Alicia Oshlack Melissa Little