Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify cell types, particularly in the developmental setting. A key analysis step is using gene expression to form clusters of cells assumed to be distinct cell types. We have catalogued more than 70 currently available scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either by specifying an exact number, or indirectly through other parameters. The clustering resolution that is chosen can have a profound effect on further analysis, but it is unclear how to make this choice. Existing clustering quality metrics often score only single clusters or resolutions, or require perturbation and re-clustering which can be infeasible for large datasets.
Here we present clustering trees, a visualisation that shows the relationship between clusters with increasing clustering resolution. In a clustering tree each cluster is represented as a graph node with edges representing the overlap in samples (cells) between clusters at neighbouring resolutions. Clustering trees are a compact, information-dense visualisation that can be used to highlight instability that may indicate over clustering or display a range of information including gene expression. Importantly, clustering trees display information across resolutions, in contrast to more common visualisations which only show a single clustering. Here we explain the methods developed to produce clustering trees used in the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how we have used these trees for visualization of scRNA-seq data from kidney organoids.
Visualising trees to
choose clusters for
Cell cycle SC3 stability
Choosing the number of clusters is hard but important
A clustering tree can help by showing:
- Relationships between clusters
- Which clusters are distinct
- Where samples are changing
Compact, information dense visualisation
- Alternative to t-SNE plots (or similar)
Everyone that makes tools and data available