Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify cell types, particularly in the developmental setting. A key analysis step is using gene expression to form clusters of cells assumed to be distinct cell types. We have catalogued more than 70 currently available scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either by specifying an exact number, or indirectly through other parameters. The clustering resolution that is chosen can have a profound effect on further analysis, but it is unclear how to make this choice. Existing clustering quality metrics often score only single clusters or resolutions, or require perturbation and re-clustering which can be infeasible for large datasets.
Here we present clustering trees, a visualisation that shows the relationship between clusters with increasing clustering resolution. In a clustering tree each cluster is represented as a graph node with edges representing the overlap in samples (cells) between clusters at neighbouring resolutions. Clustering trees are a compact, information-dense visualisation that can be used to highlight instability that may indicate over clustering or display a range of information including gene expression. Importantly, clustering trees display information across resolutions, in contrast to more common visualisations which only show a single clustering. Here we explain the methods developed to produce clustering trees used in the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how we have used these trees for visualization of scRNA-seq data from kidney organoids.