$30 off During Our Annual Pro Sale. View details »

Visualising trees to choose clusters for scRNA-seq data

Luke Zappia
November 27, 2018

Visualising trees to choose clusters for scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify cell types, particularly in the developmental setting. A key analysis step is using gene expression to form clusters of cells assumed to be distinct cell types. We have catalogued more than 70 currently available scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either by specifying an exact number, or indirectly through other parameters. The clustering resolution that is chosen can have a profound effect on further analysis, but it is unclear how to make this choice. Existing clustering quality metrics often score only single clusters or resolutions, or require perturbation and re-clustering which can be infeasible for large datasets.

Here we present clustering trees, a visualisation that shows the relationship between clusters with increasing clustering resolution. In a clustering tree each cluster is represented as a graph node with edges representing the overlap in samples (cells) between clusters at neighbouring resolutions. Clustering trees are a compact, information-dense visualisation that can be used to highlight instability that may indicate over clustering or display a range of information including gene expression. Importantly, clustering trees display information across resolutions, in contrast to more common visualisations which only show a single clustering. Here we explain the methods developed to produce clustering trees used in the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how we have used these trees for visualization of scRNA-seq data from kidney organoids.

Luke Zappia

November 27, 2018

More Decks by Luke Zappia

Other Decks in Science


  1. Visualising trees to choose clusters for scRNA-seq data Luke Zappia

  2. None
  3. None
  4. None
  5. None
  6. NPHS1

  7. Cell cycle SC3 stability Number of genes

  8. Summary Choosing the number of clusters is hard but important

    A clustering tree can help by showing: - Relationships between clusters - Which clusters are distinct - Where samples are changing Compact, information dense visualisation - Alternative to t-SNE plots (or similar)
  9. Acknowledgements Everyone that makes tools and data available MCRI Bioinformatics

    Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ oshlacklab.com lazappi.github.io/clustree Paper doi.org/10.1093/gigascience/giy083 Slides tinyurl.com/abacbs2018-clustree Supervisors Alicia Oshlack Melissa Little