Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualising trees to choose clusters for scRNA-seq data

Luke Zappia
November 27, 2018

Visualising trees to choose clusters for scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify cell types, particularly in the developmental setting. A key analysis step is using gene expression to form clusters of cells assumed to be distinct cell types. We have catalogued more than 70 currently available scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either by specifying an exact number, or indirectly through other parameters. The clustering resolution that is chosen can have a profound effect on further analysis, but it is unclear how to make this choice. Existing clustering quality metrics often score only single clusters or resolutions, or require perturbation and re-clustering which can be infeasible for large datasets.

Here we present clustering trees, a visualisation that shows the relationship between clusters with increasing clustering resolution. In a clustering tree each cluster is represented as a graph node with edges representing the overlap in samples (cells) between clusters at neighbouring resolutions. Clustering trees are a compact, information-dense visualisation that can be used to highlight instability that may indicate over clustering or display a range of information including gene expression. Importantly, clustering trees display information across resolutions, in contrast to more common visualisations which only show a single clustering. Here we explain the methods developed to produce clustering trees used in the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how we have used these trees for visualization of scRNA-seq data from kidney organoids.

Luke Zappia

November 27, 2018

More Decks by Luke Zappia

Other Decks in Science


  1. Visualising trees to
    choose clusters for
    scRNA-seq data
    Luke Zappia

    View full-size slide

  2. Cell cycle SC3 stability
    Number of

    View full-size slide

  3. Summary
    Choosing the number of clusters is hard but important
    A clustering tree can help by showing:
    - Relationships between clusters
    - Which clusters are distinct
    - Where samples are changing
    Compact, information dense visualisation
    - Alternative to t-SNE plots (or similar)

    View full-size slide

  4. Acknowledgements
    Everyone that makes tools and data available
    MCRI Bioinformatics
    Belinda Phipson
    Alex Combes
    Alicia Oshlack
    Melissa Little

    View full-size slide