Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualising trees to choose clusters for scRNA-seq data

Luke Zappia
November 27, 2018

Visualising trees to choose clusters for scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify cell types, particularly in the developmental setting. A key analysis step is using gene expression to form clusters of cells assumed to be distinct cell types. We have catalogued more than 70 currently available scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either by specifying an exact number, or indirectly through other parameters. The clustering resolution that is chosen can have a profound effect on further analysis, but it is unclear how to make this choice. Existing clustering quality metrics often score only single clusters or resolutions, or require perturbation and re-clustering which can be infeasible for large datasets.

Here we present clustering trees, a visualisation that shows the relationship between clusters with increasing clustering resolution. In a clustering tree each cluster is represented as a graph node with edges representing the overlap in samples (cells) between clusters at neighbouring resolutions. Clustering trees are a compact, information-dense visualisation that can be used to highlight instability that may indicate over clustering or display a range of information including gene expression. Importantly, clustering trees display information across resolutions, in contrast to more common visualisations which only show a single clustering. Here we explain the methods developed to produce clustering trees used in the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how we have used these trees for visualization of scRNA-seq data from kidney organoids.

Luke Zappia

November 27, 2018
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Visualising trees to
    choose clusters for
    scRNA-seq data
    Luke Zappia
    @_lazappi_

    View full-size slide

  2. Cell cycle SC3 stability
    Number of
    genes

    View full-size slide

  3. Summary
    Choosing the number of clusters is hard but important
    A clustering tree can help by showing:
    - Relationships between clusters
    - Which clusters are distinct
    - Where samples are changing
    Compact, information dense visualisation
    - Alternative to t-SNE plots (or similar)

    View full-size slide

  4. Acknowledgements
    Everyone that makes tools and data available
    MCRI Bioinformatics
    Belinda Phipson
    MCRI KDDR
    Alex Combes
    @_lazappi_
    oshlacklab.com
    lazappi.github.io/clustree
    Paper
    doi.org/10.1093/gigascience/giy083
    Slides
    tinyurl.com/abacbs2018-clustree
    Supervisors
    Alicia Oshlack
    Melissa Little

    View full-size slide