Clustering trees for visualising scRNA-seq data

July 15, 2018

Science

690

Clustering trees for visualising scRNA-seq data

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify and compare the cell types present. This type of experiment is particularly prevalent in the developmental setting. A key step in this approach is assigning cells to different clusters that are assumed to be distinct cell types. Although this can be done by comparison with reference datasets, cells are more routinely grouped using unsupervised clustering and we have catalogued more than 60 scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either through specifying an exact number, a parameter which controls the clustering resolution or indirectly through other parameters. The resolution that is chosen can have a profound effect on further analysis but it is unclear how to make this choice. Existing clustering metrics often score only single clusters or resolutions, or require datasets to be perturbed and clustered multiple times which can be infeasible for large datasets. Here we present clustering trees as an alternative visualisation that shows the relationship between clusters as the clustering resolution increases. These trees can highlight instability that may indicate overclustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. More generally, clustering trees are a compact, information dense visualisation that can serve as an alternative to plotting cells in reduced dimensions such as t-SNE. Here we explain how clustering trees are produced using the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how they can be used with a examples of scRNA-seq data from kidney organoids.

Presented at the Oz Single Cells meeting, July 2018.

Luke Zappia

July 15, 2018

Tweet

More Decks by Luke Zappia

See All by Luke Zappia

Suggestions for successful scRNA-seq analysis

0

140

Successful scRNA-seq analysis

0

410

Interoperability between Bioconductor and Python for scRNA-seq analysis

0

920

Tools and techniques for single-cell RNA sequencing data

0

900

Visualising trees to choose clusters for scRNA-seq data

2

650

PhD Europe 2018

0

440

clustree: a package for producing clustering trees using ggraph

2

1.1k

0

450

gi2017: Simulation and analysis tools for single-cell RNA sequencing data

3

950

Other Decks in Science

See All in Science

03_草原和博_広島大学大学院人間社会科学研究科教授_デジタル_シティズンシップシティで_新たな_学び__をつくる.pdf

0

570

機械学習 - SVM

PRO

1

870

Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション

0

320

統計的因果探索: 背景知識とデータにより因果仮説を探索する

4

990

データベース04: SQL (1/3) 単純質問 & 集約演算

PRO

0

980

白金鉱業Meetup Vol.16_数理最適化案件のはじめかた・すすめかた

4

1.9k

NASの容量不足のお悩み解決！災害対策も兼ねた「Wasabi Cloud NAS」はここがスゴイ

0

110

モンテカルロDCF法による事業価値の算出（モンテカルロ法とベイズモデリング） / Business Valuation Using Monte Carlo DCF Method (Monte Carlo Simulation and Bayesian Modeling)

0

240

生成AIと学ぶPythonデータ分析再入門－Pythonによるクラスタリング・可視化をサクサク実施－

datascientistsociety

PRO

4

1.7k

地質研究者が苦労しながら運用する情報公開システムの実例

0

250

mathematics of indirect reciprocity

1

160

白金鉱業Meetup Vol.16_【初学者向け発表】数理最適化のはじめの一歩〜身近な問題で学ぶ最適化の面白さ〜

11

2.3k

Featured

See All Featured

Site-Speed That Sticks

10

780

Code Reviewing Like a Champion

525

40k

Build your cross-platform service in a week with App Engine

231

18k

Designing for humans not robots

253

25k

How to Create Impact in a Changing Tech Landscape [PerfNow 2023]

53

2.9k

Easily Structure & Communicate Ideas using Wireframe

194

16k

Fight the Zombie Pattern Library - RWD Summit 2016

234

17k

39

3.6k

Being A Developer After 40

90

590k

How to Ace a Technical Interview

279

23k

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

3k

[RailsConf 2023 Opening Keynote] The Magic of Rails

30

9.6k

Transcript

Clustering trees for visualising scRNA-seq data Luke Zappia @_lazappi_
Zappia L, Phipson B, Oshlack A. 2018. DOI:10.1371/journal.pcbi.1006245
Many clustering tools > 25% of all tools Data from
www.scRNA-tools.org
None
How many clusters?
None
None
A tree of clusters?
None
None
Weighting edges In proportion = Number of cells on edge
Number of cells in high res cluster
Some examples
My data iPSCs organoid
None
None
NPHS1
NPHS1
Cell cycle SC3 stability Number of genes
None
t-SNE 2 t-SNE 1 t-SNE 1 t-SNE 2
Summary Choosing the number of clusters is hard but important
A clustering tree can help by showing: - Relationships between clusters - Which clusters are distinct - Where samples are changing Compact, information dense visualisation - Alternative to t-SNE plots (or similar)
Acknowledgements Everyone that makes tools and data available MCRI Bioinformatics
Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ oshlacklab.com lazappi.github.io/clustree Paper doi.org/10.1093/gigascience/giy083 Slides tidyurl.com/clustree-OzSingleCells Supervisors Alicia Oshlack Melissa Little