Building a clustering tree

Building a clustering tree Luke Zappia @_lazappi_

My data OpenStax College, CC BY 3.0 via Wikimedia Commons
Single-cell RNA-sequencing Gene activity in thousands of cells ~20000 rows x ~7000 columns Look for different cell types

How many clusters?

Can we build a tree of clusters?

Nodes Resolution (k) Cluster Size

Edges Cluster from (lower resolution) Cluster to (higher resolution)

Edges Cluster from (lower resolution) Cluster to (higher resolution) Number
Proportion

Proportions size = 100 60 40 k = 1 k
= 2 n = 60 n = 40

Proportions 100 60 40 k = 1 k = 2
p from = n / size low n = 60 n = 40 p to = n / size high

Proportions 100 60 40 k = 1 k = 2
p from = 0.6 n = 60 n = 40 p to = 1.0 p from = 0.4 p to = 1.0

Proportions 100 60 40 k = 1 k = 2
p from = 0.67 60 40 p to = 1.0 40 30 40 20 30 10 30 k = 3 p from = 0.33 p to = 0.67 p from = 0.25 p to = 0.33 p from = 0.75 p to = 1.0

Algorithm For each resolution r = 1, ..., R -
1 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

1 Res1, Res2 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

1 For each unique cluster C low in r 1, 2 For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

1 For each unique cluster C low in r For each unique cluster C high in r + 1 1, 2, 3 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

1 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs 1 Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Edge table Res1 Res2 Res3 S1 1 1 1 S2
2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4 ResFrom ClusterFrom ResTo ClusterTo Number Res1 1 Res2 1 1 Res1 1 Res2 2 1 Res1 1 Res2 3 1 Res1 2 Res2 1 2 Res1 2 Res2 2 0 Res1 2 Res2 3 0

Tree Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.8 0.9 1.0

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.9 1.0

Summary Choosing the number of clusters is hard but important
A clustering tree can help by showing: - Relationships between clusters - Which clusters are distinct - Where samples are changing

Acknowledgements Everyone that makes tools and data available Supervisors Alicia
Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ oshlacklab.com R Tutorial lazappi.id.au/building-a-clustering-tree/ Slides speakerdeck.com/lazappi/building-a-clustering-tree

Building a clustering tree

Building a clustering tree

Luke Zappia

More Decks by Luke Zappia

Other Decks in Science

Featured

Transcript