My data
OpenStax College, CC BY 3.0 via Wikimedia Commons
Single-cell RNA-sequencing
Gene activity in thousands of cells
~20000 rows x ~7000 columns
Look for different cell types
Slide 3
Slide 3 text
How many
clusters?
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Can we build a tree of
clusters?
Slide 7
Slide 7 text
Nodes
Resolution (k)
Cluster
Size
Slide 8
Slide 8 text
Edges
Cluster from
(lower resolution)
Cluster to
(higher resolution)
Slide 9
Slide 9 text
Edges
Cluster from
(lower resolution)
Cluster to
(higher resolution)
Number
Proportion
Slide 10
Slide 10 text
Proportions
size = 100
60
40
k = 1 k = 2
n = 60
n = 40
Slide 11
Slide 11 text
Proportions
100
60
40
k = 1 k = 2
p
from
= n / size
low
n = 60
n = 40
p
to
= n / size
high
Slide 12
Slide 12 text
Proportions
100
60
40
k = 1 k = 2
p
from
= 0.6
n = 60
n = 40
p
to
= 1.0
p
from
= 0.4
p
to
= 1.0
Slide 13
Slide 13 text
Proportions
100
60
40
k = 1 k = 2 p
from
= 0.67
60
40
p
to
= 1.0
40
30
40
20
30
10
30
k = 3
p
from
= 0.33
p
to
= 0.67
p
from
= 0.25
p
to
= 0.33
p
from
= 0.75
p
to
= 1.0
Slide 14
Slide 14 text
Algorithm
For each resolution r = 1, ..., R - 1
For each unique cluster C
low
in r
For each unique cluster C
high
in r + 1
Count C
low
, C
high
pairs
Res1 Res2 Res3
S1 1 1 1
S2 2 1 2
S3 1 2 2
S4 1 3 3
S5 2 1 4
Slide 15
Slide 15 text
Algorithm
For each resolution r = 1, ..., R - 1
Res1, Res2
For each unique cluster C
low
in r
For each unique cluster C
high
in r + 1
Count C
low
, C
high
pairs
Res1 Res2 Res3
S1 1 1 1
S2 2 1 2
S3 1 2 2
S4 1 3 3
S5 2 1 4
Slide 16
Slide 16 text
Algorithm
For each resolution r = 1, ..., R - 1
For each unique cluster C
low
in r
1, 2
For each unique cluster C
high
in r + 1
Count C
low
, C
high
pairs
Res1 Res2 Res3
S1 1 1 1
S2 2 1 2
S3 1 2 2
S4 1 3 3
S5 2 1 4
Slide 17
Slide 17 text
Algorithm
For each resolution r = 1, ..., R - 1
For each unique cluster C
low
in r
For each unique cluster C
high
in r + 1
1, 2, 3
Count C
low
, C
high
pairs
Res1 Res2 Res3
S1 1 1 1
S2 2 1 2
S3 1 2 2
S4 1 3 3
S5 2 1 4
Slide 18
Slide 18 text
Algorithm
For each resolution r = 1, ..., R - 1
For each unique cluster C
low
in r
For each unique cluster C
high
in r + 1
Count C
low
, C
high
pairs
1
Res1 Res2 Res3
S1 1 1 1
S2 2 1 2
S3 1 2 2
S4 1 3 3
S5 2 1 4
Summary
Choosing the number of clusters is hard but important
A clustering tree can help by showing:
- Relationships between clusters
- Which clusters are distinct
- Where samples are changing
Slide 29
Slide 29 text
Acknowledgements
Everyone that makes tools and data available
Supervisors
Alicia Oshlack
Melissa Little
MCRI Bioinformatics
Belinda Phipson
MCRI KDDR
Alex Combes
@_lazappi_
oshlacklab.com
R Tutorial
lazappi.id.au/building-a-clustering-tree/
Slides
speakerdeck.com/lazappi/building-a-clustering-tree