Slide 1

Slide 1 text

Building a clustering tree Luke Zappia @_lazappi_

Slide 2

Slide 2 text

My data OpenStax College, CC BY 3.0 via Wikimedia Commons Single-cell RNA-sequencing Gene activity in thousands of cells ~20000 rows x ~7000 columns Look for different cell types

Slide 3

Slide 3 text

How many clusters?

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Can we build a tree of clusters?

Slide 7

Slide 7 text

Nodes Resolution (k) Cluster Size

Slide 8

Slide 8 text

Edges Cluster from (lower resolution) Cluster to (higher resolution)

Slide 9

Slide 9 text

Edges Cluster from (lower resolution) Cluster to (higher resolution) Number Proportion

Slide 10

Slide 10 text

Proportions size = 100 60 40 k = 1 k = 2 n = 60 n = 40

Slide 11

Slide 11 text

Proportions 100 60 40 k = 1 k = 2 p from = n / size low n = 60 n = 40 p to = n / size high

Slide 12

Slide 12 text

Proportions 100 60 40 k = 1 k = 2 p from = 0.6 n = 60 n = 40 p to = 1.0 p from = 0.4 p to = 1.0

Slide 13

Slide 13 text

Proportions 100 60 40 k = 1 k = 2 p from = 0.67 60 40 p to = 1.0 40 30 40 20 30 10 30 k = 3 p from = 0.33 p to = 0.67 p from = 0.25 p to = 0.33 p from = 0.75 p to = 1.0

Slide 14

Slide 14 text

Algorithm For each resolution r = 1, ..., R - 1 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Slide 15

Slide 15 text

Algorithm For each resolution r = 1, ..., R - 1 Res1, Res2 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Slide 16

Slide 16 text

Algorithm For each resolution r = 1, ..., R - 1 For each unique cluster C low in r 1, 2 For each unique cluster C high in r + 1 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Slide 17

Slide 17 text

Algorithm For each resolution r = 1, ..., R - 1 For each unique cluster C low in r For each unique cluster C high in r + 1 1, 2, 3 Count C low , C high pairs Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Slide 18

Slide 18 text

Algorithm For each resolution r = 1, ..., R - 1 For each unique cluster C low in r For each unique cluster C high in r + 1 Count C low , C high pairs 1 Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4

Slide 19

Slide 19 text

Edge table Res1 Res2 Res3 S1 1 1 1 S2 2 1 2 S3 1 2 2 S4 1 3 3 S5 2 1 4 ResFrom ClusterFrom ResTo ClusterTo Number Res1 1 Res2 1 1 Res1 1 Res2 2 1 Res1 1 Res2 3 1 Res1 2 Res2 1 2 Res1 2 Res2 2 0 Res1 2 Res2 3 0

Slide 20

Slide 20 text

Tree Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Slide 21

Slide 21 text

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Slide 22

Slide 22 text

Resolution 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Summary Choosing the number of clusters is hard but important A clustering tree can help by showing: - Relationships between clusters - Which clusters are distinct - Where samples are changing

Slide 29

Slide 29 text

Acknowledgements Everyone that makes tools and data available Supervisors Alicia Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ oshlacklab.com R Tutorial lazappi.id.au/building-a-clustering-tree/ Slides speakerdeck.com/lazappi/building-a-clustering-tree