clustree: a package for producing clustering trees using ggraph

clustree: a package for producing clustering trees using ggraph

Clustering analysis is commonly used in many fields to group together similar samples. Many clustering algorithms exist, but all of them require some sort of user input to set parameters that affect the number of clusters produced. Deciding on the correct number of clusters for a given dataset is a difficult problem that can be tackled by looking at the relationships between samples at different resolutions. Here I will present clustree, an R package for producing clustering tree visualisations. These visualisations combine information from multiple clusterings with different resolutions, showing where new clusters come from and how samples change clusters as the number of clusters increases. Summarised information describing the samples in each cluster can be overlaid on the tree to give additional insight. I will also describe my experience developing clustree, particularly how I have made use of the ggraph package. The clustree package is available at https://github.com/lazappi/clustree and a preprint describing clustering trees can be read at https://www.biorxiv.org/content/early/2018/03/02/274035.

This talk was presented at userR! 2018 in Brisbane.

9d81fd2d95185ac557a4a6a1e2139657?s=128

Luke Zappia

July 12, 2018
Tweet

Transcript

  1. 2.

    My data OpenStax College, CC BY 3.0 via Wikimedia Commons

    Single-cell RNA-sequencing Gene activity in thousands of cells ~20000 features (genes) ~8000 samples (cells) Look for different cell types
  2. 4.
  3. 5.

    Sample K1 K2 K3 0 A A A 1 A

    B C 2 A A A 3 A A B 4 A B A 5 A A B 6 A B C 7 A A A 8 A A B 9 A B C
  4. 6.
  5. 7.
  6. 9.

    Clusters + transitions ID Resolution Cluster Size 1A 1 A

    10 2A 2 A 6 2B 2 B 4 3A 3 A 4 3B 3 B 3 3C 3 C 3 From To Number 1A 2A 6 1A 2B 4 2A 3A 3 2A 3B 3 2B 3A 1 2B 3C 3
  7. 10.
  8. 11.
  9. 12.

    Building a graph igraph::from_data_frame(edges, vertices = nodes) tidygraph::tbl_graph(edges, nodes) graph

    %>% activate(nodes) %>% filter(...) %>% mutate(...) %>% activate(edges) %>% filter(...) %>% mutate(...)
  10. 18.

    The Iris dataset Tiia Monto CC BY-SA 4.0, via Wikimedia

    Commons C T Johansson CC BY 3.0, via Wikimedia Commons Jefficus, via Wikimedia Commons Iris setosa Iris versicolor Iris virginica
  11. 20.
  12. 23.
  13. 24.
  14. 26.

    Acknowledgements Everyone that makes tools and data available MCRI Bioinformatics

    Belinda Phipson MCRI KDDR Alex Combes @_lazappi_ lazappi.github.io/clustree install.packages(“clustree”) Paper doi.org/10.1101/274035 Slides tinyurl.com/clustree-useR2018 Supervisors Alicia Oshlack Melissa Little