Slide 1

Slide 1 text

Claire McWhite @clairemcwhite 4/25/18 A quick intro to networks

Slide 2

Slide 2 text

Networks are …  First a data structure  Second a data visualization

Slide 3

Slide 3 text

A minimal network Node A Node B Edge A-B node1 node2 A B Data table =

Slide 4

Slide 4 text

Edges can have attributes node1 node2 weight relationship direction A B 5 red E A C 2 red E A D 5 blue F B C 1 blue EF C D 5 blue EF B A C D

Slide 5

Slide 5 text

Nodes can have attributes node size color A 1 red B 2 red C 3 blue D 4 blue B A C D

Slide 6

Slide 6 text

D Network layout algorithms node1 node2 A B A C A D B C C D B A C Network data structure text file Visualized network Network layout algorithm Calculates node coordinates Not very interpretable by humans Ideally more interpretable by humans

Slide 7

Slide 7 text

Introduction to ggraph: Layouts – Thomas Lin Pederson https://www.data-imaginist.com/2017/ggraph-introduction-layouts/

Slide 8

Slide 8 text

Hierarchical layouts Introduction to ggraph: Layouts – Thomas Lin Pederson https://www.data-imaginist.com/2017/ggraph-introduction-layouts/ Yfiles circular layouts

Slide 9

Slide 9 text

I use mostly force-directed layouts

Slide 10

Slide 10 text

node1 node2 weight A B 1 B C 10 A B C A B C A B C 1 10 Edges act as springs in a force-directed layout

Slide 11

Slide 11 text

CC-BY martinandjean.ch 2016

Slide 12

Slide 12 text

CC-BY martinandjean.ch 2016

Slide 13

Slide 13 text

There is a sweet spot of network size for most network layout algorithms. Then you enter the hairball zone https://www.systemsbiology.org/news/2012/12/19/combing-the-hairball/

Slide 14

Slide 14 text

When networks get too large, they may become uninterpretable hairballs

Slide 15

Slide 15 text

Steps to break up hairball networks Apply a threshold for edge weights. ------------------------ Thresholding -------------------

Slide 16

Slide 16 text

Steps to break up hairball networks Apply a clustering algorithm to network data ------------------------ Clustering -------------------

Slide 17

Slide 17 text

My data: No clustering 10 million edge, 1 million node network, Large Graph Layout (LGL)

Slide 18

Slide 18 text

MCL clustering strength 3

Slide 19

Slide 19 text

MCL clustering strength 4

Slide 20

Slide 20 text

MCL clustering strength 5

Slide 21

Slide 21 text

MCL clustering strength 8

Slide 22

Slide 22 text

Software  R – ggraph/tidygraph  Python – NetworkX  Gui -Cytoscape (lots of tools for biology)  Gui – Gephi  Gui - Graphviz

Slide 23

Slide 23 text

Large graph layout - LGL Protein homology graph – Edward Marcotte and Alex Adai - MOMA • My fav • Minimizes hairballness of larger networks • Scalable (handles over 1 million node networks) Adai 2004, LGL: creating a map of protein function with an algorithm for visualizing very large biological networks.

Slide 24

Slide 24 text

Post-talk addendum  It would be great if someone could make a LGL network layout algorithm plugin for cytoscape or any network visualization software.  The program currently only runs on Linux  https://github.com/TheOpteProject/LGL  My user guide -> http://clairemcwhite.github.io/lgl-guide/  Original paper describing algorithm: https://www.sciencedirect.com/science/article/pii /S0022283604004851?via%3Dihub

Slide 25

Slide 25 text

If your data looks like this at all, throw it in a network layout. node1 node2 A B A C A D B C C D (but don’t overinterpret any one layout, and remember that network layouts are fickle and minor changes to thresholds and data input choices can change everything)