Hypergraph reconstruction from network data

Slide 1

Slide 1 text

H HONS Jean-Gabriel Young Department of Computer Science, University of Vermont, Burlington, VT, USA jg-you.github.io @_jgyou [email protected] Joint work with Giovanni Petri and Tiago P. Peixoto arXiv:2008.04948

Slide 2

Slide 2 text

A simple inequality with big consequences [a,b,c] [a,b] [b,c] [a,c] =

Slide 3

Slide 3 text

Higher-order networks = new dynamics & new perspectives

Slide 4

Slide 4 text

To use higher-order network, you need higher-order data DATA about interactions: [a,b,c],[a,d],[d,c],[c,e] a b c e d [a,b,c] [a,d] [c,d] [c,e] ] SIMPLICIAL COMPLEX a b c e d HYPERGRAPH a b c d e BIPARTITE GRAPH GRAPH a b c e d

Slide 5

Slide 5 text

This talk : How to turn graph data into higher-order networks

Slide 6

Slide 6 text

Graph data only : frequently the case ⊲ Social surveys ⊲ Observational studies of animals ⊲ Plant interactions ⊲ Neuronal network ⊲ Biochemical networks ⊲ Host populations for epidemics ⊲ Hyperlinks networks ⊲ Web of thrust data ⊲ Power grids

Slide 7

Slide 7 text

Slide 8

Slide 8 text

The problem A “What is the hypergraph that best explains the graph data ?” Network data Higher-order interactions Diﬃculty comes from the multitude of possible answers.

Slide 9

Slide 9 text

Dealing with ill-posedness A What is the hypergraph that best explains the graph data ... AND minimizes some cost? (ad hoc regularization) O What are hypergraphs that can plausibly explain the graph data ? (“Bayesian regularization”) ⊲ From ﬁrst-principle ⊲ Easily extensible ⊲ Automatic inference ⊲ Fits within theory of network inference

Slide 10

Slide 10 text

Slide 11

Slide 11 text

The formalized problem Data generation process B What are the hypergraphs that can plausibly explain the graph data ? ( | ) ∝ ( | ) ( ) Probabilities deﬁned by a generative model : ⊲ Hypergraph prior ( ) Prob. of generating are particular hypergraph ⊲ Projection component ( | ) Prob. of generating based on

Slide 12

Slide 12 text

Model : Hypergraph prior Data generation process Prior ( | ) ∝ ( | ) ( ) Poisson Random Hypergraph Model (PRHM) R. W. Darling and J. R. Norris. Ann. Appl. Probab. ( ). Connect every sets of nodes of size = 2, 3, ..., with a Poisson number of hyperedge (i.i.d.) ( 1,.., | ) = 1,.., 1,.., ! − , ( | ) = =2 1,..., ( 1,.., | ) * We improve the ﬁt with a hierarchical model ∼ Exp( ) and ﬁxed empirically

Slide 13

Slide 13 text

Model : Projection component Data generation process Projection ( | ) ∝ ( | ) ( ) Projection operation G( ) : hypergraphs to graphs ( , ) ∈ (G( )) if and only if ( , ) ⊂ ℎ for some ℎ ⊂ ( ) All project to G None project to G ( | ) = 1 if = G( ), 0 otherwise.

Slide 14

Slide 14 text

Estimation in a nutshell E Given data , how do we... ⊲ Find ∗ = argmax ( | ) ? ⊲ Evaluate ( , ) ( | )? Method : Factor graph MCMC ⊲ Encode hypergraph as factor graph ( ) ⊲ MCMC on ( | ) by changing factors at random 1 2 3 4 5 A 12 A 23 A 23 A 24 A 34 A 123 A 234 A 45

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Planted hypergraph recovery RQ : What happens when we feed a known hypergraph to the method? Generate Project as = G( ) Guess ∗

Slide 17

Slide 17 text

Planted hypergraph recovery RQ : What happens when we feed a known hypergraph to the method? Generate +noise Project as = G( ) Guess ∗

Slide 18

Slide 18 text

Planted hypergraph recovery RQ : What happens when we feed a known hypergraph to the method? Generate +noise Project as = G( ) +randomize Guess ∗

Slide 19

Slide 19 text

Interlude : MDL Minimum description length (MDL) : An information theoretic interpretation of maximum a posteriori (MAP) inference ⊲ Posterior probability a solution (the bigger the better) log ( | ) = log ( | ) + log ( ) ⊲ Cost of a solution (the smaller the better) − log ( | ) = −log ( | ) − log ( ) − log ( ) : Cost of communicating − log ( | ) : Cost of communicating with shared knowledge of

Slide 20

Slide 20 text

Planted hypergraph recovery 0 50 100 150 200 Additional edges 1000 1500 2000 2500 3000 3500 Description length Randomized Planted interactions [bits]

Slide 21

Slide 21 text

Empirical systems NCAA Footba data Nodes ( ) : teams Edges ( ) : Played during the Fall season k = 9 k = 7, 8 k = 5, 6 k = 2, 3, 4 Size k 2 3 4 5 6 7 8 9 Hyperedge size 0 20 40 60 80 100 120 Number of hyperedges Best fit Max. cliques Uncertain edges Uncertain triangles

Slide 22

Slide 22 text

Empirical systems : broader view 0.0 0.5 Clustering coefficient 102 104 106 Compression [bits] 101 Average degree k 102 104 106 Compression [bits] (a) (c) (b) (e) (d) Football 103 104 105 106 107 Description length [bits] PGP web of trust Dictionary entries Global airport network Political blogs Western states power grid E-mail Scientific coauthorships C. elegans neural network Florida food web (dry) Florida food web (wet) Add Health study American college football Characters in Les Misérables Dolphin social network Southern women interactions Zachary's karate club Max. cliques Best fit 0.0 0.5 Clustering coefficient 2 3 4 5 Average interaction size 0 0 101 Average degree k 2 3 4 5 Average interaction size Political blogs S. Women

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Open problems U HON ? Insight : We can compress a handful of system as hypergraph Question : Systematic study I Insight : We can recover from Question : How useful is when predic- ting dynamics on ? On latent ? F Lesson : MCMC can be slow Question : Borrow from minimal clique cover algorithms? B PRHM Lesson : PRHM is not a realistic process Question : What is the impact of changing models?

Slide 25

Slide 25 text

Take-home message ⊲ Networks : often pairwise P.O.V. on higher-order networks. ⊲ Can reconstruct these HONs, but it is an ill-posed problem. ⊲ We solved the problem with Bayesian reconstruction techniques. ⊲ Found higher-order interactions in empirical and synthetic data. ⊲ ∃ many open problems. ⊲ References : arXiv: . ⊲ Software : graph-too (graph-tool.skewed.de)