Construction of and efficient sampling from the simplicial configuration model

Construction of and efficient sampling from the simplicial configuration model

Presented at HONS 2017 (http://complexdata.businesscatalyst.com/2017/)

Paper: https://doi.org/10.1103/PhysRevE.96.032312
arXiv: https://arxiv.org/abs/1705.10298
Code: https://github.com/jg-you/scm/

Abstract
=======
Simplicial complexes are now a popular alternative to networks when it comes to describing the structure of complex systems, primarily because they encode multi-node interactions explicitly. With this new description comes the need for principled null models that allow for easy comparison with empirical data. We propose a natural candidate, the simplicial configuration model. The core of our contribution is an efficient and uniform Markov chain Monte Carlo sampler for this model. We demonstrate its usefulness in a short case study by investigating the topology of three real systems and their randomized counterparts (using their Betti numbers). For two out of three systems, the model allows us to reject the hypothesis that there is no organization beyond the local scale.

6f39fdc3a5c2445c6e3b32a19df9e3bb?s=128

Jean-Gabriel Young

June 20, 2017
Tweet

Transcript

  1. HONS C J.-G. Young , G. Petri , F. Vaccarino

    , , A. Patania , June rd, Département de physique, de génie physique, et d’optique, Université Laval, Québec, Canada ISI Foundation, Turin, Italy Dipartimento di Scienze Matematiche, Politecnico di Torino, Torino, Italy
  2. / There are multi-agent interactions in complex systems Simplicial complexes

    track this explicitly
  3. / Why simplicial complexes? No loss of information upon projection

  4. / Why simplicial complexes? No loss of information upon projection

  5. / Why simplicial complexes? Efficient compression of the structure

  6. / Why simplicial complexes? Efficient compression of the structure

  7. / Disease regulation dataset (facets : genes, nodes : human

    diseases) [Goh et al., PNAS, , ( )] Diseasome - facets column 1
  8. / Problem we address : How to assess the significance

    of the properties of simplicial complexes?
  9. / Outline Simplicial complexes and null models Simplicial configuration model

    The sampling problem and its solution
  10. / Outline Simplicial complexes and null models Simplicial configuration model

    The sampling problem and its solution The shape of real complex systems : random or organized? Homology and Betti numbers
  11. T /

  12. / The simplicial configuration model : basic definitions 2 3

    4 5 1 F Degree sequence : d (2, 2, 1, 2, 1) Size sequence : s (3, 3, 2)
  13. / The simplicial configuration model : basic definitions 2 3

    4 5 1 F Degree sequence : d (2, 2, 1, 2, 1) Size sequence : s (3, 3, 2)
  14. / The simplicial configuration model : basic definitions 2 3

    4 5 1 F Degree sequence : d (2, 2, 1, 2, 1) Size sequence : s (3, 3, 2)
  15. / The simplicial configuration model : the ensemble The Simplicial

    Configuration Model (SCM) is the distribution : Pr(K; d, s) 1/|Ω(d, s)| Ω(d, s) : number of simplicial complexes with sequences (d, s) 2 3 4 5 1 Generalizes [Courtney and Bianconi, Phys. Rev. E , ( )]
  16. / The simplicial configuration model : the ensemble The Simplicial

    Configuration Model (SCM) is the distribution : Pr(K; d, s) 1/|Ω(d, s)| Ω(d, s) : number of simplicial complexes with sequences (d, s) Generalizes [Courtney and Bianconi, Phys. Rev. E , ( )]
  17. S SCM /

  18. / Sampling : Change of representation 2 3 4 5

    1 S : B Factor graph ensemble with degree sequences (d, s) ...
  19. / Sampling : Change of representation 2 3 4 5

    1 S : B Factor graph ensemble with degree sequences (d, s) ...
  20. / Sampling : Change of representation 2 3 4 5

    1 S : B Factor graph ensemble with degree sequences (d, s) ... ... and two additional constraints (mapping bijective) ◦ No ◦ No
  21. / Sampling : Constraints Input sequences : (d, s) ([2,

    2, 1, 2, 1], [3, 3, 2]) Correct mapping 2 3 4 5 1 2 3 4 5 1 Bipartite graph Simplicial complex Output sequences : (d, s) ([2, 2, 1, 2, 1], [3, 3, 2])
  22. / Sampling : Constraints Input sequences : (d, s) ([2,

    2, 1, 2, 1], [3, 3, 2]) Constraint not respected : No multiedges 2 3 4 5 1 2 3 4 5 1 Bipartite graph Simplicial complex Output sequences : (d, s) ([1, 2, 1, 1, 1], [2, 2, 2])
  23. / Sampling : Constraints Input sequences : (d, s) ([2,

    2, 1, 2, 1], [3, 3, 2]) Constraint not respected : No included neighborhoods 2 3 4 5 1 2 3 4 5 1 Bipartite graph Simplicial complex Output sequences : (d, s) ([1, 1, 1, 1, 1], [3, 0, 2])
  24. / Consequence : Uniform distribution over bipartite graphs Uniform distribution

    over simplicial complexes
  25. / Consequence : Uniform distribution over bipartite graphs Uniform distribution

    over simplicial complexes
  26. / Sampling : Possible sampling strategies

  27. / Sampling : Possible sampling strategies Rejection sampling (stub matching

    + rejection) All bipartite graphs with sequences (d,s) No constraints violated Reject
  28. / Sampling : Possible sampling strategies Rejection sampling (stub matching

    + rejection) P : Far too many rejections! Loose upper bound : Pr[reject] > exp −1 2 d2 / d − 1 s2 / s − 1 All bipartite graphs with sequences (d,s) Reject
  29. / Sampling : Possible sampling strategies Rejection sampling (stub matching

    + rejection) P : Far too many rejections! Loose upper bound : Pr[reject] > exp −1 2 d2 / d − 1 s2 / s − 1 Markov Chain Monte Carlo The natural choice!
  30. 2 3 4 5 1 2 3 4 5 1

    2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 4 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 MCMC
  31. 2 3 4 5 1 2 3 4 5 1

    2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 4 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
  32. 2 3 4 5 1 2 3 4 5 1

    2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 4 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
  33. 2 3 4 5 1 2 3 4 5 1

    2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 4 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
  34. / Sampling : MCMC Details M Pick L ∼ P

    random edges in bipartite graph P can be , we use Pr[L ] exp[λ ]/Z Rewire edges. If multiedge or included neighbors, reject. Similar to [Miklós–Erdős–Soukup, Electron. J. Combin., , ( )] 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
  35. / Sampling : MCMC Details M Pick L ∼ P

    random edges in bipartite graph P can be , we use Pr[L ] exp[λ ]/Z Rewire edges. If multiedge or included neighbors, reject. Similar to [Miklós–Erdős–Soukup, Electron. J. Combin., , ( )] MCMC is uniform over Ω(d, s) Move set yields aperiodic chain ∗ Move set connects the space 100 101 102 103 104 Lmax 0 2500 5000 7500 Edit distance = 0 = 1 = +1 (a)
  36. T : ? /

  37. / SCM has a null model : concept N :

    Is the quantity f (X) close to f (K) for random simplicial complexes X ∼ SCM[d(K), s(K)]?
  38. / SCM has a null model : concept N :

    Is the quantity f (X) close to f (K) for random simplicial complexes X ∼ SCM[d(K), s(K)]? 50 100 150 200 10 3 10 2 10 1 100 Distribution Property I Pr[| f (K) − f (X)| < ] ≈ 1 Yes : K is typical, the local quantities d, s explain f.
  39. / SCM has a null model : concept N :

    Is the quantity f (X) close to f (K) for random simplicial complexes X ∼ SCM[d(K), s(K)]? 50 100 150 200 10 3 10 2 10 1 100 Distribution Property I Pr[| f (K) − f (X)| < ] 1 No : K is atypical, K is organized beyond the local scale.
  40. / So ... are real systems organized?

  41. / Disease regulation dataset (true system) (facets : genes, nodes

    : human diseases) [Goh et al., PNAS, , ( )] Diseasome - facets column 1
  42. / Disease regulation dataset (random instance) (facets : genes, nodes

    : human diseases) [Goh et al., PNAS, , ( )] Diseasome - SCM (facets column 1)
  43. / Crimes in St-Louis (true system) (facets : people, nodes

    : crimes) [Rosenfeld et al., ( )] Moreno crime - facets column 1
  44. / Crimes in St-Louis (random instance) (facets : people, nodes

    : crimes) [Rosenfeld et al., ( )] Moreno crime - SCM (facets column 1)
  45. / So ... are real systems random?

  46. / So ... are real systems random? Visibly not.

  47. / How to quantify this : Homology in seconds 2

    3 4 5 1 2 3 4 5 1 Q Done with homology. Results summarized with B β 0,β 1,... βk : counts the number of dimension k holes
  48. / Real systems : organized or random? Diseases Crime Diseasome

    - facets column 1 Moreno crime - facets column 1 0 50 10 3 10 2 10 1 100 Distribution 350 400 450 500 k 1 0 100 101 10 2 100 Degree Size (b) 50 100 150 200 k 10 3 10 2 10 1 100 Distribution 0 1 100 101 10 2 100 Degree Size (c) F – β0 , β1 in the SCM (symbol) vs real systems (horizontal lines)
  49. / Real systems : organized or random? Pollinators (real) Pollinators

    (random) Pollonators - facets column 0 Pollinators - SCM (facets column 0) 10 20 30 40 50 k 10 3 10 2 10 1 100 Distribution 0 1 101 10 2 10 1 100 Degree Size (a) F – β0 , β1 in the SCM (symbol) vs real systems (horizontal lines)
  50. / Software and tutorials : github.com/jg-you/scm

  51. / Software and tutorials : github.com/jg-you/scm

  52. / Software and tutorials : github.com/jg-you/scm

  53. / Selected references O ( ) J.-G. Young, G. Petri,

    F. Vaccarino and A. Patania, arxiv : . ( ) Equilibrium random ensembles ( ) O. Courtney and G. Bianconi, Phys. Rev. E , ( ) ( ) K. Zuev, O. Eisenberg and K. Krioukov, J. Phys. A , ( ) Sampling ( ) B. K. Fosdick, et al., arXiv : . ( )
  54. / Take-home message SCM : random simplicial complexes with fixed

    (d, s).
  55. / Take-home message SCM : random simplicial complexes with fixed

    (d, s). Efficient sampling with MCMC.
  56. / Take-home message SCM : random simplicial complexes with fixed

    (d, s). Efficient sampling with MCMC. Real system are not not always organized.
  57. / Take-home message SCM : random simplicial complexes with fixed

    (d, s). Efficient sampling with MCMC. Real system are not not always organized. Many open questions! Simpliciality, best distribution P, connectivity?
  58. / Take-home message SCM : random simplicial complexes with fixed

    (d, s). Efficient sampling with MCMC. Real system are not not always organized. Many open questions! Simpliciality, best distribution P, connectivity? Reference : arxiv.org/1705.10298 Software : github.com/jg-you/scm
  59. / Reference : arxiv.org/1705.10298 Software : github.com/jg-you/scm info@jgyoung.ca jgyoung.ca @_jgyou