Robust and fully Bayesian inference of complex networks from noisy data

Robust and fully Bayesian inference of complex networks from noisy data

Paper: https://arxiv.org/abs/2008.03334
Code: https://github.com/jg-you/noisy-networks-measurements
Tutorial: https://github.com/jg-you/noisy-networks-measurements/blob/master/tutorial/tutorial.ipynb

Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

6f39fdc3a5c2445c6e3b32a19df9e3bb?s=128

Jean-Gabriel Young

September 21, 2020
Tweet

Transcript

  1. N S MMXX R Jean-Gabriel Young Center for the Study

    of Complex Systems, University of Michigan, Ann Arbor, MI, USA Department of Computer Science, University of Vermont, Burlington, VT, USA jg-you.github.io @_jgyou jean-gabriel.young@uvm.edu Joint work with George T. Cantwell and M.E.J. Newman
  2. In the empirical sciences, measurements are treated as noisy observation

    of reality. 184 186 Height (cm) 0.0 0.1 0.2 0.3 0.4 Probability
  3. In network science, measurements are treated as direct observations of

    reality. 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 1 2 3 4 5 6 7 8 9 10 11 12 13
  4. This talk : How to convert noisy measurements to network*

    *efficiently, from first principles, and EASILY
  5. How are network data born?

  6. Statistical approach to network measurement ( of ) B (

    , | ) ∝ ( | , ) ( | ) ( ) Probabilities defined by a measurement model : ⊲ Prior ( ) What is the likely range of parameters? ⊲ Network model ( | ) What class of networks are we considering? ⊲ Data model ( | , ) How would a network lead to data ?
  7. Statistical approach to network measurement ( of ) B (

    , | ) ∝ ( | , ) ( | ) ( ) Statistical measurement can mean any of the following : ⊲ Computing the distribution ( | ). ⊲ Estimating the probability of every edge ( = 1| ). ⊲ Estimating the probability of triangles ( = 1 ∧ = 1 ∧ = 1| ). ⊲ And more.. “Just” averages of the form ∫ ( , , ) ( , | )
  8. How can we compute ∫ ( , , ) (

    , | ) ... ... for your data ? ... with a model that suits your measurements? ... easily? ... and efficiently?
  9. The method in a nutshell ( of ) Key insight

    : consider a smaller (but expressive) class of models. ( ) = arbitrary ( | ) = [ (0)]1− [ (1)] ( | , ) = [ (0)]1− [ (1)] F “ ” : Network model (1) : Prob. of an edge ( , ) (0) : Prob. of no edge ( , ) Data model (1) : Prob. of , when ( , ) is an edge (0) : Prob. of when ( , ) is not an edge
  10. The method in a nutshell ( of ) Why is

    it helpful? Because we know the closed forms : ( | ) = [ (0) (0) + (1) (1)] ( | , ) = [ ( )] [1 − ( )]1− With these we can evaluate ∫ ( , , ) ( , | ) = ∫ ( , , ) ( | , ) ( | ) ≈ 1 ( , , ) in two easy steps : . Draw from ( | ) (automatic with stan, pymc, etc.) . Draw from ( | , ) (just coin flips)
  11. Example of model # of times dolphins seen swimming together

    2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations [ R. C. Connor, R. A. Smolker and A. F. Richards, ( )] O Network model (0) = 1 − (1) = Data model | = 0 ∼ Poisson( 0 ) i.e. (0) = ( 0) − 0 / ! | = 1 ∼ Poisson( 1 ) i.e. (0) = ( 1) − 1 / ! Prior : 0 < 1
  12. The method in action Dolphin data set, with the example

    model Input Outputs 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13
  13. The method in action Dolphin data set, with the example

    model Input Outputs 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 0.7 0.8 0.9 1.0 Transitivity 0 5 10 15 20 25 Density Thresholded 0.200 0.205 0.210 0.215 Mean eigenvector centrality 0 50 100 150 200 250 300 350 Density Thresholded
  14. Actual applications P - [JGY, F. S. Valdovinos, M. E.

    J. Newman, bioarxiv: ( ).] M I [K. Leyba et al., forthcoming ( ).]
  15. Take-home message ⊲ Measurements are not networks. ⊲ Networks from

    measurements is as an inference problem. ⊲ We delineated models for which this problem is easy. ⊲ References : arXiv: . (method) and bioarxiv: (application). ⊲ Software : github.com/jg-you/noisy-networks-measurements ⊲ Tutorial : https://bit. y/32bnKsv
  16. Complete tutorial available on the repository! https://bit. y/32bnKsv