97

# Robust and fully Bayesian inference of complex networks from noisy data

Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

## Jean-Gabriel Young

September 21, 2020

## Transcript

1. ### N S MMXX R Jean-Gabriel Young Center for the Study

of Complex Systems, University of Michigan, Ann Arbor, MI, USA Department of Computer Science, University of Vermont, Burlington, VT, USA jg-you.github.io @_jgyou jean-gabriel.young@uvm.edu Joint work with George T. Cantwell and M.E.J. Newman
2. ### In the empirical sciences, measurements are treated as noisy observation

of reality. 184 186 Height (cm) 0.0 0.1 0.2 0.3 0.4 Probability
3. ### In network science, measurements are treated as direct observations of

reality. 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 1 2 3 4 5 6 7 8 9 10 11 12 13
4. ### This talk : How to convert noisy measurements to network*

*eﬃciently, from ﬁrst principles, and EASILY

6. ### Statistical approach to network measurement ( of ) B (

, | ) ∝ ( | , ) ( | ) ( ) Probabilities deﬁned by a measurement model : ⊲ Prior ( ) What is the likely range of parameters? ⊲ Network model ( | ) What class of networks are we considering? ⊲ Data model ( | , ) How would a network lead to data ?
7. ### Statistical approach to network measurement ( of ) B (

, | ) ∝ ( | , ) ( | ) ( ) Statistical measurement can mean any of the following : ⊲ Computing the distribution ( | ). ⊲ Estimating the probability of every edge ( = 1| ). ⊲ Estimating the probability of triangles ( = 1 ∧ = 1 ∧ = 1| ). ⊲ And more.. “Just” averages of the form ∫ ( , , ) ( , | )
8. ### How can we compute ∫ ( , , ) (

, | ) ... ... for your data ? ... with a model that suits your measurements? ... easily? ... and eﬃciently?
9. ### The method in a nutshell ( of ) Key insight

: consider a smaller (but expressive) class of models. ( ) = arbitrary ( | ) = [ (0)]1− [ (1)] ( | , ) = [ (0)]1− [ (1)] F “ ” : Network model (1) : Prob. of an edge ( , ) (0) : Prob. of no edge ( , ) Data model (1) : Prob. of , when ( , ) is an edge (0) : Prob. of when ( , ) is not an edge
10. ### The method in a nutshell ( of ) Why is

it helpful? Because we know the closed forms : ( | ) = [ (0) (0) + (1) (1)] ( | , ) = [ ( )] [1 − ( )]1− With these we can evaluate ∫ ( , , ) ( , | ) = ∫ ( , , ) ( | , ) ( | ) ≈ 1 ( , , ) in two easy steps : . Draw from ( | ) (automatic with stan, pymc, etc.) . Draw from ( | , ) (just coin ﬂips)
11. ### Example of model # of times dolphins seen swimming together

2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations [ R. C. Connor, R. A. Smolker and A. F. Richards, ( )] O Network model (0) = 1 − (1) = Data model | = 0 ∼ Poisson( 0 ) i.e. (0) = ( 0) − 0 / ! | = 1 ∼ Poisson( 1 ) i.e. (0) = ( 1) − 1 / ! Prior : 0 < 1
12. ### The method in action Dolphin data set, with the example

model Input Outputs 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13
13. ### The method in action Dolphin data set, with the example

model Input Outputs 2 4 6 8 10 12 Dolphin ID 2 4 6 8 10 12 Dolphin ID Dolphin companionship 0 5 10 15 20 25 30 Number of observations 0.7 0.8 0.9 1.0 Transitivity 0 5 10 15 20 25 Density Thresholded 0.200 0.205 0.210 0.215 Mean eigenvector centrality 0 50 100 150 200 250 300 350 Density Thresholded
14. ### Actual applications P - [JGY, F. S. Valdovinos, M. E.

J. Newman, bioarxiv: ( ).] M I [K. Leyba et al., forthcoming ( ).]
15. ### Take-home message ⊲ Measurements are not networks. ⊲ Networks from

measurements is as an inference problem. ⊲ We delineated models for which this problem is easy. ⊲ References : arXiv: . (method) and bioarxiv: (application). ⊲ Software : github.com/jg-you/noisy-networks-measurements ⊲ Tutorial : https://bit. y/32bnKsv