Multilayer Weighted Social Network Model

Multilayer Weighted Social Network Model Y. Murase, J. Török, H.-H.
Jo, K. Kaski, J. Kertész 1. RIKEN, Advanced Institute of Computational Science 2. Department of Theoretical Physics, Budapest University of Technology and Economics 3. Department of Biomedical Engineering and Computational Science, Aalto University School of Science 4. Centre for Network Science, Central European University

Large scale analysis on ICT data 2 Rapid development of
ICT has generated entirely new approaches in social sciences. Mobile phone call, social network service, scientiﬁc collaboration etc… Mobile phone data have a special role as the coverage in the adult population approaches 100% http://tnw.to/c4a8p

The Granovetterian structure 3 ks within the communities. As a
point of we randomly permute the link strengths d user pairs (Fig. 2B), in what would be dyadic hypothesis, we observe dramatically in the communities and more strong ties ommunities. Finally, even more divergent ata (Fig. 2A), we illustrate what the world predicted by the global efficiency principle ntrality, intercommunity ties (‘‘bridges’’) acommunity ties (‘‘local roads’’) weak (Fig. differences observed in Fig. 2, we measured al overlap of the neighborhood of two users g the proportion of their common friends 1) ϩ (kj Ϫ 1) Ϫ nij ), where nij is the number s of vi and vj , and ki (kj ) denotes the degree 0 Σ s2n s N / R C G 0 . 8 0 . 9 1 0 0 .0 5 0 . 1 A C b 0 25 50 75 100 125 0.25 0.75 20 10 f B D s2n s N / R GC high high f 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1 0.50 0 0.25 0.75 1 0.50 0 low low Fig. 3. The stability of the mobile communication network to link rem The control parameter f denotes the fraction of removed links. (A and C) A B 1 100 10 Intra-communy links are strong while Inter-community links are weak Difference of critical threshold Δfc is an indicator of the Granovetterian structure. Percolation analysis J. P. Onnela et al., Proc. Nat. Acad. Sci, 104, 7332 (2007) The strength of weak ties (M.Granovetter, 1973): Hypothesis about the local scale (micro-) structure of the society: 1. “The strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.” 2. “The stronger the tie between A and B, the larger the proportion of individuals S to whom both are tied.” Consequences on global scale (macro-) structure: Society consists of strongly connected / “wired” communities linked by weak ties. The latter hold the society together. M.S. Granovetter "The Strength of Weak Ties", American Journal of Sociology 78, 1360 (1973) Structure and Tie Strength: Interplay Local and Global Scales The strength of weak ties (M. Granovetter, 1973) Hypothesis about the relation between link topology and weight. “The stronger the tie between A and B, the larger proportion of individuals S to whom both are tied.”

Overlapping Communities 4 These networks possess rich metadata that al
University Home and work Family Buildings in same neighborhood a b c Figure 1 | Overlapping communities lead to dense n the discovery of a single node hierarchy. a, Local s networks is simple: an individual node sees the com b, Complex global structure emerges when every no displayed in a. c, Pervasive overlap hinders the disc organization because nodes cannot occupy multiple dendrogram, preventing a single tree from encodin d, e, An example showing link communities (colours matrix (e; darker entries show more similar pairs o dendrogram (e). f, Link communities from the full w around the word ‘Newton’. Link colours represent regions provide a guide for the eye. Link communit related to science and allow substantial overlap. No produced by experiment participants during free w LETTERS NATURE| kinship collaboration friendship school Y.-Y. Ahn et al., Nature, 466, 761 (2010) Real networks often have “community” structure. - community = “densely connected components” - extensively studied topic in network science S. Fortunato / Physics Reports 486 (2010) 75–174 Fig. 1. A simple graph with three communities, enclosed by the dashed circles. Reprinted figure with permi © 2009, by Springer. structure [12], or clustering, and is the topic of this review (for earlier reviews see Refs. [13–17]). clusters or modules, are groups of vertices which probably share common properties and/or pla graph. In Fig. 1 a schematic example of a graph with communities is shown. Society offers a wide variety of possible group organizations: families, working and friendsh nations. The diffusion of Internet has also led to the creation of virtual groups, that live on the Web Indeed, social communities have been studied for a long time [18–21]. Communities also occur in from biology, computer science, engineering, economics, politics, etc. In protein–protein interactio are likely to group proteins having the same specific function within the cell [22–24], in the grap they may correspond to groups of pages dealing with the same or related topics [25,26], in metabo related to functional modules such as cycles and pathways [27,28], in food webs they may ident and so on. Communities can have concrete applications. Clustering Web clients who have similar interes near to each other may improve the performance of services provided on the World Wide Web, in could be served by a dedicated mirror server [31]. Identifying clusters of customers with simila of purchase relationships between customers and products of online retailers (like, e.g., www.a to set up efficient recommendation systems [32], that better guide customers through the list o enhance the business opportunities. Clusters of large graphs can be used to create data structu store the graph data and to handle navigational queries, like path searches [33,34]. Ad hoc network networks formed by communication nodes acting in the same region and rapidly changing (beca instance), usually have no centrally maintained routing tables that specify how nodes have to com Grouping the nodes into clusters enables one to generate compact routing tables while the cho paths is still efficient [36]. Community detection is important for other reasons, too. Identifying modules and their classification of vertices, according to their structural position in the modules. So, vertices with clusters, i.e. sharing a large number of edges with the other group partners, may have an imp and stability within the group; vertices lying at the boundaries between modules play an import Usually they are “overlapping” “Link community” detection method in. Notable previous work removed currency metabolites before identifying meaningful community structure. The statistics presented here match current knowledge about the two systems, further con- firming the communities’ relevance. Having established that link communities at the maximal partition density are meaningful and relevant, we now show that the link dendrogram reveals meaningful communities at different scales. Figure 4a–c shows that mobile phone users in a community are spatially co-located. Figure 4a maps the most likely geographic loca- tions of all users in the network; several cities are present. In Fig. 4b, we show (insets) several communities at different cuts above the optimum threshold, revealing small, intra-city communities. Below the optimum threshold, larger, yet still spatially correlated, communities exist (Fig. 4c). Because we expect a tight-knit community to have only small geographical dispersion, the clustered structures on the map indicate that the communities are meaningful. The geographical correlation of each community does not suddenly break down, but is sustained over a wide range of thresholds. In Fig. 4d, we look more closely at the social network of the largest community in Fig. 4c, extracting the structure of its largest subcommunity along with its remaining hierarchy and revealing the small-scale structures encoded in the link dendrogram. This example provides evidence for the presence of spatial, hierarchical organization at a societal scale. To validate the hierarchical organization of communities quantitatively sented in Supplementary Information, section 7. Many cutting-edge networks are far from complete. For example, an ambitious project to map all protein–protein interactions in yeast is currently estimated to detect approximately 20% of connections14. As the rate of data collection continues to increase, networks become unities 106 105 104 of users 103 102 101 106 105 100 101 102 103 Number of communities Number of metabolites per community 103 102 101 100 0 50 100 150 200 Number of metabolites Number of communities per metabolite Metabolic H 2 O, H+ ATP ADP P i Threshold, t = 0.20 t = 0.24 t = 0.27 t = 0.27 50 km a 0.4 D 0.6 0.8 1 d Largest community Largest subcommunity Remaining hierarchy t e b c Word association Metabolic 0.8 1 Phone Largest community Second largest Third largest

Our goal: Developing a Model 5 Our goal: To investigate
the possibilities to model the combination of the Granovetterian structure and the overlapping communities. By constructing a model which incorporates basic link-formation process between individuals, we aim to understand the underlying mechanisms which leads to these structures.

Weighted Social Network (WSN) model 6 (1) Local Attachment (2)
Global Attachment (3) Node Deletion +δ +δ w0 +δ +δ +δ Node i chooses one of its neighbours j with probability proportional to wij . Then, node j chooses one of its neighbours except i, say k, with probability proportional to wjk . If node i and k are not connected, create a new link with pΔ. Weight of these links are increased by δ. with probability pr , node i is connected to a randomly chosen node. w0 With probability pd , node i loses all of its links. J. M. Kumpula et al., Phys. Rev. Lett., 99, 228701 (2007) undirected weighted network of N nodes. The links in the networks are updated by the following three rules.

Results of WSN model 0 0.2 0.4 0.6 0.8 1.0
RLCC (a) L=1 asc. desc. (b) L=2 asc. desc. 40 80 120 160 200 Susceptibility fraction of removed links 1.0 Δfc 0.0 Reproduces the Granovetterian structure A node tends to belong to a single node.

Multilayer model — a naive approach 9 Layer 1 Layer
2 Layer 1+2 Each layer is constructed independently by a WSN model. Aggregated network is constructed by summing up the edge weights. Introduction multiple layers to generate overlapping communities what we measure in data

Result of multilayer model 10 0 0.2 0.4 0.6 0.8
1.0 RLCC (a) L=1 asc. desc. (b) L=2 asc. desc. 0 40 80 120 160 200 0 0.2 0.4 0.6 0.8 1 Susceptibility f 0 0.2 0.4 0.6 0.8 1 f Single-layer Double-layer Introduction of the second layer destroys the Granovetterian structure. combining two independent layers leads to a high level of randomization. -> Interlayer correlation is needed.

Copy-and-Shuffle Multilayer Model 11 Layer 1 Layer 2 Layer 1+2
i j i j Shuffle How correlation between layers affect the properties of the network? Create the second layer by copying the first layer and then shuffle the fraction p of the nodes in the second layer. (Shuffling is just a relabelling of the node index.) p = 0 : Layer 1 and Layer 2 are identical p > 0 : Layer 1 and Layer 2 are correlated p = 1 : Layer 1 and Layer 2 are uncorrelated

Result of Copy-and-Shuffle model 12 0 40 0 0.2 0.4
0.6 0.8 1 f 0 0.2 0.4 0.6 0.8 1 f FIG. 1. (Color online) Link percolation analysis for L = 1 (left) and L = 2 (right). The upper figures show the relative size of the largest connected component, RLCC , as a function of the fraction of the removed links f. The lower figures show the susceptibility . Red solid (green dashed) lines correspond to the case when links are removed in ascending (descending) order of the link weights. The error bars show standard errors. (a) p=0 (b) p=0.01 (c) p=0.1 (d) p=1 FIG. 2. (Color) Snapshots of the copy-and-shu✏e model with di↵erent p shu✏ing parameter values and N = 300. Red (blue) links are in the first (second) layer, and green links are in both layers. weights: wij = PL k=1 wk ij , where wk ij is the weight of the 0.5 10-4 10-3 10-2 10-1 100 p fitting FIG. 3. (Color online) Percolation thresholds for various shuffle fraction values p for the copy-and-shu✏e model. The green upper and red lower lines denote the critical points fd c and fa c , respectively. The critical points are determined by the peak of the susceptibility. The points are calculated for 50 independent runs. The blue dashed line is calculated using Eq. (3). 50 realizations. To see whether the multi-layer model reproduces a realistic social network of the kind the mobile phone call (MPC) graph is a proxy [5, 6], a link percolation analysis is carried out for the model. We removed fraction f of the links from the generated networks in both ascending and descending orders, and measured the relative size RLCC of the largest connected component and the normalized susceptibility = P nss2/N, where ns is the number of components of size s and the sum is taken over all but the largest component. At the percolation threshold the order parameter RLCC vanishes and diverges in the thermodynamic limit. For finite systems the former quantity shows a fast decay and the latter one a sharp peak at the threshold value fc . The significant di↵erence fc in the thresholds for the two sequences of link removal is characteristic by the Granovetter structure; fc = fd c fa c , where the upper index d (a) stands for descending (ascending) sequences of removed links. Figure 1 shows RLCC and as a function of f for a single-layer network (L = 1) and a double-layer network (L = 2). The two plots in each figure show the results for ascending and descending orders. For L = 1 we get fc ⇡ 0.35, while for L = 2 the figure shows that the percolation threshold for ascending order fa c is not significantly di↵erent from that for descending order fd c (i.e., fc ⇡ 0). The percolation thresholds for L = 2 are approxi- 0 0.1 0.2 0.3 0.4 10-4 10-3 10-2 10-1 100 0 1 2 3 ∆fc c/c0 p ∆fc c/c0 FIG. 4. (Color online) Characteristic quantities for the copy- and-shu✏e multi-layer WSN model. The di↵erence fc between the percolation thresholds decreases with the shu✏ing probability p, while c/c0 , the ratio of the average number of communities a node belongs to at parameter value p and Similarity between the layers assures the Granovetterian structure for p<<1. However, in the regime where Δfc is significantly larger than zero, the network does not show an enhancement of the overlapping communities.

Geographic Multilayer WSN model 13 rij WSN ations are gnificantly
Previous geographic the era of MPC data cation in- e of their he gravity sider now geographic nodes are h periodic s are fixed the proba- bal attach- model we presented in the previous Section. When ↵ is larger, the nodes tend to be connected with geograph- ically closer nodes yielding the correlation between the networks in di↵erent layers. Since only non-connected pairs are considered, the probability for node i to make connection with a node j which is not yet connected to i is given by pij = r ↵ ij P k2Si r ↵ ik , (4) where Si is the set of the nodes not connected to the node i. The other rules such as LA or ND are kept the same as in the original WSN model. Figure 5 shows the results for link percolation analysis for the geographic model with ↵ = 0 and 6. Because the network for larger ↵ has a smaller average degree, we used a larger value of pr (0.002) in order to keep the average degree comparable to the results for the non-geographic model (hki = 18.0 for ↵ = 6 and hki = 27.6 for ↵ = 0). As shown in the figure, the network for ↵ = 6 exhibits a Granovetterian structure as fa c and fd c are significantly Probability for Global Attachment ∝ r-α α = 0 : Non geographic α > 0 : Geographic Geographic positions are fixed and shared by both two layers. → Interlayer correlation. A model embedded into a two-dimensional geographic space. (a) α=0 (b) α=2 (c) α=4 (d) α=6 0.80 0.84 0.88 0.92 0.96 1.00 fc ascending descending 8 10 12 14 0 2 4 6 k / L α L=1 L=2 (a) α=0 (b) α=2 (c) α=4 (d) α=6 F t t c s o t a

Result of the geographic model 14 5 0 0.2 0.4
0.6 0.8 1 RLCC (a) α=0 asc. desc. (b) α=6 asc. desc. 0 200 400 600 800 0.5 0.6 0.7 0.8 0.9 1.0 Susceptibility f 0.5 0.6 0.7 0.8 0.9 1.0 f FIG. 5. (Color online) Link percolation analysis for (a) ↵ = 0 and (b) ↵ = 6. The upper ﬁgures show the relative size of the largest connected component, RLCC , as a function of the fraction of the removed links f. The lower ﬁgures show the susceptibility . Note that the scale of the horizontal axis is di↵erent from Fig. 1. Red solid (green dashed) lines correspond to the case when links are removed in ascending (descending) order of the link weight. The results are obtained by 50 independent samples. The error bars show standard errors. 0 0.05 0.10 0.15 0 5 10 15 20 1 2 3 4 ∆fc c/c0 α ∆fc c/c0 both the Granovetterian structure and the enhancement of overlapping communities due to the multilayer structure 0.5 1.0 0.5 1.0

Conclusions 15 • To model two important properties of the
social network: • the Granovetter-type weight-topology relation (Δfc > 0) • the large amount of overlapping communities (c/c0 > 1) • The naive multilayer model and the Copy-and-Shufﬂe model fail to reproduce these required properties. • The geographic model maintains both requirements. The geographic correlations can change the picture drastically in a multilayer model. Y. Murase et al., Phys.Rev.E, 90, 052810 (2014) “Multilayer Weighted Social Network Model” Youtube: https://www.youtube.com/watch?v=0tUFAnYjXLQ (search “weighted social network model” on youtube. also ﬁnd a link to the source code repository of the simulation code.)

Multilayer Weighted Social Network Model

Multilayer Weighted Social Network Model

Yohsuke Murase

More Decks by Yohsuke Murase

Other Decks in Research

Featured

Transcript

Multilayer Weighted Social Network Model Y. Murase, J. Török, H.-H.

Large scale analysis on ICT data 2 Rapid development of

The Granovetterian structure 3 ks within the communities. As a

Overlapping Communities 4 These networks possess rich metadata that al

Our goal: Developing a Model 5 Our goal: To investigate

Weighted Social Network (WSN) model 6 (1) Local Attachment (2)

7

Results of WSN model 0 0.2 0.4 0.6 0.8 1.0

Multilayer model — a naive approach 9 Layer 1 Layer

Result of multilayer model 10 0 0.2 0.4 0.6 0.8

Copy-and-Shufﬂe Multilayer Model 11 Layer 1 Layer 2 Layer 1+2

Result of Copy-and-Shufﬂe model 12 0 40 0 0.2 0.4

Geographic Multilayer WSN model 13 rij WSN ations are gniﬁcantly

Result of the geographic model 14 5 0 0.2 0.4

Conclusions 15 • To model two important properties of the