# Random Graph Model for Structural Analysis of Online Communications

Ivan Sukharev and Maria Ivanova

International Conference on Software Testing, Machine Learning and Complex Process Analysis (TMPA-2019)
7-9 November 2019, Tbilisi

TMPA Conference website https://tmpaconf.org/

## ExactproPRO

November 08, 2019

## Transcript

1. ### Random Graph Model for Structural Analysis of Online Communications TMPA-2019

Maria Ivanova Ivan Sukharev National Research University Higher School of Economics (Moscow) November 8, 2019
2. ### Overview 1. State of a problem 2. Graph Definition 3.

Related works 4. Adaptation of the Barabasi–Albert Growth Model 5. New Random Graph Model 6. Model Fitting Maria Ivanova Higher School of Economics November 8, 2019 2 / 14
3. ### State of a problem Some problems: 1. Processing costs 2.

Disclosure of personal information 3. Statistical reliability Maria Ivanova Higher School of Economics November 8, 2019 3 / 14
4. ### Graph Definition Let Vn be a set of vertices: Vn

= {1, ..., n} Then a set of all edges En for Vn is as follows: En = {{i, j} | i, j ∈ Vn, i = j} A graph is an ordered pair G := (Vn, E) where E ⊂ En. Maria Ivanova Higher School of Economics November 8, 2019 4 / 14
5. ### Related works Erdos-–Renyi Model The graph generation process consists in

constructing a set of edges E for a given set of vertices Vn. The edge eij ∈ En is in the set of edges E of a random graph with probability p ∈ [0, 1]. Maria Ivanova Higher School of Economics November 8, 2019 5 / 14
6. ### Related works Scale-free network A scale-free network is a graph

where the degree distribution of the vertices is described by a power-law, at least asymptotically. Therefore, the probability of a vertex having k edges at large values of k is proportional to k−γ: P (k) ∼ k−γ (1) Maria Ivanova Higher School of Economics November 8, 2019 6 / 14
7. ### Related works Barabasi–Albert Growth Model A new vertex vn+1 is

added. Then, with probability pi , there is an edge between the new and the i-th vertices, where pi is calculated by the following formula: pi = deg (vi) n j=1 deg vj (2) Maria Ivanova Higher School of Economics November 8, 2019 7 / 14
8. ### Data We obtained 56003 articles with comments and constructed the

comment graph for each of them. 24% — the number of first level comments. Ci is a comment of ith level Maria Ivanova Higher School of Economics November 8, 2019 8 / 14
9. ### Adaptation of the Barabasi–Albert Growth Model The probability of choosing

the node is directly proportional to the number of edges attached to it. We add some parameter k to a root node degree, thereby increasing the likelihood of joining it rather than a comment. Maria Ivanova Higher School of Economics November 8, 2019 9 / 14
10. ### New Random Graph Model Growth algorithm: 1. With probability p,

a new vertex joins the root of the tree, that is, the article itself. Its weight is recorded by the function φ, that is an indicator of interest to this message among other users. 2. With probability of 1 − p, a new vertex joins any vertex at random, except for the root of the tree. The probability of joining each of them is proportional to their weights. A new vertex takes up λ from the weight of the vertex to which it is attached. Maria Ivanova Higher School of Economics November 8, 2019 10 / 14
11. ### Model fitting Finding of parameter p The value of the

first parameter p was calculated: 24% of all the vertices are neighbors to the article. Maria Ivanova Higher School of Economics November 8, 2019 11 / 14
12. ### Model fitting Parameter λ explanation When λ = 1, an

entire weight of the vertex will go to the next level in the case of joining an edge. This indicates appearance of long leaves without branching. When λ = 0, a leaf will not go down beyond the second level and the nodes degrees in the first level will quickly grow. Maria Ivanova Higher School of Economics November 8, 2019 12 / 14
13. ### Model fitting Finding of parameter λ The vertex that first

joins the first-level comment takes up λ of its weight. The value of this parameter is λ = 0.7629. Since the subtree’s weight is proportional to the number of comments at the end of the discussion, the distribution of the random variable φ could be found from the subtree comments number. Maria Ivanova Higher School of Economics November 8, 2019 13 / 14
14. ### Thank you for your attention! Random Graph Model for Structural

Analysis of Online Communications Maria Ivanova Ivan Sukharev National Research University Higher School of Economics (Moscow) November 8, 2019 Maria Ivanova Higher School of Economics November 8, 2019 14 / 14