Slide 1

Slide 1 text

Random Graph Model for Structural Analysis of Online Communications TMPA-2019 Maria Ivanova Ivan Sukharev National Research University Higher School of Economics (Moscow) November 8, 2019

Slide 2

Slide 2 text

Overview 1. State of a problem 2. Graph Definition 3. Related works 4. Adaptation of the Barabasi–Albert Growth Model 5. New Random Graph Model 6. Model Fitting Maria Ivanova Higher School of Economics November 8, 2019 2 / 14

Slide 3

Slide 3 text

State of a problem Some problems: 1. Processing costs 2. Disclosure of personal information 3. Statistical reliability Maria Ivanova Higher School of Economics November 8, 2019 3 / 14

Slide 4

Slide 4 text

Graph Definition Let Vn be a set of vertices: Vn = {1, ..., n} Then a set of all edges En for Vn is as follows: En = {{i, j} | i, j ∈ Vn, i = j} A graph is an ordered pair G := (Vn, E) where E ⊂ En. Maria Ivanova Higher School of Economics November 8, 2019 4 / 14

Slide 5

Slide 5 text

Related works Erdos-–Renyi Model The graph generation process consists in constructing a set of edges E for a given set of vertices Vn. The edge eij ∈ En is in the set of edges E of a random graph with probability p ∈ [0, 1]. Maria Ivanova Higher School of Economics November 8, 2019 5 / 14

Slide 6

Slide 6 text

Related works Scale-free network A scale-free network is a graph where the degree distribution of the vertices is described by a power-law, at least asymptotically. Therefore, the probability of a vertex having k edges at large values of k is proportional to k−γ: P (k) ∼ k−γ (1) Maria Ivanova Higher School of Economics November 8, 2019 6 / 14

Slide 7

Slide 7 text

Related works Barabasi–Albert Growth Model A new vertex vn+1 is added. Then, with probability pi , there is an edge between the new and the i-th vertices, where pi is calculated by the following formula: pi = deg (vi) n j=1 deg vj (2) Maria Ivanova Higher School of Economics November 8, 2019 7 / 14

Slide 8

Slide 8 text

Data We obtained 56003 articles with comments and constructed the comment graph for each of them. 24% — the number of first level comments. Ci is a comment of ith level Maria Ivanova Higher School of Economics November 8, 2019 8 / 14

Slide 9

Slide 9 text

Adaptation of the Barabasi–Albert Growth Model The probability of choosing the node is directly proportional to the number of edges attached to it. We add some parameter k to a root node degree, thereby increasing the likelihood of joining it rather than a comment. Maria Ivanova Higher School of Economics November 8, 2019 9 / 14

Slide 10

Slide 10 text

New Random Graph Model Growth algorithm: 1. With probability p, a new vertex joins the root of the tree, that is, the article itself. Its weight is recorded by the function φ, that is an indicator of interest to this message among other users. 2. With probability of 1 − p, a new vertex joins any vertex at random, except for the root of the tree. The probability of joining each of them is proportional to their weights. A new vertex takes up λ from the weight of the vertex to which it is attached. Maria Ivanova Higher School of Economics November 8, 2019 10 / 14

Slide 11

Slide 11 text

Model fitting Finding of parameter p The value of the first parameter p was calculated: 24% of all the vertices are neighbors to the article. Maria Ivanova Higher School of Economics November 8, 2019 11 / 14

Slide 12

Slide 12 text

Model fitting Parameter λ explanation When λ = 1, an entire weight of the vertex will go to the next level in the case of joining an edge. This indicates appearance of long leaves without branching. When λ = 0, a leaf will not go down beyond the second level and the nodes degrees in the first level will quickly grow. Maria Ivanova Higher School of Economics November 8, 2019 12 / 14

Slide 13

Slide 13 text

Model fitting Finding of parameter λ The vertex that first joins the first-level comment takes up λ of its weight. The value of this parameter is λ = 0.7629. Since the subtree’s weight is proportional to the number of comments at the end of the discussion, the distribution of the random variable φ could be found from the subtree comments number. Maria Ivanova Higher School of Economics November 8, 2019 13 / 14

Slide 14

Slide 14 text

Thank you for your attention! Random Graph Model for Structural Analysis of Online Communications Maria Ivanova Ivan Sukharev National Research University Higher School of Economics (Moscow) November 8, 2019 Maria Ivanova Higher School of Economics November 8, 2019 14 / 14