150

# Random Graph Model for Structural Analysis of Online Communications

Ivan Sukharev and Maria Ivanova

International Conference on Software Testing, Machine Learning and Complex Process Analysis (TMPA-2019)
7-9 November 2019, Tbilisi

TMPA Conference website https://tmpaconf.org/ ## ExactproPRO

November 08, 2019

## Transcript

1. Random Graph Model for Structural Analysis
of Online Communications
TMPA-2019
Maria Ivanova
Ivan Sukharev
National Research University
Higher School of Economics (Moscow)
November 8, 2019

2. Overview
1. State of a problem
2. Graph Definition
3. Related works
4. Adaptation of the Barabasi–Albert Growth Model
5. New Random Graph Model
6. Model Fitting
Maria Ivanova Higher School of Economics November 8, 2019 2 / 14

3. State of a problem
Some problems:
1. Processing costs
2. Disclosure of personal
information
3. Statistical reliability
Maria Ivanova Higher School of Economics November 8, 2019 3 / 14

4. Graph Definition
Let Vn be a set of vertices:
Vn = {1, ..., n}
Then a set of all edges En for
Vn is as follows:
En = {{i, j} | i, j ∈ Vn, i = j}
A graph is an ordered pair
G := (Vn, E)
where E ⊂ En.
Maria Ivanova Higher School of Economics November 8, 2019 4 / 14

5. Related works
Erdos-–Renyi Model
The graph generation process consists in constructing a set of
edges E for a given set of vertices Vn. The edge eij ∈ En is in
the set of edges E of a random graph with probability p ∈ [0, 1].
Maria Ivanova Higher School of Economics November 8, 2019 5 / 14

6. Related works
Scale-free network
A scale-free network is a graph where the degree distribution of
the vertices is described by a power-law, at least asymptotically.
Therefore, the probability of a vertex having k edges at large
values of k is proportional to k−γ:
P (k) ∼ k−γ (1)
Maria Ivanova Higher School of Economics November 8, 2019 6 / 14

7. Related works
Barabasi–Albert Growth Model
A new vertex vn+1
is added. Then, with probability pi
, there is
an edge between the new and the i-th vertices, where pi
is
calculated by the following formula:
pi =
deg (vi)
n
j=1
deg vj
(2)
Maria Ivanova Higher School of Economics November 8, 2019 7 / 14

8. Data
We obtained 56003 articles
constructed the comment
graph for each of them.
24% — the number of first
Ci
is a comment of ith level
Maria Ivanova Higher School of Economics November 8, 2019 8 / 14

of the Barabasi–Albert Growth Model
The probability of choosing the node is directly proportional to
the number of edges attached to it. We add some parameter k
to a root node degree, thereby increasing the likelihood of
joining it rather than a comment.
Maria Ivanova Higher School of Economics November 8, 2019 9 / 14

10. New Random Graph Model
Growth algorithm:
1. With probability p, a new vertex joins the root of the tree,
that is, the article itself. Its weight is recorded by the
function φ, that is an indicator of interest to this message
among other users.
2. With probability of 1 − p, a new vertex joins any vertex at
random, except for the root of the tree. The probability of
joining each of them is proportional to their weights. A new
vertex takes up λ from the weight of the vertex to which it is
attached.
Maria Ivanova Higher School of Economics November 8, 2019 10 / 14

11. Model fitting
Finding of parameter p
The value of the first
parameter p was calculated:
24% of all the vertices are
neighbors to the article.
Maria Ivanova Higher School of Economics November 8, 2019 11 / 14

12. Model fitting
Parameter λ explanation
When λ = 1, an entire weight
of the vertex will go to the next
level in the case of joining an
edge. This indicates
appearance of long leaves
without branching.
When λ = 0, a leaf will not go
down beyond the second level
and the nodes degrees in the
first level will quickly grow.
Maria Ivanova Higher School of Economics November 8, 2019 12 / 14

13. Model fitting
Finding of parameter λ
The vertex that first joins the
first-level comment takes up λ
of its weight.
The value of this parameter is
λ = 0.7629.
Since the subtree’s weight is
proportional to the number of
comments at the end of the
discussion, the distribution of
the random variable φ could
be found from the subtree