# Repurpose, Reuse, Recycle the building blocks of Machine Learning

Keynote at the Machine Learning Day @KTH, 17/5/23.

May 19, 2023

## Transcript

1. Repurpose, Reuse, Recycle the
building blocks of Machine Learning
Gianmarco De Francisci Morales

Principal Researcher

[email protected]
1

2. Machine Learning
2

3. Machine Learning
2

4. LEGO
3

5. LEGO
3

6. Today's Plan
4

7. Today's Plan
Vapnik-Chervonenkis (VC) dimension

From: Statistical learning theory and model selection

To: Approximate frequent subgraph mining
4

8. Today's Plan
Vapnik-Chervonenkis (VC) dimension

From: Statistical learning theory and model selection

To: Approximate frequent subgraph mining
Automatic differentiation

From: Backpropagation for deep learning

To: Learning agent-based models
4

9. VC dimension
5

10. 5 reasons to like the VC dimension
First approximation algorithm for frequent subgraph mining

Sampling-based algorithm

Approximation guarantees on frequency

No false negatives, perfect recall

100x faster than exact algorithm
6

11. Linear model in 2D
Can shatter

3 points
Cannot shatter

4 points
7

12. VC dimension de
fi
nition
HARD!
8

13. VC dimension de
fi
nition
Concept from statistical learning theory

Informally: measure of model capacity
HARD!
8

14. VC dimension de
fi
nition
Concept from statistical learning theory

Informally: measure of model capacity
a set of elements called points

a family of subsets of called ranges,

is a range space
𝒟

𝒟
ℛ ⊆ 2
𝒟
(
𝒟
, ℛ)
HARD!
8

15. VC dimension de
fi
nition
Concept from statistical learning theory

Informally: measure of model capacity
a set of elements called points

a family of subsets of called ranges,

is a range space
𝒟

𝒟
ℛ ⊆ 2
𝒟
(
𝒟
, ℛ)
The projection of on is the set of subsets
ℛ D ⊆
𝒟
ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
HARD!
8

16. VC dimension de
fi
nition
Concept from statistical learning theory

Informally: measure of model capacity
a set of elements called points

a family of subsets of called ranges,

is a range space
𝒟

𝒟
ℛ ⊆ 2
𝒟
(
𝒟
, ℛ)
The projection of on is the set of subsets
ℛ D ⊆
𝒟
ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
is shattered by if its projection contains all the subsets of :
D ℛ D ℛ ∩ D = 2
|D|
HARD!
8

17. VC dimension de
fi
nition
Concept from statistical learning theory

Informally: measure of model capacity
a set of elements called points

a family of subsets of called ranges,

is a range space
𝒟

𝒟
ℛ ⊆ 2
𝒟
(
𝒟
, ℛ)
The projection of on is the set of subsets
ℛ D ⊆
𝒟
ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
is shattered by if its projection contains all the subsets of :
D ℛ D ℛ ∩ D = 2
|D|
The VC dimension of is the largest cardinality of a set that is shattered by
d (
𝒟
, ℛ) ℛ
HARD!
8

18. Example: Intervals
9

19. Example: Intervals
Let be the elements of
𝒟

9

20. Example: Intervals
Let be the elements of
𝒟

Let

be the set of discrete intervals in
ℛ = {[a, b] ∩ ℤ : a ≤ b}
𝒟
9

21. o
Example: Intervals
Let be the elements of
𝒟

Let

be the set of discrete intervals in
ℛ = {[a, b] ∩ ℤ : a ≤ b}
𝒟
Shattering set of two elements of is easy
𝒟
9

22. o
Example: Intervals
Let be the elements of
𝒟

Let

be the set of discrete intervals in
ℛ = {[a, b] ∩ ℤ : a ≤ b}
𝒟
Shattering set of two elements of is easy
𝒟
Impossible to shatter set of three
elements
{c, d, e}
c < d < e
9

23. o
Example: Intervals
Let be the elements of
𝒟

Let

be the set of discrete intervals in
ℛ = {[a, b] ∩ ℤ : a ≤ b}
𝒟
Shattering set of two elements of is easy
𝒟
Impossible to shatter set of three
elements
{c, d, e}
c < d < e
No range s.t.
R ∈ ℛ R ∩ {c, d, e} = {c, e}
9

24. o
Example: Intervals
Let be the elements of
𝒟

Let

be the set of discrete intervals in
ℛ = {[a, b] ∩ ℤ : a ≤ b}
𝒟
Shattering set of two elements of is easy
𝒟
Impossible to shatter set of three
elements
{c, d, e}
c < d < e
No range s.t.
R ∈ ℛ R ∩ {c, d, e} = {c, e}
VC dimension of this =
(
𝒟
, ℛ) 2
9

25. Pr test
error
≤ training
error
+
1
N
d
(
log (
2N
d ) + 1
)
− log (
δ
4) = 1 − δ
VC dimension in ML
10

26. Pr test
error
≤ training
error
+
1
N
d
(
log (
2N
d ) + 1
)
− log (
δ
4) = 1 − δ
VC dimension in ML
10

27. VC dimension for data analysis
11

28. VC dimension for data analysis
Dataset = Sample
11

29. VC dimension for data analysis
Dataset = Sample
How good an approximation can we get from a sample?
11

30. VC dimension for data analysis
Dataset = Sample
How good an approximation can we get from a sample?
"When analyzing a random sample of size , with probability , the
results are within an factor of the true results"
N 1 − δ
ε
11

31. VC dimension for data analysis
Dataset = Sample
How good an approximation can we get from a sample?
"When analyzing a random sample of size , with probability , the
results are within an factor of the true results"
N 1 − δ
ε
11

32. -sample and VC dimension
ε
12

33. -sample and VC dimension
ε
-sample for : for a subset s.t.

ε (
𝒟
, ℛ) ε ∈ (0,1) A ⊆
𝒟
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε, for every R ∈ ℛ
12

34. -sample and VC dimension
ε
-sample for : for a subset s.t.

ε (
𝒟
, ℛ) ε ∈ (0,1) A ⊆
𝒟
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε, for every R ∈ ℛ
a range space with VC-dimension
(
𝒟
, ℛ) d
12

35. -sample and VC dimension
ε
-sample for : for a subset s.t.

ε (
𝒟
, ℛ) ε ∈ (0,1) A ⊆
𝒟
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε, for every R ∈ ℛ
a range space with VC-dimension
(
𝒟
, ℛ) d
Random sample of size N =
𝒪
(
1
ε2 (d + log
1
δ ))
12

36. -sample and VC dimension
ε
-sample for : for a subset s.t.

ε (
𝒟
, ℛ) ε ∈ (0,1) A ⊆
𝒟
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε, for every R ∈ ℛ
a range space with VC-dimension
(
𝒟
, ℛ) d
Random sample of size N =
𝒪
(
1
ε2 (d + log
1
δ ))
Is -sample for with probability
ε (
𝒟
, ℛ) 1 − δ
12

37. Example applications
Betweenness Centrality

Clustering Coef
fi
cient

Set Cover

Frequent Itemset Mining
13

38. Graph Pattern Mining
14

39. Graph Pattern Mining
14

40. Patterns and orbits HARD!
15

41. Patterns and orbits
Pattern: connected labeled graph
HARD!
15
1 2 3 4 5 6 7 8
9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28 29

42. Patterns and orbits
Pattern: connected labeled graph
Pattern equality: isomorphism
HARD!
15
1 2 3 4 5 6 7 8
9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28 29

43. Patterns and orbits
Pattern: connected labeled graph
Pattern equality: isomorphism
Automorphism: isomorphism to
itself
HARD!
15
1 2 3 4 5 6 7 8
9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28 29

44. automorphisms and their set is denoted as Aut(⌧).
Given a pattern % = (+%, ⇢% ) in P and a vertex E 2 +% , the orbit ⌫% (E) of
+% that is mapped to E by any automorphism of %, i.e.,
⌫% (E) ⌘ {D 2 +% : 9` 2 Aut(%) s.t. `(D) = E} .
The orbits of % form a partitioning of+% , for each D 2 ⌫% (E), it holds ⌫% (D) =
in ⌫% (E) have the same label. In Fig. 1 we show examples of two patterns w
v3
v1
v2
v3
v1
v2
O3
O2
O1
O2
O1
Fig. 1. Examples of paerns and orbits. Colors represent vertex labels, while circle
paern on the le, v1 and v2 belong to the same orbit \$1. On the right, each vertex
Patterns and orbits
Pattern: connected labeled graph
Pattern equality: isomorphism
Automorphism: isomorphism to
itself
Orbit: subset of pattern mapped
to each other by automorphisms
V2
V1 V3 V3
V2
V1
HARD!
15

45. Frequency of a pattern
Graph Pattern Frequency
16

46. Frequency of a pattern
Graph Pattern Frequency
16

47. Frequency of a pattern
Graph Pattern Frequency
1
16

48. Frequency of a pattern
Graph Pattern Frequency
1
16

49. Frequency of a pattern
Graph Pattern Frequency
1
4
16

50. Frequency of a pattern
Graph Pattern Frequency
1
4
Not anti-monotone!
16

51. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
Image
17

52. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
Image
17

53. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
Image
{V1}
17

54. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
Image
{V1}
17

55. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
Image
{V1}
17

56. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
Image
{V1}
{V2,V3,V4,V5}
17

57. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
4
Image
{V1}
{V2,V3,V4,V5}
17

58. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
4
Image
{V1}
{V2,V3,V4,V5}
min(1,4)=1
17

59. Minimum Node-based Image (MNI)
V2
V3
V4
V5 V1
Graph Pattern Frequency
1
4
Image
{V1}
{V2,V3,V4,V5}
Anti-monotone! min(1,4)=1
17

60. Relative MNI frequency
= image set of orbit of pattern on

Relative MNI frequency of pattern in graph

ZV
(q) q P V
P G = (V, E)
fV
(P) = min
q∈P
{
|ZV
(q)|
|V| }
18

61. Approx. Frequent Subgraph Mining
19

62. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
19

63. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
19

64. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
19

65. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
Find s.t.
(P, εp
) fV
(P) − fS
(P) =
|ZV
(q)|
|V| −
|ZS
(q)|
|S| ≤ εP
19

66. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
Find s.t.
(P, εp
) fV
(P) − fS
(P) =
|ZV
(q)|
|V| −
|ZS
(q)|
|S| ≤ εP
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε
-sample
ε
19

67. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
Find s.t.
(P, εp
) fV
(P) − fS
(P) =
|ZV
(q)|
|V| −
|ZS
(q)|
|S| ≤ εP
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε
-sample
ε
19

68. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
Find s.t.
(P, εp
) fV
(P) − fS
(P) =
|ZV
(q)|
|V| −
|ZS
(q)|
|S| ≤ εP
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε
-sample
ε
19

69. Approx. Frequent Subgraph Mining
Given threshold , sample of vertices
τ S
With probability at least 1 − δ
For every pattern with
P fV
(P) ≥ τ
Find s.t.
(P, εp
) fV
(P) − fS
(P) =
|ZV
(q)|
|V| −
|ZS
(q)|
|S| ≤ εP
|R ∩
𝒟
|
|
𝒟
| −
|R ∩ A|
|A| ≤ ε
-sample
ε
19

70. Empirical VC dimension for FSG
20

71. Empirical VC dimension for FSG
orbits of frequent patterns

Use range space
Ri
= {ZV
(q) : q is an orbit of P with fV
(P) ≥ τ}
(V, Ri
)
20

72. Empirical VC dimension for FSG
orbits of frequent patterns

Use range space
Ri
= {ZV
(q) : q is an orbit of P with fV
(P) ≥ τ}
(V, Ri
)
acceptable failure probability

uniform sample of of size

upper bound to the VC dimension
δ ∈ (0,1)
S V s
d
20

73. Empirical VC dimension for FSG
orbits of frequent patterns

Use range space
Ri
= {ZV
(q) : q is an orbit of P with fV
(P) ≥ τ}
(V, Ri
)
acceptable failure probability

uniform sample of of size

upper bound to the VC dimension
δ ∈ (0,1)
S V s
d
With high probability is an -sample for for
S ε (V, Ri
) ε =
d + log 1
δ
2s
20

74. Pruning
21

75. Pruning
-sample guarantee:
ε
|Ri
∩ V|
|V| −
|Ri
∩ S|
|S| ≤ εi
21

76. Pruning
-sample guarantee:
ε
|Ri
∩ V|
|V| −
|Ri
∩ S|
|S| ≤ εi
Given that we can bound the error on every orbit,

we can bound the error on its minimum
21

77. Pruning
-sample guarantee:
ε
|Ri
∩ V|
|V| −
|Ri
∩ S|
|S| ≤ εi
Given that we can bound the error on every orbit,

we can bound the error on its minimum
fV
(Pi
) − fS
(Pi
) ≤ εi
⟹ fS
(Pi
) ≥ fV
(Pi
) − εi
≥ τ − εi
21

78. Pruning
-sample guarantee:
ε
|Ri
∩ V|
|V| −
|Ri
∩ S|
|S| ≤ εi
Given that we can bound the error on every orbit,

we can bound the error on its minimum
fV
(Pi
) − fS
(Pi
) ≤ εi
⟹ fS
(Pi
) ≥ fV
(Pi
) − εi
≥ τ − εi
Lower bound on the frequency of a frequent pattern in the sample
21

79. Search space
22

80. Search space
22

81. Search space
22

82. Search space
22

83. Search space
22

84. Search space
22

85. MaNIACS
1) Find image sets of the orbits of unpruned patterns with vertices

2) Use them to compute an upper bound to the VC dimension of

3) Compute such that is an -sample for

4) Prune patterns that cannot be frequent with lower bound

5) Extend unpruned patterns to get candidate patterns with vertices
ZS
(q) i
(V, Ri
)
εi
S εi
(V, Ri
)
fS
(Pi
) ≥ τ − εi
i + 1
23

86. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
Min Frequency Threshold ø
102
103
104
105
Running Time (s)
Æ=1
Æ=0.8
exact
Results
First sampling-based algorithm

Approximation guarantees on
computed frequency

No false negatives
24
1K 1.4K 1.7K 2K 2.3K 2.6K 2.9K
Sample Size
0.01
0.02
0.03
0.04
0.05
0.06
0.07
MaxAE Bound
MaxAE
"2
"3
"4
"5

87. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
Min Frequency Threshold ø
102
103
104
105
Running Time (s)
Æ=1
Æ=0.8
exact
Results
First sampling-based algorithm

Approximation guarantees on
computed frequency

No false negatives
24

88. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
Min Frequency Threshold ø
102
103
104
105
Running Time (s)
Æ=1
Æ=0.8
exact
Results
First sampling-based algorithm

Approximation guarantees on
computed frequency

No false negatives
24

89. Automatic Differentiation
25

90. Autodiff
Set of techniques to evaluate the partial derivative of a computer program

Chain rule to break complex expressions

Originally created for neural networks and deep learning (backpropagation)

Different from numerical and symbolic differentiation
∂f(g(x))
∂x
=
∂f
∂g
∂g
∂x
26

91. Alternatives
27

92. Alternatives
Numerical:
∂f(x)
dxi
≈ lim
h→0
f(x + hei
) − f(x)
h
27

93. Alternatives
Numerical:
∂f(x)
dxi
≈ lim
h→0
f(x + hei
) − f(x)
h
Slow (need to evaluate each dimension) and errors due to rounding
27

94. Alternatives
Numerical:
∂f(x)
dxi
≈ lim
h→0
f(x + hei
) − f(x)
h
Slow (need to evaluate each dimension) and errors due to rounding
Symbolic: Input=computation graph, Output=symbolic derivative
27

95. Alternatives
Numerical:
∂f(x)
dxi
≈ lim
h→0
f(x + hei
) − f(x)
h
Slow (need to evaluate each dimension) and errors due to rounding
Symbolic: Input=computation graph, Output=symbolic derivative
Example: Mathematica
27

96. Alternatives
Numerical:
∂f(x)
dxi
≈ lim
h→0
f(x + hei
) − f(x)
h
Slow (need to evaluate each dimension) and errors due to rounding
Symbolic: Input=computation graph, Output=symbolic derivative
Example: Mathematica
Slow (search and apply rules) and large intermediate state
27

97. Computational graph
28

98. Forward/Reverse mode
29

99. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1
, =
1
1 + *.(012034320545)
1/%
30

100. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1 1/%

1
%&
- =
1
1 + */(123145431656)
- % = 1/% à 89
85
= −1/%&
31

101. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1 1/%

1
%&
- =
1
1 + */(123145431656)
∗ 1
- % = % + 1 à 89
85
= 1
32

102. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1 1/%

1
%&
- =
1
1 + */(123145431656)
∗ 1

- % = *5 à 89
85
= *5
33

103. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1 1/%

1
%&
- =
1
1 + */(123145431656)
∗ 1

∗ −1

89
814
- %, " = %" à 8;
81
= %
34

104. Example
Automatic Differentiation (autodiff)
• Create computation graph for gradient computation

"#
+
%#

"&
%&
"'
+ ∗ −1 *%+ +1 1/%

1
%&
- =
1
1 + */(123145431656)
∗ 1

∗ −1

89
814

89
816
35

105. Libraries
36

106. A few highlights
Machine Learning (Tensorﬂow,
specialized for ML)
Learning protein structure (e.g.,
AlphaFold)
Many-body Schrodinger
equation (e.g., FermiNet)
Stellarator coil design
Di↵erentiable ray tracing
Model uncertainty & sensitivity
Optimization of ﬂuid simulations
Example

applications
Neural Networks

Optimization

Ray tracing

Fluid simulations

Many more...
37

107. A few highlights
Machine Learning (Tensorﬂow,
specialized for ML)
Learning protein structure (e.g.,
AlphaFold)
Many-body Schrodinger
equation (e.g., FermiNet)
Stellarator coil design
Di↵erentiable ray tracing
Model uncertainty & sensitivity
Optimization of ﬂuid simulations
Example

applications
Neural Networks

Optimization

Ray tracing

Fluid simulations

Many more...
37

108. Agent-based model
Evolution over time of system of autonomous agents

Mechanistic and causal model of behavior

Encodes sociological assumptions

Agents interact according to prede
fi
ned rules

Agents are simulated to draw conclusions
38

109. Example: Schelling's segregation
2 types of agents: R and B

Satisfaction: number of neighbors
of same color

Homophily parameter

If
τ
Si
< τ → relocate
39

110. Example: Schelling's segregation
2 types of agents: R and B

Satisfaction: number of neighbors
of same color

Homophily parameter

If
τ
Si
< τ → relocate
39

ABM is "theory development tool"

Some people use it as forecasting tool

Calibration of parameters: run simulations with different parameters until
model is able to reproduce summary statistics of data

Manual, expensive, and error-prone process
40

112. Can we do better?
41

113. Can we do better?
Yes!
41

114. Can we do better?
Yes!
Rewrite ABM as Probabilistic
Generative Model
41

115. Can we do better?
Yes!
Rewrite ABM as Probabilistic
Generative Model
Write likelihood of parameters
given data ℒ(Θ|X)
41

116. Can we do better?
Yes!
Rewrite ABM as Probabilistic
Generative Model
Write likelihood of parameters
given data ℒ(Θ|X)
Maximize via Auto Differentiation
̂
Θ = arg max
Θ
ℒ(Θ|X)
41

117. Opinion dynamics
How people's belief evolve

Echo Chambers

Data from Social Media
42

118. Opinion dynamics
How people's belief evolve

Echo Chambers

Data from Social Media
42

119. Bounded Con
fi
dence Model
Opinion

Each time agents interact
they get closer if they are
closer than

Positive interaction
xu
∈ [−1,1]
ϵ+
43

120. Bounded Con
fi
dence Model
Opinion

Each time agents interact
they get closer if they are
closer than

Positive interaction
xu
∈ [−1,1]
ϵ+
43

121. Repulsive behavior
Can interactions back
fi
re?

Each time agents interact
they get further away if they
were further than

Negative interaction
ϵ−
44

122. Repulsive behavior
Can interactions back
fi
re?

Each time agents interact
they get further away if they
were further than

Negative interaction
ϵ−
44

123. 0 2
n+ = 0.6
n = 1.2
0 2
n+ = 0.4
n = 0.6
0 2
n+ = 1.2
n = 1.6
0 2
n+ = 0.2
n = 1.6
Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time
Opinion Trajectories
Parameter values encode different assumptions and

determine signi
fi
cantly different latent trajectories
45

124. Rewrite as probabilistic model
Replace step function with
smooth version (sigmoid)

|xu
− xv
| > ϵ− ⟹ S(u, v) = − 1
P((u, v) ∈ E ∣ S(u, v) = − 1) ∝ σ (|xu
− xv
| − ϵ−)
Opinion distance
Likelihood
46

125. Learning from data
Assume we see presence of interactions

But signs are latent

And opinions of users are latent

Can we learn the dynamics and parameters of the system?
47

126. ales Part B2 ALBEDO
xt
x0
xt+1
↵t
s u, v
T
t
Figure 2: Translation of
everage recent advances in probabilistic programming
to express our models. These frameworks combine
erative programming languages with primitives that
stic constructs, such as sampling from a distribution.
de a naturally rich environment for transforming
PGABM counterparts. Once a model is written in
diﬀerent algorithms can be used to solve the variable
m.
wn a proof-of-concept of how a traditional opinion
based on bounded conﬁdence [16] can be translated
orm [46]. Figure 2 shows the plate notation for such
d from our work [46]), where we represent the latent
users at time t with xt (x0 is the initial condition),
observed interaction from the data. Similarly to
Learning problem
Given observable interactions

fi
nd:

opinions for nodes in time
and

sign of each edge

with maximum likelihood

Use EM and gradient descent via
automatic differentiation
G = (V, E)
xt
V × {0,…, T} → [−1,1]
s E → {−, +}
48

127. Reconstructing synthetic data
Estimated x0
True x0
Estimated xt
True xt
49

128. Recovering parameters
0 2
n+ = 0.6
n = 1.2
0 2
n+ = 0.4
n = 0.6
0 2
n+ = 1.2
n = 1.6
0 2
n+ = 0.2
n = 1.6
Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.
0 2
n+ = 0.6
n = 1.2
0 2
n+ = 0.4
n = 0.6
0 2
n+ = 1.2
n = 1.6
0 2
n+ = 0.2
n = 1.6
Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.
50
0 2
n+ = 0.6
n = 1.2
0 2
n+ = 0.4
n = 0.6
0 2
n+ = 1.2
n = 1.6
0 2
n+ = 0.2
n = 1.6
Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.

129. Recovering parameters
51
Figure 4: Examples of synthetic data traces generated in each s

130. Real data: Reddit

Estimate position of users and
subreddits in opinion space

Larger estimated distance of user
from subreddit lower score of
user on that subreddit

52

131. Real data: Reddit

Estimate position of users and
subreddits in opinion space

Larger estimated distance of user
from subreddit lower score of
user on that subreddit

52

132. Call to Action
Machine Learning is a treasure trove
of interesting building blocks

VC dimension for approximation
algorithms

Automatic differentiation for agent-
based models

Repurpose it for your own goals

Be curious, be bold: hack and invent!
53

133. G. Preti, G. De Francisci Morales, M. Riondato

“MaNIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling”

KDD 2021 + ACM TIST 2023

C. Monti, G. De Francisci Morales, F. Bonchi

“Learning Opinion Dynamics From Social Traces”

KDD 2020

C. Monti, M. Pangallo, G. De Francisci Morales, F. Bonchi

“On Learning Agent-Based Models from Data”

SciRep 2022 (accepted) + arXiv:2205.05052
54
[email protected] https://gdfm.me
@gdfm7