Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Repurpose, Reuse, Recycle the building blocks of Machine Learning

Repurpose, Reuse, Recycle the building blocks of Machine Learning

Keynote at the Machine Learning Day @KTH, 17/5/23.

More Decks by Gianmarco De Francisci Morales

Other Decks in Research

Transcript

  1. Repurpose, Reuse, Recycle the
    building blocks of Machine Learning
    Gianmarco De Francisci Morales

    Principal Researcher

    [email protected]
    1

    View Slide

  2. Machine Learning
    2

    View Slide

  3. Machine Learning
    2

    View Slide

  4. LEGO
    3

    View Slide

  5. LEGO
    3

    View Slide

  6. Today's Plan
    4

    View Slide

  7. Today's Plan
    Vapnik-Chervonenkis (VC) dimension


    From: Statistical learning theory and model selection


    To: Approximate frequent subgraph mining
    4

    View Slide

  8. Today's Plan
    Vapnik-Chervonenkis (VC) dimension


    From: Statistical learning theory and model selection


    To: Approximate frequent subgraph mining
    Automatic differentiation


    From: Backpropagation for deep learning


    To: Learning agent-based models
    4

    View Slide

  9. VC dimension
    5

    View Slide

  10. 5 reasons to like the VC dimension
    First approximation algorithm for frequent subgraph mining


    Sampling-based algorithm


    Approximation guarantees on frequency


    No false negatives, perfect recall


    100x faster than exact algorithm
    6

    View Slide

  11. Linear model in 2D
    Can shatter

    3 points
    Cannot shatter

    4 points
    7

    View Slide

  12. VC dimension de
    fi
    nition
    HARD!
    8

    View Slide

  13. VC dimension de
    fi
    nition
    Concept from statistical learning theory

    Informally: measure of model capacity
    HARD!
    8

    View Slide

  14. VC dimension de
    fi
    nition
    Concept from statistical learning theory

    Informally: measure of model capacity
    a set of elements called points

    a family of subsets of called ranges,

    is a range space
    𝒟

    𝒟
    ℛ ⊆ 2
    𝒟
    (
    𝒟
    , ℛ)
    HARD!
    8

    View Slide

  15. VC dimension de
    fi
    nition
    Concept from statistical learning theory

    Informally: measure of model capacity
    a set of elements called points

    a family of subsets of called ranges,

    is a range space
    𝒟

    𝒟
    ℛ ⊆ 2
    𝒟
    (
    𝒟
    , ℛ)
    The projection of on is the set of subsets
    ℛ D ⊆
    𝒟
    ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
    HARD!
    8

    View Slide

  16. VC dimension de
    fi
    nition
    Concept from statistical learning theory

    Informally: measure of model capacity
    a set of elements called points

    a family of subsets of called ranges,

    is a range space
    𝒟

    𝒟
    ℛ ⊆ 2
    𝒟
    (
    𝒟
    , ℛ)
    The projection of on is the set of subsets
    ℛ D ⊆
    𝒟
    ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
    is shattered by if its projection contains all the subsets of :
    D ℛ D ℛ ∩ D = 2
    |D|
    HARD!
    8

    View Slide

  17. VC dimension de
    fi
    nition
    Concept from statistical learning theory

    Informally: measure of model capacity
    a set of elements called points

    a family of subsets of called ranges,

    is a range space
    𝒟

    𝒟
    ℛ ⊆ 2
    𝒟
    (
    𝒟
    , ℛ)
    The projection of on is the set of subsets
    ℛ D ⊆
    𝒟
    ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ}
    is shattered by if its projection contains all the subsets of :
    D ℛ D ℛ ∩ D = 2
    |D|
    The VC dimension of is the largest cardinality of a set that is shattered by
    d (
    𝒟
    , ℛ) ℛ
    HARD!
    8

    View Slide

  18. Example: Intervals
    9

    View Slide

  19. Example: Intervals
    Let be the elements of
    𝒟

    9

    View Slide

  20. Example: Intervals
    Let be the elements of
    𝒟

    Let

    be the set of discrete intervals in
    ℛ = {[a, b] ∩ ℤ : a ≤ b}
    𝒟
    9

    View Slide

  21. o
    Example: Intervals
    Let be the elements of
    𝒟

    Let

    be the set of discrete intervals in
    ℛ = {[a, b] ∩ ℤ : a ≤ b}
    𝒟
    Shattering set of two elements of is easy
    𝒟
    9

    View Slide

  22. o
    Example: Intervals
    Let be the elements of
    𝒟

    Let

    be the set of discrete intervals in
    ℛ = {[a, b] ∩ ℤ : a ≤ b}
    𝒟
    Shattering set of two elements of is easy
    𝒟
    Impossible to shatter set of three
    elements
    {c, d, e}
    c < d < e
    9

    View Slide

  23. o
    Example: Intervals
    Let be the elements of
    𝒟

    Let

    be the set of discrete intervals in
    ℛ = {[a, b] ∩ ℤ : a ≤ b}
    𝒟
    Shattering set of two elements of is easy
    𝒟
    Impossible to shatter set of three
    elements
    {c, d, e}
    c < d < e
    No range s.t.
    R ∈ ℛ R ∩ {c, d, e} = {c, e}
    9

    View Slide

  24. o
    Example: Intervals
    Let be the elements of
    𝒟

    Let

    be the set of discrete intervals in
    ℛ = {[a, b] ∩ ℤ : a ≤ b}
    𝒟
    Shattering set of two elements of is easy
    𝒟
    Impossible to shatter set of three
    elements
    {c, d, e}
    c < d < e
    No range s.t.
    R ∈ ℛ R ∩ {c, d, e} = {c, e}
    VC dimension of this =
    (
    𝒟
    , ℛ) 2
    9

    View Slide

  25. Pr test
    error
    ≤ training
    error
    +
    1
    N
    d
    (
    log (
    2N
    d ) + 1
    )
    − log (
    δ
    4) = 1 − δ
    VC dimension in ML
    10

    View Slide

  26. Pr test
    error
    ≤ training
    error
    +
    1
    N
    d
    (
    log (
    2N
    d ) + 1
    )
    − log (
    δ
    4) = 1 − δ
    VC dimension in ML
    10

    View Slide

  27. VC dimension for data analysis
    11

    View Slide

  28. VC dimension for data analysis
    Dataset = Sample
    11

    View Slide

  29. VC dimension for data analysis
    Dataset = Sample
    How good an approximation can we get from a sample?
    11

    View Slide

  30. VC dimension for data analysis
    Dataset = Sample
    How good an approximation can we get from a sample?
    "When analyzing a random sample of size , with probability , the
    results are within an factor of the true results"
    N 1 − δ
    ε
    11

    View Slide

  31. VC dimension for data analysis
    Dataset = Sample
    How good an approximation can we get from a sample?
    "When analyzing a random sample of size , with probability , the
    results are within an factor of the true results"
    N 1 − δ
    ε
    Trade-off among sample size, accuracy, and complexity of the task
    11

    View Slide

  32. -sample and VC dimension
    ε
    12

    View Slide

  33. -sample and VC dimension
    ε
    -sample for : for a subset s.t.


    ε (
    𝒟
    , ℛ) ε ∈ (0,1) A ⊆
    𝒟
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε, for every R ∈ ℛ
    12

    View Slide

  34. -sample and VC dimension
    ε
    -sample for : for a subset s.t.


    ε (
    𝒟
    , ℛ) ε ∈ (0,1) A ⊆
    𝒟
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε, for every R ∈ ℛ
    a range space with VC-dimension
    (
    𝒟
    , ℛ) d
    12

    View Slide

  35. -sample and VC dimension
    ε
    -sample for : for a subset s.t.


    ε (
    𝒟
    , ℛ) ε ∈ (0,1) A ⊆
    𝒟
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε, for every R ∈ ℛ
    a range space with VC-dimension
    (
    𝒟
    , ℛ) d
    Random sample of size N =
    𝒪
    (
    1
    ε2 (d + log
    1
    δ ))
    12

    View Slide

  36. -sample and VC dimension
    ε
    -sample for : for a subset s.t.


    ε (
    𝒟
    , ℛ) ε ∈ (0,1) A ⊆
    𝒟
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε, for every R ∈ ℛ
    a range space with VC-dimension
    (
    𝒟
    , ℛ) d
    Random sample of size N =
    𝒪
    (
    1
    ε2 (d + log
    1
    δ ))
    Is -sample for with probability
    ε (
    𝒟
    , ℛ) 1 − δ
    12

    View Slide

  37. Example applications
    Betweenness Centrality


    Clustering Coef
    fi
    cient


    Set Cover


    Frequent Itemset Mining
    13

    View Slide

  38. Graph Pattern Mining
    14

    View Slide

  39. Graph Pattern Mining
    14

    View Slide

  40. Patterns and orbits HARD!
    15

    View Slide

  41. Patterns and orbits
    Pattern: connected labeled graph
    HARD!
    15
    1 2 3 4 5 6 7 8
    9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28 29

    View Slide

  42. Patterns and orbits
    Pattern: connected labeled graph
    Pattern equality: isomorphism
    HARD!
    15
    1 2 3 4 5 6 7 8
    9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28 29

    View Slide

  43. Patterns and orbits
    Pattern: connected labeled graph
    Pattern equality: isomorphism
    Automorphism: isomorphism to
    itself
    HARD!
    15
    1 2 3 4 5 6 7 8
    9 10 11 12 13 14
    15 16 17 18 19 20 21
    22 23 24 25 26 27 28 29

    View Slide

  44. automorphisms and their set is denoted as Aut(⌧).
    Given a pattern % = (+%, ⇢% ) in P and a vertex E 2 +% , the orbit ⌫% (E) of
    +% that is mapped to E by any automorphism of %, i.e.,
    ⌫% (E) ⌘ {D 2 +% : 9` 2 Aut(%) s.t. `(D) = E} .
    The orbits of % form a partitioning of+% , for each D 2 ⌫% (E), it holds ⌫% (D) =
    in ⌫% (E) have the same label. In Fig. 1 we show examples of two patterns w
    v3
    v1
    v2
    v3
    v1
    v2
    O3
    O2
    O1
    O2
    O1
    Fig. 1. Examples of paerns and orbits. Colors represent vertex labels, while circle
    paern on the le, v1 and v2 belong to the same orbit $1. On the right, each vertex
    Patterns and orbits
    Pattern: connected labeled graph
    Pattern equality: isomorphism
    Automorphism: isomorphism to
    itself
    Orbit: subset of pattern mapped
    to each other by automorphisms
    V2
    V1 V3 V3
    V2
    V1
    HARD!
    15

    View Slide

  45. Frequency of a pattern
    Graph Pattern Frequency
    16

    View Slide

  46. Frequency of a pattern
    Graph Pattern Frequency
    16

    View Slide

  47. Frequency of a pattern
    Graph Pattern Frequency
    1
    16

    View Slide

  48. Frequency of a pattern
    Graph Pattern Frequency
    1
    16

    View Slide

  49. Frequency of a pattern
    Graph Pattern Frequency
    1
    4
    16

    View Slide

  50. Frequency of a pattern
    Graph Pattern Frequency
    1
    4
    Not anti-monotone!
    16

    View Slide

  51. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    Image
    17

    View Slide

  52. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    Image
    17

    View Slide

  53. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    Image
    {V1}
    17

    View Slide

  54. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    Image
    {V1}
    17

    View Slide

  55. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    Image
    {V1}
    17

    View Slide

  56. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    Image
    {V1}
    {V2,V3,V4,V5}
    17

    View Slide

  57. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    4
    Image
    {V1}
    {V2,V3,V4,V5}
    17

    View Slide

  58. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    4
    Image
    {V1}
    {V2,V3,V4,V5}
    min(1,4)=1
    17

    View Slide

  59. Minimum Node-based Image (MNI)
    V2
    V3
    V4
    V5 V1
    Graph Pattern Frequency
    1
    4
    Image
    {V1}
    {V2,V3,V4,V5}
    Anti-monotone! min(1,4)=1
    17

    View Slide

  60. Relative MNI frequency
    = image set of orbit of pattern on


    Relative MNI frequency of pattern in graph


    ZV
    (q) q P V
    P G = (V, E)
    fV
    (P) = min
    q∈P
    {
    |ZV
    (q)|
    |V| }
    18

    View Slide

  61. Approx. Frequent Subgraph Mining
    19

    View Slide

  62. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    19

    View Slide

  63. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    19

    View Slide

  64. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    19

    View Slide

  65. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    Find s.t.
    (P, εp
    ) fV
    (P) − fS
    (P) =
    |ZV
    (q)|
    |V| −
    |ZS
    (q)|
    |S| ≤ εP
    19

    View Slide

  66. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    Find s.t.
    (P, εp
    ) fV
    (P) − fS
    (P) =
    |ZV
    (q)|
    |V| −
    |ZS
    (q)|
    |S| ≤ εP
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε
    -sample
    ε
    19

    View Slide

  67. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    Find s.t.
    (P, εp
    ) fV
    (P) − fS
    (P) =
    |ZV
    (q)|
    |V| −
    |ZS
    (q)|
    |S| ≤ εP
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε
    -sample
    ε
    19

    View Slide

  68. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    Find s.t.
    (P, εp
    ) fV
    (P) − fS
    (P) =
    |ZV
    (q)|
    |V| −
    |ZS
    (q)|
    |S| ≤ εP
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε
    -sample
    ε
    19

    View Slide

  69. Approx. Frequent Subgraph Mining
    Given threshold , sample of vertices
    τ S
    With probability at least 1 − δ
    For every pattern with
    P fV
    (P) ≥ τ
    Find s.t.
    (P, εp
    ) fV
    (P) − fS
    (P) =
    |ZV
    (q)|
    |V| −
    |ZS
    (q)|
    |S| ≤ εP
    |R ∩
    𝒟
    |
    |
    𝒟
    | −
    |R ∩ A|
    |A| ≤ ε
    -sample
    ε
    19

    View Slide

  70. Empirical VC dimension for FSG
    20

    View Slide

  71. Empirical VC dimension for FSG
    orbits of frequent patterns

    Use range space
    Ri
    = {ZV
    (q) : q is an orbit of P with fV
    (P) ≥ τ}
    (V, Ri
    )
    20

    View Slide

  72. Empirical VC dimension for FSG
    orbits of frequent patterns

    Use range space
    Ri
    = {ZV
    (q) : q is an orbit of P with fV
    (P) ≥ τ}
    (V, Ri
    )
    acceptable failure probability

    uniform sample of of size

    upper bound to the VC dimension
    δ ∈ (0,1)
    S V s
    d
    20

    View Slide

  73. Empirical VC dimension for FSG
    orbits of frequent patterns

    Use range space
    Ri
    = {ZV
    (q) : q is an orbit of P with fV
    (P) ≥ τ}
    (V, Ri
    )
    acceptable failure probability

    uniform sample of of size

    upper bound to the VC dimension
    δ ∈ (0,1)
    S V s
    d
    With high probability is an -sample for for
    S ε (V, Ri
    ) ε =
    d + log 1
    δ
    2s
    20

    View Slide

  74. Pruning
    21

    View Slide

  75. Pruning
    -sample guarantee:
    ε
    |Ri
    ∩ V|
    |V| −
    |Ri
    ∩ S|
    |S| ≤ εi
    21

    View Slide

  76. Pruning
    -sample guarantee:
    ε
    |Ri
    ∩ V|
    |V| −
    |Ri
    ∩ S|
    |S| ≤ εi
    Given that we can bound the error on every orbit,

    we can bound the error on its minimum
    21

    View Slide

  77. Pruning
    -sample guarantee:
    ε
    |Ri
    ∩ V|
    |V| −
    |Ri
    ∩ S|
    |S| ≤ εi
    Given that we can bound the error on every orbit,

    we can bound the error on its minimum
    fV
    (Pi
    ) − fS
    (Pi
    ) ≤ εi
    ⟹ fS
    (Pi
    ) ≥ fV
    (Pi
    ) − εi
    ≥ τ − εi
    21

    View Slide

  78. Pruning
    -sample guarantee:
    ε
    |Ri
    ∩ V|
    |V| −
    |Ri
    ∩ S|
    |S| ≤ εi
    Given that we can bound the error on every orbit,

    we can bound the error on its minimum
    fV
    (Pi
    ) − fS
    (Pi
    ) ≤ εi
    ⟹ fS
    (Pi
    ) ≥ fV
    (Pi
    ) − εi
    ≥ τ − εi
    Lower bound on the frequency of a frequent pattern in the sample
    21

    View Slide

  79. Search space
    22

    View Slide

  80. Search space
    22

    View Slide

  81. Search space
    22

    View Slide

  82. Search space
    22

    View Slide

  83. Search space
    22

    View Slide

  84. Search space
    22

    View Slide

  85. MaNIACS
    1) Find image sets of the orbits of unpruned patterns with vertices


    2) Use them to compute an upper bound to the VC dimension of


    3) Compute such that is an -sample for


    4) Prune patterns that cannot be frequent with lower bound


    5) Extend unpruned patterns to get candidate patterns with vertices
    ZS
    (q) i
    (V, Ri
    )
    εi
    S εi
    (V, Ri
    )
    fS
    (Pi
    ) ≥ τ − εi
    i + 1
    23

    View Slide

  86. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
    Min Frequency Threshold ø
    102
    103
    104
    105
    Running Time (s)
    Æ=1
    Æ=0.8
    exact
    Results
    First sampling-based algorithm


    Approximation guarantees on
    computed frequency


    No false negatives
    24
    1K 1.4K 1.7K 2K 2.3K 2.6K 2.9K
    Sample Size
    0.01
    0.02
    0.03
    0.04
    0.05
    0.06
    0.07
    MaxAE Bound
    MaxAE
    "2
    "3
    "4
    "5

    View Slide

  87. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
    Min Frequency Threshold ø
    102
    103
    104
    105
    Running Time (s)
    Æ=1
    Æ=0.8
    exact
    Results
    First sampling-based algorithm


    Approximation guarantees on
    computed frequency


    No false negatives
    24

    View Slide

  88. 0.18 0.20 0.22 0.24 0.26 0.28 0.30
    Min Frequency Threshold ø
    102
    103
    104
    105
    Running Time (s)
    Æ=1
    Æ=0.8
    exact
    Results
    First sampling-based algorithm


    Approximation guarantees on
    computed frequency


    No false negatives
    24

    View Slide

  89. Automatic Differentiation
    25

    View Slide

  90. Autodiff
    Set of techniques to evaluate the partial derivative of a computer program


    Chain rule to break complex expressions


    Originally created for neural networks and deep learning (backpropagation)


    Different from numerical and symbolic differentiation
    ∂f(g(x))
    ∂x
    =
    ∂f
    ∂g
    ∂g
    ∂x
    26

    View Slide

  91. Alternatives
    27

    View Slide

  92. Alternatives
    Numerical:
    ∂f(x)
    dxi
    ≈ lim
    h→0
    f(x + hei
    ) − f(x)
    h
    27

    View Slide

  93. Alternatives
    Numerical:
    ∂f(x)
    dxi
    ≈ lim
    h→0
    f(x + hei
    ) − f(x)
    h
    Slow (need to evaluate each dimension) and errors due to rounding
    27

    View Slide

  94. Alternatives
    Numerical:
    ∂f(x)
    dxi
    ≈ lim
    h→0
    f(x + hei
    ) − f(x)
    h
    Slow (need to evaluate each dimension) and errors due to rounding
    Symbolic: Input=computation graph, Output=symbolic derivative
    27

    View Slide

  95. Alternatives
    Numerical:
    ∂f(x)
    dxi
    ≈ lim
    h→0
    f(x + hei
    ) − f(x)
    h
    Slow (need to evaluate each dimension) and errors due to rounding
    Symbolic: Input=computation graph, Output=symbolic derivative
    Example: Mathematica
    27

    View Slide

  96. Alternatives
    Numerical:
    ∂f(x)
    dxi
    ≈ lim
    h→0
    f(x + hei
    ) − f(x)
    h
    Slow (need to evaluate each dimension) and errors due to rounding
    Symbolic: Input=computation graph, Output=symbolic derivative
    Example: Mathematica
    Slow (search and apply rules) and large intermediate state
    27

    View Slide

  97. Computational graph
    28

    View Slide

  98. Forward/Reverse mode
    29

    View Slide

  99. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1
    , =
    1
    1 + *.(012034320545)
    1/%
    30

    View Slide

  100. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1 1/%

    1
    %&
    - =
    1
    1 + */(123145431656)
    - % = 1/% à 89
    85
    = −1/%&
    31

    View Slide

  101. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1 1/%

    1
    %&
    - =
    1
    1 + */(123145431656)
    ∗ 1
    - % = % + 1 à 89
    85
    = 1
    32

    View Slide

  102. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1 1/%

    1
    %&
    - =
    1
    1 + */(123145431656)
    ∗ 1

    - % = *5 à 89
    85
    = *5
    33

    View Slide

  103. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1 1/%

    1
    %&
    - =
    1
    1 + */(123145431656)
    ∗ 1

    ∗ −1

    89
    814
    - %, " = %" à 8;
    81
    = %
    34

    View Slide

  104. Example
    Automatic Differentiation (autodiff)
    • Create computation graph for gradient computation

    "#
    +
    %#

    "&
    %&
    "'
    + ∗ −1 *%+ +1 1/%

    1
    %&
    - =
    1
    1 + */(123145431656)
    ∗ 1

    ∗ −1

    89
    814

    89
    816
    35

    View Slide

  105. Libraries
    36

    View Slide

  106. A few highlights
    Machine Learning (Tensorflow,
    Pytorch are AD libraries
    specialized for ML)
    Learning protein structure (e.g.,
    AlphaFold)
    Many-body Schrodinger
    equation (e.g., FermiNet)
    Stellarator coil design
    Di↵erentiable ray tracing
    Model uncertainty & sensitivity
    Optimization of fluid simulations
    Example

    applications
    Neural Networks


    Optimization


    Ray tracing


    Fluid simulations


    Many more...
    37

    View Slide

  107. A few highlights
    Machine Learning (Tensorflow,
    Pytorch are AD libraries
    specialized for ML)
    Learning protein structure (e.g.,
    AlphaFold)
    Many-body Schrodinger
    equation (e.g., FermiNet)
    Stellarator coil design
    Di↵erentiable ray tracing
    Model uncertainty & sensitivity
    Optimization of fluid simulations
    Example

    applications
    Neural Networks


    Optimization


    Ray tracing


    Fluid simulations


    Many more...
    37

    View Slide

  108. Agent-based model
    Evolution over time of system of autonomous agents


    Mechanistic and causal model of behavior


    Encodes sociological assumptions


    Agents interact according to prede
    fi
    ned rules


    Agents are simulated to draw conclusions
    38

    View Slide

  109. Example: Schelling's segregation
    2 types of agents: R and B


    Satisfaction: number of neighbors
    of same color


    Homophily parameter


    If
    τ
    Si
    < τ → relocate
    39

    View Slide

  110. Example: Schelling's segregation
    2 types of agents: R and B


    Satisfaction: number of neighbors
    of same color


    Homophily parameter


    If
    τ
    Si
    < τ → relocate
    39

    View Slide

  111. What about data?
    ABM is "theory development tool"


    Some people use it as forecasting tool


    Calibration of parameters: run simulations with different parameters until
    model is able to reproduce summary statistics of data


    Manual, expensive, and error-prone process
    40

    View Slide

  112. Can we do better?
    41

    View Slide

  113. Can we do better?
    Yes!
    41

    View Slide

  114. Can we do better?
    Yes!
    Rewrite ABM as Probabilistic
    Generative Model
    41

    View Slide

  115. Can we do better?
    Yes!
    Rewrite ABM as Probabilistic
    Generative Model
    Write likelihood of parameters
    given data ℒ(Θ|X)
    41

    View Slide

  116. Can we do better?
    Yes!
    Rewrite ABM as Probabilistic
    Generative Model
    Write likelihood of parameters
    given data ℒ(Θ|X)
    Maximize via Auto Differentiation
    ̂
    Θ = arg max
    Θ
    ℒ(Θ|X)
    41

    View Slide

  117. Opinion dynamics
    How people's belief evolve


    Polarization, Radicalization,
    Echo Chambers


    Data from Social Media
    42

    View Slide

  118. Opinion dynamics
    How people's belief evolve


    Polarization, Radicalization,
    Echo Chambers


    Data from Social Media
    42

    View Slide

  119. Bounded Con
    fi
    dence Model
    Opinion


    Each time agents interact
    they get closer if they are
    closer than


    Positive interaction
    xu
    ∈ [−1,1]
    ϵ+
    43

    View Slide

  120. Bounded Con
    fi
    dence Model
    Opinion


    Each time agents interact
    they get closer if they are
    closer than


    Positive interaction
    xu
    ∈ [−1,1]
    ϵ+
    43

    View Slide

  121. Repulsive behavior
    Can interactions back
    fi
    re?


    Each time agents interact
    they get further away if they
    were further than


    Negative interaction
    ϵ−
    44

    View Slide

  122. Repulsive behavior
    Can interactions back
    fi
    re?


    Each time agents interact
    they get further away if they
    were further than


    Negative interaction
    ϵ−
    44

    View Slide

  123. 0 2
    n+ = 0.6
    n = 1.2
    0 2
    n+ = 0.4
    n = 0.6
    0 2
    n+ = 1.2
    n = 1.6
    0 2
    n+ = 0.2
    n = 1.6
    Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time
    Opinion Trajectories
    Parameter values encode different assumptions and

    determine signi
    fi
    cantly different latent trajectories
    45

    View Slide

  124. Rewrite as probabilistic model
    Replace step function with
    smooth version (sigmoid)






    |xu
    − xv
    | > ϵ− ⟹ S(u, v) = − 1
    P((u, v) ∈ E ∣ S(u, v) = − 1) ∝ σ (|xu
    − xv
    | − ϵ−)
    Opinion distance
    Likelihood
    46

    View Slide

  125. Learning from data
    Assume we see presence of interactions


    But signs are latent


    And opinions of users are latent


    Can we learn the dynamics and parameters of the system?
    47

    View Slide

  126. ales Part B2 ALBEDO
    xt
    x0
    xt+1
    ↵t
    s u, v
    T
    t
    Figure 2: Translation of
    everage recent advances in probabilistic programming
    to express our models. These frameworks combine
    erative programming languages with primitives that
    stic constructs, such as sampling from a distribution.
    de a naturally rich environment for transforming
    PGABM counterparts. Once a model is written in
    different algorithms can be used to solve the variable
    m.
    wn a proof-of-concept of how a traditional opinion
    based on bounded confidence [16] can be translated
    orm [46]. Figure 2 shows the plate notation for such
    d from our work [46]), where we represent the latent
    users at time t with xt (x0 is the initial condition),
    observed interaction from the data. Similarly to
    Learning problem
    Given observable interactions


    fi
    nd:

    opinions for nodes in time
    and

    sign of each edge

    with maximum likelihood


    Use EM and gradient descent via
    automatic differentiation
    G = (V, E)
    xt
    V × {0,…, T} → [−1,1]
    s E → {−, +}
    48

    View Slide

  127. Reconstructing synthetic data
    Estimated x0
    True x0
    Estimated xt
    True xt
    49

    View Slide

  128. Recovering parameters
    0 2
    n+ = 0.6
    n = 1.2
    0 2
    n+ = 0.4
    n = 0.6
    0 2
    n+ = 1.2
    n = 1.6
    0 2
    n+ = 0.2
    n = 1.6
    Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.
    0 2
    n+ = 0.6
    n = 1.2
    0 2
    n+ = 0.4
    n = 0.6
    0 2
    n+ = 1.2
    n = 1.6
    0 2
    n+ = 0.2
    n = 1.6
    Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.
    50
    0 2
    n+ = 0.6
    n = 1.2
    0 2
    n+ = 0.4
    n = 0.6
    0 2
    n+ = 1.2
    n = 1.6
    0 2
    n+ = 0.2
    n = 1.6
    Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.

    View Slide

  129. Recovering parameters
    51
    Figure 4: Examples of synthetic data traces generated in each s

    View Slide

  130. Real data: Reddit
    Comments score = upvotes


    Estimate position of users and
    subreddits in opinion space


    Larger estimated distance of user
    from subreddit lower score of
    user on that subreddit

    52

    View Slide

  131. Real data: Reddit
    Comments score = upvotes


    Estimate position of users and
    subreddits in opinion space


    Larger estimated distance of user
    from subreddit lower score of
    user on that subreddit

    52

    View Slide

  132. Call to Action
    Machine Learning is a treasure trove
    of interesting building blocks


    VC dimension for approximation
    algorithms


    Automatic differentiation for agent-
    based models


    Repurpose it for your own goals


    Be curious, be bold: hack and invent!
    53

    View Slide

  133. G. Preti, G. De Francisci Morales, M. Riondato

    “MaNIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling”

    KDD 2021 + ACM TIST 2023


    C. Monti, G. De Francisci Morales, F. Bonchi

    “Learning Opinion Dynamics From Social Traces”

    KDD 2020


    C. Monti, M. Pangallo, G. De Francisci Morales, F. Bonchi

    “On Learning Agent-Based Models from Data”

    SciRep 2022 (accepted) + arXiv:2205.05052
    54
    [email protected] https://gdfm.me
    @gdfm7

    View Slide