$30 off During Our Annual Pro Sale. View Details »

Probabilistic Numerics 2018 April 11 London

Probabilistic Numerics 2018 April 11 London

Talk at the Probabilistic Numerics Workshop in London at the Alan Turing Institute

Fred J. Hickernell

April 11, 2018
Tweet

More Decks by Fred J. Hickernell

Other Decks in Research

Transcript

  1. Adaptive Probabilistic Numerical Methods Using Fast Transforms
    Fred J. Hickernell & Jagadees Rathinavel
    Department of Applied Mathematics
    Center for Interdisciplinary Scientific Computation
    Illinois Institute of Technology
    [email protected] mypages.iit.edu/~hickernell
    Thanks to the organizers, the GAIL team, NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI)
    SAMSI-Lloyds-Turing Workshop on Probabilistic Numerical Methods, April 11, 2018

    View Slide

  2. Introduction MLE Matching Kernels and Designs Examples Summary References
    When Do We Stop?
    Compute the solution to a linear problem:
    sol(f) =
    $



    &



    %
    ż
    Rd
    f(x) (x) dx Bayesian inference, financial risk, statistical physics, ...
    f(x) surrogate modeling for computer experiments, ...
    .
    .
    .
    Desired solution: An adaptive algorithm, app(¨, ¨) of the form
    app(f, ε) = w0,n
    +
    n
    ÿ
    i=1
    wi,n
    f(xi
    ),
    where n, txiu∞
    i=1
    , w0,n
    , and w = (wi,n
    )n
    i=1
    are chosen to guarantee
    sol(f) ´ app(f, ε) ď ε with high probability @ε ą 0, reasonable f
    2/12

    View Slide

  3. Introduction MLE Matching Kernels and Designs Examples Summary References
    The Probabilistic Numerics Approach
    Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining
    c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1
    )), . . . , sol¨(Cθ(¨, xn
    )) T
    , C = Cθ(xi
    , xj
    ) n
    i,j=1
    and choosing the weights as
    w0
    = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0
    + wTf, f = f(xi
    ) n
    i=1
    .
    yields an unbiased approximation:
    sol(f) ´ app(f, ε) ˇ
    ˇ f = y „ N 0, s2(c ´ cTC´1c)
    If n is chosen large enough to make
    2.58sac ´ cTC´1c ď ε,
    then we are assured that
    Pf
    [|sol(f) ´ app(f, ε)| ď ε] ě 99%.
    3/12

    View Slide

  4. Introduction MLE Matching Kernels and Designs Examples Summary References
    The Probabilistic Numerics Approach
    Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining
    c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1
    )), . . . , sol¨(Cθ(¨, xn
    )) T
    , C = Cθ(xi
    , xj
    ) n
    i,j=1
    and choosing the weights as
    w0
    = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0
    + wTf, f = f(xi
    ) n
    i=1
    .
    yields an unbiased approximation:
    sol(f) ´ app(f, ε) ˇ
    ˇ f = y „ N 0, s2(c ´ cTC´1c)
    If n is chosen large enough to make
    2.58sac ´ cTC´1c ď ε,
    then we are assured that
    Pf
    [|sol(f) ´ app(f, ε)| ď ε] ě 99%.
    There are issues requiring attention. 3/12

    View Slide

  5. Introduction MLE Matching Kernels and Designs Examples Summary References
    Maximum Likelihood Estimation
    Minimize minus twice the log likelihood observed with f = y:
    2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1)
    first with respect to m, then s, then θ:
    mMLE
    =
    1TC´1y
    1TC´11
    , s2
    MLE
    =
    1
    n
    yT C´1 ´
    C´111TC´1
    1TC´11
    y
    θMLE
    = argmin
    θ
    "
    n log yT C´1 ´
    C´111TC´1
    1TC´11
    y + log(det(C))
    *
    4/12

    View Slide

  6. Introduction MLE Matching Kernels and Designs Examples Summary References
    Maximum Likelihood Estimation
    Minimize minus twice the log likelihood observed with f = y:
    2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1)
    first with respect to m, then s, then θ:
    mMLE
    =
    1TC´1y
    1TC´11
    , s2
    MLE
    =
    1
    n
    yT C´1 ´
    C´111TC´1
    1TC´11
    y
    θMLE
    = argmin
    θ
    "
    n log yT C´1 ´
    C´111TC´1
    1TC´11
    y + log(det(C))
    *
    Stopping criterion becomes
    2.58
    g
    f
    f
    f
    f
    e
    c ´ cTC´1c
    n
    looooooomooooooon
    depends on design
    yT C´1 ´
    C´111TC´1
    1TC´11
    y
    looooooooooooooomooooooooooooooon
    depends on data
    ď ε,
    4/12

    View Slide

  7. Introduction MLE Matching Kernels and Designs Examples Summary References
    Maximum Likelihood Estimation
    mMLE
    =
    1TC´1y
    1TC´11
    , s2
    MLE
    =
    1
    n
    yT C´1 ´
    C´111TC´1
    1TC´11
    y
    θMLE
    = argmin
    θ
    "
    n log yT C´1 ´
    C´111TC´1
    1TC´11
    y + log(det(C))
    *
    Stopping criterion becomes
    2.58
    g
    f
    f
    f
    f
    e
    c ´ cTC´1c
    n
    looooooomooooooon
    depends on design
    yT C´1 ´
    C´111TC´1
    1TC´11
    y
    looooooooooooooomooooooooooooooon
    depends on data
    ď ε,
    Q1: How large a family of kernels, Cθ
    is needed in practice to be confident in the error bound?
    Q2: Are the better ways of finding the right θ, say cross-validation?
    Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there
    alternatives
    4/12

    View Slide

  8. Introduction MLE Matching Kernels and Designs Examples Summary References
    Low Discrepancy Sampling
    Suppose that the domain is [0, 1]d. Low discrepancy sampling places the xi
    more evenly than IID
    sampling
    IID points Sobol’ points Integration lattice points
    ¨¨¨
    Dick, J. et al. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013), H., F. J. et al.
    SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics.
    https://www.samsi.info/programs-and-activities/year-long-research-programs/2017-18-program-quasi-
    monte-carlo-high-dimensional-sampling-methods-applied-mathematics-qmc/. 5/12

    View Slide

  9. Introduction MLE Matching Kernels and Designs Examples Summary References
    Covariance Kernels that Match the Design
    Suppose that the covariance kernel, Cθ
    , and the design, txiun
    i=1
    , have special properties:
    C = Cθ(xi
    , xj
    )
    n
    i,j=1
    = C1
    , . . . , Cn
    =
    1
    n
    VΛVH, VH = nV´1, Λ = diag(λ1
    , . . . , λn
    ) = diag(λ)
    V = V1 ¨ ¨ ¨ Vn = v1 ¨ ¨ ¨ vn
    T
    , V1
    = v1
    = 1
    Suppose that VTz is a fast transform (O(n log n) cost) applied to z. Then it follows that
    λ = VTC1
    is fast, C´11 =
    1
    λ1
    Let y be the observed function values. Recall c = sol¨(sol¨¨(Cθ(¨, ¨¨))), and let
    ^
    y = VTy,
    p
    c = VTc, c = sol¨(Cθ(¨, x1
    )), . . . , sol¨(Cθ(¨, xn
    )) T
    Then using the MLE estimates, the approximate solution and the stopping criterion become:
    app(f, ε) =
    ^
    y1
    sol(1)
    n
    +
    n
    ÿ
    i=2
    ^

    i p
    ci
    λi
    , 2.58
    g
    f
    f
    e c ´
    1
    n
    n
    ÿ
    i=1
    |
    p
    ci
    |2
    λi
    1
    n2
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    ď ε
    6/12

    View Slide

  10. Introduction MLE Matching Kernels and Designs Examples Summary References
    Covariance Kernels that Match the Design
    c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi
    , xj
    )
    n
    i,j=1
    =
    1
    n
    VΛVH, Λ = diag(λ), V1
    = v1
    = 1
    VTz is O(n log n), λ = VTC1
    , C´11 =
    1
    λ1
    ^
    y = VTy,
    p
    c = VTc, c = sol¨(Cθ(¨, x1
    )), . . . , sol¨(Cθ(¨, xn
    )) T
    app(f, ε) =
    ^
    y1
    sol(1)
    n
    +
    n
    ÿ
    i=2
    ^

    i p
    ci
    λi
    , 2.58
    g
    f
    f
    e c ´
    1
    n
    n
    ÿ
    i=1
    |
    p
    ci
    |2
    λi
    1
    n2
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    ď ε
    θMLE
    = argmin
    θ
    #
    n log
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    +
    n
    ÿ
    i=1
    log(λi
    )
    +
    For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1:
    app(f, ε) =
    ^
    y1
    n
    , 2.58
    g
    f
    f
    e 1 ´
    n
    λ1
    1
    n2
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    ď ε
    7/12

    View Slide

  11. Introduction MLE Matching Kernels and Designs Examples Summary References
    Covariance Kernels that Match the Design
    c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi
    , xj
    )
    n
    i,j=1
    =
    1
    n
    VΛVH, Λ = diag(λ), V1
    = v1
    = 1
    VTz is O(n log n), λ = VTC1
    , C´11 =
    1
    λ1
    ^
    y = VTy,
    p
    c = VTc, c = sol¨(Cθ(¨, x1
    )), . . . , sol¨(Cθ(¨, xn
    )) T
    app(f, ε) =
    ^
    y1
    sol(1)
    n
    +
    n
    ÿ
    i=2
    ^

    i p
    ci
    λi
    , 2.58
    g
    f
    f
    e c ´
    1
    n
    n
    ÿ
    i=1
    |
    p
    ci
    |2
    λi
    1
    n2
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    ď ε
    θMLE
    = argmin
    θ
    #
    n log
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    +
    n
    ÿ
    i=1
    log(λi
    )
    +
    For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1:
    app(f, ε) =
    ^
    y1
    n
    , 2.58
    g
    f
    f
    e 1 ´
    n
    λ1
    1
    n2
    n
    ÿ
    i=2
    |
    p
    yi
    |2
    λi
    ď ε
    Q4: How do we avoid round-off in evaluating 1 ´ n/λ1
    ?
    7/12

    View Slide

  12. Introduction MLE Matching Kernels and Designs Examples Summary References
    Form of Matching Covariance Kernels
    Typically the domain of f is [0, 1)d, and
    C(x, t) =
    #
    r
    C(x ´ t mod 1) integration lattices
    r
    C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2
    E.g., for integration lattices
    C(x, t) =
    d
    ź
    k=1
    [1 ´ θ1
    (´1)θ2 B2θ2
    (xk ´ tk
    mod 1)], θ1 ą 0, θ2 P N
    8/12

    View Slide

  13. Introduction MLE Matching Kernels and Designs Examples Summary References
    Form of Matching Covariance Kernels
    Typically the domain of f is [0, 1)d, and
    C(x, t) =
    #
    r
    C(x ´ t mod 1) integration lattices
    r
    C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2
    E.g., for integration lattices
    C(x, t) =
    d
    ź
    k=1
    [1 ´ θ1
    (´1)θ2 B2θ2
    (xk ´ tk
    mod 1)], θ1 ą 0, θ2 P N
    Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels?
    Does this even work for function approximation?
    Q6: May we get higher order convergence with higher order nets and smoother kernels? What do
    those kernels look like?
    Q7: Are low discrepancy designs also good for function approximation?
    Q8: Are there other kernel/design combinations that expedite vector-matrix operations?
    8/12

    View Slide

  14. Introduction MLE Matching Kernels and Designs Examples Summary References
    Option Pricing
    µ = fair price =
    ż
    Rd
    e´rT max


    1
    d
    d
    ÿ
    j=1
    Sj ´ K, 0


    e´zTz/2
    (2π)d/2
    dz « $13.12
    Sj
    = S0
    e(r´σ2/2)jT/d+σxj = stock price at time jT/d,
    x = Az, AAT = Σ = min(i, j)T/d
    d
    i,j=1
    , T = 1/4, d = 13 here
    Abs. Error Median Worst 10% Worst 10%
    Tolerance Method Error Accuracy n Time (s)
    1E´2 IID diff 2E´3 100% 6.1E7 33.000
    1E´2 Scr. Sobol’ PCA 1E´3 100% 1.6E4 0.040
    1E´2 Scr. Sob. cont. var. PCA 2E´3 100% 4.1E3 0.019
    1E´2 Bayes. Latt. PCA 2E´3 99% 1.6E4 0.051
    9/12

    View Slide

  15. Introduction MLE Matching Kernels and Designs Examples Summary References
    Gaussian Probability
    µ =
    ż
    [a,b]
    exp ´1
    2
    tTΣ´1t
    a(2π)d det(Σ)
    dt =
    ż
    [0,1]d´1
    f(x) dx
    For some typical choice of a, b, Σ, d = 3; µ « 0.6763
    Rel. Error Median Worst 10% Worst 10%
    Tolerance Method Error Accuracy n Time (s)
    1E´2 IID 5E´4 100% 8.1E4 0.020
    1E´2 Scr. Sobol’ 4E´5 100% 1.0E3 0.005
    1E´2 Bayes. Latt. 5E´5 100% 4.1E3 0.023
    1E´3 IID 9E´5 100% 2.0E6 0.400
    1E´3 Scr. Sobol’ 2E´5 100% 2.0E3 0.006
    1E´3 Bayes. Latt. 3E´7 100% 6.6E4 0.076
    1E´4 Scr. Sobol’ 4E´7 100% 1.6E4 0.018
    1E´4 Bayes. Latt. 6E´9 100% 5.2E5 0.580
    1E´4 Bayes. Latt. Smth. 1E´7 100% 3.3E4 0.047
    10/12

    View Slide

  16. Introduction MLE Matching Kernels and Designs Examples Summary References
    Recap of Questions
    Q1: How large a family of kernels, Cθ
    is needed in practice to be confident in the error bound?
    Q2: Are the better ways of finding the right θ, say cross-validation?
    Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there
    alternatives
    Q4: How do we avoid round-off in evaluating 1 ´ n/λ1
    ?
    Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels?
    Does this even work for function approximation?
    Q6: May we get higher order convergence with higher order nets and smoother kernels? What do
    those kernels look like?
    Q7: Are low discrepancy designs also good for function approximation?
    Q8: Are there other kernel/design combinations that expedite vector-matrix operations?
    Q9: Is this adaptive Bayesian algorithm competitive with others?
    Q10: What other problems are amenable to matched kernels and designs beyond cubature and function
    approximation?
    11/12

    View Slide

  17. Thank you
    Slides available at www.slideshare.net/fjhickernell/
    probabilistic-numerics-2018-April-11-london

    View Slide

  18. Introduction MLE Matching Kernels and Designs Examples Summary References
    Dick, J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta
    Numer. 22, 133–288 (2013).
    H., F. J., Kuo, F. Y., L’Ecuyer, P. & Owen, A. B. SAMSI Program on Quasi-Monte Carlo and
    High-Dimensional Sampling Methods for Applied Mathematics.
    https://www.samsi.info/programs-and-activities/year-long-research-
    programs/2017-18-program-quasi-monte-carlo-high-dimensional-sampling-methods-
    applied-mathematics-qmc/.
    12/12

    View Slide