Right Ingredients for Adaptive Function Approximation

Right Ingredients for Adaptive Function Approximation

9d6eae084bd3d9a3c86e9a182224f014?s=128

Fred J. Hickernell

March 05, 2020
Tweet

Transcript

  1. The Right Ingredients for Adaptive Function Approximation Algorithms Fred J.

    Hickernell Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology hickernell@iit.edu mypages.iit.edu/~hickernell with Sou-Cheng Choi, Yuhan Ding, Mac Hyman, Xin Tong, and the GAIL team partially supported by NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) Thanks to Guohui Song for the invitation and hospitality Old Dominion University, March 5, 2020
  2. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References The Guaranteed Automatic Integration Library (GAIL) and QMCPy Teams Sou-Cheng Choi (Chief Data Scientist, Kamakura) Yuhan Ding (IIT PhD ’15, Lecturer, IIT) Lan Jiang (IIT PhD ’16, Compass) Lluís Antoni Jiménez Rugama (IIT PhD ’17, UBS) Jagadeeswaran Rathinavel (IIT PhD ’19, Wi-Tronix) Aleksei Sorokin (IIT BS + MAS ’21 exp.) Tong Xin (IIT MS, UIC PhD ’20 exp.) Kan Zhang (IIT PhD ’20 exp.) Yizhi Zhang (IIT PhD ’18, Jamran Int’l) Xuan Zhou (IIT PhD ’15, JP Morgan) and others Adaptive software libraries GAIL and QMCPy 2/14
  3. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Problem Given black-box function routine f : X ⊆ Rd → R, e.g., output of a computer simulation Expensive cost of a function value, $(f) Want fixed tolerance algorithm ALG : C × (0, ∞) → L∞(X) such that f − ALG(f, ε) ∞ ε ∀f ∈ C candidate set cheap cost of an ALG(f, ε) value, e.g., spline 3/14
  4. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Problem Given black-box function routine f : X ⊆ Rd → R, e.g., output of a computer simulation Expensive cost of a function value, $(f) Want fixed tolerance algorithm ALG : C × (0, ∞) → L∞(X) such that f − ALG(f, ε) ∞ ε ∀f ∈ C candidate set cheap cost of an ALG(f, ε) value design or node array X ∈ Xn ⊆ Rn×d, function data y = f(X) ∈ Rn xn+1 = argmax x∈X ACQ(x, X, y) acquisition function f − APP(X, y) ∞ ERR(X, y) data-driven error bound ∀n ∈ N, f ∈ C n∗ = min {n ∈ N: ERR(X, y) ε} stopping criterion ALG(f, ε) = APP(X, y) fixed budget approximation for this n∗ 3/14
  5. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Problem Given black-box function routine f : X ⊆ Rd → R, e.g., output of a computer simulation Expensive cost of a function value, $(f) Want fixed tolerance algorithm ALG : C × (0, ∞) → L∞(X) such that f − ALG(f, ε) ∞ ε ∀f ∈ C candidate set cheap cost of an ALG(f, ε) value design or node array X ∈ Xn ⊆ Rn×d, function data y = f(X) ∈ Rn xn+1 = argmax x∈X ACQ(x, X, y) acquisition function f − APP(X, y) ∞ ERR(X, y) data-driven error bound ∀n ∈ N, f ∈ C n∗ = min {n ∈ N: ERR(X, y) ε} stopping criterion ALG(f, ε) = APP(X, y) fixed budget approximation for this n∗ Adaptive sample size, design, and fixed budget approximation Assumes that what you see is almost what you get 3/14
  6. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Linear Splines X f : [a, b] → R a =: x0 < x1 < · · · < xn := b, X = xi n i=0 data sites function data y = f(X) linear spline APP(X, y) := x − xi xi−1 − xi yi−1 + x − xi−1 xi − xi−1 yi , xi−1 x xi , i ∈ 1:n f − APP(X, y) ∞,[xi−1,xi] (xi − xi−1 )2 f ∞,[xi−1,xi] 8 , i ∈ 1:n, f ∈ W2,∞ 4/14
  7. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Linear Splines X f : [a, b] → R a =: x0 < x1 < · · · < xn := b, X = xi n i=0 data sites function data y = f(X) linear spline APP(X, y) := x − xi xi−1 − xi yi−1 + x − xi−1 xi − xi−1 yi , xi−1 x xi , i ∈ 1:n f − APP(X, y) ∞,[xi−1,xi] (xi − xi−1 )2 f ∞,[xi−1,xi] 8 , i ∈ 1:n, f ∈ W2,∞ Numerical analysis often stops here, leaving unanswered questions: How big should n be to make f − APP(X, y) ∞ ε? How big is f ∞,[xi−1,xi]? How best to choose X? 4/14
  8. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Linear Splines Error f : [a, b] → R a =: x0 < x1 < · · · < xn := b, X = xi n i=0 data sites function data y = f(X) linear spline APP(X, y) := x − xi xi−1 − xi yi−1 + x − xi−1 xi − xi−1 yi , xi−1 x xi , i ∈ 1:n f − APP(X, y) ∞,[xi−1,xi] 1 8 (xi − xi−1 )2 f ∞,[xi−1,xi] , i ∈ 1:n, f ∈ W2,∞ f −∞,[xi−1,xi+1] yi+1−yi xi+1−xi − yi−yi−1 xi−xi−1 (xi+1 − xi−1 )/2 Di(X,y)=2|f[xi−1,xi,xi+1]| data based abs. 2nd deriv. of interp. poly. f ∞,[xi−1,xi+1] 5/14
  9. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Linear Splines Error f : [a, b] → R a =: x0 < x1 < · · · < xn := b, X = xi n i=0 data sites function data y = f(X) linear spline APP(X, y) := x − xi xi−1 − xi yi−1 + x − xi−1 xi − xi−1 yi , xi−1 x xi , i ∈ 1:n f − APP(X, y) ∞,[xi−1,xi] 1 8 (xi − xi−1 )2 f ∞,[xi−1,xi] , i ∈ 1:n, f ∈ W2,∞ f −∞,[xi−1,xi+1] yi+1−yi xi+1−xi − yi−yi−1 xi−xi−1 (xi+1 − xi−1 )/2 Di(X,y)=2|f[xi−1,xi,xi+1]| data based abs. 2nd deriv. of interp. poly. f ∞,[xi−1,xi+1] candidate set C := f ∈ W2,∞ : |f (x)| max C(h− ) |f (x − h− )| , C(h+ ) |f (x + h+ )| , 0 < h± < h, a < x < b inflation factor C(h) := C0 h h − h |f | does not change abruptly 5/14
  10. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Linear Splines Error f : [a, b] → R a =: x0 < x1 < · · · < xn := b, X = xi n i=0 data sites function data y = f(X) f − APP(X, y) ∞,[xi−1,xi] 1 8 (xi − xi−1 )2 f ∞,[xi−1,xi] max ± ERRi,± (X, y), i ∈ 1:n, f ∈ C candidate set C := f ∈ W2,∞ : |f (x)| max C(h− ) |f (x − h− )| , C(h+ ) |f (x + h+ )| , 0 < h± < h, a < x < b inflation factor C(h) := C0 h h − h |f | does not change abruptly ERRi,− (X, y) = 1 8 (xi − xi−1 )2C(xi − xi−3 )Di−2 (X, y), ERRi,+ (X, y) = 1 8 (xi − xi−1 )2C(xi+2 − xi−1 )Di+1 (X, y) Di (X, y) = 2 |f[xi−1 , xi , xi+1 ]| data based, absolute 2nd derivative of interpoplating polynomial 5/14
  11. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Adaptive Linear Spline Algorithm X Given ninit 4, C0 1: h = 3(b − a) ninit − 1 , C(h) = C0 h h − h n = ninit , xi = a + i(b − a)/n Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 6/14
  12. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Adaptive Linear Spline Algorithm X Given ninit 4, C0 1: h = 3(b − a) ninit − 1 , C(h) = C0 h h − h n = ninit , xi = a + i(b − a)/n Step 1. Compute data based ERRi,± (X, y) for i = 1, . . . , n. Step 2. Construct I, the index set of subintervals that might be split: I = i ∈ 1:n : ERRi±j,∓ (X, y) > ε, j = 0, 1, 2} Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 6/14
  13. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Adaptive Linear Spline Algorithm X Given ninit 4, C0 1: h = 3(b − a) ninit − 1 , C(h) = C0 h h − h n = ninit , xi = a + i(b − a)/n Step 1. Compute data based ERRi,± (X, y) for i = 1, . . . , n. Step 2. Construct I, the index set of subintervals that might be split: I = i ∈ 1:n : ERRi±j,∓ (X, y) > ε, j = 0, 1, 2} Step 3. If I = ∅, return ALG(f, ε) = APP(X, y) as the approximation satisfying the error tolerance. Otherwise split those intervals in I with largest width and go to Step 1 (acquisition function). Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 6/14
  14. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Adaptive Linear Spline Algorithm X Given ninit 4, C0 1: h = 3(b − a) ninit − 1 , C(h) = C0 h h − h n = ninit , xi = a + i(b − a)/n Step 1. Compute data based ERRi,± (X, y) for i = 1, . . . , n. Step 2. Construct I, the index set of subintervals that might be split: I = i ∈ 1:n : ERRi±j,∓ (X, y) > ε, j = 0, 1, 2} Step 3. If I = ∅, return ALG(f, ε) = APP(X, y) as the approximation satisfying the error tolerance. Otherwise split those intervals in I with largest width and go to Step 1 (acquisition function). Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 6/14
  15. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  16. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  17. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  18. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  19. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Impossible to have an algorithm for all f ∈ W2,∞ since W2,∞ contains arbitrarily large functions that look like 0 Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  20. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Impossible to have an algorithm for all f ∈ W2,∞ since W2,∞ contains arbitrarily large functions that look like 0 Adaptive algorithms do not help for ball candidate sets C = {f : f ∞ R} Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  21. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Impossible to have an algorithm for all f ∈ W2,∞ since W2,∞ contains arbitrarily large functions that look like 0 Adaptive algorithms do not help for ball candidate sets C = {f : f ∞ R} cost(ALG, f, ε, C) C0 f 1 2 ε comp(f, ε, C) optimal Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  22. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Impossible to have an algorithm for all f ∈ W2,∞ since W2,∞ contains arbitrarily large functions that look like 0 Adaptive algorithms do not help for ball candidate sets C = {f : f ∞ R} cost(ALG, f, ε, C) C0 f 1 2 ε comp(f, ε, C) optimal Does not allow for smoothness to be inferred Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  23. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Highlights of Adaptive Linear Spline Algorithm X Defined for cone candidate set, C, whose definition does not depend on the algorithm Guaranteed to succeed for all f ∈ C Candidate set C excludes spikes, i.e., two nearby inflection points C formalizes what you see is almost what you get Impossible to have an algorithm for all f ∈ W2,∞ since W2,∞ contains arbitrarily large functions that look like 0 Adaptive algorithms do not help for ball candidate sets C = {f : f ∞ R} cost(ALG, f, ε, C) C0 f 1 2 ε comp(f, ε, C) optimal Does not allow for smoothness to be inferred Not multivariate Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). 7/14
  24. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Approximation via Reproducing Kernel Hilbert Spaces (RKHSs) X F is a Hilbert space with reproducing kernel K : X × X → R K(X, X) positive definite ∀X K(·, x) ∈ F, f(x) = K(·, x), f F ∀x ∈ X, e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn Fasshauer, G. E. Meshfree Approximation Methods with M . (World Scientific Publishing Co., Singapore, 2007), Fasshauer, G. E. & McCourt, M. Kernel-based Approximation Methods using MATLAB. (World Scientific Publishing Co., Singapore, 2015). 8/14
  25. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Approximation via Reproducing Kernel Hilbert Spaces (RKHSs) X F is a Hilbert space with reproducing kernel K : X × X → R K(X, X) positive definite ∀X K(·, x) ∈ F, f(x) = K(·, x), f F ∀x ∈ X, e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn Optimal (minimum norm) interpolant is APP(X, y) = K(·, X) K(X, X) −1 y, y = f(X) f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ f − APP(X, y) 2 F known Fasshauer, G. E. Meshfree Approximation Methods with M . (World Scientific Publishing Co., Singapore, 2007), Fasshauer, G. E. & McCourt, M. Kernel-based Approximation Methods using MATLAB. (World Scientific Publishing Co., Singapore, 2015). 8/14
  26. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Approximation via Reproducing Kernel Hilbert Spaces (RKHSs) X F is a Hilbert space with reproducing kernel K : X × X → R K(X, X) positive definite ∀X K(·, x) ∈ F, f(x) = K(·, x), f F ∀x ∈ X, e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn Optimal (minimum norm) interpolant is APP(X, y) = K(·, X) K(X, X) −1 y, y = f(X) f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ f − APP(X, y) 2 F known candidate set C = f ∈ F : f − APP(X, y) F C(X) f F Fasshauer, G. E. Meshfree Approximation Methods with M . (World Scientific Publishing Co., Singapore, 2007), Fasshauer, G. E. & McCourt, M. Kernel-based Approximation Methods using MATLAB. (World Scientific Publishing Co., Singapore, 2015). 8/14
  27. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Approximation via Reproducing Kernel Hilbert Spaces (RKHSs) X F is a Hilbert space with reproducing kernel K : X × X → R K(X, X) positive definite ∀X K(·, x) ∈ F, f(x) = K(·, x), f F ∀x ∈ X, e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn Optimal (minimum norm) interpolant is APP(X, y) = K(·, X) K(X, X) −1 y, y = f(X) f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ f − APP(X, y) 2 F known K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) APP(X, y) 2 F =: ERR2(X, y) candidate set C = f ∈ F : f − APP(X, y) F C(X) f F Fasshauer, G. E. Meshfree Approximation Methods with M . (World Scientific Publishing Co., Singapore, 2007), Fasshauer, G. E. & McCourt, M. Kernel-based Approximation Methods using MATLAB. (World Scientific Publishing Co., Singapore, 2015). 8/14
  28. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Error and Acquisition for Optimal RKHS Approximation X F is a Hilbert space with reproducing kernel K : X × X → R e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn APP(X, y) = K(·, X) K(X, X) −1 y, y = f(X) candidate set C = f ∈ F : f − APP(X, y) F C(X) f F f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ f − APP(X, y) 2 F known K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) APP(X, y) 2 F =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) APP(X, y) 2 F yT(K(X,X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function ALG(f, ε) = APP(X, y) for n∗ = min {n ∈ N: ERR(X, y) ε} stopping criterion 9/14
  29. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Error and Acquisition for Optimal RKHS Approximation X X X F is a Hilbert space with reproducing kernel K : X × X → R e.g., K(t, x) = (1 + t − x 2 ) exp − t − x 2 Matérn APP(X, y) = K(·, X) K(X, X) −1 y, y = f(X) candidate set C = f ∈ F : f − APP(X, y) F C(X) f F f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ f − APP(X, y) 2 F known K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) APP(X, y) 2 F =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) APP(X, y) 2 F yT(K(X,X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function ALG(f, ε) = APP(X, y) for n∗ = min {n ∈ N: ERR(X, y) ε} stopping criterion 9/14
  30. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Must Infer Kernel from y = f(X) Fθ is a Hilbert space with reproducing kernel Kθ C = f ∈ ∪ θ Fθ : f − APP(X, y) Fθ∗ C(X) f Fθ∗ ∀X, y = f(X), θ∗(X, y) given below e.g., Kθ(t, x) = (1 + θ (t − x) 2 ) exp − θ (t − x) 2 Choose the θ (inspired by empirical Bayes) by minimizing the ellipsoid in Rn of function data yielding interpolants with no greater norm than that observed: θ∗ = argmin θ 1 n log det(Kθ) + log yT K−1 θ y f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) yT(K(X, X))−1y =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) yT(K(X, X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function 10/14
  31. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Must Infer Kernel from y = f(X) X X X Fθ is a Hilbert space with reproducing kernel Kθ C = f ∈ ∪ θ Fθ : f − APP(X, y) Fθ∗ C(X) f Fθ∗ ∀X, y = f(X), θ∗(X, y) given below e.g., Kθ(t, x) = (1 + θ (t − x) 2 ) exp − θ (t − x) 2 Choose the θ (inspired by empirical Bayes) by minimizing the ellipsoid in Rn of function data yielding interpolants with no greater norm than that observed: θ∗ = argmin θ 1 n log det(Kθ) + log yT K−1 θ y f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) yT(K(X, X))−1y =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) yT(K(X, X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function 10/14
  32. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Must Infer Kernel from y = f(X) X X X Fθ is a Hilbert space with reproducing kernel Kθ C = f ∈ ∪ θ Fθ : f − APP(X, y) Fθ∗ C(X) f Fθ∗ ∀X, y = f(X), θ∗(X, y) given below e.g., Kθ(t, x) = exp(bT(t + x)) × (1 + a (t − x) 2 ) exp − a (t − x) 2 , θ = (a, b) Choose the θ (inspired by empirical Bayes) by minimizing the ellipsoid in Rn of function data yielding interpolants with no greater norm than that observed: θ∗ = argmin θ 1 n log det(Kθ) + log yT K−1 θ y f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) yT(K(X, X))−1y =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) yT(K(X, X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function 10/14
  33. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Must Infer Kernel from y = f(X) X X X Fθ is a Hilbert space with reproducing kernel Kθ C = f ∈ ∪ θ Fθ : f − APP(X, y) Fθ∗ C(X) f Fθ∗ ∀X, y = f(X), θ∗(X, y) given below e.g., Kθ(t, x) = exp(bT(t + x)) × (1 + a (t − x) 2 ) exp − a (t − x) 2 , θ = (a, b) Choose the θ (inspired by empirical Bayes) by minimizing the ellipsoid in Rn of function data yielding interpolants with no greater norm than that observed: θ∗ = argmin θ 1 n log det(Kθ) + log yT K−1 θ y f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) yT(K(X, X))−1y =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) yT(K(X, X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function 10/14
  34. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Must Infer Kernel from y = f(X) X X X Fθ is a Hilbert space with reproducing kernel Kθ C = f ∈ ∪ θ Fθ : f − APP(X, y) Fθ∗ C(X) f Fθ∗ ∀X, y = f(X), θ∗(X, y) given below e.g., Kθ(t, x) = exp(bT(t + x)) × (1 + a (t − x) 2 ) exp − a (t − x) 2 , θ = (a, b) Choose the θ (inspired by empirical Bayes) by minimizing the ellipsoid in Rn of function data yielding interpolants with no greater norm than that observed: θ∗ = argmin θ 1 n log det(Kθ) + log yT K−1 θ y f − APP(X, y) 2 ∞ K(·, ·) − K(·, X) K(X, X) −1 K(X, ·) ∞ C2(X) 1 − C2(X) yT(K(X, X))−1y =: ERR2(X, y) ACQ(x, X, y) := K(x, x) − K(x, X) K(X, X) −1 K(X, x) C2(X) 1 − C2(X) yT(K(X, X))−1y xn+1 = argmax x∈X ACQ(x, X, y) acquisition function 10/14
  35. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Cheng and Sandu Function f(x) = cos(x1 + x2 ) exp(x1 x2 ) error w/ Matérn & θ = 1 error w/ mod. Matérn + opt. θ ε = 0.05 Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/, Cheng, H. & Sandu, A. Collocation least-squares polynomial chaos method. in Proceedings of the 2010 Spring Simulation Multiconference, Society for Computer Simulation International. (2010). 11/14
  36. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References Cheng and Sandu Function f(x) = cos(x1 + x2 ) exp(x1 x2 ) error w/ Matérn & θ = 1 error w/ mod. Matérn + opt. θ ε = 0.05 Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/, Cheng, H. & Sandu, A. Collocation least-squares polynomial chaos method. in Proceedings of the 2010 Spring Simulation Multiconference, Society for Computer Simulation International. (2010). 11/14
  37. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References What Are the Right Ingredients for Adaptive Function Approximation? A fixed budget homogeneous approximation, APP : Xn × Rn → L∞(X), with an error bound, e.g., linear splines, RKHS approximation An unbounded, non-convex candidate set, C, for which the error bound can be bounded in data-driven way; what you see is almost what you get Necessary conditions for f to lie in C; will not have sufficient conditions A rich enough candidate set from which the right approximation can be inferred; attention to underfitting and overfitting 12/14
  38. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References What Are the Right Ingredients for Adaptive Function Approximation? A fixed budget homogeneous approximation, APP : Xn × Rn → L∞(X), with an error bound, e.g., linear splines, RKHS approximation An unbounded, non-convex candidate set, C, for which the error bound can be bounded in data-driven way; what you see is almost what you get Necessary conditions for f to lie in C; will not have sufficient conditions A rich enough candidate set from which the right approximation can be inferred; attention to underfitting and overfitting More work is needed on What makes a good initial sample Balancing the richness of the candidate set with overfitting Numerical instability and computational effort challenges for larger numbers of data sites. 12/14
  39. Thank you These slides are available at speakerdeck.com/fjhickernell/ right-ingredients-for-adaptive-function-approximation

  40. Introduction Univariate, Low Accuracy Multivariate, Reproducing Kernel Hilbert Space Summary

    References References Choi, S.-C. T., Ding, Y., H., F. J. & Tong, X. Local Adaption for Approximation and Minimization of Univariate Functions. J. Complexity 40, 17–33 (2017). Fasshauer, G. E. Meshfree Approximation Methods with M . (World Scientific Publishing Co., Singapore, 2007). Fasshauer, G. E. & McCourt, M. Kernel-based Approximation Methods using MATLAB. (World Scientific Publishing Co., Singapore, 2015). Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/. Cheng, H. & Sandu, A. Collocation least-squares polynomial chaos method. in Proceedings of the 2010 Spring Simulation Multiconference, Society for Computer Simulation International. (2010). 14/14