SAMSI QMC Transition 2018 May

Function Approximation When Function Values Are Expensive Fred J. Hickernell1
Simon Mak2 1Department of Applied Mathematics, Illinois Institute of Technology, [email protected] 2School of Industrial and System Engineering, Georgia Institute of Technology Supported by NSF-DMS-1522687 and DMS-1638521 (SAMSI) SAMSI-QMC Transition Workshop, May 7, 2018

Background Approx. by Series Coeﬃcients Approx. by Function Values References
Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks 2/25

Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders Especially Art, who keeps trying to push QMC out of its box 2/25

Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders Especially Art, who keeps trying to push QMC out of its box Many of you with whom I have had fruiful discussions, especially WGs 2, 4, and 5 2/25

Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders Especially Art, who keeps trying to push QMC out of its box Many of you with whom I have had fruiful discussions, especially WGs 2, 4, and 5 Mac Hyman, who Promoted QMC to SAMSI behind the scenes Introduced me to problems with expensive function values Co-led WG 5 2/25

Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders Especially Art, who keeps trying to push QMC out of its box Many of you with whom I have had fruiful discussions, especially WGs 2, 4, and 5 Mac Hyman, who Promoted QMC to SAMSI behind the scenes Introduced me to problems with expensive function values Co-led WG 5 Henryk Woźniakowski, whose work on tractability has inspired some of what I will say 2/25

Appendix Thanks to ... SAMSI for sponsoring this program in Quasi-Monte Carlo Methods and High Dimensional Sampling for Applied Mathematics Especially Ilse Ipsen, Karem Jackson, Sue McDonald, Rick Scoggins, Thomas Gehrmann, Richard Smith, and David Banks Frances Kuo, Pierre L’Ecuyer, and Art Owen, fellow program leaders Especially Art, who keeps trying to push QMC out of its box Many of you with whom I have had fruiful discussions, especially WGs 2, 4, and 5 Mac Hyman, who Promoted QMC to SAMSI behind the scenes Introduced me to problems with expensive function values Co-led WG 5 Henryk Woźniakowski, whose work on tractability has inspired some of what I will say Kai-Tai Fang, who introduced me to experimental design 2/25

Appendix Approximating Functions When Function Values Are Expensive Interested in f : [−1, 1]d → R, e.g., the result of a climate model, or a ﬁnancial calculation d is dozens or a few hundred $(f) = cost to evaluate f(x) for any x ∈ [−1, 1]d = hours or days or $1M Want to construct a surrogate model, fapp ≈ f, with $(fapp) = $0.000001 so that we may quickly explore (plot, integrate, optimize, search for sharp gradients of) f fapp is constructed using n pieces of information about f Want f − fapp ∞ ε for n = O(dpε−q) as d ↑ ∞ or ε ↓ 0 (with small p and q) Assume $(f) nr for any practical n and any positive r, so the cost of the algorithm is O($(f)n) 3/25

Appendix Functions Expressed at Series Let f : [−1, 1]d → R have L2([−1, 1]d, ) an orthogonal series expansion: f(x) = j∈Nd 0 f(j)φj(x), φj(x) = φj1 (x1) · · · φjd (xd), φj ∞ = 1 f(j) = f, φj φj, φj , f, g := [−1,1]d f(x)g(x) (x) dx Legendre polynomials: 1 −1 φj(x)φk(x) dx = cjδj,k Chebyshev polynomials: φj(x) = cos(j arccos(x)), 1 −1 φj(x)φk(x) √ 1 − x2 dx = cjδj,k 4/25

Appendix Example Bases Legendre Chebyshev 5/25

Appendix Approximation by Series Coefficients f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1 Suppose that we may observe the series coefficients f(j) at a cost of $1M each. (Eventually we want to consider the case of observing function values.) For any vector of non-negative constants, γ = (γj)j∈Nd 0 , define the norm f q,γ := f(j) γj j∈Nd 0 q , 0/0 = 0, γj = 0 & f ∞,γ < ∞ =⇒ f(j) = 0 Order the wavenumbers j such that γj1 γj2 · · · . The optimal approximation why? to f given the choice of n series coefficients is fapp(x) = n i=1 f(ji)φji , f − fapp ∞ = ∞ i=n+1 f(ji)φji ∞ loose f−fapp 1 tight optimal f ∞,γ ∞ i=n+1 γji 6/25

Appendix How Quickly Does Error Decay? f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ loose f − fapp 1 tight optimal f ∞,γ ∞ i=n+1 γji An often used trick (q > 0): γjn+1 1 n γ1/q j1 + · · · + γ1/q jn q 1 nq γ 1/q , γ 1/q = j∈Nd 0 γ1/q j q ∞ i=n+1 γji γ 1/q ∞ i=n 1 iq γ 1/q (q − 1)(n − 1)q−1 rate controlled by ﬁniteness of γ 1/q 7/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε n = O      f ∞,γ γ 1/q ε   1/(q−1)   is suﬃcient To succeed with n = O(dp) , we need γ 1/q = O(dp ) Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in Mathematics 6 (European Mathematical Society, Zürich, 2008), Kühn, T. et al. Approximation numbers of Sobolev embeddings—Sharp constants and tractability. J. Complexity 30, 95–116 (2014). 8/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε γ 1/q = O(dp ) =⇒ n = O(dp) What remains? 8/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε γ 1/q = O(dp ) =⇒ n = O(dp) What remains? How do we infer γ in practice? Tradition ﬁxes something convenient. 8/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε γ 1/q = O(dp ) =⇒ n = O(dp) What remains? How do we infer γ in practice? Tradition ﬁxes something convenient. How do we infer a bound on f ∞,γ ? 8/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε γ 1/q = O(dp ) =⇒ n = O(dp) What remains? How do we infer γ in practice? Tradition ﬁxes something convenient. How do we infer a bound on f ∞,γ ? How do we approximate using function values, not series coeﬃcients? 8/25

Appendix Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε γ 1/q = O(dp ) =⇒ n = O(dp) What remains? Assume that the function is nice enough to allow this inference. How do we infer γ in practice? Tradition ﬁxes something convenient. How do we infer a bound on f ∞,γ ? How do we approximate using function values, not series coeﬃcients? 8/25

Appendix Main New Ideas It is assumed that the f is nice enough to justify the following: Inferring γ Assume a structure informed by experimental design principles. Infer coordinate importance from a pilot sample with wavenumbers J := {(0, . . . , 0, j, 0 . . . , 0) : j = 0, . . . , n0} = {jek : j = 0, . . . , n0, k = 1, . . . , d} Inferring f ∞,γ Iteratively add wavenumber with largest γj to J. Inﬂate the norm that is observed so far and assume f ∞,γ C fj j∈J ∞,γ Function values Let the new wavenumber, j, pick the next design point via a shifted van der Corput sequence. Use interpolation to estimate fj j∈J . 9/25

Appendix Product, Order, and Smoothness Dependent (POSD) Weights f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = f(j) /γj j∈Nd 0 q j∈Nd 0 γ1/q j = O(dp ) =⇒ f − fapp ∞ ε for n = O(dp) if f ∞,γ < ∞ Experimental design assumes Effect sparsity: Only a small number of effects are important Effect hierarchy: Lower-order effects are more important than higher-order effects Effect heredity: Interaction is active only if both parent effects are also active Effect smoothness: Coarse horizontal scales are more important than fine horizontal scales Consider product, order, and smoothness dependent (POSD) weights: γj = Γ j 0 d =1 j >0 w sj , Γ0 = s1 = 1,      w = coordinate importance Γr = order size sj = smoothness degree Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design Optimization. (John Wiley & Sons, Inc., New York, 2000). 10/25

Appendix Product, Order, and Smoothness Dependent Weights Effect sparsity: Only a small number of effects are important Effect hierarchy: Lower-order effects are more important than higher-order effects Effect heredity: Interaction is active only if both parent effects are also active Effect smoothness: Coarse horizontal scales are more important than fine horizontal scales Consider product, order and smoothness dependent weights: γj = Γ j 0 d =1 j >0 w sj , Γ0 = s1 = 1,      w = coordinate importance Γr = order size sj = smoothness degree j∈Nd 0 γ1/q j = u⊆1:d    Γ1/q |u| ∈u w1/q   ∞ j=1 s1/q j   |u|   = O(dp ) =⇒ f − fapp ∞ ε for n = O(dp) if f ∞,γ < ∞ 11/25

Appendix Special Cases of Weights j∈Nd 0 γ1/q j = u⊆1:d    Γ1/q |u| ∈u w1/q   ∞ j=1 s1/q j   |u|   Want = O(dp ) Coordinates, orders equally important Γr = w = 1 j∈Nd 0 γ1/q j =  1 + ∞ j=1 s1/q j   d Fail Coordinates equally important No interactions w = Γ1 = 1, Γr = 0 ∀r > 1 j∈Nd 0 γ1/q j = 1 + d ∞ j=1 s1/q j Success Coordinates diﬀer in importance Interactions equally important Γr = 1 j∈Nd 0 γ1/q j exp   ∞ k=1 w1/q k ∞ j=1 s1/q j   Success 12/25

Appendix 13/25

Appendix Algorithm When Both γ and f ∞,γ Are Inferred Require: Γ = vector of order sizes s = vector of smoothness degrees w∗ = max k wk n0 = minimum number of wavenumbers in each coordinate C = inflation factor f = a black-box series coefficient generator for the function of interest, f, where f ∞,γ C fj j∈J ∞,γ , J := {(0, . . . , 0, j, 0 . . . , 0) : j = 0, . . . , n0 } for all γ ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Evaluate f(j) for j ∈ J 2: Define w = min argmin w w∗ fj j∈J ∞,γ 3: Let n = min n : ∞ i=n +1 γji ε C fj j∈J ∞,γ 4: Compute fapp = n i=1 f(ji )φji Computational cost is n = O ε−1C f ∞,γ γ 1/q 1/(q−1) 14/25

Appendix Example f manufactured in terms of random series coeﬃcients 15/25

Appendix A Gap Between Theory and Practice Theory using series coeﬃcients Photo Credit: Xinhua Practice using function values 16/25

Appendix A Very Sparse Grid on [−1, 1]d j 0 1 2 3 4 · · · van der Corput tj 0 1/2 1/4 3/4 1/8 · · · ψ(tj) := 2(tj + 1/3 mod 1) − 1 −1/3 2/3 1/6 −5/6 −1/12 · · · ψ(tj) := − cos(π(tj + 1/3 mod 1)) −0.5 0.8660 0.2588 −0.9659 −0.1305 · · · To estimate f(j), j ∈ J, use the design {(ψ(tj1 ), . . . , ψ(tjd ) : j ∈ J}. E.g., for J = {(0, 0, 0, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (2, 0, 0, 0), (3, 0, 0, 0), (1, 1, 0, 0)} Even Points ArcCos Points · · 17/25

Appendix Algorithm Using Function Values When Both γ and f ∞,γ Are Inferred Require: Γ = vector of order sizes s = vector of smoothness degrees w∗ = max k wk n0 = minimum number of wavenumbers in each coordinate C = inﬂation factor f = a black-box function value generator ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Approximate f(j) for j ∈ J := {(0, . . . , 0, j, 0 . . . , 0) : j = 1, . . . , n0 } by interpolating the function data {(xj , f(xj )) : xj = ψ(tj1 ), . . . , ψ(tjd ), j ∈ J} 2: Deﬁne w = min argmin w w∗ fj j∈J ∞,γ 3: while C fj j∈J ∞,γ j/ ∈J γj > ε do 4: Add argmin j/ ∈J γj to J 5: Approximate f(j) for j ∈ J by interpolating the function data {(xj , f(xj )) : x = ψ(tj1 ), . . . , ψ(tjd ), j ∈ J} 6: end while 7: Compute fapp = j∈J f(j)φj 18/25

Appendix Example f(x) = exp((x2 + 1)(x3 + 1)/4) cos((x2 + 1)/2 + (x3 + 1)/2), d = 6 Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/. 19/25

Appendix OTL CircuitExample Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/. 20/25

Appendix Summary Functions must be nice to succeed with few function values Ideas underlying experimental design and tractability show us how to define “nice” Effect sparsity, hierarchy, heredity, and smoothness Product, order, and smoothness dependent (POSD) weighted function spaces Infer properties of f from limited data (γ, f ∞,γ , f) Must assume some structure on weights to make progress at all Design determined by wavenumbers included in approximation via van der Corput, preserves low condition number of the design matrix Gap in theory when sampling function values versus series coefficients Sample size seems to be larger than necessary Can we also infer the smoothness weights? 21/25

Thank you These slides are available at speakerdeck.com/fjhickernell/samsi-qmc-transition-2018-may

Appendix Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in Mathematics 6 (European Mathematical Society, Zürich, 2008). Kühn, T., Sickel, W. & Ullrich, T. Approximation numbers of Sobolev embeddings—Sharp constants and tractability. J. Complexity 30, 95–116 (2014). Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design Optimization. (John Wiley & Sons, Inc., New York, 2000). Bingham, D. & Surjano, S. Virtual Library of Simulation Experiments. 2013. https://www.sfu.ca/~ssurjano/. 23/25

Appendix In What Sense Is This Optimal? f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj φj, φj , φj ∞ = 1, f q,γ = |f(j)| γj j∈Nd 0 q γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ loose f − fapp 1 tight optimal f ∞,γ ∞ i=n+1 γji For any other approximation, g, based on series coeﬃcients, {f(j)} j∈J with |J| = n, sup h : ^ h ∞,γ = f ∞,γ ^ h(j)=f(j) ∀j∈J ^ h − ^ g 1 = f(j) − ^ g(j) j∈J 1 + sup h : ^ h ∞,γ = f ∞,γ ^ h(j) − ^ g(j) j/ ∈J 1 sup h : ^ h ∞,γ = f ∞,γ ^ h(j) j/ ∈J 1 = sup h : ^ h ∞,γ = f ∞,γ ^ h ∞,γ j/ ∈J γj f ∞,γ ∞ i=n+1 γji back 23/25

Appendix Inferring γ from Data Given (estimates of) series coeﬃcients, f(j) for j ∈ J := {(0, . . . , 0, j, 0 . . . , 0) : j = 1, . . . , n0}, and ﬁxed {Γr}d r=0 , and {sj}∞ f=0 , note that f(j) j∈J ∞,γ = max j∈J |f(j)| γj = 1 Γ1 max k=1,...,d fk,max wk , fk,max := sup j=1,...,n0 |f(jek)| sj We choose wk = fk,max max f ,max , f(j) j∈J ∞,γ = max f ,max Γ1 24/25

Appendix Tail Sum of γ The term ∞ i=n+1 γji = ∞ i=1 γji − n i=1 γji appears in the error bound. For certain γ of PSD form, we can compute the ﬁrst sum on the right: j∈Nd 0 γj = u⊆1:d    ∈u w   ∞ j=1 sj   |u|   = d =1 (1 + w ssum), ssum = ∞ j=1 sj 25/25

SAMSI QMC Transition 2018 May

SAMSI QMC Transition 2018 May

More Decks by Fred J. Hickernell

Other Decks in Research

Featured

Transcript