48

# SAMSI-QMC WG 5-3 Research Problem

Ongoing work with Simon Mak for SAMSI-QMC March 07, 2018

## Transcript

1. ### Work in Progress: Function Approximation When Function Values Are Expensive

Fred J. Hickernell Department of Applied Mathematics, Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Supported by NSF-DMS-1522687 and DMS-1638521 (SAMSI) Working Group V.3, March 7, 2018
2. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Prologue May 7–9 we have our SAMSI-QMC Transitions Workshop where we should report on our progress. These slides summarize ongoing work by Simon Mak and me. Comments are welcome. 2/20
3. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Approximating Functions When Function Values Are Expensive Interested in some f : Ω → R, where Ω ⊆ Rd, e.g., the result of a climate model, or a ﬁnancial calculation d is a dozen, or dozens, or a few hundred \$(f) = cost to evaluate f(x) for any x ∈ Ω = hours or days or \$1M Want to construct a surrogate model, fapp, with fapp ≈ f, such that \$(fapp) = \$0.000001 so that we may quickly explore (plot, integrate, optimize, search for sharp gradients of) f fapp is constructed using n pieces of information about f, such as values of f or Fourier coeﬃcients of f Want f − fapp ∞ ε for n = O(dε−q) as d ↑ ∞ or ε ↓ 0 Assume \$(f) nr for any practical n and any positive r 3/20
4. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Functions Expressed at Series Let F be a vector space of functions f : [a, b]d → R that have L2([a, b]d, ) orthogonal series expansions: f(x) = j∈Nd 0 f(j)φj(x), φj(x) = φj1 (x1) · · · φjd (xd) f(j) = f, φj = [a,b]d f(x)φj(x) (x) dx Legendre polynomials: 1 −1 φj(x)φk(x) dx = δj,k Chebyshev polynomials: φj(x) = pj cos(j arccos(x)), 1 −1 φj(x)φk(x) √ 1 − x2 dx = δj,k 4/20
5. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Approximation by Fourier Coeﬃcients f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj , pj = φj ∞ Suppose that we may observe the Fourier coeﬃcients f(j) at a cost of \$1M each. (Eventually we want to consider the case of observing function values.) For any vector of non-negative constants, γ = (γj)j∈Nd 0 , deﬁne the family of quasi-norms on F: f q,γ = f(j) pj γj j∈Nd 0 q , 0/0 = 0, γj = 0 & f ∞,γ < ∞ =⇒ f(j) = 0 Order the wavenumbers j such that γj1 γj2 · · · . The optimal to f given n Fourier coeﬃcients chosen optimally is fapp(x) = n i=1 f(ji)φji , f − fapp ∞ = ∞ i=n+1 f(ji)φji ∞ f −fapp 1,1 tight f ∞,γ ∞ i=n+1 γji 5/20
6. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References In What Sense Is This Optimal? f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj , pj = φj ∞ , f q,γ = f(j) pj γj j∈Nd 0 q γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1,1 tight f ∞,γ ∞ i=n+1 γji For any other approximation, g, based on n Fourier coeﬃcients, f(j) with j ∈ J and J having cardinality n, f − ^ g 1,1 = j∈J f(j) − ^ g(j) pj + j/ ∈J f(j) − ^ g(j) pj f + ^ g ∞,γ j/ ∈J γj f ∞,γ ∞ i=n+1 γji 6/20
7. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References How Quickly Does Error Decay? f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj , pj = φj ∞ , f q,γ = f(j) pj γj j∈Nd 0 q γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1,1 f ∞,γ ∞ i=n+1 γji A trick that is often used (q > 0): γjn+1 1 n γ1/q j1 + · · · + γ1/q jn q 1 nq γ 1/q , γ 1/q = j∈Nd 0 γ1/q j q ∞ i=n+1 γji γ 1/q ∞ i=n 1 iq γ 1/q (q − 1)(n − 1)q−1 7/20
8. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Recap f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj , pj = φj ∞ , f q,γ = f(j) pj γj j∈Nd 0 q dependence of f on d is hidden γj1 γj2 · · · , fapp(x) = n i=1 f(ji)φji , f − fapp ∞ f − fapp 1,1 f ∞,γ ∞ i=n+1 γji f ∞,γ γ 1/q (q − 1)(n − 1)q−1 Want ε n = O      f ∞,γ γ 1/q ε   1/(q−1)   is suﬃcient To succeed with n = O(d) , we need γ 1/q = O(dq−1) Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in Mathematics 6 (European Mathematical Society, Zürich, 2008), Kühn, T. et al. Approximation numbers of Sobolev embeddings—Sharp constants and tractability. J. Complexity 30, 95–116 (2014). 8/20
9. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Product, Order, and Smoothness Dependent Weights f(x) = j∈Nd 0 f(j)φj(x), f(j) = f, φj , pj = φj ∞ , f q,γ = f(j) pj γj j∈Nd 0 q j∈Nd 0 γ1/q j = O(d(q−1)/q) =⇒ f − fapp ∞ ε for n = O(d) if f ∞,γ < ∞ Experimental design assumes Eﬀect sparsity: Only a small number of eﬀects are important Eﬀect hierarchy: Lower-order eﬀects are more important than higher-order eﬀects Eﬀect heredity: Interaction is active only if both parent eﬀects are also active Eﬀect smoothness: Coarse horizontal scales are more important than ﬁne horizontal scales Consider product, order and smoothness dependent weights: γj = Γ j 0 d =1 w sj , Γ0 = w0 = s0 = 1,      w = coordinate importance Γr = order size sj = smoothness degree Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design Optimization. (John Wiley & Sons, Inc., New York, 2000). 9/20
10. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Product, Order, and Smoothness Dependent Weights Eﬀect sparsity: Only a small number of eﬀects are important Eﬀect hierarchy: Lower-order eﬀects are more important than higher-order eﬀects Eﬀect heredity: Interaction is active only if both parent eﬀects are also active Eﬀect smoothness: Coarse horizontal scales are more important than ﬁne horizontal scales Consider product, order and smoothness dependent weights: γj = Γ j 0 d =1 w sj , Γ0 = w0 = s0 = 1,      w = coordinate importance Γr = order size sj = smoothness degree j∈Nd 0 γ1/q j = u⊆1:d    Γ1/q |u| ∈u w1/q   ∞ j=1 s1/q j   |u|   = O(d(q−1)/q) =⇒ f − fapp ∞ ε for n = O(d) if f ∞,γ < ∞ 10/20
11. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Special Cases of Weights j∈Nd 0 γ1/q j = u⊆1:d    Γ1/q |u| ∈u w1/q   ∞ j=1 s1/q j   |u|   Want = O(d(q−1)/q) Γr = w = 1 : j∈Nd 0 γ1/p j =   ∞ j=0 s1/p j   d Fail w = Γ1 = 1, Γr = 0 ∀r > 1 : j∈Nd 0 γ1/p j = 1 + d ∞ j=1 s1/p j Near Success Γr = 1 : j∈Nd 0 γ1/p j exp   ∞ k=1 w1/q k ∞ j=1 s1/q j   Success 11/20
12. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Algorithm When Both γ and f ∞,γ Are Known Require: γ = vector of weights with ordering γj1 γj2 · · · f = black-box Fourier coeﬃcient generator f ∞,γ = norm of the Fourier coeﬃcients ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Let n = min    n : ∞ i=n +1 γji ε f ∞,γ    2: Compute fapp = n i=1 f(ji)φji Computational cost is n = O ε−1 f ∞,γ γ 1/q 1/(q−1) ; γ determines the design 12/20
13. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Algorithm When γ Is Known and f ∞,γ Is Inferred Require: γ = vector of weights with ordering γj1 γj2 · · · n0 = minimum number of wavenumbers C = inﬂation factor f = black-box Fourier coeﬃcient generator for the function of interest, f, where f ∞,γ C fji n0 i=1 ∞,γ ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Evaluate f(j1), . . . , f(jn0 ) 2: Let n = min    n > n0 : ∞ i=n +1 γji ε C fji n0 i=1 ∞,γ    3: Compute fapp = n i=1 f(ji)φji Computational cost is n = O ε−1C f ∞,γ γ 1/q 1/(q−1) ; γ determines the design 13/20
14. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Algorithm When Both γ and f ∞,γ Are Inferred The order of sampling the Fourier coeﬃcients is determined by the γ, but in practice the relative size of the Fourier coeﬃcients are not known, and thus γ should be inferred. As a ﬁrst step we try γj = Γ j 0 d =1 w sj , Γ0 = w0 = s0 = 1,      w = coordinate importance Γr = order size sj = smoothness degree with the Γr and sj ﬁxed, but the w inferred. We want to infer the relative importance of the diﬀerent coordinates. 14/20
15. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Algorithm When Both γ and f ∞,γ Are Inferred Require: Γ = vector of order sizes s = vector of smoothness degrees w∗ = max k wk n0 = minimum number of wavenumbers in each coordinate C = inﬂation factor f = a black-box Fourier coeﬃcient generator for the function of interest, f, where f ∞,γ C fj j∈J ∞,γ , J := {(0, . . . , 0, j, 0 . . . , 0) : j = 0, . . . , n0 } for all γ ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Evaluate f(j) for j ∈ J 2: Deﬁne w = min argmin w w∗ fj j∈J ∞,γ 3: Let n = min n : ∞ i=n +1 γji ε C fj j∈J ∞,γ 4: Compute fapp = n i=1 f(ji )φji Computational cost is n = O ε−1C f ∞,γ γ 1/q 1/(q−1) 15/20
16. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References A Gap Between Theory and Practice Theory using Fourier coeﬃcients Photo Credit: Xinhua Practice using function values 16/20
17. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References A Very Sparse Grid on [−1, 1]d j 0 1 2 3 4 · · · van der Corput tj 0 1/2 1/4 3/4 1/8 · · · ψ(tj) := 2(tj + 1/3 mod 1) − 1 −1/3 2/3 1/6 −5/6 −1/12 · · · ψ(tj) := − cos(π(tj + 1/3 mod 1)) −0.5 0.8660 0.2588 −0.9659 −0.1305 · · · To estimate f(j), j ∈ J, use the design {(ψ(tj1 ), . . . , ψ(tjd ) : j ∈ J}. E.g., for J = {(0, 0, 0, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (2, 0, 0, 0), (3, 0, 0, 0), (1, 1, 0, 0)} Even Points ArcCos Points · · 17/20
18. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Algorithm Using Function Values When Both γ and f ∞,γ Are Inferred Require: Γ = vector of order sizes s = vector of smoothness degrees w∗ = max k wk n0 = minimum number of wavenumbers in each coordinate C = inﬂation factor f = a black-box function value generator ε = positive absolute error tolerance Ensure: f − fapp ∞ ε 1: Approximate f(j) for j ∈ J := {(0, . . . , 0, j, 0 . . . , 0) : j = 1, . . . , n0 } by interpolating the function data {(x, f(x)) : x = ψ(tj1 ), . . . , ψ(tjd ), j ∈ J} 2: Deﬁne w = min argmin w w∗ fj j∈J ∞,γ 3: while C fj j∈J ∞,γ j/ ∈J γji > ε do 4: Add argmin j/ ∈J γj to J 5: Approximate f(j) for j ∈ J by interpolating the function data {(x, f(x)) : x = ψ(tj1 ), . . . , ψ(tjd ), j ∈ J} 6: end while 7: Compute fapp = j∈J f(j)φj 18/20
19. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References What Needs Attention Bridging the theory/practice gap Try some examples Bookkeeping on next largest γj If f(j)/γj is observed to be too large, may need to increase wk for some k May want to infer Γ or s 19/20
20. ### Thank you These slides are under continuous development and are

available at speakerdeck.com/fjhickernell/samsi-qmc-wg-5-3-research-problem-1
21. ### Background Approx. by Fourier Known γ Inferred γ Approx. by

Function Values References Novak, E. & Woźniakowski, H. Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in Mathematics 6 (European Mathematical Society, Zürich, 2008). Kühn, T., Sickel, W. & Ullrich, T. Approximation numbers of Sobolev embeddings—Sharp constants and tractability. J. Complexity 30, 95–116 (2014). Wu, C. F. J. & Hamada, M. Experiments: Planning, Analysis, and Parameter Design Optimization. (John Wiley & Sons, Inc., New York, 2000). 20/20