# Bayesian Optimization in High Dimensions via Random Embeddings

March 31, 2021

## Transcript

1. ### Bayesian Optimization in High Dimensions via Random Embeddings Wang, et

al., IJCAI, 2013 Kazuaki TAKEHARA Twitter: @_zak3 2021/03
2. ### Abstract - Bayesian optimization is restricted to problems of moderate

dimension. - To attack this problem, REMBO, Random EMbedding Bayesian Optimization, algorithm is introduced. - The experiments demonstrate that REMBO can eﬀectively solve high-dimensional problems. 2
3. ### Introduction - A function f : X -> R, on

a compact set X ⊆ RD . - We treat f as a blackbox which has no closed-form, no derivatives, non-convex etc. - That only allow us to query its function value at ∀x ∈ X. - We address a global optimization problem : x* = argmax x∈X f(x) . - To ensure that a global optimum is found, we require good coverage of X. - As the dimensionality increases, the number of evaluations needed to cover X increases exponentially. 3

5. ### Bayesian Optimization - Bayesian optimization has two ingredients, the prior

and the acquisition function. - Prior : GP priors are used to construct acquisition functions. - Acquisition function : for example UCB, etc. 5
6. ### REMBO Deﬁnition 1: Eﬀective dimensionality - f : RD ->

R - d e < D : eﬀective dimensionality - T ⊂ RD : a linear subspace, dim(T) = d e - T⊥ ⊂ RD : orthogonal complement of T - If we have f(x) = f(xT + x⊥) = f(xT), xT ∈ T , x⊥ ∈ T⊥ then we call T the eﬀective subspace and T⊥ constant space. 6
7. ### REMBO Theorem 2 - f : RD -> R with

d ≥ d e , a random matrix A = (a ij ) ∈ RD*d , a ij ~ N(0, 1). - ∀x ∈ RD, ∃y ∈ Rd s.t. f(x) = f(Ay) . 7
8. ### REMBO Theorem 3 - f(x*) = f(Ay*) - || y*

|| 2 ≤ (√d e / ε) || x T * || 2 with probability at least 1 - ε REMBO algorithm 8 The box constraints : X = [-1, 1]D , always available through rescaling. In all experiments in this paper, y ∈ Y = [-√d, √d]d . Is this part time consuming ?
9. ### Choice of Kernel - Squared exponential kernel on Y ⊆

Rd - Kernel on X - For continuous variables - For categorical variables 9 s(.) maps a continuous vector to a discrete vector.
10. ### Experiments Bayesian Optimization in a Billion Dimensions - The function

whose optimum we seek is f(x 1:D ) = g(x i , x j ) . - d e = 2 - i and j are selected once using a random permutation. 10