Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bayesian Optimization in High Dimensions via Random Embeddings

Bayesian Optimization in High Dimensions via Random Embeddings

80542834077f51ffe731277513db2c07?s=128

Kazuaki Takehara

March 31, 2021
Tweet

Transcript

  1. Bayesian Optimization in High Dimensions via Random Embeddings Wang, et

    al., IJCAI, 2013 Kazuaki TAKEHARA Twitter: @_zak3 2021/03
  2. Abstract - Bayesian optimization is restricted to problems of moderate

    dimension. - To attack this problem, REMBO, Random EMbedding Bayesian Optimization, algorithm is introduced. - The experiments demonstrate that REMBO can effectively solve high-dimensional problems. 2
  3. Introduction - A function f : X -> R, on

    a compact set X ⊆ RD . - We treat f as a blackbox which has no closed-form, no derivatives, non-convex etc. - That only allow us to query its function value at ∀x ∈ X. - We address a global optimization problem : x* = argmax x∈X f(x) . - To ensure that a global optimum is found, we require good coverage of X. - As the dimensionality increases, the number of evaluations needed to cover X increases exponentially. 3
  4. Introduction - Detecting "low effective dimensionality" is the key. 4

  5. Bayesian Optimization - Bayesian optimization has two ingredients, the prior

    and the acquisition function. - Prior : GP priors are used to construct acquisition functions. - Acquisition function : for example UCB, etc. 5
  6. REMBO Definition 1: Effective dimensionality - f : RD ->

    R - d e < D : effective dimensionality - T ⊂ RD : a linear subspace, dim(T) = d e - T⊥ ⊂ RD : orthogonal complement of T - If we have f(x) = f(xT + x⊥) = f(xT), xT ∈ T , x⊥ ∈ T⊥ then we call T the effective subspace and T⊥ constant space. 6
  7. REMBO Theorem 2 - f : RD -> R with

    d ≥ d e , a random matrix A = (a ij ) ∈ RD*d , a ij ~ N(0, 1). - ∀x ∈ RD, ∃y ∈ Rd s.t. f(x) = f(Ay) . 7
  8. REMBO Theorem 3 - f(x*) = f(Ay*) - || y*

    || 2 ≤ (√d e / ε) || x T * || 2 with probability at least 1 - ε REMBO algorithm 8 The box constraints : X = [-1, 1]D , always available through rescaling. In all experiments in this paper, y ∈ Y = [-√d, √d]d . Is this part time consuming ?
  9. Choice of Kernel - Squared exponential kernel on Y ⊆

    Rd - Kernel on X - For continuous variables - For categorical variables 9 s(.) maps a continuous vector to a discrete vector.
  10. Experiments Bayesian Optimization in a Billion Dimensions - The function

    whose optimum we seek is f(x 1:D ) = g(x i , x j ) . - d e = 2 - i and j are selected once using a random permutation. 10
  11. Experiments Compare REMBO to standard BO. 11