Upgrade to Pro — share decks privately, control downloads, hide ads and more …

不確実性下における目的と手段の統合的探索に向けた連続腕バンディットの応用 / iot70_gp...

不確実性下における目的と手段の統合的探索に向けた連続腕バンディットの応用 / iot70_gp_rff_mab

2025.07.28 第70回インターネットと運用技術研究発表会(IOT70)
https://www.iot.ipsj.or.jp/meeting/70-program/

Avatar for monochromegane

monochromegane

July 28, 2025
Tweet

More Decks by monochromegane

Other Decks in Research

Transcript

  1. ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc. 2025.07.28 ୈ70ճΠϯλʔωοτͱӡ༻ٕज़ݚڀൃදձʢIOT70ʣ

    ෆ࣮֬ੑԼʹ͓͚Δ ໨తͱखஈͷ౷߹త୳ࡧʹ޲͚ͨ ࿈ଓ࿹όϯσΟοτͷԠ༻
  2.  15 ໨తͱखஈͷ౷߹త୳ࡧ ໨తू߹ खஈू߹ Encoder Encoder ౷߹ۭؒ ༧ଌͷ෼෍ ༧ଌͷෆ࣮֬ੑ

    બ୒ ใु ༧ଌͱෆ࣮֬ੑɺ ػձଛࣦΛ ߟྀͨ͠ީิͷબఆ ʢ࿈ଓ࿹όϯσΟοτʣ
  3. • Ψ΢εաఔʢGaussian Process, GPʣ͸ɺσʔλ͔Βɺ͋Δؔ਺ͷ෼෍Λ֬཰ աఔͱͯ͠ٻΊΔ • ଟ਺ͷجఈؔ਺  ʹΑΔߴ͍දݱྗ •

    Χʔωϧ๏ʹΑͬͯجఈؔ਺ͷ໌͕ࣔෆཁ ϕ: ℝD → ℝ  16 Ψ΢εաఔϞσϧͱ࿈ଓ࿹όϯσΟοτ k(p, q) ≜ (p 𝖳 q + c)m, m = 2, p, q ∈ ℝ2 ͜ͷଟ߲ࣜΧʔωϧؔ਺͸࿡ͭͷجఈؔ਺ʹΑΔม׵ͱ಺ੵΛऔͬͨ݁ՌͱҰக
  4. • Ψ΢εաఔʢGaussian Process, GPʣ͸ɺσʔλ͔Βɺ͋Δؔ਺ͷ෼෍Λ֬཰ աఔͱͯ͠ٻΊΔ • ଟ਺ͷجఈؔ਺  ʹΑΔߴ͍දݱྗ •

    Χʔωϧ๏ʹΑͬͯجఈؔ਺ͷ໌͕ࣔෆཁ • ඇઢܗੑͱෆ࣮֬ੑΛѻ͑ΔͨΊଟ࿹όϯσΟοτ໰୊΁ͷ਌࿨ੑ͕ߴ͍ • → ΧʔωϧߦྻͱͦͷٯߦྻͷܭࢉΛؚΉͨΊɺ ɹ ֶशσʔλ਺ʹରֶͯ͠श͕࣌ؒࢦ਺ؔ਺తʹ૿Ճ ϕ: ℝD → ℝ  17 Ψ΢εաఔϞσϧͱ࿈ଓ࿹όϯσΟοτ  K−1  K ∈ ℝN×N  K−1 →  k(xi , xj )
  5. • ࿈ଓ࿹ バ ϯ デ Οοτʹ͓͚ΔThompson Sampling で ͸ɺީิ఺ू߹ 

    ʹରͯ͠GPͷ༧ଌ෼෍͔ΒٻΊͨؔ਺஋ͷ͏ͪɺ࠷େ஋Λ༩͑ Δ఺  Λ࣍ͷબ୒࿹ͱ͢Δ • → ߴ࣍ݩۭؒʹ͓͍ͯ͸ɺਫ਼౓ͷ֬อͷͨΊʹީิ఺ू߹਺  Λଟ͘औΔ ɹ ඞཁ͕͋Γɺܭࢉෛՙ͕૿େ͢Δ {x(1) * , …, x(M) * } ˜ x = arg max x(m) * ˜ f(x(m) * ) M  18 Ψ΢εաఔϞσϧͱ࿈ଓ࿹όϯσΟοτ
  6. • ཚ୒ԽϑʔϦΤಛ௃ʢRandom Fourier Features, RFFʣ[5] ͸ɺ͋Δ֬཰෼෍ ͔Βͷ  ݸͷαϯϓϧΛ༻͍ͯΧʔωϧؔ਺ 

    Λ  ͱۙࣅ͢Δख๏ • ͳ͓ɺ  • ֬཰෼෍  ͸Χʔωϧؔ਺ͷछྨʹΑܾͬͯ·Δ p(ω) R′  = R/2 k(xi , xj ) ̂ k(xi , xj ) = z(xi )⊤z(xj ) z(xi ) = 1/R′  (cos(ω⊤ 1 xi ), sin(ω⊤ 1 xi ), …, cos(ω⊤ R′  xi ), sin(ω⊤ R′  xi )) p(ω)  21 ରࡦᶃɿΧʔωϧߦྻͷܭࢉෛՙ΁ͷରॲ [5] Miguel L ́azaro-Gredilla, Joaquin Quinonero-Candela, Carl Edward Rasmussen, and An ́ıbal R Figueiras-Vidal. Sparse spectrum gaussian process regression. The Journal of Machine Learning Research, Vol. 11, pp. 1865– 1881, 2010.  K  K ∈ ℝN×N  k(xi , xj )  K  ZZ⊤  ≃  K ∈ ℝN×N  Z ∈ ℝN×R  Z⊤Z  Z⊤Z ∈ ℝR×R  ⋙ k(xi , xj ) ≃ z(xi )⊤z(xj ) ݸผͷΧʔωϧؔ਺ʹରͯ͠͸ܭࢉίετ͕ ૿Ճ͢Δ͕ɺجఈؔ਺ͷద༻ͱ಺ੵͷࠞ߹ૢ ࡞ͷ݁ՌΛ෼ղͨ͠ͱݟΔ͜ͱ͕Ͱ͖Δ → ύϥϝʔλͷ࣍ݩ਺Λ  ࣍ݩʹݻఆͰ͖Δ R ϕ(x) = (ϕ1 (x), …, ϕ∞ (x))⊤ ∈ ℝ∞ ϕ(xi )⊤ϕ(xj ) = k(xi , xj ) ≈ z(xi )⊤z(xj )
  7.  22 ରࡦᶃɿΧʔωϧߦྻͷܭࢉෛՙ΁ͷରॲ  K′   K′  

    K′  … z z z … ⋮ x1 x2 xN x1 x1 xN−1 GP learning GP with RFF learning  K′  −1 Inv  K′  −1 Inv  K′  −1 Inv  Z⊤Z + Λ  Z⊤Z + Λ  Z⊤Z + Λ … z z z … ⋮ x1 x2 xN x1 x1 xN−1  (Z⊤Z + Λ)−1 Inv  (Z⊤Z + Λ)−1 Inv  (Z⊤Z + Λ)−1 Inv  K′  ∈ ℝN×N  Z⊤Z ∈ ℝR×R
  8. • ީิ఺ू߹ͷߏஙΛհ͞ ず ʹؔ਺αϯ プ ϧΛධՁ͢Δख๏ [10] • ରࡦᶃͷ݁ՌɺGPͷؔ਺αϯϓϧ͸ 

    ͱͯۙ͠ࣅ͞ΕΔ •  Ͱ͋Γɺύϥϝʔλ  ͸ީิ఺ू߹ʹґଘ͠ͳ͍ • ޯ഑  ͕ղੳతʹܭࢉͰ͖Δ • → ࿈ଓ্ۭؒͰͷ࠷దԽʹΑΓީิ఺ू߹Λ ɹ࢖Θͣ࣍ͷબ୒఺ΛܾఆՄೳ ɹ ʢ࠷దԽ͋ͨΓͷ  ͷαϯϓϦϯά͸Ұ౓ͷΈʣ ˜ f(x) = z(x)⊤ ˜ w ˜ w ∼ 𝒩 (μw , Σw ) μw , Σw ∇x ˜ f(x) ˜ w  23 ରࡦᶄɿީิ఺ू߹ߏஙͷճආ [10] Sattar Vakili, Henry Moss, Artem Artemev, Vincent Dutordoir, and Victor Picheny. Scalable thompson sampling using sparse gaussian process models. Advances in neural information processing systems, Vol. 34, pp. 5631– 5643, 2021.
  9. • εέʔϧ パ ϥϝʔλ  ʹରͯ͠ର਺ਖ਼ن෼෍Λࣄલ෼෍ͱͯ͠Ծఆ͠ɺ࣍ݩ਺ ͷ૿Ճʹ൐͏൚Խੑೳͷ௿ԼΛ཈੍͢Δਖ਼ଇԽख๏ [11] σ2 k

    D  24 ରࡦᶅɿ༧ଌੑೳͱ൚Խੑೳͷҡ࣋ [11] Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla bayesian optimization performs great in high dimensions. arXiv preprint arXiv:2402.02229, 2024.  log p(σ2 k ) = − log σ2 k − 1 2σ2 k0 (log σ2 k − μk0 − 1 2 log D) 2 + DPOTU  ͕ա৒ʹେ͖ͳ஋ΛऔΔ͜ͱΛ཈੍ → ϞσϧͷաֶशΛ๷͙໾ׂ σ2 k  ͕ɺ  ʹۙͮ͘Α͏ଅ͢ޮՌ → ࣍ݩ਺  ʹԠͨ͡ద੾ͳ  ͷબ୒Λ༠ಋ log σ2 k μk0 + 1 2 log D D σ2 k  ͷࣄલ෼෍ͷର਺໬౓ σ2 k  ͝ͱͷෛͷର਺໬౓ͷ  ͷ࠷খ஋ D log σ2 k
  10. • Ψ΢εաఔϞσϧͰ͸ɺϋΠύʔύϥϝʔλͷਪఆ͕ੑೳʹେ͖͘Өڹ͢Δ • ͜ͷਪఆ͸ର਺໬౓ͷ࠷దԽͱͯ͠ߦΘΕɺͦͷ܁Γฦ͠ܭࢉʹ͓͍ͯ໬౓ͱ ޯ഑ͷߴ଎ͳධՁ͕ٻΊΒΕΔ • ಛʹɺஞ࣍తͳߋ৽͕ٻΊΒΕΔόϯσΟοτ໰୊ͷઃఆͰ͸ॏཁ • ैདྷख๏Ͱ͸ 

    Χʔωϧߦྻͷٯߦྻ΍ߦྻࣜͷܭࢉ͕ϘτϧωοΫ • ຊใࠂͰ͸ɺ໬౓ͱޯ഑ܭࢉ΋RFFϕʔεͰՄೳͱͳΔߏ଄Λಋೖ͠ɺϋΠ ύʔύϥϝʔλͷߴ଎ͳ࠷దԽΛ࣮ݱ N × N  25 ରࡦᶆɿϋΠύʔύϥϝʔλਪఆͷߴ଎Խ
  11. • ؍ଌσʔλ਺  ʹґଘ͢Δେن໛ͳΧʔωϧߦྻ  ͷٯߦྻܭࢉ΍ߦ ྻࣜܭࢉΛճආ • → 

    ʹґଘͨ͠খ͞ͳߦྻͷܭࢉͰ໬౓Λޮ཰తʹධՁ N K ∈ ℝN×N R  26 ରࡦᶆɿRFFΛ׆༻ͨ͠ର਺໬౓ͷಋग़ log ℒ(θ ∣ y) ∝ − log|K| − y⊤K−1y + 2 log p(σ2 k ) = − N log σ2 ε − log|B| − 1 σ2 ε ( ∥y∥2 − 1 σ2 ε ∥Z⊤y∥2 A−1) − 2 log σ2 k − 1 σ2 k0 (log σ2 k − μk0 − 1 2 log D) 2 A = 1 σ2 w I + 1 σ2 ε Z⊤Z B = σ2 w A = I + σ2 w σ2 ε Z⊤Z
  12. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  27 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k )  ʹґଘ K ∈ ℝN×N ʜ
  13. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  28 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ʢ  ʣ σ2 ε ∂ ∂σ2 ε log ℒ(θ ∣ y) = − ( N σ2 ε − σ2 w σ2 ε Tr(B−1Z⊤Z) ) + 1 σ4 ε ( y 2 − 2σ2 w y⊤ZB−1Z⊤y + σ4 w ZB−1Z⊤y 2 ) ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k ) • RFFͷߏ଄Λద༻ͨ͠ϊΠζ෼ࢄ  ͷޯ഑ σ2 ε  ʹґଘ K ∈ ℝN×N ʜ ∂K ∂σ2 ε = I • →  ʹґଘͨ͠খ͞ͳߦྻͷܭࢉͰޯ഑Λޮ཰తʹධՁ R
  14. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  29 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ʢ  ʣ σ2 w ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k ) • RFFͷߏ଄Λద༻ͨ͠ॏΈ෼ࢄ  ͷޯ഑ σ2 w  ʹґଘ K ∈ ℝN×N ʜ ∂K ∂σ2 w = K • →  ʹґଘͨ͠খ͞ͳߦྻͷܭࢉͰޯ഑Λޮ཰తʹධՁ R ∂ ∂σ2 w log ℒ(θ ∣ y) = − Tr ( 1 σ2 ε Z⊤Z − σ2 w σ2 ε Z⊤Z ⋅ B−1 ⋅ Z⊤Z ) + 1 σ4 ε ((Z⊤y)⊤Z⊤y − 2σ2 w (Z⊤y)⊤Z⊤ZB−1Z⊤y + σ4 w Z⊤ZB−1Z⊤y 2 )
  15. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  30 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ʢ  ʣ σ2 k ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k )  ʹґଘ σ2 k ʜ k(x, x′  ) ≈ z(x)⊤z(x′  ), z(x) = 2 R cos(Ω⊤x + b) Ω:,r ∼ 𝒩 (0, σ−2 k I) ࠷దԽର৅ͷ  ͷมߋ͝ͱʹ ࠶αϯϓϦϯά͕ඞཁ = ࠷దԽ݁Ռ͕҆ఆ͠ͳ͍ σ2 k
  16. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  31 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ʢ  ʣ σ2 k ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k )  ʹґଘ͠ͳ͍ σ2 k ʜ k(x, x′  ) ≈ z(x)⊤z(x′  ), z(x) = 2 R cos ( 1 σk ˜ Ωx + b ) ˜ Ω ∼ 𝒩 (0, I) ʢ࠶ύϥϝʔλԽʣ
  17. • ֤ϋΠύʔύϥϝʔλ  ʹର͢Δର਺໬౓ͷޯ഑ θi ∈ θ = (σ2 k

    , σ2 w , σ2 ϵ )  32 ରࡦᶆɿRFFΛ׆༻ͨ͠ޯ഑ͷಋग़ʢ  ʣ σ2 k ∂ ∂θi log ℒ(θ ∣ y) ∝ − tr ( K−1 ∂K ∂θi ) + y⊤K−1 ∂K ∂θi K−1y + ∂ ∂σ2 k log p(σ2 k ) ∂ ∂σ2 k log ℒ(θ ∣ y) = − 2σ2 w 1 σ2 ε Tr ( Z⊤ ∂Z ∂σ2 k ) − σ2 w σ2 ε Tr ( Z⊤Z ⋅ B−1 ⋅ Z⊤ ∂Z ∂σ2 k ) + 2σ2 w (Z⊤α)⊤ ( ∂Z ∂σ2 k ) ⊤ α − 2 σ2 k ( 1 + 1 σ2 k0 (log σ2 k − μk0 − 1 2 log D)) • RFFͷߏ଄Λద༻ͨ͠εέʔϧύϥϝʔλ  ͷޯ഑ σ2 k α = 1 σ2 ε y − σ2 w σ2 ε ZB−1Z⊤y ∂Z ∂σ2 k = 2 R ⋅ sin ( 1 σk X˜ Ω⊤ + b ) ⊙ ( X˜ Ω⊤ 2σ3 k ) • →  ʹґଘͨ͠খ͞ͳߦྻͷܭࢉͰޯ഑Λޮ཰తʹධՁ R ਖ਼ଇԽ߲ ʜ
  18. • ఏҊख๏ͷ༗ޮੑΛ֬ೝ͢ΔͨΊɺγϛϡϨʔγϣϯ࣮ݧΛ࣮ࢪ • ߴ࣍ݩੑͷӨڹ͕ݱΕΔঢ়گͱͯ͠ɺ32࣍ݩͷ୳ࡧۭؒΛઃఆ • ໨తؔ਺ͱͯ͠shifted sphereؔ਺Λ࢖༻ • ֤࣍ݩͷਅͷ࠷ద஋ͱͷڑ཭ͷೋ৐࿨ΛͱΔߏ଄ •

    ࿈ଓ࿹όϯσΟοτͷ࿮૊Έʹ͓͚Δ࠷େใुͷ ୳ࡧ໰୊Λɺؔ਺࠷খԽͱͯ͠ѻ͏  34 ධՁ໨తͱ໰୊ઃఆ fr (x) = D ∑ i=1 (xi − ri )2 XIFSF x = (x1 , x2 , …, xD )⊤ ∈ [−3.0,3.0]D, r = (r1 , r2 , …, rD ), ri ∼ 𝒰 (−3.0,3.0)
  19. • ᶃ ྦྷੵϦάϨοτɿ֤࣌఺Ͱ͜Ε·Ͱબ͹Εͨީิͷ͏ͪ࠷ྑͷ΋ͷͱɺਅ ͷ࠷దղͱͷࠩͷྦྷੵ • ᶄ ਪ࿦࣌ؒɿ࣍ީิͷબఆʹཁͨ͠ܭࢉ࣌ؒ • ֤ख๏ʹ͍ͭͯ200ճͷީิબఆΛ࣮ࢪ͠ɺੑೳΛධՁ •

    ॳظঢ়ଶͱͯ͠ɺϥϯμϜʹબ͹Εͨ1600఺෼ͷ؍ଌσʔλΛࣄલ෇༩ • ҟͳΔཚ਺γʔυͰ10ճͷγϛϡϨʔγϣϯΛ࣮ࢪɺ݁Ռ͸ͦͷฏۉΛ࢖༻  35 ධՁࢦඪ
  20.  36 ൺֱํࡦ RFF ީิ఺ͷ࠷ద Խ ਖ਼ଇԽ ϋΠύʔύϥϝ ʔλߴ଎ਪఆ ํࡦ

    ✓ ✓ ✓ ✓ GP-RFF (Proposal) ✓ ✓ ✓ GP-RFF-No-Prior ✓ ✓ ✓ GP-RFF-Naive-Gradient ✓ GP - - - - TPE - - - - Random • γϛϡϨʔγϣϯʹ༻͍Δํࡦ
  21. • ࠷΋ྦྷੵϦάϨοτ͕খ͔ͬͨ͞ͷ͸ఏҊख๏ GP-RFFʢྦྷੵ஋ɿ4384.9ʣ • → ਅͷ࠷ద஋ 0 ʹର͠ɺ࠷ྑ఺ͷؔ਺஋͸ 18.3 ·Ͱ઀ۙ

    • → ఏҊख๏͕ߴ࣍ݩۭؒͰ΋༗ޮͳީิબఆ͕ՄೳͰ͋Δ͜ͱΛ֬ೝ • ಋೖ֤ͨ͠ཁૉٕज़͕༗ޮʹػೳ  37 ݁ՌᶃɿϦάϨοτൺֱ
  22. • GP-RFF-Naive-Gradientʢߴ଎Խͳ͠ʣ͸GP-RFFͱੑೳʹࠩ͸ݟΒΕͣ • → ໬౓͓Α び ޯ഑ͷཧ࿦త஋͸ಉҰͰ͋ΔͨΊ • GP-RFF-No-Priorʢਖ਼ଇԽͳ͠ʣ͸ੑೳ͕௿Լ •

    → ਖ਼ଇԽʹΑΔ୳ࡧੑೳͷ޲্͕ࣔࠦ͞Εͨ • ඪ४తͳGPͷੑೳ͸ݦஶʹ௿Լ • → ߴ࣍ݩۭؒʹ͓͚Δީิ఺ू߹ґଘͷਫ਼౓ݶք  38 ݁ՌᶃɿϦάϨοτൺֱ
  23. • ͨͩ͠ɺਖ਼ଇԽͷޮՌ͸ɺࣄલ෼෍ύϥϝʔλʢ  ʣͷઃఆʹڧ͘ґଘ •  → ࠓճͷ݁Ռ • 

    → ਖ਼ଇԽͷޮՌ͕΄΅ݟΒΕͳ͍ •  → ॳظ୳ࡧ͕཈੍͞Ε͗ͯ͢ੑೳ͕ѱԽ • → ϋΠύʔύϥϝʔλͷઃఆ͸ੑೳʹ ɹ େ͖͘Өڹ͢ΔͨΊɺ৻ॏͳબఆ͕ඞཁ σ2 k0 σ2 k0 = 0.005 σ2 k0 = 1.0 σ2 k0 = 0.001  40 ݁ՌᶄɿϋΠύʔύϥϝʔλਪҠ