Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Online Inverse Linear Optimization

Shinsaku Sakaue
March 06, 2025
110

Online Inverse Linear Optimization

Mar. 7, 2025, at Kyoto University.
https://www.ml.ist.i.kyoto-u.ac.jp/kiss

Shinsaku Sakaue

March 06, 2025
Tweet

Transcript

  1. Online Inverse Linear Optimization Mar. 7, 2025 @ KyotoU, KISS

    Shinsaku Sakaue (Univ. Tokyo, RIKEN AIP) Joint work with Taira Tsuchiya (Univ. Tokyo, RIKEN AIP), Han Bao (Kyoto Univ.), Taihei Oki (Hokkaido Univ.) Preprint: https://arxiv.org/abs/2501.14349
  2. Forward Optimization 2 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋

    Forward optimization: given 𝜃, find optimal solution 𝑥. https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour Seismic tomography 𝜃 = geological features of certain zones Seismic wave trajectories are modeled as shortest path problems • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Customer behavior 𝜃 = customer’s preference Purchase behavior is modeled as utility maximization
  3. Inverse Optimization 3 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋

    Inverse optimization: estimate 𝜃 from optimal solution 𝑥. Forward optimization: given 𝜃, find optimal solution 𝑥. • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Seismic tomography Estimate geological features 𝜃 from observed waves 𝑥 Estimate customer’s preference 𝜃 from purchase behavior 𝑥 Customer behavior https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour
  4. Linear Optimization (forward model in this talk) 4 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  5. Linear Optimization (forward model in this talk) 5 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  6. Linear Optimization (forward model in this talk) 6 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  7. Inverse Linear Optimization 7 For 𝑡 = 1, … ,

    𝑇, an agent solves 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent 🤖 Learner maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 A learner aims to infer 𝑐∗ from 𝑋%, 𝑥% %&' ( .
  8. Online Inverse Linear Optimization 8 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Bärmann et al. 2017
  9. Online Inverse Linear Optimization 9 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Bärmann et al. 2017
  10. Online Inverse Linear Optimization 10 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Bärmann et al. 2017
  11. Online Inverse Linear Optimization 11 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017
  12. Online Inverse Linear Optimization 12 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017
  13. Online Inverse Linear Optimization 13 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Optimal objective value Objective value achieved by following learner’s prediction ̂ 𝑐! Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . • 𝑅( -∗ measures the quality of actions suggested by predictions. • 𝑅( -∗ is non-negative; 𝑅( -∗ = 0 if 𝑐∗ = ̂ 𝑐% for all 𝑡; smaller is better. Bärmann et al. 2017
  14. Convenient Upper Bound on Regret 14 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% .
  15. Convenient Upper Bound on Regret 15 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐!
  16. Convenient Upper Bound on Regret 16 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% .
  17. Convenient Upper Bound on Regret 17 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% . Optimal value for ̂ 𝑐! Objective value achieved by 𝑥! for ̂ 𝑐! • Takes zero if 𝑐∗ = ̂ 𝑐% ; quantifies how well ̂ 𝑐% explains the agent’s choice 𝑥% . • Called the suboptimality loss in inverse optimization (Mohajerin Esfahani et al. 2018). • This alone is sometimes meaningless: ̂ 𝑐% = 0 trivially attains the zero suboptimality loss.
  18. Related Work: Online Learning Approach 18 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Bärmann et al. 2017
  19. Related Work: Online Learning Approach 19 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). The standard regret in online learning. Bärmann et al. 2017
  20. Related Work: Online Learning Approach 20 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017
  21. Related Work: Online Learning Approach 21 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. The rate of 𝑇 is optimal in general OLO. Is it also optimal in online inverse linear optimization? Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017
  22. Related Work: Ellipsoid Based Method 22 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023
  23. Related Work: Ellipsoid Based Method 23 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! .
  24. Related Work: Ellipsoid Based Method 24 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  25. Related Work: Ellipsoid Based Method 25 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . But, there are downsides… • 𝑛. factor is prohibitive for large 𝑛. • Not very efficient, albeit polynomial in 𝑛 and 𝑇. Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  26. Our Results 26 Theorem There is a method achieving 𝑅(

    -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  27. Our Results 27 Theorem There is a method achieving 𝑅(

    -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). And more: • Dealing with suboptimal feedback 𝑥% with MetaGrad (ONS with multiple learning rates). • Lower bound of 𝑅( -∗ = Ω(𝑛), implying the tightness regarding 𝑛. • 𝑅( -∗ = 𝑂(1) for 𝑛 = 2 based on the method of Besbes et al. (2023). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  28. Online Convex Optimization 29 For 𝑡 = 1, … ,

    𝑇: 🤖 🌏 Learner Environment
  29. Online Convex Optimization 30 For 𝑡 = 1, … ,

    𝑇: Play ̂ 𝑐% 🤖 🌏 Learner Environment
  30. Online Convex Optimization 31 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐%
  31. Online Convex Optimization 32 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,' 𝑓% may be reactive to ̂ 𝑐%
  32. Online Convex Optimization 33 • Learner’s domain Θ is convex,

    and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. For 𝑡 = 1, … , 𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'
  33. Online Convex Optimization 34 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% For any comparator 𝑐∗ ∈ Θ, the learner aims to make the regret as small as possible: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) . • Learner’s domain Θ is convex, and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'
  34. Exp-Concave Loss 35 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-).
  35. Exp-Concave Loss 36 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼.
  36. Exp-Concave Loss 37 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼. Examples • 𝑓 𝑐 = −log(𝑟6𝑐) (appears in portfolio theory) satisfies ∇5𝑓 𝑐 = 77$ 7$- % = ∇𝑓 𝑐 ∇𝑓 𝑐 6. • 𝑓 𝑐 = 𝑟6𝑐 5 for 𝑟 , 𝑐 ≤ 1 (used later) satisfies ∇5𝑓 𝑐 = 2𝑟𝑟6 = ∇2 - ∇2 - $ 5 7 % - % ≽ ' 5 ∇𝑓 𝑐 ∇𝑓 𝑐 6.
  37. ONS Regret Bound 38 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇𝑓% ̂ 𝑐% ∇𝑓% ̂ 𝑐% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇𝑓% ̂ 𝑐% − 𝑐 :" 𝑐 ∈ Θ Assume: • 𝑓', … , 𝑓( are twice differentiable and 𝛼-exp-concave, • ∇𝑓% ̂ 𝑐% ≤ 𝐺 for all 𝑡 and 𝑐 ∈ Θ. Let 𝛾 = ' 5 min 1/𝐺, 𝛼 and 𝜀 = 1/𝛾5. Then, ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by ONS satisfy For PSD 𝐴, 𝑥 : ≔ 𝑥6𝐴𝑥. ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) = 𝑂 𝑛 ' 1 + 𝐺 log 𝑇 . A well-known result. We’ll get back to this later Hazan et al. 2007
  38. Our ONS-based Method 39 Set ̂ 𝑐' ∈ Θ For

    𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe (𝑋%, 𝑥%) Compute J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% Get ̂ 𝑐%,' via ONS applied to 𝑓% ; For 𝜂 ∈ (0,1) (specified later), define 𝑓% ;: Θ → ℝ by 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by the above method, it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). Upper bound: * 𝑅" #∗ = ∑!$% " ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! Regret: 𝑅" #∗ = ∑!$% " 𝑐∗, ' 𝑥! − 𝑥!
  39. Regret Analysis 40 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇)
  40. Regret Analysis 41 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. ≤ 1 ≤ 1
  41. Regret Analysis 42 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Hence 1 − 𝜂 O 𝑅( -∗ ≤ ' ; ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇), and setting 𝜂 = ' 5 completes the proof. Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% = − ' ; ∑% ( 𝑓% ; 𝑐∗ + 𝜂𝑉( -∗ ≤ ' ; ∑% ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ + 𝜂 O 𝑅( -∗ . 𝑓! ' ̂ 𝑐! = 0 = 𝑂(𝑛 log 𝑇) ≤ 1 ≤ 1
  42. Warm-up: Online Gradient Descent 44 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ
  43. Warm-up: Online Gradient Descent 45 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ .
  44. Warm-up: Online Gradient Descent 46 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5
  45. Warm-up: Online Gradient Descent 47 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1
  46. Warm-up: Online Gradient Descent 48 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 By the convexity of 𝑓% , ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ 𝐺𝑇. 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1
  47. Closer Look at the 𝑻 Rate 49 Let ∇%= ∇𝑓%(

    ̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇.
  48. Closer Look at the 𝑻 Rate 50 𝜂 = 1/

    𝐺𝑇 If the penalty and stability sum to 𝑂(1) and 𝑂(𝑇), respectively, the regret scales with 𝑇. Let ∇%= ∇𝑓%( ̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ stability ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. penalty = 𝟎 Consider achieving a better stability via the elliptical potential lemma. ignored
  49. Elliptical Potential Lemma 51 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for

    PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺
  50. Elliptical Potential Lemma 52 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for

    PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺 Proof ∇% :" (' 5 = ∇% 6𝐴% 0'∇%= 𝐴% 0' r ∇%∇% 6= 𝐴% 0' r 𝐴% − 𝐴%0' ≤ log =>? :" =>? :"(' . For 𝑎, 𝑏 > 0, 𝑎(% 𝑎 − 𝑏 ≤ log ) * . Carefully apply this to eigenvalues. Hence ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* . Using det 𝐴9 = 𝜀" and det 𝐴( ≤ 𝑇𝐺5 " yields the bound. A.k.a. the concavity of log-det for PSD matrices: ∇log det 𝐴 r 𝐴 − 𝐵 = 𝐴0' r 𝐴 − 𝐵 ≼ log det 𝐴 det 𝐵 = log det 𝐴 − log det 𝐵.
  51. Online Newton Step 53 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ
  52. Online Newton Step 54 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ .
  53. Online Newton Step 55 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5
  54. Online Newton Step 56 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem.
  55. Online Newton Step 57 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem. Take a closer look at the penalty.
  56. ONS: Penalty 58 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5
  57. ONS: Penalty 59 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored
  58. ONS: Penalty 60 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5
  59. ONS: Penalty 61 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 Therefore, setting 𝜀 = 1/𝛾5, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ BA 5 + B 5 ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 + " 5B log (@% A + 1 ⟹ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . ignored
  60. ONS: Using Exp-Concavity 62 If 𝑓% is 𝛼-exp-concave, for 𝛾

    ≤ ' 5 min {𝛼, 1/𝐺}, 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 . Therefore, setting 𝛾 = ' 5 min {𝛼, 1/𝐺}, we obtain the desired regret bound: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ 𝑛 ' 1 + 𝐺 log ( . + 1 + 1 . cf. if 𝑓! is convex, 𝑓! ̂ 𝑐! − 𝑓! 𝑐∗ ≤ ∇! , ̂ 𝑐! − 𝑐∗ . ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . + ,-. /,1 ≤ + / + + 1 (harmonic mean ≤ 𝑛 ×min)
  61. MetaGrad 63 Tim van Erven, Wouter M. Koolen, and Dirk

    van der Hoeven (NeurIPS 2016, JMLR 2021)
  62. ONS Requires Prior Knowledge of 𝜶 64 Set ̂ 𝑐'

    ∈ Θ. Fix 𝛾 = ' 5 min{𝛼, 1/𝐺}, 𝜀 = 1/𝛾5 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ We may not know 𝛼 in advance. Even worse, we may encounter 𝛼 = 0 at some round. ONS fails in such uncertain situations… Adapt to the uncertainty with multiple ONS!
  63. MetaGrad: High-Level Idea 65 • Loss functions 𝑓% are 𝛼-exp-concave,

    but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner
  64. MetaGrad: High-Level Idea 66 • Loss functions 𝑓% are 𝛼-exp-concave,

    but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner Theorem (Van Erven et al. 2016, 2021) MetaGrad simultaneously enjoys • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝑛 𝐺 + 1/𝛼 log 𝑇), • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝐺 𝑇 log log 𝑇).
  65. Notation 67 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  66. Notation 68 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  67. Notation 69 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  68. Notation 70 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ; 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  69. Notation 71 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ. 3. Send 𝑤% and 𝑔% 5. Aggregate 𝑤%,' ; to compute 𝑤%,' 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ;
  70. Regret Decomposition 72 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E.
  71. Regret Decomposition 73 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.
  72. Regret Decomposition 74 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E By using 𝑓% ; 𝑤% = 0, for all 𝜼 ∈ 𝓖 simultaneously, O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. Regret of learner against 𝑤! ' Regret of 𝜂-expert against 𝑢 Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.
  73. Bounding Each Component 75 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  74. Bounding Each Component 76 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  75. Bounding Each Component 77 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  76. Bounding Each Component 78 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) Therefore, for all 𝜂 ∈ 𝒢 simultaneously, O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E . 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  77. Infeasible Ideal Tuning 79 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 .
  78. Infeasible Ideal Tuning 80 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E.
  79. Infeasible Ideal Tuning 81 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼.
  80. Infeasible Ideal Tuning 82 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼. However, 𝑽𝑻 𝒖 is unknown… Use the fact that „ 𝑹𝑻 𝒖 = 𝑶 𝒏 𝐥𝐨𝐠 𝑻 𝜼 + 𝜼𝑽𝑻 𝒖 holds for all 𝜼 ∈ 𝓖!
  81. Exploiting Multiple Learning Rates 83 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).
  82. Exploiting Multiple Learning Rates 84 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).
  83. Exploiting Multiple Learning Rates 85 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢). If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. If 𝜂∗ ≃ " FGH ( I) 3 ≥ ' D@ , we have 𝑉( E ≾ 𝐺5𝑛 log 𝑇. Thus, for 𝜂 = 𝜂9 = ' D@ , O 𝑅( E ≃ " FGH ( ;* + 𝜂9𝑉( E ≾ 𝑛𝐺 log 𝑇. In any case, O 𝑅( E = 𝑂 𝑉( E𝑛 log 𝑇 + 𝑛𝐺 log 𝑇 , implying ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ O 𝑅( E ≾ 𝑛 ' 1 + 𝐺 log 𝑇.
  84. Learning with Suboptimal Actions 87 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes possibly suboptimal action 𝑥% ∈ 𝑋% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Define: • O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . • 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 . • Δ( ≔ ∑%&' ( max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% − 𝑐∗, 𝑥% (cumulative suboptimality).
  85. Learning with Suboptimal Actions 88 ü Sublinear in Δ( (cf.

    corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 .
  86. Learning with Suboptimal Actions 89 ü Sublinear in Δ( (cf.

    corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Proof sketch Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 . By the same discussion as the regret analysis of MetaGrad, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≲ 𝑂 𝑉( -∗ 𝑛 log 𝑇 + 𝑛 log 𝑇 . Also, 𝑉( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 ≤ O 𝑅( -∗ + 2Δ( holds (cf. 𝑉( -∗ ≤ O 𝑅( -∗ if every 𝑥% is optimal). The claim follows from the sub-additivity of 𝑥 ↦ 𝑥 and self-bounding.
  87. 𝛀(𝒏) Lower Bound 91 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 .
  88. 𝛀(𝒏) Lower Bound 92 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable.
  89. 𝛀(𝒏) Lower Bound 93 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable. Can the 𝐥𝐨𝐠 𝑻 in the upper bound removed?
  90. Revisiting Cone-Based Approach 94 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively.
  91. Revisiting Cone-Based Approach 95 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%).
  92. Revisiting Cone-Based Approach 96 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%). ̂ 𝑐! 𝑐∗ 𝑐∗, 𝑥% − J 𝑥% ≥ 0 and ̂ 𝑐%, 𝑥% − J 𝑥% ≤ 0 must hold by the assumption. 𝑥! − ' 𝑥! must lie here ̂ 𝑐!, 𝑥! − 4 𝑥! ≥ 0 𝜃 𝑐∗, ̂ 𝑐! ≤ 𝜋 2 𝑐∗, 𝑥! − 4 𝑥! ≥ 0 Therefore, cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ cos Q 5 − 𝜃 𝑐∗, ̂ 𝑐% = sin 𝜃(𝑐∗, ̂ 𝑐%).
  93. An 𝑶(𝟏)-Regret Algorithm for 𝒏 = 𝟐 97 Independent of

    𝑇 (but extending to 𝑛 > 2 seems challenging, as discussed later). Theorem Algorithm 1 achieves 𝔼 𝑅( -∗ = 2𝜋. 𝒩% ≔ 𝑐 ∈ 𝕊' 𝑐, 𝑥% − 𝑥 ≥ 0 ∀𝑥 ∈ 𝑋% is the normal cone of 𝑋% at 𝑥% . 𝒞% is the region such that 𝑐∗ ∈ 𝒞% does not contradict “𝑥R ∈ arg max )∈+4 𝑐∗, 𝑥 for 𝑠 = 1, … , 𝑡 − 1.”
  94. Proof 98 𝔼 𝑐∗, 𝑥% − J 𝑥% = Pr

    𝒞% ∖ int 𝒩% 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% Focus on round 𝑡. If ̂ 𝑐% ∈ int(𝒩%), J 𝑥% = 𝑥% and hence 𝑐∗, 𝑥% − J 𝑥% = 0. Taking expectation of drawing ̂ 𝑐% ∈ 𝒞% , = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% , where 𝐴(⋅) denotes the arc length (= central angle).
  95. Proof 99 Since 𝒞%,' ← 𝒞% ∩ 𝒩% , Hence

    𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ 𝐴(𝒞% ∖ int 𝒩% ) in any case. 𝔼 𝑅( -∗ = ∑%&' ( 𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ ∑%&' ( 𝐴 𝒞% ∖ int 𝒩% ≤ 2𝜋. If 𝐴 𝒞% ≥ Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% If 𝐴 𝒞% < Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% ≤ : 𝒞"∖UV? 𝒩 " : 𝒞" sin 𝜃 𝑐∗, ̂ 𝑐% ≤ :(𝒞"∖UV? 𝒩 " ) :(𝒞") sin 𝐴(𝒞%) ≤ 𝐴(𝒞% ∖ int 𝒩% ). ≤ 5 Q ⋅ 1 ⋅ 𝐴(𝒞% ∖ int 𝒩% ) ≤ 𝐴(𝒞% ∖ int 𝒩% ).
  96. Conclusion 100 • 𝑅( -∗ = 𝑂 𝑛 log 𝑇

    + 𝑛Δ(log 𝑇 by ONS. • O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 by MetaGrad for possibly suboptimal case. • 𝑅( -∗ = Ω 𝑛 . • 𝑅( -∗ = 𝑂 1 for 𝑛 = 2. Future work • Tight analysis for general 𝑛. – Difficulty: sin 𝜃 𝑐∗, ̂ 𝑐! ≤ sin 𝐴(𝒞! ) no longer holds. • Exploring other online-learning ideas useful for inverse optimization.