Online Inverse Linear Optimization

Online Inverse Linear Optimization Mar. 7, 2025 @ KyotoU, KISS
Shinsaku Sakaue (Univ. Tokyo, RIKEN AIP) Joint work with Taira Tsuchiya (Univ. Tokyo, RIKEN AIP), Han Bao (Kyoto Univ.), Taihei Oki (Hokkaido Univ.) Preprint: https://arxiv.org/abs/2501.14349

Forward Optimization 2 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋
Forward optimization: given 𝜃, find optimal solution 𝑥. https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour Seismic tomography 𝜃 = geological features of certain zones Seismic wave trajectories are modeled as shortest path problems • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Customer behavior 𝜃 = customer’s preference Purchase behavior is modeled as utility maximization

Inverse Optimization 3 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋
Inverse optimization: estimate 𝜃 from optimal solution 𝑥. Forward optimization: given 𝜃, find optimal solution 𝑥. • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Seismic tomography Estimate geological features 𝜃 from observed waves 𝑥 Estimate customer’s preference 𝜃 from purchase behavior 𝑥 Customer behavior https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour

Linear Optimization (forward model in this talk) 4 For 𝑡
= 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .

= 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .

= 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .

Inverse Linear Optimization 7 For 𝑡 = 1, … ,
𝑇, an agent solves 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent 🤖 Learner maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 A learner aims to infer 𝑐∗ from 𝑋%, 𝑥% %&' ( .

Online Inverse Linear Optimization 8 For 𝑡 = 1, …
, 𝑇: 👩🦰 Agent 🤖 Learner Bärmann et al. 2017

, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Bärmann et al. 2017

, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Bärmann et al. 2017

, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017

, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017

, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Optimal objective value Objective value achieved by following learner’s prediction ̂ 𝑐! Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . • 𝑅( -∗ measures the quality of actions suggested by predictions. • 𝑅( -∗ is non-negative; 𝑅( -∗ = 0 if 𝑐∗ = ̂ 𝑐% for all 𝑡; smaller is better. Bärmann et al. 2017

Convenient Upper Bound on Regret 14 Bärmann et al. 2017
For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% .

For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐!

For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% .

For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% . Optimal value for ̂ 𝑐! Objective value achieved by 𝑥! for ̂ 𝑐! • Takes zero if 𝑐∗ = ̂ 𝑐% ; quantifies how well ̂ 𝑐% explains the agent’s choice 𝑥% . • Called the suboptimality loss in inverse optimization (Mohajerin Esfahani et al. 2018). • This alone is sometimes meaningless: ̂ 𝑐% = 0 trivially attains the zero suboptimality loss.

Related Work: Online Learning Approach 18 Consider making the upper
bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Bärmann et al. 2017

bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). The standard regret in online learning. Bärmann et al. 2017

bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017

bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. The rate of 𝑇 is optimal in general OLO. Is it also optimal in online inverse linear optimization? Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017

Related Work: Ellipsoid Based Method 22 There is a method
achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023

achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! .

achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.

achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . But, there are downsides… • 𝑛. factor is prohibitive for large 𝑛. • Not very efficient, albeit polynomial in 𝑛 and 𝑇. Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.

Our Results 26 Theorem There is a method achieving 𝑅(
-∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.

Our Results 27 Theorem There is a method achieving 𝑅(
-∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). And more: • Dealing with suboptimal feedback 𝑥% with MetaGrad (ONS with multiple learning rates). • Lower bound of 𝑅( -∗ = Ω(𝑛), implying the tightness regarding 𝑛. • 𝑅( -∗ = 𝑂(1) for 𝑛 = 2 based on the method of Besbes et al. (2023). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.

𝑶 𝒏 𝐥𝐨𝐠 𝑻 via ONS-Based Method

Online Convex Optimization 29 For 𝑡 = 1, … ,
𝑇: 🤖 🌏 Learner Environment

𝑇: Play ̂ 𝑐% 🤖 🌏 Learner Environment

𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐%

𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,' 𝑓% may be reactive to ̂ 𝑐%

Online Convex Optimization 33 • Learner’s domain Θ is convex,
and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. For 𝑡 = 1, … , 𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'

𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% For any comparator 𝑐∗ ∈ Θ, the learner aims to make the regret as small as possible: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) . • Learner’s domain Θ is convex, and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'

Exp-Concave Loss 35 Function 𝑓: Θ → ℝ is 𝛼-exp-concave
for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-).

for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼.

for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼. Examples • 𝑓 𝑐 = −log(𝑟6𝑐) (appears in portfolio theory) satisfies ∇5𝑓 𝑐 = 77$ 7$- % = ∇𝑓 𝑐 ∇𝑓 𝑐 6. • 𝑓 𝑐 = 𝑟6𝑐 5 for 𝑟 , 𝑐 ≤ 1 (used later) satisfies ∇5𝑓 𝑐 = 2𝑟𝑟6 = ∇2 - ∇2 - $ 5 7 % - % ≽ ' 5 ∇𝑓 𝑐 ∇𝑓 𝑐 6.

ONS Regret Bound 38 Set ̂ 𝑐' ∈ Θ. Fix
𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇𝑓% ̂ 𝑐% ∇𝑓% ̂ 𝑐% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇𝑓% ̂ 𝑐% − 𝑐 :" 𝑐 ∈ Θ Assume: • 𝑓', … , 𝑓( are twice differentiable and 𝛼-exp-concave, • ∇𝑓% ̂ 𝑐% ≤ 𝐺 for all 𝑡 and 𝑐 ∈ Θ. Let 𝛾 = ' 5 min 1/𝐺, 𝛼 and 𝜀 = 1/𝛾5. Then, ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by ONS satisfy For PSD 𝐴, 𝑥 : ≔ 𝑥6𝐴𝑥. ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) = 𝑂 𝑛 ' 1 + 𝐺 log 𝑇 . A well-known result. We’ll get back to this later Hazan et al. 2007

Our ONS-based Method 39 Set ̂ 𝑐' ∈ Θ For
𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe (𝑋%, 𝑥%) Compute J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% Get ̂ 𝑐%,' via ONS applied to 𝑓% ; For 𝜂 ∈ (0,1) (specified later), define 𝑓% ;: Θ → ℝ by 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by the above method, it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). Upper bound: * 𝑅" #∗ = ∑!$% " ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! Regret: 𝑅" #∗ = ∑!$% " 𝑐∗, ' 𝑥! − 𝑥!

Regret Analysis 40 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%
− 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇)

− 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. ≤ 1 ≤ 1

− 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Hence 1 − 𝜂 O 𝑅( -∗ ≤ ' ; ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇), and setting 𝜂 = ' 5 completes the proof. Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% = − ' ; ∑% ( 𝑓% ; 𝑐∗ + 𝜂𝑉( -∗ ≤ ' ; ∑% ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ + 𝜂 O 𝑅( -∗ . 𝑓! ' ̂ 𝑐! = 0 = 𝑂(𝑛 log 𝑇) ≤ 1 ≤ 1

Online Newton Step Elad Hazan, Amit Agarwal, and Satyen Kale
(Mach. Learn. 2007)

Warm-up: Online Gradient Descent 44 Set ̂ 𝑐' ∈ Θ.
Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ

Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ .

Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5

Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1

Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 By the convexity of 𝑓% , ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ 𝐺𝑇. 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1

Closer Look at the 𝑻 Rate 49 Let ∇%= ∇𝑓%(
̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇.

Closer Look at the 𝑻 Rate 50 𝜂 = 1/
𝐺𝑇 If the penalty and stability sum to 𝑂(1) and 𝑂(𝑇), respectively, the regret scales with 𝑇. Let ∇%= ∇𝑓%( ̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ stability ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. penalty = 𝟎 Consider achieving a better stability via the elliptical potential lemma. ignored

Elliptical Potential Lemma 51 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for
PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺

Elliptical Potential Lemma 52 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for
PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺 Proof ∇% :" (' 5 = ∇% 6𝐴% 0'∇%= 𝐴% 0' r ∇%∇% 6= 𝐴% 0' r 𝐴% − 𝐴%0' ≤ log =>? :" =>? :"(' . For 𝑎, 𝑏 > 0, 𝑎(% 𝑎 − 𝑏 ≤ log ) * . Carefully apply this to eigenvalues. Hence ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* . Using det 𝐴9 = 𝜀" and det 𝐴( ≤ 𝑇𝐺5 " yields the bound. A.k.a. the concavity of log-det for PSD matrices: ∇log det 𝐴 r 𝐴 − 𝐵 = 𝐴0' r 𝐴 − 𝐵 ≼ log det 𝐴 det 𝐵 = log det 𝐴 − log det 𝐵.

Online Newton Step 53 Set ̂ 𝑐' ∈ Θ. Fix
𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ

𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ .

𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5

𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem.

𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem. Take a closer look at the penalty.

ONS: Penalty 58 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"
5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5

ONS: Penalty 59 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"
5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored

ONS: Penalty 60 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"
5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5

ONS: Penalty 61 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"
5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 Therefore, setting 𝜀 = 1/𝛾5, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ BA 5 + B 5 ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 + " 5B log (@% A + 1 ⟹ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . ignored

ONS: Using Exp-Concavity 62 If 𝑓% is 𝛼-exp-concave, for 𝛾
≤ ' 5 min {𝛼, 1/𝐺}, 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 . Therefore, setting 𝛾 = ' 5 min {𝛼, 1/𝐺}, we obtain the desired regret bound: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ 𝑛 ' 1 + 𝐺 log ( . + 1 + 1 . cf. if 𝑓! is convex, 𝑓! ̂ 𝑐! − 𝑓! 𝑐∗ ≤ ∇! , ̂ 𝑐! − 𝑐∗ . ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . + ,-. /,1 ≤ + / + + 1 (harmonic mean ≤ 𝑛 ×min)

MetaGrad 63 Tim van Erven, Wouter M. Koolen, and Dirk
van der Hoeven (NeurIPS 2016, JMLR 2021)

ONS Requires Prior Knowledge of 𝜶 64 Set ̂ 𝑐'
∈ Θ. Fix 𝛾 = ' 5 min{𝛼, 1/𝐺}, 𝜀 = 1/𝛾5 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ We may not know 𝛼 in advance. Even worse, we may encounter 𝛼 = 0 at some round. ONS fails in such uncertain situations… Adapt to the uncertainty with multiple ONS!

MetaGrad: High-Level Idea 65 • Loss functions 𝑓% are 𝛼-exp-concave,
but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner

MetaGrad: High-Level Idea 66 • Loss functions 𝑓% are 𝛼-exp-concave,
but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner Theorem (Van Erven et al. 2016, 2021) MetaGrad simultaneously enjoys • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝑛 𝐺 + 1/𝛼 log 𝑇), • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝐺 𝑇 log log 𝑇).

Notation 67 • For each 𝑡: learner’s loss 𝑓% ,
learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.

learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.

learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.

learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ; 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.

learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ. 3. Send 𝑤% and 𝑔% 5. Aggregate 𝑤%,' ; to compute 𝑤%,' 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ;

Regret Decomposition 72 Since every 𝑓% is convex, for any
comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E.

comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.

comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E By using 𝑓% ; 𝑤% = 0, for all 𝜼 ∈ 𝓖 simultaneously, O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. Regret of learner against 𝑤! ' Regret of 𝜂-expert against 𝑢 Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.

Bounding Each Component 75 For all 𝜂 ∈ 𝒢 simultaneously,
O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢

O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢

O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢

O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) Therefore, for all 𝜂 ∈ 𝒢 simultaneously, O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E . 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢

Infeasible Ideal Tuning 79 If 𝑉( E = ∑%&' (
𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 .

𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E.

𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼.

𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼. However, 𝑽𝑻 𝒖 is unknown… Use the fact that „ 𝑹𝑻 𝒖 = 𝑶 𝒏 𝐥𝐨𝐠 𝑻 𝜼 + 𝜼𝑽𝑻 𝒖 holds for all 𝜼 ∈ 𝓖!

Exploiting Multiple Learning Rates 83 Let 𝜂∗ ≃ " FGH
( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).

( I) 3 ≥ ' D@ ( be the unknown best learning rate. If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).

( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢). If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. If 𝜂∗ ≃ " FGH ( I) 3 ≥ ' D@ , we have 𝑉( E ≾ 𝐺5𝑛 log 𝑇. Thus, for 𝜂 = 𝜂9 = ' D@ , O 𝑅( E ≃ " FGH ( ;* + 𝜂9𝑉( E ≾ 𝑛𝐺 log 𝑇. In any case, O 𝑅( E = 𝑂 𝑉( E𝑛 log 𝑇 + 𝑛𝐺 log 𝑇 , implying ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ O 𝑅( E ≾ 𝑛 ' 1 + 𝐺 log 𝑇.

Online Inverse Linear Optimization with MetaGrad: Robustness to Suboptimality

Learning with Suboptimal Actions 87 For 𝑡 = 1, …
, 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes possibly suboptimal action 𝑥% ∈ 𝑋% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Define: • O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . • 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 . • Δ( ≔ ∑%&' ( max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% − 𝑐∗, 𝑥% (cumulative suboptimality).

Learning with Suboptimal Actions 88 ü Sublinear in Δ( (cf.
corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 .

Learning with Suboptimal Actions 89 ü Sublinear in Δ( (cf.
corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Proof sketch Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 . By the same discussion as the regret analysis of MetaGrad, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≲ 𝑂 𝑉( -∗ 𝑛 log 𝑇 + 𝑛 log 𝑇 . Also, 𝑉( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 ≤ O 𝑅( -∗ + 2Δ( holds (cf. 𝑉( -∗ ≤ O 𝑅( -∗ if every 𝑥% is optimal). The claim follows from the sub-additivity of 𝑥 ↦ 𝑥 and self-bounding.

Toward Tight Regret Analysis

𝛀(𝒏) Lower Bound 91 Focus on the regret 𝑅( -∗
= ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 .

= ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable.

= ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable. Can the 𝐥𝐨𝐠 𝑻 in the upper bound removed?

Revisiting Cone-Based Approach 94 Assume • Θ = 𝜃 ∈
ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively.

ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%).

ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%). ̂ 𝑐! 𝑐∗ 𝑐∗, 𝑥% − J 𝑥% ≥ 0 and ̂ 𝑐%, 𝑥% − J 𝑥% ≤ 0 must hold by the assumption. 𝑥! − ' 𝑥! must lie here ̂ 𝑐!, 𝑥! − 4 𝑥! ≥ 0 𝜃 𝑐∗, ̂ 𝑐! ≤ 𝜋 2 𝑐∗, 𝑥! − 4 𝑥! ≥ 0 Therefore, cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ cos Q 5 − 𝜃 𝑐∗, ̂ 𝑐% = sin 𝜃(𝑐∗, ̂ 𝑐%).

An 𝑶(𝟏)-Regret Algorithm for 𝒏 = 𝟐 97 Independent of
𝑇 (but extending to 𝑛 > 2 seems challenging, as discussed later). Theorem Algorithm 1 achieves 𝔼 𝑅( -∗ = 2𝜋. 𝒩% ≔ 𝑐 ∈ 𝕊' 𝑐, 𝑥% − 𝑥 ≥ 0 ∀𝑥 ∈ 𝑋% is the normal cone of 𝑋% at 𝑥% . 𝒞% is the region such that 𝑐∗ ∈ 𝒞% does not contradict “𝑥R ∈ arg max )∈+4 𝑐∗, 𝑥 for 𝑠 = 1, … , 𝑡 − 1.”

Proof 98 𝔼 𝑐∗, 𝑥% − J 𝑥% = Pr
𝒞% ∖ int 𝒩% 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% Focus on round 𝑡. If ̂ 𝑐% ∈ int(𝒩%), J 𝑥% = 𝑥% and hence 𝑐∗, 𝑥% − J 𝑥% = 0. Taking expectation of drawing ̂ 𝑐% ∈ 𝒞% , = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% , where 𝐴(⋅) denotes the arc length (= central angle).

Proof 99 Since 𝒞%,' ← 𝒞% ∩ 𝒩% , Hence
𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ 𝐴(𝒞% ∖ int 𝒩% ) in any case. 𝔼 𝑅( -∗ = ∑%&' ( 𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ ∑%&' ( 𝐴 𝒞% ∖ int 𝒩% ≤ 2𝜋. If 𝐴 𝒞% ≥ Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% If 𝐴 𝒞% < Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% ≤ : 𝒞"∖UV? 𝒩 " : 𝒞" sin 𝜃 𝑐∗, ̂ 𝑐% ≤ :(𝒞"∖UV? 𝒩 " ) :(𝒞") sin 𝐴(𝒞%) ≤ 𝐴(𝒞% ∖ int 𝒩% ). ≤ 5 Q ⋅ 1 ⋅ 𝐴(𝒞% ∖ int 𝒩% ) ≤ 𝐴(𝒞% ∖ int 𝒩% ).

Conclusion 100 • 𝑅( -∗ = 𝑂 𝑛 log 𝑇
+ 𝑛Δ(log 𝑇 by ONS. • O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 by MetaGrad for possibly suboptimal case. • 𝑅( -∗ = Ω 𝑛 . • 𝑅( -∗ = 𝑂 1 for 𝑛 = 2. Future work • Tight analysis for general 𝑛. – Difficulty: sin 𝜃 𝑐∗, ̂ 𝑐! ≤ sin 𝐴(𝒞! ) no longer holds. • Exploring other online-learning ideas useful for inverse optimization.

Online Inverse Linear Optimization

Online Inverse Linear Optimization

More Decks by Shinsaku Sakaue

Featured

Transcript