Slide 1

Slide 1 text

Online Inverse Linear Optimization Mar. 7, 2025 @ KyotoU, KISS Shinsaku Sakaue (Univ. Tokyo, RIKEN AIP) Joint work with Taira Tsuchiya (Univ. Tokyo, RIKEN AIP), Han Bao (Kyoto Univ.), Taihei Oki (Hokkaido Univ.) Preprint: https://arxiv.org/abs/2501.14349

Slide 2

Slide 2 text

Forward Optimization 2 maximize 𝑓!(π‘₯) subject to π‘₯ ∈ 𝑋 Forward optimization: given πœƒ, find optimal solution π‘₯. https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour Seismic tomography πœƒ = geological features of certain zones Seismic wave trajectories are modeled as shortest path problems β€’ Decision variable π‘₯ ∈ ℝ" β€’ Constraint 𝑋 βŠ† ℝ" (known) β€’ Model parameter πœƒ ∈ ℝ# Customer behavior πœƒ = customer’s preference Purchase behavior is modeled as utility maximization

Slide 3

Slide 3 text

Inverse Optimization 3 maximize 𝑓!(π‘₯) subject to π‘₯ ∈ 𝑋 Inverse optimization: estimate πœƒ from optimal solution π‘₯. Forward optimization: given πœƒ, find optimal solution π‘₯. β€’ Decision variable π‘₯ ∈ ℝ" β€’ Constraint 𝑋 βŠ† ℝ" (known) β€’ Model parameter πœƒ ∈ ℝ# Seismic tomography Estimate geological features πœƒ from observed waves π‘₯ Estimate customer’s preference πœƒ from purchase behavior π‘₯ Customer behavior https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour

Slide 4

Slide 4 text

Linear Optimization (forward model in this talk) 4 For 𝑑 = 1, … , 𝑇, an agent solves maximize π‘βˆ—, π‘₯ subject to π‘₯ ∈ 𝑋% . Budget = 2 π‘βˆ— = 0.2 0.4 0.1 0.0 0.3 πŸ‘©πŸ¦° Agent β€’ π‘βˆ— ∈ ℝ" is the agent’s internal objective vector. β€’ π‘βˆ— lies in a convex set Θ βŠ‚ ℝ" with diam Θ = 1 (known to the learner). β€’ 𝑋% βŠ† ℝ" is the agent’s 𝑑th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . β€’ Let π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% .

Slide 5

Slide 5 text

Linear Optimization (forward model in this talk) 5 For 𝑑 = 1, … , 𝑇, an agent solves maximize π‘βˆ—, π‘₯ subject to π‘₯ ∈ 𝑋% . 𝑋% Sold out π‘βˆ— = 0.2 0.4 0.1 0.0 0.3 πŸ‘©πŸ¦° Agent Budget = 2 β€’ π‘βˆ— ∈ ℝ" is the agent’s internal objective vector. β€’ π‘βˆ— lies in a convex set Θ βŠ‚ ℝ" with diam Θ = 1 (known to the learner). β€’ 𝑋% βŠ† ℝ" is the agent’s 𝑑th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . β€’ Let π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% .

Slide 6

Slide 6 text

Linear Optimization (forward model in this talk) 6 For 𝑑 = 1, … , 𝑇, an agent solves maximize π‘βˆ—, π‘₯ subject to π‘₯ ∈ 𝑋% . 𝑋% Sold out π‘₯% = 1 0 0 0 1 π‘βˆ— = 0.2 0.4 0.1 0.0 0.3 πŸ‘©πŸ¦° Agent Budget = 2 β€’ π‘βˆ— ∈ ℝ" is the agent’s internal objective vector. β€’ π‘βˆ— lies in a convex set Θ βŠ‚ ℝ" with diam Θ = 1 (known to the learner). β€’ 𝑋% βŠ† ℝ" is the agent’s 𝑑th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . β€’ Let π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% .

Slide 7

Slide 7 text

Inverse Linear Optimization 7 For 𝑑 = 1, … , 𝑇, an agent solves 𝑋% Sold out π‘₯% = 1 0 0 0 1 π‘βˆ— = 0.2 0.4 0.1 0.0 0.3 πŸ‘©πŸ¦° Agent πŸ€– Learner maximize π‘βˆ—, π‘₯ subject to π‘₯ ∈ 𝑋% . Budget = 2 A learner aims to infer π‘βˆ— from 𝑋%, π‘₯% %&' ( .

Slide 8

Slide 8 text

Online Inverse Linear Optimization 8 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner BΓ€rmann et al. 2017

Slide 9

Slide 9 text

Online Inverse Linear Optimization 9 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. BΓ€rmann et al. 2017

Slide 10

Slide 10 text

Online Inverse Linear Optimization 10 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. Agent faces 𝑋% and takes action π‘₯% ∈ argmax )∈+" π‘βˆ—, π‘₯ . BΓ€rmann et al. 2017

Slide 11

Slide 11 text

Online Inverse Linear Optimization 11 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. Agent faces 𝑋% and takes action π‘₯% ∈ argmax )∈+" π‘βˆ—, π‘₯ . Observes 𝑋%, π‘₯% and updates from Μ‚ 𝑐% to Μ‚ 𝑐%,' . BΓ€rmann et al. 2017

Slide 12

Slide 12 text

Online Inverse Linear Optimization 12 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. Agent faces 𝑋% and takes action π‘₯% ∈ argmax )∈+" π‘βˆ—, π‘₯ . Let J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% and define regret 𝑅( -βˆ— as 𝑅( -βˆ— ≔ βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ π‘βˆ—, J π‘₯% . Observes 𝑋%, π‘₯% and updates from Μ‚ 𝑐% to Μ‚ 𝑐%,' . BΓ€rmann et al. 2017

Slide 13

Slide 13 text

Online Inverse Linear Optimization 13 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. Agent faces 𝑋% and takes action π‘₯% ∈ argmax )∈+" π‘βˆ—, π‘₯ . Let J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% and define regret 𝑅( -βˆ— as 𝑅( -βˆ— ≔ βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ π‘βˆ—, J π‘₯% . Optimal objective value Objective value achieved by following learner’s prediction Μ‚ 𝑐! Observes 𝑋%, π‘₯% and updates from Μ‚ 𝑐% to Μ‚ 𝑐%,' . β€’ 𝑅( -βˆ— measures the quality of actions suggested by predictions. β€’ 𝑅( -βˆ— is non-negative; 𝑅( -βˆ— = 0 if π‘βˆ— = Μ‚ 𝑐% for all 𝑑; smaller is better. BΓ€rmann et al. 2017

Slide 14

Slide 14 text

Convenient Upper Bound on Regret 14 BΓ€rmann et al. 2017 For Μ‚ 𝑐% , J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% , π‘βˆ—, and π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% , define O 𝑅( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% .

Slide 15

Slide 15 text

Convenient Upper Bound on Regret 15 BΓ€rmann et al. 2017 For Μ‚ 𝑐% , J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% , π‘βˆ—, and π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% , define hence O 𝑅( -βˆ— β‰₯ 𝑅( -βˆ— . O 𝑅( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% . O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + 𝑅( -βˆ— , = regret 𝑅" #βˆ— β‰₯ 0 as ' π‘₯! is optimal for Μ‚ 𝑐!

Slide 16

Slide 16 text

Convenient Upper Bound on Regret 16 BΓ€rmann et al. 2017 For Μ‚ 𝑐% , J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% , π‘βˆ—, and π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% , define hence O 𝑅( -βˆ— β‰₯ 𝑅( -βˆ— . O 𝑅( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% . O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + 𝑅( -βˆ— , = regret 𝑅" #βˆ— β‰₯ 0 as ' π‘₯! is optimal for Μ‚ 𝑐! What is the additional term? Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% = max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% βˆ’ Μ‚ 𝑐%, π‘₯% .

Slide 17

Slide 17 text

Convenient Upper Bound on Regret 17 BΓ€rmann et al. 2017 For Μ‚ 𝑐% , J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% , π‘βˆ—, and π‘₯% ∈ arg max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% , define hence O 𝑅( -βˆ— β‰₯ 𝑅( -βˆ— . O 𝑅( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% . O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + 𝑅( -βˆ— , = regret 𝑅" #βˆ— β‰₯ 0 as ' π‘₯! is optimal for Μ‚ 𝑐! What is the additional term? Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% = max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% βˆ’ Μ‚ 𝑐%, π‘₯% . Optimal value for Μ‚ 𝑐! Objective value achieved by π‘₯! for Μ‚ 𝑐! β€’ Takes zero if π‘βˆ— = Μ‚ 𝑐% ; quantifies how well Μ‚ 𝑐% explains the agent’s choice π‘₯% . β€’ Called the suboptimality loss in inverse optimization (Mohajerin Esfahani et al. 2018). β€’ This alone is sometimes meaningless: Μ‚ 𝑐% = 0 trivially attains the zero suboptimality loss.

Slide 18

Slide 18 text

Related Work: Online Learning Approach 18 Consider making the upper bound O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% as small as possible. BΓ€rmann et al. 2017

Slide 19

Slide 19 text

Related Work: Online Learning Approach 19 Consider making the upper bound O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% as small as possible. Regarding 𝑓%: Θ βˆ‹ 𝑐 ↦ 𝑐, J π‘₯% βˆ’ π‘₯% as a linear cost function, O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% βˆ’ βˆ‘%&' ( π‘βˆ—, J π‘₯% βˆ’ π‘₯% = βˆ‘%&' ( 𝑓%( Μ‚ 𝑐%) βˆ’ βˆ‘%&' ( 𝑓%(π‘βˆ—). The standard regret in online learning. BΓ€rmann et al. 2017

Slide 20

Slide 20 text

Related Work: Online Learning Approach 20 Consider making the upper bound O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute Μ‚ 𝑐% , we obtain 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 β†’ ∞. Regarding 𝑓%: Θ βˆ‹ 𝑐 ↦ 𝑐, J π‘₯% βˆ’ π‘₯% as a linear cost function, O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% βˆ’ βˆ‘%&' ( π‘βˆ—, J π‘₯% βˆ’ π‘₯% = βˆ‘%&' ( 𝑓%( Μ‚ 𝑐%) βˆ’ βˆ‘%&' ( 𝑓%(π‘βˆ—). BΓ€rmann et al. 2017

Slide 21

Slide 21 text

Related Work: Online Learning Approach 21 Consider making the upper bound O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute Μ‚ 𝑐% , we obtain 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% + βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 β†’ ∞. The rate of 𝑇 is optimal in general OLO. Is it also optimal in online inverse linear optimization? Regarding 𝑓%: Θ βˆ‹ 𝑐 ↦ 𝑐, J π‘₯% βˆ’ π‘₯% as a linear cost function, O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐%, J π‘₯% βˆ’ π‘₯% βˆ’ βˆ‘%&' ( π‘βˆ—, J π‘₯% βˆ’ π‘₯% = βˆ‘%&' ( 𝑓%( Μ‚ 𝑐%) βˆ’ βˆ‘%&' ( 𝑓%(π‘βˆ—). BΓ€rmann et al. 2017

Slide 22

Slide 22 text

Related Work: Ellipsoid Based Method 22 There is a method achieving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023

Slide 23

Slide 23 text

Related Work: Ellipsoid Based Method 23 There is a method achieving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 High-level Idea Maintain a cone π’ž% representing possible existence of π‘βˆ—. After observing (𝑋%, π‘₯%), π’ž% can be narrowed down: π’ž%,' ← π’ž% ∩ 𝑐 ∈ Θ | 𝑐, π‘₯% β‰₯ 𝑐, π‘₯ for all π‘₯ ∈ 𝑋% . Normal cone of 𝑋! at π‘₯! : Cone of vectors that make π‘₯! optimal over 𝑋! .

Slide 24

Slide 24 text

Related Work: Ellipsoid Based Method 24 There is a method achieving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone π’ž% representing possible existence of π‘βˆ—. After observing (𝑋%, π‘₯%), π’ž% can be narrowed down: π’ž%,' ← π’ž% ∩ 𝑐 ∈ Θ | 𝑐, π‘₯% β‰₯ 𝑐, π‘₯ for all π‘₯ ∈ 𝑋% . Normal cone of 𝑋! at π‘₯! : Cone of vectors that make π‘₯! optimal over 𝑋! . Slightly inflate π’ž% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.

Slide 25

Slide 25 text

Related Work: Ellipsoid Based Method 25 There is a method achieving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone π’ž% representing possible existence of π‘βˆ—. After observing (𝑋%, π‘₯%), π’ž% can be narrowed down: π’ž%,' ← π’ž% ∩ 𝑐 ∈ Θ | 𝑐, π‘₯% β‰₯ 𝑐, π‘₯ for all π‘₯ ∈ 𝑋% . Normal cone of 𝑋! at π‘₯! : Cone of vectors that make π‘₯! optimal over 𝑋! . But, there are downsides… β€’ 𝑛. factor is prohibitive for large 𝑛. β€’ Not very efficient, albeit polynomial in 𝑛 and 𝑇. Slightly inflate π’ž% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.

Slide 26

Slide 26 text

Our Results 26 Theorem There is a method achieving 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = 𝑂(𝑛 log 𝑇). β€’ Improving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. β€’ Applies to the upper bound O 𝑅( -βˆ— β‰₯ 𝑅( -βˆ— . β€’ More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.

Slide 27

Slide 27 text

Our Results 27 Theorem There is a method achieving 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = 𝑂(𝑛 log 𝑇). And more: β€’ Dealing with suboptimal feedback π‘₯% with MetaGrad (ONS with multiple learning rates). β€’ Lower bound of 𝑅( -βˆ— = Ξ©(𝑛), implying the tightness regarding 𝑛. β€’ 𝑅( -βˆ— = 𝑂(1) for 𝑛 = 2 based on the method of Besbes et al. (2023). β€’ Improving 𝑅( -βˆ— = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. β€’ Applies to the upper bound O 𝑅( -βˆ— β‰₯ 𝑅( -βˆ— . β€’ More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.

Slide 28

Slide 28 text

𝑢 𝒏 π₯𝐨𝐠 𝑻 via ONS-Based Method

Slide 29

Slide 29 text

Online Convex Optimization 29 For 𝑑 = 1, … , 𝑇: πŸ€– 🌏 Learner Environment

Slide 30

Slide 30 text

Online Convex Optimization 30 For 𝑑 = 1, … , 𝑇: Play Μ‚ 𝑐% πŸ€– 🌏 Learner Environment

Slide 31

Slide 31 text

Online Convex Optimization 31 For 𝑑 = 1, … , 𝑇: Reveal 𝑓% Play Μ‚ 𝑐% πŸ€– 🌏 Learner Environment 𝑓% may be reactive to Μ‚ 𝑐%

Slide 32

Slide 32 text

Online Convex Optimization 32 For 𝑑 = 1, … , 𝑇: Reveal 𝑓% Play Μ‚ 𝑐% πŸ€– 🌏 Learner Environment Incurs 𝑓%( Μ‚ 𝑐%) and computes Μ‚ 𝑐%,' 𝑓% may be reactive to Μ‚ 𝑐%

Slide 33

Slide 33 text

Online Convex Optimization 33 β€’ Learner’s domain Θ is convex, and diam Θ = 1. β€’ Learner can use information up to the end of round 𝑑 when computing Μ‚ 𝑐%,' . β€’ Loss function 𝑓%: Θ β†’ ℝ is convex. For 𝑑 = 1, … , 𝑇: Reveal 𝑓% Play Μ‚ 𝑐% πŸ€– 🌏 Learner Environment 𝑓% may be reactive to Μ‚ 𝑐% Incurs 𝑓%( Μ‚ 𝑐%) and computes Μ‚ 𝑐%,'

Slide 34

Slide 34 text

Online Convex Optimization 34 For 𝑑 = 1, … , 𝑇: Reveal 𝑓% Play Μ‚ 𝑐% πŸ€– 🌏 Learner Environment 𝑓% may be reactive to Μ‚ 𝑐% For any comparator π‘βˆ— ∈ Θ, the learner aims to make the regret as small as possible: βˆ‘%&' ( 𝑓% Μ‚ 𝑐% βˆ’ 𝑓%(π‘βˆ—) . β€’ Learner’s domain Θ is convex, and diam Θ = 1. β€’ Learner can use information up to the end of round 𝑑 when computing Μ‚ 𝑐%,' . β€’ Loss function 𝑓%: Θ β†’ ℝ is convex. Incurs 𝑓%( Μ‚ 𝑐%) and computes Μ‚ 𝑐%,'

Slide 35

Slide 35 text

Exp-Concave Loss 35 Function 𝑓: Θ β†’ ℝ is 𝛼-exp-concave for some 𝛼 > 0 if the following 𝑔: Θ β†’ ℝ is concave: 𝑔: 𝑐 ↦ e012(-).

Slide 36

Slide 36 text

Exp-Concave Loss 36 Function 𝑓: Θ β†’ ℝ is 𝛼-exp-concave for some 𝛼 > 0 if the following 𝑔: Θ β†’ ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ β†’ ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to βˆ‡5𝑓 𝑐 ≽ π›Όβˆ‡π‘“ 𝑐 βˆ‡π‘“ 𝑐 6 βˆ€π‘ ∈ Θ. Cf. 𝛼-strong convexity requires βˆ‡5𝑓 𝑐 ≽ 1 5 𝐼.

Slide 37

Slide 37 text

Exp-Concave Loss 37 Function 𝑓: Θ β†’ ℝ is 𝛼-exp-concave for some 𝛼 > 0 if the following 𝑔: Θ β†’ ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ β†’ ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to βˆ‡5𝑓 𝑐 ≽ π›Όβˆ‡π‘“ 𝑐 βˆ‡π‘“ 𝑐 6 βˆ€π‘ ∈ Θ. Cf. 𝛼-strong convexity requires βˆ‡5𝑓 𝑐 ≽ 1 5 𝐼. Examples β€’ 𝑓 𝑐 = βˆ’log(π‘Ÿ6𝑐) (appears in portfolio theory) satisfies βˆ‡5𝑓 𝑐 = 77$ 7$- % = βˆ‡π‘“ 𝑐 βˆ‡π‘“ 𝑐 6. β€’ 𝑓 𝑐 = π‘Ÿ6𝑐 5 for π‘Ÿ , 𝑐 ≀ 1 (used later) satisfies βˆ‡5𝑓 𝑐 = 2π‘Ÿπ‘Ÿ6 = βˆ‡2 - βˆ‡2 - $ 5 7 % - % ≽ ' 5 βˆ‡π‘“ 𝑐 βˆ‡π‘“ 𝑐 6.

Slide 38

Slide 38 text

ONS Regret Bound 38 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡π‘“% Μ‚ 𝑐% βˆ‡π‘“% Μ‚ 𝑐% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 :" 𝑐 ∈ Θ Assume: β€’ 𝑓', … , 𝑓( are twice differentiable and 𝛼-exp-concave, β€’ βˆ‡π‘“% Μ‚ 𝑐% ≀ 𝐺 for all 𝑑 and 𝑐 ∈ Θ. Let 𝛾 = ' 5 min 1/𝐺, 𝛼 and πœ€ = 1/𝛾5. Then, Μ‚ 𝑐', … , Μ‚ 𝑐( ∈ Θ computed by ONS satisfy For PSD 𝐴, π‘₯ : ≔ π‘₯6𝐴π‘₯. βˆ‘%&' ( 𝑓% Μ‚ 𝑐% βˆ’ 𝑓%(π‘βˆ—) = 𝑂 𝑛 ' 1 + 𝐺 log 𝑇 . A well-known result. We’ll get back to this later Hazan et al. 2007

Slide 39

Slide 39 text

Our ONS-based Method 39 Set Μ‚ 𝑐' ∈ Θ For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe (𝑋%, π‘₯%) Compute J π‘₯% ∈ arg max Μ‚ 𝑐%, π‘₯ π‘₯ ∈ 𝑋% Get Μ‚ 𝑐%,' via ONS applied to 𝑓% ; For πœ‚ ∈ (0,1) (specified later), define 𝑓% ;: Θ β†’ ℝ by 𝑓% ; 𝑐 ≔ βˆ’πœ‚ Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% + πœ‚5 Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% 5. Theorem For Μ‚ 𝑐', … , Μ‚ 𝑐( ∈ Θ computed by the above method, it holds that 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = 𝑂(𝑛 log 𝑇). Upper bound: * 𝑅" #βˆ— = βˆ‘!$% " Μ‚ 𝑐! βˆ’ π‘βˆ—, ' π‘₯! βˆ’ π‘₯! Regret: 𝑅" #βˆ— = βˆ‘!$% " π‘βˆ—, ' π‘₯! βˆ’ π‘₯!

Slide 40

Slide 40 text

Regret Analysis 40 𝑓% ; 𝑐 ≔ βˆ’πœ‚ Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% + πœ‚5 Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% 5 with constant πœ‚ ∈ (0,1) enjoys Ξ©(1)-exp-concavity and βˆ‡π‘“% ;( Μ‚ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: βˆ‘%&' ( 𝑓% ; Μ‚ 𝑐% βˆ’ 𝑓% ; π‘βˆ— = 𝑂(𝑛 log 𝑇)

Slide 41

Slide 41 text

Regret Analysis 41 𝑓% ; 𝑐 ≔ βˆ’πœ‚ Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% + πœ‚5 Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% 5 with constant πœ‚ ∈ (0,1) enjoys Ξ©(1)-exp-concavity and βˆ‡π‘“% ;( Μ‚ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: βˆ‘%&' ( 𝑓% ; Μ‚ 𝑐% βˆ’ 𝑓% ; π‘βˆ— = 𝑂(𝑛 log 𝑇) Proof Define 𝑉( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% 5, which satisfies 𝑉( -βˆ— ≀ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— J π‘₯% βˆ’ π‘₯% Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% ≀ O 𝑅( -βˆ— . ∡ Μ‚ 𝑐! βˆ’ π‘βˆ—, ' π‘₯! βˆ’ π‘₯! β‰₯ 0 and Cauchy–Schwarz. ≀ 1 ≀ 1

Slide 42

Slide 42 text

Regret Analysis 42 𝑓% ; 𝑐 ≔ βˆ’πœ‚ Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% + πœ‚5 Μ‚ 𝑐% βˆ’ 𝑐, J π‘₯% βˆ’ π‘₯% 5 with constant πœ‚ ∈ (0,1) enjoys Ξ©(1)-exp-concavity and βˆ‡π‘“% ;( Μ‚ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: βˆ‘%&' ( 𝑓% ; Μ‚ 𝑐% βˆ’ 𝑓% ; π‘βˆ— = 𝑂(𝑛 log 𝑇) Hence 1 βˆ’ πœ‚ O 𝑅( -βˆ— ≀ ' ; βˆ‘%&' ( 𝑓% ; Μ‚ 𝑐% βˆ’ 𝑓% ; π‘βˆ— = 𝑂(𝑛 log 𝑇), and setting πœ‚ = ' 5 completes the proof. Proof Define 𝑉( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% 5, which satisfies 𝑉( -βˆ— ≀ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— J π‘₯% βˆ’ π‘₯% Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% ≀ O 𝑅( -βˆ— . ∡ Μ‚ 𝑐! βˆ’ π‘βˆ—, ' π‘₯! βˆ’ π‘₯! β‰₯ 0 and Cauchy–Schwarz. O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% = βˆ’ ' ; βˆ‘% ( 𝑓% ; π‘βˆ— + πœ‚π‘‰( -βˆ— ≀ ' ; βˆ‘% ( 𝑓% ; Μ‚ 𝑐% βˆ’ 𝑓% ; π‘βˆ— + πœ‚ O 𝑅( -βˆ— . 𝑓! ' Μ‚ 𝑐! = 0 = 𝑂(𝑛 log 𝑇) ≀ 1 ≀ 1

Slide 43

Slide 43 text

Online Newton Step Elad Hazan, Amit Agarwal, and Satyen Kale (Mach. Learn. 2007)

Slide 44

Slide 44 text

Warm-up: Online Gradient Descent 44 Set Μ‚ 𝑐' ∈ Θ. Fix πœ‚ > 0. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 𝑐 ∈ Θ

Slide 45

Slide 45 text

Warm-up: Online Gradient Descent 45 Set Μ‚ 𝑐' ∈ Θ. Fix πœ‚ > 0. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— 5 ≀ Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ π‘βˆ— 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + πœ‚5 βˆ‡π‘“% Μ‚ 𝑐% 5 βˆ’ 2πœ‚ βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— .

Slide 46

Slide 46 text

Warm-up: Online Gradient Descent 46 Set Μ‚ 𝑐' ∈ Θ. Fix πœ‚ > 0. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— 5 ≀ Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ π‘βˆ— 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + πœ‚5 βˆ‡π‘“% Μ‚ 𝑐% 5 βˆ’ 2πœ‚ βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑 and ignoring βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— 5 ≀ 0, βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ βˆ‘%&' ( Μ‚ -"0-βˆ— %0 Μ‚ -"&'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5 = Μ‚ -'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5

Slide 47

Slide 47 text

Warm-up: Online Gradient Descent 47 Set Μ‚ 𝑐' ∈ Θ. Fix πœ‚ > 0. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— 5 ≀ Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ π‘βˆ— 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + πœ‚5 βˆ‡π‘“% Μ‚ 𝑐% 5 βˆ’ 2πœ‚ βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑 and ignoring βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— 5 ≀ 0, βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ βˆ‘%&' ( Μ‚ -"0-βˆ— %0 Μ‚ -"&'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5 = Μ‚ -'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5 πœ‚ = 1/ 𝐺𝑇 ≀ ' 5 ' ; + πœ‚πΊπ‘‡ = 𝐺𝑇. diam Θ = 1

Slide 48

Slide 48 text

Warm-up: Online Gradient Descent 48 Set Μ‚ 𝑐' ∈ Θ. Fix πœ‚ > 0. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— 5 ≀ Μ‚ 𝑐% βˆ’ πœ‚βˆ‡π‘“% Μ‚ 𝑐% βˆ’ π‘βˆ— 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + πœ‚5 βˆ‡π‘“% Μ‚ 𝑐% 5 βˆ’ 2πœ‚ βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑 and ignoring βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— 5 ≀ 0, βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ βˆ‘%&' ( Μ‚ -"0-βˆ— %0 Μ‚ -"&'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5 = Μ‚ -'0-βˆ— % 5; + ; 5 βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% 5 By the convexity of 𝑓% , βˆ‘%&' ( 𝑓% Μ‚ 𝑐% βˆ’ 𝑓% π‘βˆ— ≀ βˆ‘%&' ( βˆ‡π‘“% Μ‚ 𝑐% , Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ 𝐺𝑇. πœ‚ = 1/ 𝐺𝑇 ≀ ' 5 ' ; + πœ‚πΊπ‘‡ = 𝐺𝑇. diam Θ = 1

Slide 49

Slide 49 text

Closer Look at the 𝑻 Rate 49 Let βˆ‡%= βˆ‡π‘“%( Μ‚ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ ' 5; Μ‚ 𝑐' βˆ’ π‘βˆ— 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— 5 + β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— 5 + ; 5 βˆ‘%&' ( βˆ‡% 5 πœ‚ = 1/ 𝐺𝑇 ≀ ' 5 ' ; + πœ‚πΊπ‘‡ = 𝐺𝑇.

Slide 50

Slide 50 text

Closer Look at the 𝑻 Rate 50 πœ‚ = 1/ 𝐺𝑇 If the penalty and stability sum to 𝑂(1) and 𝑂(𝑇), respectively, the regret scales with 𝑇. Let βˆ‡%= βˆ‡π‘“%( Μ‚ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— stability ≀ ' 5; Μ‚ 𝑐' βˆ’ π‘βˆ— 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— 5 + β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— 5 + ; 5 βˆ‘%&' ( βˆ‡% 5 ≀ ' 5 ' ; + πœ‚πΊπ‘‡ = 𝐺𝑇. penalty = 𝟎 Consider achieving a better stability via the elliptical potential lemma. ignored

Slide 51

Slide 51 text

Elliptical Potential Lemma 51 Recall π‘₯ : ≔ π‘₯6𝐴π‘₯ for PSD matrix 𝐴. Let 𝐴9 = πœ€πΌ and 𝐴% = 𝐴%0' + βˆ‡%βˆ‡% 6 for 𝑑 = 1, … , 𝑇. Then, it holds that βˆ‘%&' ( βˆ‡% :" (' 5 ≀ log =>? :) =>? :* ≀ 𝑛 log (@% A + 1 . βˆ‡! ≀ 𝐺

Slide 52

Slide 52 text

Elliptical Potential Lemma 52 Recall π‘₯ : ≔ π‘₯6𝐴π‘₯ for PSD matrix 𝐴. Let 𝐴9 = πœ€πΌ and 𝐴% = 𝐴%0' + βˆ‡%βˆ‡% 6 for 𝑑 = 1, … , 𝑇. Then, it holds that βˆ‘%&' ( βˆ‡% :" (' 5 ≀ log =>? :) =>? :* ≀ 𝑛 log (@% A + 1 . βˆ‡! ≀ 𝐺 Proof βˆ‡% :" (' 5 = βˆ‡% 6𝐴% 0'βˆ‡%= 𝐴% 0' r βˆ‡%βˆ‡% 6= 𝐴% 0' r 𝐴% βˆ’ 𝐴%0' ≀ log =>? :" =>? :"(' . For π‘Ž, 𝑏 > 0, π‘Ž(% π‘Ž βˆ’ 𝑏 ≀ log ) * . Carefully apply this to eigenvalues. Hence βˆ‘%&' ( βˆ‡% :" (' 5 ≀ log =>? :) =>? :* . Using det 𝐴9 = πœ€" and det 𝐴( ≀ 𝑇𝐺5 " yields the bound. A.k.a. the concavity of log-det for PSD matrices: βˆ‡log det 𝐴 r 𝐴 βˆ’ 𝐡 = 𝐴0' r 𝐴 βˆ’ 𝐡 β‰Ό log det 𝐴 det 𝐡 = log det 𝐴 βˆ’ log det 𝐡.

Slide 53

Slide 53 text

Online Newton Step 53 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ

Slide 54

Slide 54 text

Online Newton Step 54 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 ≀ Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 + 𝛾05 βˆ‡% :" (' 5 βˆ’ 2𝛾0' βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— .

Slide 55

Slide 55 text

Online Newton Step 55 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 ≀ Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 + 𝛾05 βˆ‡% :" (' 5 βˆ’ 2𝛾0' βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑, βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ B 5 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 + ' 5B βˆ‘%&' ( βˆ‡% :" (' 5

Slide 56

Slide 56 text

Online Newton Step 56 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 ≀ Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 + 𝛾05 βˆ‡% :" (' 5 βˆ’ 2𝛾0' βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑, βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ B 5 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 + ' 5B βˆ‘%&' ( βˆ‡% :" (' 5 ≀ B 5 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 + " 5B log (@% A + 1 Elliptical potential lem.

Slide 57

Slide 57 text

Online Newton Step 57 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾, πœ€ > 0 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 ≀ Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 + 𝛾05 βˆ‡% :" (' 5 βˆ’ 2𝛾0' βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— . Summing over 𝑑, βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ B 5 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 + ' 5B βˆ‘%&' ( βˆ‡% :" (' 5 ≀ B 5 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 + " 5B log (@% A + 1 Elliptical potential lem. Take a closer look at the penalty.

Slide 58

Slide 58 text

ONS: Penalty 58 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— :' 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— :% 5 βˆ’ β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— :"(' 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— :) 5

Slide 59

Slide 59 text

ONS: Penalty 59 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— :' 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— :% 5 βˆ’ β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— :"(' 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— :) 5 penalty = Μ‚ 𝑐! βˆ’ π‘βˆ— + 𝐴! βˆ’ 𝐴!(% Μ‚ 𝑐! βˆ’ π‘βˆ— = βˆ‡! + Μ‚ 𝑐! βˆ’ π‘βˆ— , ignored

Slide 60

Slide 60 text

ONS: Penalty 60 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— :' 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— :% 5 βˆ’ β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— :"(' 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— :) 5 ≀ Μ‚ 𝑐' βˆ’ π‘βˆ— :* 5 + Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐' βˆ’ π‘βˆ— :* 5 + βˆ‘%&5 ( βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 penalty = Μ‚ 𝑐! βˆ’ π‘βˆ— + 𝐴! βˆ’ 𝐴!(% Μ‚ 𝑐! βˆ’ π‘βˆ— = βˆ‡! + Μ‚ 𝑐! βˆ’ π‘βˆ— , ignored ≀ πœ€ + βˆ‘%&' ( βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5

Slide 61

Slide 61 text

ONS: Penalty 61 βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 βˆ’ Μ‚ 𝑐%,' βˆ’ π‘βˆ— :" 5 = Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐5 βˆ’ π‘βˆ— :' 5 + Μ‚ 𝑐5 βˆ’ π‘βˆ— :% 5 βˆ’ β‹― βˆ’ Μ‚ 𝑐% βˆ’ π‘βˆ— :"(' 5 + Μ‚ 𝑐% βˆ’ π‘βˆ— :" 5 β‹― βˆ’ Μ‚ 𝑐(,' βˆ’ π‘βˆ— :) 5 ≀ Μ‚ 𝑐' βˆ’ π‘βˆ— :* 5 + Μ‚ 𝑐' βˆ’ π‘βˆ— :' 5 βˆ’ Μ‚ 𝑐' βˆ’ π‘βˆ— :* 5 + βˆ‘%&5 ( βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 penalty = Μ‚ 𝑐! βˆ’ π‘βˆ— + 𝐴! βˆ’ 𝐴!(% Μ‚ 𝑐! βˆ’ π‘βˆ— = βˆ‡! + Μ‚ 𝑐! βˆ’ π‘βˆ— , ≀ πœ€ + βˆ‘%&' ( βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 Therefore, setting πœ€ = 1/𝛾5, βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— ≀ BA 5 + B 5 βˆ‘%&' ( βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 + " 5B log (@% A + 1 ⟹ βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— βˆ’ B 5 βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 ≀ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . ignored

Slide 62

Slide 62 text

ONS: Using Exp-Concavity 62 If 𝑓% is 𝛼-exp-concave, for 𝛾 ≀ ' 5 min {𝛼, 1/𝐺}, 𝑓% Μ‚ 𝑐% βˆ’ 𝑓% π‘βˆ— ≀ βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— βˆ’ B 5 βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 . Therefore, setting 𝛾 = ' 5 min {𝛼, 1/𝐺}, we obtain the desired regret bound: βˆ‘%&' ( 𝑓% Μ‚ 𝑐% βˆ’ 𝑓% π‘βˆ— ≀ βˆ‘%&' ( βˆ‡%, Μ‚ 𝑐% βˆ’ π‘βˆ— βˆ’ B 5 βˆ‡% 6 Μ‚ 𝑐% βˆ’ π‘βˆ— 5 ≀ 𝑛 ' 1 + 𝐺 log ( . + 1 + 1 . cf. if 𝑓! is convex, 𝑓! Μ‚ 𝑐! βˆ’ 𝑓! π‘βˆ— ≀ βˆ‡! , Μ‚ 𝑐! βˆ’ π‘βˆ— . ≀ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . + ,-. /,1 ≀ + / + + 1 (harmonic mean ≀ 𝑛 Γ—min)

Slide 63

Slide 63 text

MetaGrad 63 Tim van Erven, Wouter M. Koolen, and Dirk van der Hoeven (NeurIPS 2016, JMLR 2021)

Slide 64

Slide 64 text

ONS Requires Prior Knowledge of 𝜢 64 Set Μ‚ 𝑐' ∈ Θ. Fix 𝛾 = ' 5 min{𝛼, 1/𝐺}, πœ€ = 1/𝛾5 and let 𝐴9 = πœ€πΌ. For 𝑑 = 1, … , 𝑇 Output Μ‚ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + βˆ‡%βˆ‡% 6 Μ‚ 𝑐%,' ← arg min Μ‚ 𝑐% βˆ’ 𝛾0'𝐴% 0'βˆ‡% βˆ’ 𝑐 :" 𝑐 ∈ Θ We may not know 𝛼 in advance. Even worse, we may encounter 𝛼 = 0 at some round. ONS fails in such uncertain situations… Adapt to the uncertainty with multiple ONS!

Slide 65

Slide 65 text

MetaGrad: High-Level Idea 65 β€’ Loss functions 𝑓% are 𝛼-exp-concave, but 𝛼 is unknown, possibly 𝛼 = 0. β€’ Keep experts with different learning rates πœ‚ > 0, called πœ‚-experts. β€’ Each πœ‚-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. β€’ MetaGrad aggregates the experts’ outputs to return a single output. β‹― 𝜼-experts πŸ€– Learner

Slide 66

Slide 66 text

MetaGrad: High-Level Idea 66 β€’ Loss functions 𝑓% are 𝛼-exp-concave, but 𝛼 is unknown, possibly 𝛼 = 0. β€’ Keep experts with different learning rates πœ‚ > 0, called πœ‚-experts. β€’ Each πœ‚-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. β€’ MetaGrad aggregates the experts’ outputs to return a single output. β‹― 𝜼-experts πŸ€– Learner Theorem (Van Erven et al. 2016, 2021) MetaGrad simultaneously enjoys β€’ βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ 𝑂(𝑛 𝐺 + 1/𝛼 log 𝑇), β€’ βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ 𝑂(𝐺 𝑇 log log 𝑇).

Slide 67

Slide 67 text

Notation 67 β€’ For each 𝑑: learner’s loss 𝑓% , learner’s output 𝑀% , and 𝑔% ∈ πœ•π‘“%(𝑀%). β€’ 𝒒 ≔ πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of πœ‚ values; 𝒒 = Θ(log 𝑇). β€’ For each 𝑑 and πœ‚: πœ‚-expert’s loss 𝑓% ; and πœ‚-expert’s output 𝑀% ;, where πŸ€– β‹― Learner 𝜼-experts 𝑓% ; 𝑀 ≔ βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5 βˆ€π‘€ ∈ Θ.

Slide 68

Slide 68 text

Notation 68 β€’ For each 𝑑: learner’s loss 𝑓% , learner’s output 𝑀% , and 𝑔% ∈ πœ•π‘“%(𝑀%). β€’ 𝒒 ≔ πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of πœ‚ values; 𝒒 = Θ(log 𝑇). β€’ For each 𝑑 and πœ‚: πœ‚-expert’s loss 𝑓% ; and πœ‚-expert’s output 𝑀% ;, where πŸ€– β‹― Learner 𝜼-experts At round 𝑑 1. Play 𝑀% 2. Incur 𝑓%(𝑀%) and observe 𝑔% ∈ πœ•π‘“%(𝑀%) 𝑓% ; 𝑀 ≔ βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5 βˆ€π‘€ ∈ Θ.

Slide 69

Slide 69 text

Notation 69 β€’ For each 𝑑: learner’s loss 𝑓% , learner’s output 𝑀% , and 𝑔% ∈ πœ•π‘“%(𝑀%). β€’ 𝒒 ≔ πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of πœ‚ values; 𝒒 = Θ(log 𝑇). β€’ For each 𝑑 and πœ‚: πœ‚-expert’s loss 𝑓% ; and πœ‚-expert’s output 𝑀% ;, where πŸ€– β‹― Learner 𝜼-experts At round 𝑑 1. Play 𝑀% 2. Incur 𝑓%(𝑀%) and observe 𝑔% ∈ πœ•π‘“%(𝑀%) 3. Send 𝑀% and 𝑔% 𝑓% ; 𝑀 ≔ βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5 βˆ€π‘€ ∈ Θ.

Slide 70

Slide 70 text

Notation 70 β€’ For each 𝑑: learner’s loss 𝑓% , learner’s output 𝑀% , and 𝑔% ∈ πœ•π‘“%(𝑀%). β€’ 𝒒 ≔ πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of πœ‚ values; 𝒒 = Θ(log 𝑇). β€’ For each 𝑑 and πœ‚: πœ‚-expert’s loss 𝑓% ; and πœ‚-expert’s output 𝑀% ;, where πŸ€– β‹― Learner 𝜼-experts At round 𝑑 1. Play 𝑀% 2. Incur 𝑓%(𝑀%) and observe 𝑔% ∈ πœ•π‘“%(𝑀%) 3. Send 𝑀% and 𝑔% 4. Compute 𝑀%,' ; via ONS applied to 𝑓% ; 𝑓% ; 𝑀 ≔ βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5 βˆ€π‘€ ∈ Θ.

Slide 71

Slide 71 text

Notation 71 β€’ For each 𝑑: learner’s loss 𝑓% , learner’s output 𝑀% , and 𝑔% ∈ πœ•π‘“%(𝑀%). β€’ 𝒒 ≔ πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of πœ‚ values; 𝒒 = Θ(log 𝑇). β€’ For each 𝑑 and πœ‚: πœ‚-expert’s loss 𝑓% ; and πœ‚-expert’s output 𝑀% ;, where πŸ€– β‹― Learner 𝜼-experts At round 𝑑 1. Play 𝑀% 2. Incur 𝑓%(𝑀%) and observe 𝑔% ∈ πœ•π‘“%(𝑀%) 𝑓% ; 𝑀 ≔ βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5 βˆ€π‘€ ∈ Θ. 3. Send 𝑀% and 𝑔% 5. Aggregate 𝑀%,' ; to compute 𝑀%,' 4. Compute 𝑀%,' ; via ONS applied to 𝑓% ;

Slide 72

Slide 72 text

Regret Decomposition 72 Since every 𝑓% is convex, for any comparator 𝑒 ∈ Θ, we have βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% =∢ O 𝑅( E.

Slide 73

Slide 73 text

Regret Decomposition 73 Since every 𝑓% is convex, for any comparator 𝑒 ∈ Θ, we have βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% =∢ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% = βˆ’ ' ; βˆ‘%&' ( 𝑓% ; 𝑒 + πœ‚ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 = βˆ’ ' ; βˆ‘%&' ( 𝑓% ; 𝑒 + πœ‚π‘‰( E. =∢ 𝑉( E Recall 𝑓% ; 𝑀 = βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5.

Slide 74

Slide 74 text

Regret Decomposition 74 Since every 𝑓% is convex, for any comparator 𝑒 ∈ Θ, we have βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% =∢ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% = βˆ’ ' ; βˆ‘%&' ( 𝑓% ; 𝑒 + πœ‚ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 = βˆ’ ' ; βˆ‘%&' ( 𝑓% ; 𝑒 + πœ‚π‘‰( E. =∢ 𝑉( E By using 𝑓% ; 𝑀% = 0, for all 𝜼 ∈ 𝓖 simultaneously, O 𝑅( E = ' ; βˆ‘%&' ( 𝑓% ; 𝑀% βˆ’ 𝑓% ; 𝑀% ; + βˆ‘%&' ( 𝑓% ; 𝑀% ; βˆ’ 𝑓% ; 𝑒 + πœ‚π‘‰( E. Regret of learner against 𝑀! ' Regret of πœ‚-expert against 𝑒 Recall 𝑓% ; 𝑀 = βˆ’πœ‚ 𝑀% βˆ’ 𝑀, 𝑔% + πœ‚5 𝑀% βˆ’ 𝑀, 𝑔% 5.

Slide 75

Slide 75 text

Bounding Each Component 75 For all πœ‚ ∈ 𝒒 simultaneously, O 𝑅( E = ' ; βˆ‘%&' ( 𝑓% ; 𝑀% βˆ’ 𝑓% ; 𝑀% ; + βˆ‘%&' ( 𝑓% ; 𝑀% ; βˆ’ 𝑓% ; 𝑒 + πœ‚π‘‰( E. 1. Regret of learner against 𝑀! ' 2. Regret of πœ‚-expert against 𝑒

Slide 76

Slide 76 text

Bounding Each Component 76 For all πœ‚ ∈ 𝒒 simultaneously, O 𝑅( E = ' ; βˆ‘%&' ( 𝑓% ; 𝑀% βˆ’ 𝑓% ; 𝑀% ; + βˆ‘%&' ( 𝑓% ; 𝑀% ; βˆ’ 𝑓% ; 𝑒 + πœ‚π‘‰( E. If 𝑀% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). 1. Regret of learner against 𝑀! ' 2. Regret of πœ‚-expert against 𝑒

Slide 77

Slide 77 text

Bounding Each Component 77 For all πœ‚ ∈ 𝒒 simultaneously, O 𝑅( E = ' ; βˆ‘%&' ( 𝑓% ; 𝑀% βˆ’ 𝑓% ; 𝑀% ; + βˆ‘%&' ( 𝑓% ; 𝑀% ; βˆ’ 𝑓% ; 𝑒 + πœ‚π‘‰( E. If 𝑀% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑀% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ξ©(1)-exp-concave and βˆ‡π‘“! '(𝑀! ') = 𝑂(1) for every πœ‚ ∈ 𝒒 βŠ† 0, % -. .) 1. Regret of learner against 𝑀! ' 2. Regret of πœ‚-expert against 𝑒

Slide 78

Slide 78 text

Bounding Each Component 78 For all πœ‚ ∈ 𝒒 simultaneously, O 𝑅( E = ' ; βˆ‘%&' ( 𝑓% ; 𝑀% βˆ’ 𝑓% ; 𝑀% ; + βˆ‘%&' ( 𝑓% ; 𝑀% ; βˆ’ 𝑓% ; 𝑒 + πœ‚π‘‰( E. If 𝑀% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑀% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ξ©(1)-exp-concave and βˆ‡π‘“! '(𝑀! ') = 𝑂(1) for every πœ‚ ∈ 𝒒 βŠ† 0, % -. .) Therefore, for all πœ‚ ∈ 𝒒 simultaneously, O 𝑅( E = 𝑂 " FGH ( ; + πœ‚π‘‰( E . 1. Regret of learner against 𝑀! ' 2. Regret of πœ‚-expert against 𝑒

Slide 79

Slide 79 text

Infeasible Ideal Tuning 79 If 𝑉( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 is known a priori, by using only πœ‚ = πœ‚βˆ— ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + πœ‚π‘‰( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 .

Slide 80

Slide 80 text

Infeasible Ideal Tuning 80 If 𝑉( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 is known a priori, by using only πœ‚ = πœ‚βˆ— ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + πœ‚π‘‰( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% βˆ’ 1 5 βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 = O 𝑅( E βˆ’ 1 5 𝑉( E.

Slide 81

Slide 81 text

Infeasible Ideal Tuning 81 If 𝑉( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 is known a priori, by using only πœ‚ = πœ‚βˆ— ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + πœ‚π‘‰( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% βˆ’ 1 5 βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 = O 𝑅( E βˆ’ 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E βˆ’ 1 5 𝑉( E β‰Ύ 𝑉( E𝑛 log 𝑇 βˆ’ 𝛼𝑉( E β‰Ύ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼.

Slide 82

Slide 82 text

Infeasible Ideal Tuning 82 If 𝑉( E = βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 is known a priori, by using only πœ‚ = πœ‚βˆ— ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + πœ‚π‘‰( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% βˆ’ 1 5 βˆ‘%&' ( 𝑀% βˆ’ 𝑒, 𝑔% 5 = O 𝑅( E βˆ’ 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E βˆ’ 1 5 𝑉( E β‰Ύ 𝑉( E𝑛 log 𝑇 βˆ’ 𝛼𝑉( E β‰Ύ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼. However, 𝑽𝑻 𝒖 is unknown… Use the fact that β€ž 𝑹𝑻 𝒖 = 𝑢 𝒏 π₯𝐨𝐠 𝑻 𝜼 + πœΌπ‘½π‘» 𝒖 holds for all 𝜼 ∈ 𝓖!

Slide 83

Slide 83 text

Exploiting Multiple Learning Rates 83 Let πœ‚βˆ— ≃ " FGH ( I) 3 β‰₯ ' D@ ( be the unknown best learning rate. Recall 𝒒 = πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 βŠ† 0, ' D@ ( O 𝑅( E β‰Ύ " FGH ( ; + πœ‚π‘‰( E holds for all πœ‚ ∈ 𝒒).

Slide 84

Slide 84 text

Exploiting Multiple Learning Rates 84 Let πœ‚βˆ— ≃ " FGH ( I) 3 β‰₯ ' D@ ( be the unknown best learning rate. If πœ‚βˆ— ∈ ' D@ ( , ' D@ , there exists πœ‚ ∈ 𝒒 s.t. πœ‚βˆ— ∈ ; 5 , πœ‚ , hence O 𝑅( E β‰Ύ " FGH ( ; + πœ‚π‘‰( E ≀ " FGH ( ;βˆ— + 2πœ‚βˆ—π‘‰( E ≃ 𝑉( E𝑛 log 𝑇. Recall 𝒒 = πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 βŠ† 0, ' D@ ( O 𝑅( E β‰Ύ " FGH ( ; + πœ‚π‘‰( E holds for all πœ‚ ∈ 𝒒).

Slide 85

Slide 85 text

Exploiting Multiple Learning Rates 85 Let πœ‚βˆ— ≃ " FGH ( I) 3 β‰₯ ' D@ ( be the unknown best learning rate. Recall 𝒒 = πœ‚C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 βŠ† 0, ' D@ ( O 𝑅( E β‰Ύ " FGH ( ; + πœ‚π‘‰( E holds for all πœ‚ ∈ 𝒒). If πœ‚βˆ— ∈ ' D@ ( , ' D@ , there exists πœ‚ ∈ 𝒒 s.t. πœ‚βˆ— ∈ ; 5 , πœ‚ , hence O 𝑅( E β‰Ύ " FGH ( ; + πœ‚π‘‰( E ≀ " FGH ( ;βˆ— + 2πœ‚βˆ—π‘‰( E ≃ 𝑉( E𝑛 log 𝑇. If πœ‚βˆ— ≃ " FGH ( I) 3 β‰₯ ' D@ , we have 𝑉( E β‰Ύ 𝐺5𝑛 log 𝑇. Thus, for πœ‚ = πœ‚9 = ' D@ , O 𝑅( E ≃ " FGH ( ;* + πœ‚9𝑉( E β‰Ύ 𝑛𝐺 log 𝑇. In any case, O 𝑅( E = 𝑂 𝑉( E𝑛 log 𝑇 + 𝑛𝐺 log 𝑇 , implying βˆ‘%&' ( 𝑓% 𝑀% βˆ’ 𝑓%(𝑒) ≀ O 𝑅( E β‰Ύ 𝑛 ' 1 + 𝐺 log 𝑇.

Slide 86

Slide 86 text

Online Inverse Linear Optimization with MetaGrad: Robustness to Suboptimality

Slide 87

Slide 87 text

Learning with Suboptimal Actions 87 For 𝑑 = 1, … , 𝑇: πŸ‘©πŸ¦° Agent πŸ€– Learner Learner makes prediction Μ‚ 𝑐% ∈ Θ of π‘βˆ—. Agent faces 𝑋% and takes possibly suboptimal action π‘₯% ∈ 𝑋% . Observes 𝑋%, π‘₯% and updates from Μ‚ 𝑐% to Μ‚ 𝑐%,' . Define: β€’ O 𝑅( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% . β€’ 𝑉( -βˆ— ≔ βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% 5 . β€’ Ξ”( ≔ βˆ‘%&' ( max π‘βˆ—, π‘₯ π‘₯ ∈ 𝑋% βˆ’ π‘βˆ—, π‘₯% (cumulative suboptimality).

Slide 88

Slide 88 text

Learning with Suboptimal Actions 88 ΓΌ Sublinear in Ξ”( (cf. corruption-robustness in bandits). ΓΌ Recovers O 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 when Ξ”( = 0. Theorem For Μ‚ 𝑐', … , Μ‚ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J π‘₯% βˆ’ π‘₯% , it holds that 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 .

Slide 89

Slide 89 text

Learning with Suboptimal Actions 89 ΓΌ Sublinear in Ξ”( (cf. corruption-robustness in bandits). ΓΌ Recovers O 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 when Ξ”( = 0. Proof sketch Theorem For Μ‚ 𝑐', … , Μ‚ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J π‘₯% βˆ’ π‘₯% , it holds that 𝑅( -βˆ— ≀ O 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 . By the same discussion as the regret analysis of MetaGrad, O 𝑅( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% ≲ 𝑂 𝑉( -βˆ— 𝑛 log 𝑇 + 𝑛 log 𝑇 . Also, 𝑉( -βˆ— = βˆ‘%&' ( Μ‚ 𝑐% βˆ’ π‘βˆ—, J π‘₯% βˆ’ π‘₯% 5 ≀ O 𝑅( -βˆ— + 2Ξ”( holds (cf. 𝑉( -βˆ— ≀ O 𝑅( -βˆ— if every π‘₯% is optimal). The claim follows from the sub-additivity of π‘₯ ↦ π‘₯ and self-bounding.

Slide 90

Slide 90 text

Toward Tight Regret Analysis

Slide 91

Slide 91 text

𝛀(𝒏) Lower Bound 91 Focus on the regret 𝑅( -βˆ— = βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -βˆ— = Ξ© 𝑛 .

Slide 92

Slide 92 text

𝛀(𝒏) Lower Bound 92 Focus on the regret 𝑅( -βˆ— = βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -βˆ— = Ξ© 𝑛 . Intuition Since π‘βˆ— ∈ ℝ" is unknown, if elements of π‘βˆ— are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ξ©(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ξ©(𝑛) regert is inevitable.

Slide 93

Slide 93 text

𝛀(𝒏) Lower Bound 93 Focus on the regret 𝑅( -βˆ— = βˆ‘%&' ( π‘βˆ—, π‘₯% βˆ’ J π‘₯% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -βˆ— = Ξ© 𝑛 . Intuition Since π‘βˆ— ∈ ℝ" is unknown, if elements of π‘βˆ— are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ξ©(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ξ©(𝑛) regert is inevitable. Can the π₯𝐨𝐠 𝑻 in the upper bound removed?

Slide 94

Slide 94 text

Revisiting Cone-Based Approach 94 Assume β€’ Θ = πœƒ ∈ ℝ" πœƒ = 1 = π•Š"0' and diam 𝑋% ≀ 1. β€’ π‘₯% and J π‘₯% are optimal for π‘βˆ— and Μ‚ 𝑐% , respectively.

Slide 95

Slide 95 text

Revisiting Cone-Based Approach 95 Assume β€’ Θ = πœƒ ∈ ℝ" πœƒ = 1 = π•Š"0' and diam 𝑋% ≀ 1. β€’ π‘₯% and J π‘₯% are optimal for π‘βˆ— and Μ‚ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let πœƒ(π‘βˆ—, Μ‚ 𝑐%) be the angle between π‘βˆ—, Μ‚ 𝑐% ∈ π•Š"0'. If πœƒ π‘βˆ—, Μ‚ 𝑐% ≀ Q 5 , we have π‘βˆ—, π‘₯% βˆ’ J π‘₯% ≀ cos πœƒ(π‘βˆ—, π‘₯% βˆ’ J π‘₯%) ≀ sin πœƒ(π‘βˆ—, Μ‚ 𝑐%).

Slide 96

Slide 96 text

Revisiting Cone-Based Approach 96 Assume β€’ Θ = πœƒ ∈ ℝ" πœƒ = 1 = π•Š"0' and diam 𝑋% ≀ 1. β€’ π‘₯% and J π‘₯% are optimal for π‘βˆ— and Μ‚ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let πœƒ(π‘βˆ—, Μ‚ 𝑐%) be the angle between π‘βˆ—, Μ‚ 𝑐% ∈ π•Š"0'. If πœƒ π‘βˆ—, Μ‚ 𝑐% ≀ Q 5 , we have π‘βˆ—, π‘₯% βˆ’ J π‘₯% ≀ cos πœƒ(π‘βˆ—, π‘₯% βˆ’ J π‘₯%) ≀ sin πœƒ(π‘βˆ—, Μ‚ 𝑐%). Μ‚ 𝑐! π‘βˆ— π‘βˆ—, π‘₯% βˆ’ J π‘₯% β‰₯ 0 and Μ‚ 𝑐%, π‘₯% βˆ’ J π‘₯% ≀ 0 must hold by the assumption. π‘₯! βˆ’ ' π‘₯! must lie here Μ‚ 𝑐!, π‘₯! βˆ’ 4 π‘₯! β‰₯ 0 πœƒ π‘βˆ—, Μ‚ 𝑐! ≀ πœ‹ 2 π‘βˆ—, π‘₯! βˆ’ 4 π‘₯! β‰₯ 0 Therefore, cos πœƒ(π‘βˆ—, π‘₯% βˆ’ J π‘₯%) ≀ cos Q 5 βˆ’ πœƒ π‘βˆ—, Μ‚ 𝑐% = sin πœƒ(π‘βˆ—, Μ‚ 𝑐%).

Slide 97

Slide 97 text

An 𝑢(𝟏)-Regret Algorithm for 𝒏 = 𝟐 97 Independent of 𝑇 (but extending to 𝑛 > 2 seems challenging, as discussed later). Theorem Algorithm 1 achieves 𝔼 𝑅( -βˆ— = 2πœ‹. 𝒩% ≔ 𝑐 ∈ π•Š' 𝑐, π‘₯% βˆ’ π‘₯ β‰₯ 0 βˆ€π‘₯ ∈ 𝑋% is the normal cone of 𝑋% at π‘₯% . π’ž% is the region such that π‘βˆ— ∈ π’ž% does not contradict β€œπ‘₯R ∈ arg max )∈+4 π‘βˆ—, π‘₯ for 𝑠 = 1, … , 𝑑 βˆ’ 1.”

Slide 98

Slide 98 text

Proof 98 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% = Pr π’ž% βˆ– int 𝒩% 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% | π’ž% βˆ– int 𝒩% Focus on round 𝑑. If Μ‚ 𝑐% ∈ int(𝒩%), J π‘₯% = π‘₯% and hence π‘βˆ—, π‘₯% βˆ’ J π‘₯% = 0. Taking expectation of drawing Μ‚ 𝑐% ∈ π’ž% , = :(π’ž"βˆ–UV? 𝒩 " ) :(π’ž") 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% | π’ž% βˆ– int 𝒩% , where 𝐴(β‹…) denotes the arc length (= central angle).

Slide 99

Slide 99 text

Proof 99 Since π’ž%,' ← π’ž% ∩ 𝒩% , Hence 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% ≀ 𝐴(π’ž% βˆ– int 𝒩% ) in any case. 𝔼 𝑅( -βˆ— = βˆ‘%&' ( 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% ≀ βˆ‘%&' ( 𝐴 π’ž% βˆ– int 𝒩% ≀ 2πœ‹. If 𝐴 π’ž% β‰₯ Q 5 , 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% = :(π’ž"βˆ–UV? 𝒩 " ) :(π’ž") 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% | π’ž% βˆ– int 𝒩% If 𝐴 π’ž% < Q 5 , 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% = :(π’ž"βˆ–UV? 𝒩 " ) :(π’ž") 𝔼 π‘βˆ—, π‘₯% βˆ’ J π‘₯% | π’ž% βˆ– int 𝒩% ≀ : π’ž"βˆ–UV? 𝒩 " : π’ž" sin πœƒ π‘βˆ—, Μ‚ 𝑐% ≀ :(π’ž"βˆ–UV? 𝒩 " ) :(π’ž") sin 𝐴(π’ž%) ≀ 𝐴(π’ž% βˆ– int 𝒩% ). ≀ 5 Q β‹… 1 β‹… 𝐴(π’ž% βˆ– int 𝒩% ) ≀ 𝐴(π’ž% βˆ– int 𝒩% ).

Slide 100

Slide 100 text

Conclusion 100 β€’ 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 by ONS. β€’ O 𝑅( -βˆ— = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 by MetaGrad for possibly suboptimal case. β€’ 𝑅( -βˆ— = Ξ© 𝑛 . β€’ 𝑅( -βˆ— = 𝑂 1 for 𝑛 = 2. Future work β€’ Tight analysis for general 𝑛. – Difficulty: sin πœƒ π‘βˆ—, Μ‚ 𝑐! ≀ sin 𝐴(π’ž! ) no longer holds. β€’ Exploring other online-learning ideas useful for inverse optimization.