Slide 24
Slide 24 text
Theorem ı The Failure of IRM in the Non-Linear Regime [7]
Suppose we observe 𝐸 environments E = {𝑒1
, … , 𝑒𝐸
}, where 𝜎2
𝐸
= 1, ∀𝑒 ∈ [1, 𝐸]. Then, for any 𝜖 > 1,
there exists a featurizer Φ𝜖
which, combined with the ERM-optimal classifier ̂
𝛽 = [𝛽𝑐
, 𝛽𝑒;𝐸𝑅𝑀
, 𝛽0
]⊤,
satisfies the following
1. The regularization term of Φ𝜖
, ̂
𝛽 is bounded as
1
𝐸
∑
𝑒∈E
‖∇ ̂
𝛽
𝑅𝑒(Φ𝜖
, ̂
𝛽)‖
2
2
∈ O (𝑝2
𝜖
(𝑐𝜖
𝑑𝑒
+
1
𝐸
∑
𝑒∈E
‖𝜇𝑒
‖2
2
)) , (13)
for some constants 𝑐𝜖
and 𝑝𝜖
≔ exp{−𝑑𝑒
min(𝜖 − 1, (𝜖 − 1)2/8)}.
2. Φ𝜖
, ̂
𝛽 is equivalent to the ERM -optimal predicter on at least 1 − 𝑞 fraction of the test
distribution, where 𝑞 ≔ 2𝑅
√𝜋𝛿
exp{−𝛿2}.
output.tex 18 ʢ 24