Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習における重要度重み付けとその応用

Masanari Kimura
November 13, 2023

 機械学習における重要度重み付けとその応用

Masanari Kimura

November 13, 2023
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. 機械学習における重要度重み付けとその応用 Masanari Kimura Graduate University for Advanced Studies, SOKENDAI Department

    of Statistical Science, School of Multidisciplinary Sciences [email protected] November 13, 2023 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 1 / 80
  2. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 2 / 80
  3. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 3 / 80
  4. 概要 重要度重み付けは何らかの重要度に基づいてインスタンスに重み付けする操作: e.g., for S = {xi}n i=1 , f(S)

    := x∈S ϕ(x) ⇒ fw(S) := x∈S w(x)ϕ(x). 機械学習において多くの応用範囲がある. どうやって w(x) を定義 or 推定するのかも大事. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 4 / 80
  5. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 6 / 80
  6. Empirical Risk Minimization 一般的な機械学習アルゴリズムは学習データとテストデータが同一の確率分布に従 うことを仮定している(i.i.d. 仮定) . 特に,i.i.d. 仮定の下での教師あり学習の妥当性は Empirical

    Risk Minimization(ERM) の統計的性質に依存する(経験リスク ˆ R の最小化 ⇒ 期待リスク R の最小化) . 例:ERM の不偏性 ERM はある損失関数 ℓ : Y × Y → [0, ∞) を経験的に得られたデータ集合 D に対して最小 化することで,未知データに対する損失の最小化を目指す手続き. ˆ h = arg min h∈H ˆ R(h) = arg min h∈H 1 |D| (x,y)∈D ℓ(h(x), y). (1) 学習分布 ptr とテスト分布 pte の同一性が仮定できるとき,ERM は不偏性を持つ. Eptr [ ˆ R] = R. (2) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 8 / 80
  7. Covariate Shift ERM の仮定するデータの分布の同一性は,現実の問題では満たされないことが多い. Covariate Shift Assumption(共変量シフト仮定)は,学習時とテスト時の共変量が従 う確率分布が異なるという仮定 [65]. Covariate

    Shift Assumption 学習分布 ptr ,テスト分布 pte について以下が成り立つ: ptr(x) ̸= pte(x), ptr(y|x) = pte(y|x). この仮定の下では,ERM の不偏性は満たされない. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 9 / 80
  8. Importance Weighted Empirical Risk Minimization 損失関数の最小化の際に共変量の密度比 pte(x)/ptr(x) による重み付けを行うことで, ERM の普遍性が復元できる(IWERM

    [65]) : Eptr(x,y) pte(x) ptr(x) ℓ(h(x), y) = X×Y pte(x) ptr(x) ℓ(h(x), y) · ptr(x, y)dxdy = X×Y pte(x) ptr(x) ℓ(h(x), y) · ptr(x)ptr(y|x)dxdy = X×Y pte(x)ℓ(h(x), y) · ptr(y|x)dxdy = X×Y pte(x)ℓ(h(x), y) · pte(y|x)dxdy = X×Y ℓ(h(x), y) · pte(x, y)dxdy = Epte(x,y) [ℓ(h(x), y)] . (3) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 10 / 80
  9. IWERM の亜種 IWERM の数値実験上の不安定さを解消するため,いくつかの亜種が提案されて いる: Adaptive Importance Weighted ERM(AIWERM [65])

    : ˆ h = arg min h∈H 1 |D| (x,y)∈D wA(x)ℓ(h(x), y), wA(x) = pte(x) ptr(x) λ . (4) Relative Importance Weighted ERM(RIWERM [86]) : ˆ h = arg min h∈H 1 |D| (x,y)∈D wR(x)ℓ(h(x), y), wR(x) = pte(x) (1 − λ)ptr(x) + λpte(x) . (5) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 11 / 80
  10. IWERM の情報幾何学的一般化 IWERM およびその亜種における重要度重みづけ w(x) の選択は,データの確率分布 が構成する統計的多様体上の α-測地線の選択と同一視できる [38]. ˆ

    h = arg min h∈H x∈D w(λ,α)(x)ℓ(h(x), y), w(λ,α)(x) = m(λ,α) f (ptr(x), pte(x)) ptr(x) , (6) ここで m(λ,α) f (a, b) = f−1 α (1 − λ)fα(a) + λfα(b) , fα(a) = a1−α 2 (α ̸= 1) log a (α = 1). (7) IWERM は λ = 1 の場合. AIWERM は α = 1 の場合. RIWERM は α = 3 の場合. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 12 / 80
  11. Covariate Shift 下での重要度重みづけのその他の話題 Importance Weighted Cross Validation (IWCV) [69] は共変量シフト下でのモデル選択

    のための Cross Validation の亜種. Distributionally Robust Optimization の文脈では,学習データに対する密度比 pte(x)/ptr(x) による重み付けの代わりに,テストデータに対する逆密度比 ptr(x)/pte(x) による重み付けが広く使われている. これら 2 種類の重み付けのトレードオフを考慮した Double-Weighting Covariate Shift Adaptation[49] が提案されている. 共変量シフト下での conformal prediction にも重要度重みづけが有効であることが報 告されている [72]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 13 / 80
  12. Covariate Shift 下での重要度重みづけの Negative Results IWERM とその様々な推定量はどれも期待リスクを過小評価する [42]. モデルがノンパラメトリックかつ model

    misspecification が仮定されない場合,重要 度重み付けは不要 [26]. ただしノンパラメトリックかつ model misspecification が仮定される場合は必要. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 14 / 80
  13. Target Shift ターゲットシフトは,学習時とテスト時のターゲット変数の分布の違いを仮定 [92]. Target Shift Assumption 学習分布 ptr ,テスト分布

    pte について以下が成り立つ: ptr(y) ̸= pte(y), ptr(x|y) = pte(x|y). 共変量シフト仮定と同様に,重要度重みづけが有効. EM アルゴリズムを用いた p(y) の推定は p(x|y) の推定を内包するため非効率 [15]. ターゲット変数が連続値をとるとき,半教師あり学習の設定での密度比の推定による重 要度重みづけが有効 [53]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 15 / 80
  14. Black Box Shift Estimation Black Box Shift Estimation (BBSE) [45]

    はブラックボックス予測器を用いた重要度重 み付け w の推定方法. ブラックボックス予測器 f の Confusion matrix C と,f の出力平均 b を用いて以下の 式を解く: Cw = b. (8) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 16 / 80
  15. Sample Selection Bias 共変量シフトと同一の仮定を置くことも多い [75, 81, 78, 6]. 広く受け入れられている解法の一つはとして,sample selection

    bias をモデル化する ための確率変数 s を導入し,テスト分布を以下のように構成する(s = 1 のときイン スタンスが選択されることを表現) : pte(x, y) = p(x, y) = s p(x, y, s). (9) このような仮定の下で,次のような重要度が有効であることが知られてい る [91, 76]: w(x) = P(s = 1) P(s = 1|x) . (10) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 17 / 80
  16. Subpopulation Shift Subpopulation shift は,単一のインスタンスでなく,データの部分集合の分布の変化 を仮定する [64, 87]. 他の分布シフト適応と同様に,部分集合の出現頻度に応じた重要度重みづけが有 効

    [13, 18, 46]. より最近の研究としては,UMIX[30] は mixup データ拡張においてインスタンスの加 重平均をとる操作に subpopulation shift のための重要度重み付けを導入している. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 18 / 80
  17. Feedback Shift 広告分野において,購買行動に何度のクリックが紐づくかを予測することは重要. しかし実際の問題設定では,クリックから購買までは比較的長い時間がかかること が知られている(Feedback Shift,または Delayed Feedback)[16, 90, 43,

    63, 71]. Feedback Shift のもたらす悪影響としては,本来クリックの後購入されて positive ラ ベルが付くはずだったインスタンスが,フィードバックの遅れによって negative ラ ベルが付いてしまうこと. Feedback Shift Importance Weighting (FSIW)[88] は,フィードバックの遅延確率に応 じた重要度重み付けによってこの問題に取り組んでいる. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 19 / 80
  18. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 20 / 80
  19. ドメイン適応 ドメイン適応の目的は,与えられたソースドメインのデータを用いて,ターゲット ドメインのデータにおいて良い予測器を学習すること [57, 4, 79, 24, 17, 80]. 重要度重み付けによるドメイン適応についての多くの研究が存在:

    敵対学習において,generator が discriminator を騙すような重みをインスタンスに割り当 てるように学習 [50]. ソースドメインとターゲットドメインのサンプルサイズの違いを考慮した重要度重み付 け [83]. 重要度重みづけに基づくドメイン適応の汎化誤差解析 [1]. NLP タスクでも重要度重み付けによるドメイン適応が用いられている [36].ただし,単 語の出現頻度といった NLP タスク特有の問題による重要度重み付けによるドメイン適 応がうまくいかないことが報告されている [59]. また,こうしたネガティブな結果は,既存研究はサンプル選択バイアスのみに注目して いてサンプル選択分散を扱えていないことが原因と指摘 [82]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 21 / 80
  20. ドメイン適応のその他の問題設定 ドメイン適応の問題は,データの与えられ方や条件によって細分化できる. Multi-source domain adaptation:ソースドメインが複数ある問題設定 Partial domain adaptation:ターゲットドメインがソースドメインより少ないクラス 数を持つ問題設定 Open-set

    domain adaptation:両方のドメインに未知クラスが含まれる. Universal domain adaptation:ラベル集合に一切の事前知識を必要としない問題設定. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 22 / 80
  21. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 23 / 80
  22. Importance Weighted Active Learning Importance Weighted Active Learning (IWAL) [9]

    は,重要度重み付けに基づく能動学 習の有名なものの一つ. IWAL はラベルなしインスタンス xt にその特徴量やそれまでのラベル付きデータの履歴 などをもとに確率 pt でラベルづけを行う. その後,インスタンス xt の重みを 1/pt として学習を行う.pt の決め方は,時刻 t 時点 のデータで学習されたモデルの集合 Ht を用いて pt = max f,g∈Ht+1 max y σ(ℓ(f(xt), y) − ℓ(g(xt), y)), (11) Ht+1 = {h ∈ Ht; Lt(h) ≤ L∗ t + ∆t}, (12) IWAL はこの重要度重み付けの下で一致性を持つことが示されている. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 25 / 80
  23. IWAL の追加の議論 Beygelzimer et al. は rejection threshold を適切に設定することで IWAL

    の実用的な実 装を行なった [10]. 能動学習には sample reusability [73, 74] という概念がある.これは,ある学習器を用 いた能動学習で集められたデータセットは他の学習器にとっても有用かを考える 問題. 検証の結果 IWAL は sample reusability を持たないという報告がある [77]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 26 / 80
  24. 能動学習と model misspecification 既存の能動学習は大きな model misspecification に脆弱であることを指摘し,重要度 重みづけによってこの影響を緩和できることを示唆 [68]. model

    misspecification の下での一般化線形モデルの能動学習の漸近的性質を調べ,効 果的なインスタンス選択が重要度重み付けに依存することを提案 [2]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 27 / 80
  25. Active Learning by Learning Active Learning by Learning (ALBL) [31]

    は複数のインスタンス選択の戦略を多腕バン ディットのフレームワークを用いて組み合わせる. ALBL は IWAL を拡張して,Importance Weighted Accuracy と呼ばれる報酬関数を導 入し,実験的に良い結果を得られることを報告. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 28 / 80
  26. 能動学習の性能評価 能動学習の課題の一つとして,sampling bias などに起因するデータ収集中のモデル の性能評価がある. 能動学習においては,単純な重要度重み付け Cross Validation はうまくいかないことが 報告されている

    [41]. 重要度重み付けと class balanced sampling[93] を組み合わせるとうまく評価できるらし い. loss-propotional sampling に基づいた active testing と呼ばれる手法も存在 [40, 25]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 29 / 80
  27. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 30 / 80
  28. Distributionally Robust Optimization Distributionally Robust Optimization (DRO) [60, 27, 20,

    8, 44, 19] は,ある分布 p0 の周 りの uncertainty set U(p0) におけるワーストケース性能を改善するような最適化を目 指すタスク. minimizeh∈H R(h; p0) := sup q∈U(p0) E(x,y)∼q(x,y) ℓ(h(x), y) , (13) uncertainty set の構成に関しては多くの研究がある [5, 7, 11, 22]. DRO は未知の分布シフトに対する最悪ケースの評価を考えていると捉えることもで きる. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 31 / 80
  29. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 33 / 80
  30. Model Calibration 機械学習モデルの softmax 出力の確率はモデルの信頼度として扱われることが多い. ただしそうした出力は well-calibrated であるとは限らない. model calibration

    は,実際のイベントの発生確率に即した出力をモデルに促すことを 期待するタスク. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 34 / 80
  31. Focal Loss の重要度重み付け的解釈 Model Calibration のために最もよく使われるものの一つが Focal Loss[51]. Focal Loss

    は,簡単に分類できるインスタンスの重みを小さくすることで model calibration に取り組んでいる. この手続きは,モデルの予測確率 pi に依存した重み付け w(xi) = (1 − pi)γ を行なっ ていると解釈できる. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 35 / 80
  32. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 36 / 80
  33. Positive-Unlabelled (PU) learning Positive-Unlabeled learning (PU) Learning は,2 値分類において positive

    ラベルのつい たサンプルとラベルなしサンプルだけからモデルを学習するタスク [3, 39]. PU Learning では negative ラベルのついたインスタンスが与えられない. 例えば negative class がうまく定義できないような問題設定において有用. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 37 / 80
  34. Lemma (Elkan and Noto[21]) Let x be an example and

    let y ∈ {0, 1} be a binary label. Let s = 1 if the example x is labeled, and let s = 0 if x is unlabeled. Then, for the selected completely at random unlabeled example x, we have p(y = 1|x) = p(s = 1|x) p(s = 1|y = 1) . (15) Proof. 仮定から,p(s = 1|y = 1, x) = p(s = 1|y = 1).また, p(s = 1|x) = p(y = 1 ∧ s = 1|x) = p(y = 1|x)p(s = 1|y = 1, x) = p(y = 1|x)p(s = 1|y = 1). (16) 両辺を p(s = 1|y = 1) で割ることで,補題を得る. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 38 / 80
  35. ラベルなしデータに対して, p(y = 1|x, s = 0) = p(s =

    0|x, y = 1)p(y = 1|x) p(s = 0|x) = (1 − p(s = 1|x, y = 1)) p(y = 1|x) 1 − p(s = 1|x) = (1 − c)p(y = 1|x) 1 − p(s = 1|x) = (1 − c)p(s = 1|x)/c 1 − p(s = 1|x) = 1 − c c p(s = 1|x) 1 − p(s = 1|x) . (17) ここで c = p(s = 1|y = 1). よって, Ep(x,y,s) [h(x, y)] = X×Y×S h(x, y)p(x, y, x)dxdyds = X p(x) p(s = 1|x)h(x, 1) + p(s = 0|x) p(y = 1|x, s = 0)h(x, 1) + p(y = 0|x, s = 0)h(x, 0) Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 39 / 80
  36. このとき Ep(x,y,s) [h(x, y)] の plugin estimator は 1 ntr

      x,s=1 h(x, 1) + (x,s=0) w(x)h(x, 1) + (1 − w(x))h(x, 0)   . (18) ここで w(x) = p(y = 1|x, s = 0) = 1 − c c p(s = 1|x) 1 − p(s = 1|x) . (19) したがって,PU learning はラベルなしデータにラベルがつく確率に基づく重要度重み付 けと捉えることができる. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 40 / 80
  37. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 41 / 80
  38. Label Noise Correction Label noise correction [55, 56, 66] は,ラベル付き学習データセットの不正確なラベル

    付けの検出および修正を行うタスク. 最も有名な label noise correction のアプローチは,ラベルノイズの発生確率に対応する 行列を用いて損失関数に重み付けするもの [58]. この手法は,ラベルノイズの発生確率が高いインスタンスの重みを小さくするように学 習する重要度重み付き学習に相当する. Noise Attention Learning [47] はラベルノイズの発生確率をアテンション構造を用いてモ デル化し,得られたノイズ確率を用いて重み付き学習を行う. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 42 / 80
  39. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 43 / 80
  40. Moment Matching に基づく密度比推定 Moment Matching の基本的なアイディアは,重み付き分布 ˆ pte(x) = ˆ

    r(x)ptr(x) とテス ト分布 pte(x) のマッチング. よく用いられるのはこれらの分布の平均のマッチング. x ·ˆ r(x)ptr(x)dx = x · pte(x)dx. (20) ただし,有限個のモーメントのマッチングは,漸近的にであってさえも真の密度比 は誘導しないことが知られている. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 45 / 80
  41. Kernel Mean Matching (KMM) Kernel Mean Matching [32, 28] は,再生核ヒルベルト空間上

    H で, min ˆ r∈H K(x, ·)ˆ r(x)ptr(x)dx − K(x, ·)ptr(x)dx 2 H , (21) として Moment Matching を行う. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 46 / 80
  42. KLIEP と LSIF KLIEP [54, 70] は pte(x) と ˆ

    pte(x) = ˆ r(x)ptr(x) の間の KL-divergence の最小化によって 密度比を推定する. min ˆ r DKL[pte(x)∥ˆ pte] = min ˆ r pte(x) pte(x) ˆ r(x)ptr(x) dx. 同様に Least-Squares Importance Fitting (LSIF)[37] は squared loss を最小化する. min ˆ r (ˆ r(x) − r(x))2 ptr(x)dx. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 47 / 80
  43. Telescoping Density Ratio Estimation (TRE) ふたつの分布間が大きく離れているとき,密度比推定の性能は大きく劣化する. Telescoping Density Ratio Estimation

    (TRE)[62] は,ふたつの分布の間の中間データセ ットを生成し,データを徐々にソース分布 p0 からターゲット分布 q = pm へ移すこ とを提案. p0(x) pm(x) = p0(x)p1(x) p1(x)p2(x) · · · pm−2(x)pm−1(x) pm−1(x)pm(x) . (22) 後の研究 [89] では,TRE のいくつかの統計的性質が明らかになっている. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 48 / 80
  44. 目次 1 概要 2 種々の分布シフト 3 ドメイン適応 4 能動学習 5

    Distributionally Robust Optimization 6 Model Calibration 7 Positive-Unlabelled (PU) learning 8 Label Noise Correction 9 密度比推定 10 重要度重み付けと深層学習 Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 49 / 80
  45. 重要度重み付けと深層学習 Over-parametrized なニューラルネットワークにおける重要度重み付き ERM の振る舞 いはこれまで未解明だった. 近年の研究では,深層学習における重要度重み付けの効果は学習イテレーションに 応じて減衰することが報告されている [12]. また,こうした現象は

    L2 正則化およびバッチ正規化によって緩和できることも実験的 に示されている. この現象は勾配法の implicit bias と関連することも示唆されている [67, 35, 34, 52]. 後続の研究では,こうした実験結果の理論的裏付けも与えられている [84]. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 50 / 80
  46. 重要度重み付けの深層学習による近似 重要度重み付け関数 w(x) のニューラルネットワークによる暗黙的な学習に関する研究も 多く存在: メタ学習による重要度重み付けの学習 [61]. 重要度重み付け関数の NN による学習はバイアスを誘導することを指摘

    [23].分類器 と重みづけ関数の交互最適化によってこれを解決できることを提案. 重要度重み付けは敵対的攻撃への頑健性を向上できることも報告されている [14, 85, 33, 29].また,このような重みは敵対的学習によって獲得できることも提案. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 51 / 80
  47. Importance Tempering 重要度重み付けの代替として,Importance Tempering[48] が提案されている. Importance Tempering は over-parametrized なニューラルネットワークの決定境界の改

    善が目的. 重要度重み付けに相当するインスタンス依存の温度パラメータを softmax 関数に導 入することでこれを達成. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 52 / 80
  48. References I [1] Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and

    Animashree Anandkumar. Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:1903.09734, 2019. [2] Francis Bach. Active learning for misspecified generalized linear models. Advances in neural information processing systems, 19, 2006. [3] Jessa Bekker and Jesse Davis. Learning from positive and unlabeled data: A survey. Machine Learning, 109:719–760, 2020. [4] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 53 / 80
  49. References II [5] Aharon Ben-Tal, Dick Den Hertog, Anja De

    Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013. [6] Richard A Berk. An introduction to sample selection bias in sociological data. American sociological review, pages 386–398, 1983. [7] Dimitris Bertsimas, Vishal Gupta, and Nathan Kallus. Data-driven robust optimization. Mathematical Programming, 167:235–292, 2018. [8] Dimitris Bertsimas, Melvyn Sim, and Meilin Zhang. Adaptive distributionally robust optimization. Management Science, 65(2):604–618, 2019. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 54 / 80
  50. References III [9] Alina Beygelzimer, Sanjoy Dasgupta, and John Langford.

    Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, pages 49–56, 2009. [10] Alina Beygelzimer, Daniel Hsu, Nikos Karampatziakis, John Langford, and Tong Zhang. Efficient active learning. In ICML 2011 Workshop on On-line Trading of Exploration and Exploitation, 2011. [11] Jose Blanchet, Yang Kang, and Karthyek Murthy. Robust wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019. [12] Jonathon Byrd and Zachary Lipton. What is the effect of importance weighting in deep learning? In International conference on machine learning, pages 872–881. PMLR, 2019. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 55 / 80
  51. References IV [13] Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin

    Wang, and Qiang Yang. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2985–2994, 2019. [14] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. A survey on adversarial attacks and defences. CAAI Transactions on Intelligence Technology, 6(1):25–45, 2021. [15] Yee Seng Chan and Hwee Tou Ng. Word sense disambiguation with distribution estimation. In IJCAI, volume 5, pages 1010–5, 2005. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 56 / 80
  52. References V [16] Olivier Chapelle. Modeling delayed feedback in display

    advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1097–1105, 2014. [17] Gabriela Csurka. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374, 2017. [18] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 57 / 80
  53. References VI [19] Erick Delage and Yinyu Ye. Distributionally robust

    optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010. [20] John Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750, 2018. [21] Charles Elkan and Keith Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 58 / 80
  54. References VII [22] Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven

    distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. arXiv preprint arXiv:1505.05116, 2015. [23] Tongtong Fang, Nan Lu, Gang Niu, and Masashi Sugiyama. Rethinking importance weighting for deep learning under distribution shift. Advances in neural information processing systems, 33:11996–12007, 2020. [24] Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R Arabnia. A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020, pages 877–894, 2021. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 59 / 80
  55. References VIII [25] Sebastian Farquhar, Yarin Gal, and Tom Rainforth.

    On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665, 2021. [26] Davit Gogolashvili, Matteo Zecchin, Motonobu Kanagawa, Marios Kountouris, and Maurizio Filippone. When is importance weighting correction needed for covariate shift adaptation? arXiv preprint arXiv:2303.04020, 2023. [27] Joel Goh and Melvyn Sim. Distributionally robust optimization and its tractable approximations. Operations research, 58(4-part-1):902–917, 2010. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 60 / 80
  56. References IX [28] Arthur Gretton, Alex Smola, Jiayuan Huang, Marcel

    Schmittfull, Karsten Borgwardt, and Bernhard Schölkopf. Covariate shift by kernel mean matching. Dataset shift in machine learning, 3(4):5, 2009. [29] Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. Simple black-box adversarial attacks. In International Conference on Machine Learning, pages 2484–2493. PMLR, 2019. [30] Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, Bingzhe Wu, Changqing Zhang, and Jianhua Yao. Umix: Improving importance weighting for subpopulation shift via uncertainty-aware mixup. Advances in Neural Information Processing Systems, 35:37704–37718, 2022. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 61 / 80
  57. References X [31] Wei-Ning Hsu and Hsuan-Tien Lin. Active learning

    by learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015. [32] Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex Smola. Correcting sample selection bias by unlabeled data. Advances in neural information processing systems, 19, 2006. [33] Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017. [34] Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks. arXiv preprint arXiv:1810.02032, 2018. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 62 / 80
  58. References XI [35] Ziwei Ji and Matus Telgarsky. Risk and

    parameter convergence of logistic regression. arXiv preprint arXiv:1803.07300, 2018. [36] Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in nlp. ACL, 2007. [37] Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391–1445, 2009. [38] Masanari Kimura and Hideitsu Hino. Information geometrically generalized covariate shift adaptation. Neural Computation, 34(9):1944–1977, 2022. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 63 / 80
  59. References XII [39] Ryuichi Kiryo, Gang Niu, Marthinus C Du

    Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator. Advances in neural information processing systems, 30, 2017. [40] Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth. Active testing: Sample-efficient model evaluation. In International Conference on Machine Learning, pages 5753–5763. PMLR, 2021. [41] Daniel Kottke, Jim Schellinger, Denis Huseljic, and Bernhard Sick. Limitations of assessing active learning performance at runtime. arXiv preprint arXiv:1901.10338, 2019. [42] Wouter M Kouw and Marco Loog. On regularization parameter estimation under covariate shift. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 426–431. IEEE, 2016. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 64 / 80
  60. References XIII [43] Sofia Ira Ktena, Alykhan Tejani, Lucas Theis,

    Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszár, Steven Yoo, and Wenzhe Shi. Addressing delayed feedback for continuous training with neural networks in ctr prediction. In Proceedings of the 13th ACM conference on recommender systems, pages 187–195, 2019. [44] Daniel Levy, Yair Carmon, John C Duchi, and Aaron Sidford. Large-scale methods for distributionally robust optimization. Advances in Neural Information Processing Systems, 33:8847–8860, 2020. [45] Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pages 3122–3130. PMLR, 2018. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 65 / 80
  61. References XIV [46] Wei Liu and Sanjay Chawla. Class confidence

    weighted k nn algorithms for imbalanced data sets. In Advances in Knowledge Discovery and Data Mining: 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, Proceedings, Part II 15, pages 345–356. Springer, 2011. [47] Yangdi Lu, Yang Bo, and Wenbo He. Noise attention learning: Enhancing noise robustness by gradient scaling. Advances in Neural Information Processing Systems, 35:23164–23177, 2022. [48] Yiping Lu, Wenlong Ji, Zachary Izzo, and Lexing Ying. Importance tempering: Group robustness for overparameterized models. arXiv preprint arXiv:2209.08745, 2022. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 66 / 80
  62. References XV [49] Jose Ignacio Segovia Martin, Santiago Mazuelas, and

    Anqi Liu. Double-weighting for covariate shift adaptation. In International Conference on Machine Learning, pages 30439–30457. PMLR, 2023. [50] Nima Mashayekhi. An Adversarial Approach to Importance Weighting for Domain Adaptation. PhD thesis, 2022. [51] Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip Torr, and Puneet Dokania. Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33:15288–15299, 2020. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 67 / 80
  63. References XVI [52] Mor Shpigel Nacson, Suriya Gunasekar, Jason Lee,

    Nathan Srebro, and Daniel Soudry. Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep models. In International Conference on Machine Learning, pages 4683–4692. PMLR, 2019. [53] Tuan Duong Nguyen, Marthinus Christoffel, and Masashi Sugiyama. Continuous target shift adaptation in supervised learning. In Asian Conference on Machine Learning, pages 285–300. PMLR, 2016. [54] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010. [55] Bryce Nicholson, Victor S Sheng, and Jing Zhang. Label noise correction and application in crowdsourcing. Expert Systems with Applications, 66:149–162, 2016. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 68 / 80
  64. References XVII [56] Bryce Nicholson, Jing Zhang, Victor S Sheng,

    and Zhiheng Wang. Label noise correction methods. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 1–9. IEEE, 2015. [57] Vishal M Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine, 32(3):53–69, 2015. [58] Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944–1952, 2017. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 69 / 80
  65. References XVIII [59] Barbara Plank, Anders Johannsen, and Anders Søgaard.

    Importance weighting and unsupervised domain adaptation of pos taggers: a negative result. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 968–973, 2014. [60] Hamed Rahimian and Sanjay Mehrotra. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659, 2019. [61] Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 70 / 80
  66. References XIX [62] Benjamin Rhodes, Kai Xu, and Michael U

    Gutmann. Telescoping density-ratio estimation. Advances in neural information processing systems, 33:4905–4916, 2020. [63] Abdollah Safari, Rachel MacKay Altman, and Thomas M Loughin. Display advertising: Estimating conversion probability efficiently. arXiv preprint arXiv:1710.08583, 2017. [64] Shibani Santurkar, Dimitris Tsipras, and Aleksander Madry. Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859, 2020. [65] Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 71 / 80
  67. References XX [66] Hwanjun Song, Minseok Kim, Dongmin Park, Yooju

    Shin, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022. [67] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018. [68] Masashi Sugiyama. Active learning for misspecified models. Advances in neural information processing systems, 18, 2005. [69] Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 72 / 80
  68. References XXI [70] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul

    Buenau, and Motoaki Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in neural information processing systems, 20, 2007. [71] Marcelo Tallis and Pranjul Yadav. Reacting to variations in product demand: An application for conversion rate (cr) prediction in sponsored search. In 2018 IEEE International Conference on Big Data (Big Data), pages 1856–1864. IEEE, 2018. [72] Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift. Advances in neural information processing systems, 32, 2019. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 73 / 80
  69. References XXII [73] Katrin Tomanek. Resource-aware annotation through active learning.

    2010. [74] Katrin Tomanek and Katherina Morik. Inspecting sample reusability for active learning. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 169–181. JMLR Workshop and Conference Proceedings, 2011. [75] Van-Tinh Tran. Selection bias correction in supervised learning with importance weight. PhD thesis, Université de Lyon, 2017. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 74 / 80
  70. References XXIII [76] Van-Tinh Tran and Alex Aussem. Correcting a

    class of complete selection bias with external data based on importance weight estimation. In International Conference on Neural Information Processing, pages 111–118. Springer, 2015. [77] Gijs Van Tulder. Sample reusability in importance-weighted active learning. 2012. [78] Francis Vella. Estimating models with sample selection bias: a survey. Journal of Human Resources, pages 127–169, 1998. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 75 / 80
  71. References XXIV [79] Mei Wang and Weihong Deng. Deep visual

    domain adaptation: A survey. Neurocomputing, 312:135–153, 2018. [80] Garrett Wilson and Diane J Cook. A survey of unsupervised deep domain adaptation. ACM Transactions on Intelligent Systems and Technology (TIST), 11(5):1–46, 2020. [81] Christopher Winship and Robert D Mare. Models for sample selection bias. Annual review of sociology, 18(1):327–350, 1992. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 76 / 80
  72. References XXV [82] Rui Xia, Zhenchun Pan, and Feng Xu.

    Instance weighting for domain adaptation via trading off sample selection bias and variance. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pages 13–19, 2018. [83] Ni Xiao and Lei Zhang. Dynamic weighted learning for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15242–15251, 2021. [84] Da Xu, Yuting Ye, and Chuanwei Ruan. Understanding the role of importance weighting for deep learning. arXiv preprint arXiv:2103.15209, 2021. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 77 / 80
  73. References XXVI [85] Han Xu, Yao Ma, Hao-Chen Liu, Debayan

    Deb, Hui Liu, Ji-Liang Tang, and Anil K Jain. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing, 17:151–178, 2020. [86] Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Masashi Sugiyama. Relative density-ratio estimation for robust distribution comparison. Neural computation, 25(5):1324–1370, 2013. [87] Yuzhe Yang, Haoran Zhang, Dina Katabi, and Marzyeh Ghassemi. Change is hard: A closer look at subpopulation shift. arXiv preprint arXiv:2302.12254, 2023. [88] Shota Yasui, Gota Morishita, Fujita Komei, and Masashi Shibata. A feedback shift correction in predicting conversion rates under delayed feedback. In Proceedings of The Web Conference 2020, pages 2740–2746, 2020. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 78 / 80
  74. References XXVII [89] Jiayang Yin. On the Improvement of Density

    Ratio Estimation via Probabilistic Classifier–Theoretical Study and Its Applications. PhD thesis, The University of British Columbia (Vancouver, 2023. [90] Yuya Yoshikawa and Yusaku Imai. A nonparametric delayed feedback model for conversion rate prediction. arXiv preprint arXiv:1802.00255, 2018. [91] Bianca Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, page 114, 2004. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 79 / 80
  75. References XXVIII [92] Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and

    Zhikun Wang. Domain adaptation under target and conditional shift. In International Conference on Machine Learning, pages 819–827. PMLR, 2013. [93] Eric Zhao, Anqi Liu, Animashree Anandkumar, and Yisong Yue. Active learning under label shift. In International Conference on Artificial Intelligence and Statistics, pages 3412–3420. PMLR, 2021. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 80 / 80