Forward process Backward process どちらも(ほぼ)ミニマックス最適 [Yang & Barron, 1999; Niles-Weed & Berthet, 2022]. 経験スコアマッチング推定量: (for any 𝛿 > 0). 定理 Let 𝑌 be the r.v. generated by the backward process w.r.t. Ƹ 𝑠, then (Estimator for 𝑊1 distance requires some modification) (𝑠: 密度関数の滑らかさ) [Kazusato Oko, Shunta Akiyama, Taiji Suzuki: Diffusion Models are Minimax Optimal Distribution Estimators. ICML2023, oral] (2% of all submissions)
such as LLMs. 29 Fine-tuning data: • For each prompt 𝑐 ∼ 𝑝(𝑐), generate 𝑦1 , 𝑦2 ∼ 𝑝SFT 𝑦 𝑐 (independently). • Get preference 𝑦𝑤 ≻ 𝑦𝑙 between 𝑦1 , 𝑦2 . (human feedback) 1. 2. (Bradley-Terry model) (computation of normalization constant is not required) [Rafailov et al. 2024]
We observe pretraining task data 𝑇 times. ➢ Each task has 𝑛 data. Test task (In-context learning): ⋯ Predict • The true functions 𝐹𝑡 ∘ are different across different tasks. • 𝐹𝑡 ∘ is generated randomly for each task. Model: 𝑡 = 1, … , 𝑇: Task index
the emergence of non-Bayesian in-context learning for regression. 2023 ] If # of pretraining tasks is enough, ICL coincides with optimal ridge regression.