Slide 14
Slide 14 text
14
طଘͷΞϥΠϝϯτख๏ͷ֦ுɾվྑ
4"$10
༗༻ੑʹؔ͢Δ
σʔλ
҆શੑʹؔ͢Δ
σʔλ
NBYJNVN
MJLFMJIPPE
FH
%10
,50
SFGFSFODF
-.1PMJDZ
NBYJNVN
MJLFMJIPPE
FH
%10
,50
GJOBM
-.1PMJDZ
SFXBSEBMJHOFE
-.1PMJDZ
Wachi, et al. “Stepwise Alignment for Constrained Language Model Policy Optimization.” In NeurIPS (2024).
Huang et al. "One-Shot Safety Alignment for Large Language Models via Optimal Dualization." In NeurIPS (2024).
Yang et al. "Metaaligner: Towards generalizable multi-objective alignment of language models." In NeurIPS (2024).
Ruizhe+ "Decoding-time language model alignment with multiple objectives." In NeurIPS (2024).
• ҆શ੍͖ͷΛղ͘ˠ 8BDIJ
)VBOH
• ଟత࠷దԽΛղ͘ ˠ ,BJMBJ
3VJ[IF
ը૾ :BOH
ΑΓഈआ
ը૾ 8BDIJ
ΑΓഈआʢզʑͷจʣ