Slide 84
Slide 84 text
#FTUPG/ #P/
4BNQMJOH
γϯϓϧΏ͑ɺཧղੳ৭ʑͱΒΕ͍ͯΔ
• Beirami+, "Theoretical guarantees on the best-of-n alignment policy." In ICML. 2024.
• Yang+. "Asymptotics of language model alignment." In ISIT, 2024.
• Gui+. “Bonbon alignment for large language models and the sweetness of best-of-n
sampling.” In NeurIPS. 2024.
• Huang+. "Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in
Inference-Time Alignment." In ICML. 2024
BoN ཧతʹཪ͚͞Εͨੑ࣭ͷྑ͍ख๏
2ɿ#P/ ʹΑͬͯಘΒΕΔग़ྗͷʮྑ͍ʯͷ͔ʁ
"ɿ͋Δ݅ԼͰ #P/ ͱʢ,-ਖ਼ଇԽ͖ͭͷʣڧԽֶशಉ