Vaswani etl al., Attention Is All You Need, NeurIPS 2017. Θ͔Γ͍͢ղઆ: ʲਂֶशʳTransformer - Multi-Head AttentionΛཧղͯ͠Ζ͏͡Όͳ͍ͷʲσΟʔϓϥʔχϯάͷੈքvol.28ʳ ֓ཁਤ Encoder Decoder
( fi ne-tuning) ͱ͍͏ύϥμΠϜ͕ීٴ BERT: Bidirectional Encoder Representations from Transformers 14 Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019.
“গ͠” ௐ͢Δ͚ͩͰ͍͍ •܇࿅σʔλ͕ൺֱతগྔͰߴ͍ੑೳΛಘΒΕΔΑ͏ʹ (10ສ~ →1000~) • ܇࿅ίετ (σʔληοτऩूɾ܇࿅࣌ؒ) ͕ܶతʹݮগ BERTΛ༻͍ͨࣗવݴޠॲཧ 15 Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019.
Pre fi x-Tuning / Prompt Tuning 28 Li et al., Pre fi x-Tuning: Optimizing Continuous Prompts for Generation, ACL-IJCNLP 2021. Lester et al., The Power of Scale for Parameter-E ff i cient Prompt Tuning, EMNLP 2021. LLMͰ༻͍ΒΕΔ “ࢄతͳ” promptͱରরత Prompt Tuning Pre fi x-Tuning