Attention is all you need

ͲΜͳ΋ͷʁ ઌߦݚڀͱൺ΂ͯԿ͕͍͢͝ʁ ٕज़ͷख๏΍؊͸ʁ վળͷ༨஍͸͋Δʁ Ͳ͏΍ͬͯ༗ޮͩͱݕূͨ͠ʁ ࣍ʹಡΉ΂͖࿦จ͸ʁ ɾSepp Hochreiter and Jürgen
Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. ɾओʹςΩετͷΈʹ࢖༻͞Ε͍ͯΔ͕ɺը૾ɺԻ੠ɺಈըͳͲͷେن໛ͳೖग़ྗΛޮ ཰తʹॲཧ͢ΔҝͷվྑΛߦ͏ ɾੜ੒ͷংྻੑΛͳ͘͢ ɾWMT 2014ͷӳಠ຋༁λεΫʹΑΓ࣮ݧ ɾ͜Ε·Ͱͷ࠷ྑϞσϧʢΞϯαϯϒϧΛؚΉʣΛ2.0BLEUҎ্্ճΓɺ28.4ͱ͍͏৽ ͍͠࠷ઌ୺ͷBLEUείΞΛཱ֬ ɾֶशίετ͸ڝ߹ϞσϧͷԿ෼ͷҰ͔Ͱ࣮ݱ ɾTransformerͰ͸ɺΤϯίʔμͱσίʔμͷ྆ํͰɺstaked self-attentionͱpoint- wise, શ݁߹૚Λ࢖༻ͯ͠ɺશମతͳΞʔΩςΫνϟΛߏங͍ͯ͠Δ ɾThis makes it more difficult to learn dependencies between distant positions. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. ɾTransformer͸ɺ࠶ؼ΍৞ΈࠐΈΛҰ੾࢖Θͳ͍AttentionϝΧχζϜͷΈʹجͮ ͍ͨ৽͍͠γϯϓϧͳΞʔΫςΫνϟϞσϧ ɾฒྻԽ͕ՄೳͰ͋Γɺֶशʹඞཁͳ͕࣌ؒେ෯ʹ୹ॖ͞Εͨ Attention is all you need https://arxiv.org/pdf/1706.03762.pdf ʢ2017ʣVaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, L., Polosukhin, 2021/03/25

࣍ʹಡΉ࿦จଓ͖ ɾ

Attention is all you need

Attention is all you need

sasanoshohuta

More Decks by sasanoshohuta

Featured

Transcript

ͲΜͳ΋ͷʁ ઌߦݚڀͱൺ΂ͯԿ͕͍͢͝ʁ ٕज़ͷख๏΍؊͸ʁ վળͷ༨஍͸͋Δʁ Ͳ͏΍ͬͯ༗ޮͩͱݕূͨ͠ʁ ࣍ʹಡΉ΂͖࿦จ͸ʁ ɾSepp Hochreiter and Jürgen

࣍ʹಡΉ࿦จଓ͖ ɾ