Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Attention is all you need

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for sasanoshohuta sasanoshohuta
March 26, 2021
66

Attention is all you need

Avatar for sasanoshohuta

sasanoshohuta

March 26, 2021
Tweet

Transcript

  1. ͲΜͳ΋ͷʁ ઌߦݚڀͱൺ΂ͯԿ͕͍͢͝ʁ ٕज़ͷख๏΍؊͸ʁ վળͷ༨஍͸͋Δʁ Ͳ͏΍ͬͯ༗ޮͩͱݕূͨ͠ʁ ࣍ʹಡΉ΂͖࿦จ͸ʁ ɾSepp Hochreiter and Jürgen

    Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. ɾओʹςΩετͷΈʹ࢖༻͞Ε͍ͯΔ͕ɺը૾ɺԻ੠ɺಈըͳͲͷେن໛ͳೖग़ྗΛޮ ཰తʹॲཧ͢ΔҝͷվྑΛߦ͏ ɾੜ੒ͷংྻੑΛͳ͘͢ ɾWMT 2014ͷӳಠ຋༁λεΫʹΑΓ࣮ݧ ɾ͜Ε·Ͱͷ࠷ྑϞσϧʢΞϯαϯϒϧΛؚΉʣΛ2.0BLEUҎ্্ճΓɺ28.4ͱ͍͏৽ ͍͠࠷ઌ୺ͷBLEUείΞΛཱ֬ ɾֶशίετ͸ڝ߹ϞσϧͷԿ෼ͷҰ͔Ͱ࣮ݱ ɾTransformerͰ͸ɺΤϯίʔμͱσίʔμͷ྆ํͰɺstaked self-attentionͱpoint- wise, શ݁߹૚Λ࢖༻ͯ͠ɺશମతͳΞʔΩςΫνϟΛߏங͍ͯ͠Δ ɾThis makes it more difficult to learn dependencies between distant positions. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. ɾTransformer͸ɺ࠶ؼ΍৞ΈࠐΈΛҰ੾࢖Θͳ͍AttentionϝΧχζϜͷΈʹجͮ ͍ͨ৽͍͠γϯϓϧͳΞʔΫςΫνϟϞσϧ ɾฒྻԽ͕ՄೳͰ͋Γɺֶशʹඞཁͳ͕࣌ؒେ෯ʹ୹ॖ͞Εͨ Attention is all you need https://arxiv.org/pdf/1706.03762.pdf ʢ2017ʣVaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, L., Polosukhin, 2021/03/25