3//Θͣʹ4FMG"UUFOUJPO͚ͩͰจ຺දݱΛநग़Մೳ "UUFOUJPOJT"MM:PV/FFE 5SBOTGPSNFS <>IUUQTBSYJWPSHBCT Figure 1: The Transformer - model architecture. 1 Encoder and Decoder Stacks ncoder: The encoder is composed of a stack of N = 6 identical layers. Each layer has two ub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position- ise fully connected feed-forward network. We employ a residual connection [11] around each of
ଟͷϕϯνϚʔΫλεΫͰ4UBUF0G5IF"SUͷੑೳ IUUQTUXJUUFSDPN@3ZPCPUTUBUVT SWAG (The Situations With Adversarial Generations dataset) A girl is going across a set up monkey bars. She (少⼥が雲梯を進んでいます。彼⼥は) (i) jumps up across the monkey bars. (雲梯で跳び上がります) (ii) struggles onto the bars to grab her head. (頭を掴むために雲梯でもがいています) (iii) gets to the end and stands on a wooden plank. (終わりまで⾏き、⽊の板の上に⽴ちます) (iv) jumps up and does a back flip. (跳び上がり、宙返りをします) ࢀߟʣIUUQTJUNJOUDPNEBUBTFUTPGEFFQMFBSOJOHCFSUIUNM
[CLS]: 隣接⽂クイズ開始の印、[MASK]: 単語の置き換え、[SEP]: センテンスの末尾、##: 接尾辞 38 ݀ຒΊΫΠζͱྡจΫΠζΛղֶ͍ͯश Input A: [CLS] the man went to [MASK] store [SEP] Input B: he bought a gallon [MASK] milk [SEP] Label: IsNext - Input A: [CLS] the man went to [MASK] store [SEP] Input B: penguin [MASK] are flight ##less birds [SEP] Label: NotNext
BERT E [CLS] E 1 E [SEP] ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Question Paragraph BERT E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence ... ... BERT Tok 1 Tok 2 Tok N ... [CLS] E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence B-PER O O ... ... E [CLS] E 1 E [SEP] Class Label ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ Start/End Span Class Label BERT Tok 1 Tok 2 Tok N ... [CLS] Tok 1 [CLS] [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Sentence 1 ... Sentence 2 Figure 3: Our task specific models are formed by incorporating BERT with one additional output layer, so a minimal number of parameters need to be learned from scratch. Among the tasks, (a) and (b) are sequence-level ಛఆͷλεΫ༻ʹγϯϓϧͳग़ྗΛՃ͢Δ
BERT E [CLS] E 1 E [SEP] ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Question Paragraph BERT E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence ... ... BERT Tok 1 Tok 2 Tok N ... [CLS] E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence B-PER O O ... ... E [CLS] E 1 E [SEP] Class Label ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ Start/End Span Class Label BERT Tok 1 Tok 2 Tok N ... [CLS] Tok 1 [CLS] [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Sentence 1 ... Sentence 2 Figure 3: Our task specific models are formed by incorporating BERT with one additional output layer, so a minimal number of parameters need to be learned from scratch. Among the tasks, (a) and (b) are sequence-level tasks while (c) and (d) are token-level tasks. In the figure, E represents the input embedding, Ti represents the contextual representation of token i, [CLS] is the special symbol for classification output, and [SEP] is the special symbol to separate non-consecutive token sequences. QNLI Question Natural Language Inference is the goal is to predict whether an English sentence