(8" (RTE.-) • BK<J/ (8GEHJ= (NER) • @DJ (SQuAD 0QA) • Google%'*$8FAI0 0$/!#7 (23 ) → 05+Pre-Training 7 BERT E [CLS] E 1 E [SEP] ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Question Paragraph BERT E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence ... ... BERT Tok 1 Tok 2 Tok N ... [CLS] E [CLS] E 1 E 2 E N C T 1 T 2 T N Single Sentence B-PER O O ... ... E [CLS] E 1 E [SEP] Class Label ... E N E 1 ’ ... E M ’ C T 1 T [SEP] ... T N T 1 ’ ... T M ’ Start/End Span Class Label BERT Tok 1 Tok 2 Tok N ... [CLS] Tok 1 [CLS] [CLS] Tok 1 [SEP] ... Tok N Tok 1 ... Tok M Sentence 1 ... Sentence 2 Figure 3: Our task speciﬁc models are formed by incorporating BERT with one additional output layer, s minimal number of parameters need to be learned from scratch. Among the tasks, (a) and (b) are sequence-le tasks while (c) and (d) are token-level tasks. In the ﬁgure, E represents the input embedding, Ti represents contextual representation of token i, [CLS] is the special symbol for classiﬁcation output, and [SEP] is the spec symbol to separate non-consecutive token sequences. QNLI Question Natural Language Inference is a version of the Stanford Question Answering Dataset (Rajpurkar et al., 2016) which has been converted to a binary classiﬁcation task (Wang et al., 2018). The positive examples are (ques- tion, sentence) pairs which do contain the correct answer, and the negative examples are (question, sentence) from the same paragraph which do not contain the answer. SST-2 The Stanford Sentiment Treebank is a binary single-sentence classiﬁcation task consist- ing of sentences extracted from movie reviews with human annotations of their sentiment (Socher et al., 2013). CoLA The Corpus of Linguistic Acceptability is a binary single-sentence classiﬁcation task, where the goal is to predict whether an English senten is linguistically “acceptable” or not (Warst et al., 2018). STS-B The Semantic Textual Similarity Benc mark is a collection of sentence pairs drawn fro news headlines and other sources (Cer et 2017). They were annotated with a score from to 5 denoting how similar the two sentences are terms of semantic meaning. MRPC Microsoft Research Paraphrase Corp consists of sentence pairs automatically extrac from online news sources, with human annotatio for whether the sentences in the pair are seman cally equivalent (Dolan and Brockett, 2005).
<N D ;M * D KM D 8? D 7 N / Um`aS J <N D ;M * E D 8? F Um`aS J gm^m XSVi jm\ CB A G NM 0$* FM E … … rpq . n cm ##Zm ##Te 3 Q ##5Q o F / WkbV` _R\V CB @ O NM bi\ ! ) % P 2= _[^i ##Zk]X SYm E . E HA > 0 1 L49I fgh D '- 8? 32= rpq n Xkdi o P : M 6A @ … … Pre-training _m^ ( Wikipedia ("1,800) [CLS] $* F + E $* 1M2F <N D ;M * D KM D 8? D 7 N / Um`aS [MASK] … G NM 0 [SEP] $* F M [MASK] E… [SEP] Label: IsNext Pre-training Fine-tuning #&lBPEP, lll lll lll lll