MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o EB C MA B FIB MBB - PM S o W xo 5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o ' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w 8P M 8PBN F 7 FMN 887 2 - PM S ho r o 5P F643 5 EBA 5643 I - PM S 2ho contradiction, entailment, neutralo3 Xtrain n ergenreo 5P F643 5FNI EBA 5643 II - PM S 2ho contradiction, entailment, neutralo3 Xtrain n dm genreo 8PBN F 643 8643 - PM S F FLBAF o n di u r o 9B D FTF D BR P 1 F IB 9 1 - PM S ho r o # F DM A 643 643 - PM S C u rk d e ab trpo r fsu ir 0F D N F N 5 F - 5 EBQ N / MM contradiction, entailment, neutralo3 Xtrain datamd GLUE y W dg o ho z w csr
d aF. /gfe lind ü / : : - B 6 - 6 - B - . E 1 / 6: : .1/ gfed tRF mkn R : : N ü 2 B 6 urdk f lind ü 2 B N B T o ed • 4 B ( • ( x ü / B: : 6: B: Bd M 4 66: ) 2 L b p P Flin r cb
8 C 45 .: BC 45 h BB 4 F: 8 45 ) ) ( 0 B 4: : 8 .: BC Wo T dbe P unPR0 B 4: : 8To P vyo wtWPR ü dbe PTo wt ( 1 2 21 ( 1 ( ( )1 1 1 1 1 B 1 2 ) iL e E :8 B ü a : : :4 (sx C )sx 0 B 4: : 8To PRE :8 B vyo P rn -12 e gW k /4 p lk W n wt 1 2 N E MF L
3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocodile around in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012 4 9 : :olpi uT t ir T ?:B : r ShW Pz g : - B 7 : . B 7 : B 7 : ea g P g :# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B : B5 9colp nkm g s :# B:9 ?:B : _ ( x = - = B : B : B5 : ?:B : = /= = = . ?:B : _ : 232#0 _ B9: ++df b t _)
[SEP] two kids two [SEP] ( o % 8 ) o - n s 8 o EA B E n s o n s l r 4 5t c 5 t 0 11 20 3 , 11 5 e p 5 playing two two [CLS] two kids are [MASK] kids [SEP] dog kids [MASK] [SEP] pred2 pred3 pred4 kids pred1 cross entropy loss cross entropy loss cross entropy loss cross entropy loss
transformers import BertTokenizer, TFBertForSequenceClassification tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile float. [SEP]" text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]" tokenized_text = tokenizer.tokenize(text1 + " " + text2) print(tokenized_text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep) tokens_tensor = tf.Variable([indexed_tokens]) segments_tensors = tf.Variable([segments_ids]) model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased') print(model.summary()) outputs = model(tokens_tensor, token_type_ids=segments_tensors) print(outputs) r Pf BF E gd T - P C - c C fm nk Fph e R -
TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds 8 ab pd c n heT( (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) 8 , , 8 , , 8 o Weight [word_embeddings] Embedding [position_embeddings] Embedding [token_type_embeddings] gather (vocab_size, dim) (n_seq, dim) (type_vocab_size, dim) + (n_seq, dim) LayerNormalization Dropout hidden_status [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 max_position_embeddings = 512 ( 8 7 sq[ m N l i ! " ] E _f r ) 8 y / / n L gx 8 ] E k 6 6 , t z 8 input (n_seq, dim) (n_seq, dim) (n_seq, dim)
TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask multi head multi head multi head Dense [query (Q)] Dense [ Key (K)] Dense [ Value (V)] input = A u dcb - [mf g q ehi ] Qc[_a A DNMMb u[- vs N b ] tS (n_seq, dim) D = A e e AA A e Tb (n_seq, dim) (n_head, n_seq, dim/n_head) (n_head, n_seq, n_seq) (n_seq,) softmax Dropout hidden_status x _ [ A c cb attention_mask attention_mask attention_mask attention_mask (n_head, n_seq, n_seq) (n_head, n_seq, n_seq) (n_head, n_seq, dim/n_head) Reshape (n_seq, dim) x _ b Q vse QQ ] A A ( ) NtS M cbu AA A e AA A K [ + sMc KV A e r
… [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 1 2 = 1 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] dim/n_head =64 dim/n_head =64 dim/n_head =64 n_head = 12 multi head max_position_embeddings = 512
[SEP] [PAD] … [PAD] ( )( 32 max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Q lh K lh i d !" !# !$ !% !&## '" '# '$ !( !& '% '&## !( '$ i ) ( e D D ) ( am Key Query lh ( n D b )( D D g
swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] softmax m g D x w tl Key Query QueryK “kids” KeyKo s K Softmax q i1 b q ehd K () f n sK (K K n ra
swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Key Query [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 !" !# !$ !% !&## !' !& [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 [CLS] two kids are playing in a swimming pool [SEP] [PAD] [PAD] Qweight h D gd ( ) 1Q ( a ( ( ) hQ V gd ( ( ) i ( ( ( Q be