Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BERT入門

kenmatsu4
November 29, 2023

 BERT入門

kenmatsu4

November 29, 2023
Tweet

More Decks by kenmatsu4

Other Decks in Technology

Transcript

  1.    )( 3 C Te TC a C

    RTs Ci C C  ü t t p s a s g C   • (/ 2) / H N Cs L • s C C N • Nv • ( - N • . N • coRh C L (/ 2) / • D V s m LirgS nd I GN C dpa V E A :
  2. 2015 32016 h c 1 1 P P b 2

    ( ) 0. 9 2 . . 1 2
  3.  5 Kaggle !    201911$( 125,564 627&

    (top 0.5%)   SIGNATE ü ")+%,-# *.'01  2& / https://www.slideshare.net/matsukenbook/signate-108228406
  4.    9 6 IB 5B MF EB /

    MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o EB C MA B FIB MBB - PM S o W xo 5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o ' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w 8P M 8PBN F 7 FMN 887 2 - PM S ho r o 5P F643 5 EBA 5643 I - PM S 2ho contradiction, entailment, neutralo3 Xtrain n ergenreo 5P F643 5FNI EBA 5643 II - PM S 2ho contradiction, entailment, neutralo3 Xtrain n dm genreo 8PBN F 643 8643 - PM S F FLBAF o n di u r o 9B D FTF D BR P 1 F IB 9 1 - PM S ho r o # F DM A 643 643 - PM S C u rk d e ab trpo r fsu ir 0F D N F N 5 F - 5 EBQ N / MM contradiction, entailment, neutralo3 Xtrain datamd GLUE y W dg o ho z w csr
  5. 11 1 ü / : : s : : SbF

    d aF. /gfe lind ü / : : - B 6 - 6 - B - . E 1 / 6: : .1/ gfed tRF mkn R : : N ü 2 B 6 urdk f lind ü 2 B N B T o ed • 4 B ( • ( x ü / B: : 6: B: Bd M 4 66: ) 2 L b p P Flin r cb
  6. - 1 2 1 1 13 0 B 4: :

    8 C 45 .: BC 45 h BB 4 F: 8 45 ) ) ( 0 B 4: : 8 .: BC Wo T dbe P unPR0 B 4: : 8To P vyo wtWPR ü dbe PTo wt ( 1 2 21 ( 1 ( ( )1 1 1 1 1 B 1 2 ) iL e E :8 B ü a : : :4 (sx C )sx 0 B 4: : 8To PRE :8 B vyo P rn -12 e gW k /4 p lk W n wt 1 2 N E MF L
  7.    14 text 1 position_id 0 1 2

    3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocodile around in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012 4 9 : :olpi uT t ir T ?:B : r ShW Pz g : - B 7 : . B 7 : B 7 : ea g P g :# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B : B5 9colp nkm g s :# B:9 ?:B : _ ( x = - = B : B : B5 : ?:B : = /= = = . ?:B : _ : 232#0 _ B9: ++df b t _)
  8.  15 ['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming',

    'pool', 'with', 'a', 'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an', 'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]'] text 1 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocod ile aroun d in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012      
  9. 1 1: 1 16 [CLS] two kids are playing kids

    [SEP] two kids two [SEP] ( o % 8 ) o - n s 8 o EA B E n s o n s l r 4 5t c 5 t 0 11 20 3 , 11 5 e p 5 playing two two [CLS] two kids are [MASK] kids [SEP] dog kids [MASK] [SEP] pred2 pred3 pred4 kids pred1 cross entropy loss cross entropy loss cross entropy loss cross entropy loss
  10. 2 - 2 2 2 17 N I 1 2

    IsNext pred1 cross entropy loss e ( ) ) ) [CLS] two kids are playing kids [SEP] two kids two [SEP] /
  11.     18 #   unk_token =

    “[UNK]”, #  sep_token = “[SEP]”, #  pad_token = “[PAD]”, #   cls_token = “[CLS]”, #  mask_token = "[MASK]", # pre-training https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145
  12.     21 import tensorflow as tf from

    transformers import BertTokenizer, TFBertForSequenceClassification tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile float. [SEP]" text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]" tokenized_text = tokenizer.tokenize(text1 + " " + text2) print(tokenized_text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep) tokens_tensor = tf.Variable([indexed_tokens]) segments_tensors = tf.Variable([segments_ids]) model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased') print(model.summary()) outputs = model(tokens_tensor, token_type_ids=segments_tensors) print(outputs) r Pf BF E gd T - P C - c C fm nk Fph e R -
  13.       22 https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52 vocab PRETRAINED_VOCAB_FILES_MAP

    = { "vocab_file": { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt", } } https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52 weight TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5", "bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5", "bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5", }
  14.      24 L . -/ .

    1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ## vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act="gelu", hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, :
  15.    25 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer

    TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput : /1 //1 /. :. : : : :. : 1 / . //1 /. C :. : C ::1.1 1 F B : 21 / C
  16. TFBertForSequenceClassification      26 TFBertMainLayer TFBertPooler TFBertEmbedding

    TFBertEncorder TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) input attention_mask (n_seq,) Dense DD N Extract first seq a e i F h (dim, ) (dim, ) (n_seq, dim) = bgF (n_seq, dim) pooled_output sequence_output (dim,) (n_seq, dim) N N Dropout Dense DD N (dim, ) (n_class) output BERT c Dropout, Dense _B fTS dC
  17.     27 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler

    TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds 8 ab pd c n heT( (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) 8 , , 8 , , 8 o Weight [word_embeddings] Embedding [position_embeddings] Embedding [token_type_embeddings] gather (vocab_size, dim) (n_seq, dim) (type_vocab_size, dim) + (n_seq, dim) LayerNormalization Dropout hidden_status [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 max_position_embeddings = 512 ( 8 7 sq[ m N l i ! " ] E _f r ) 8 y / / n L gx 8 ] E k 6 6 , t z 8 input (n_seq, dim) (n_seq, dim) (n_seq, dim)
  18.    28 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer

    TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput TFBertEncoder hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer hidden_status 2 (n_seq, dim) 22 2 2 12
  19. TFBertLayer TFBertSelfOutput   29 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler

    TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertSelfAttention TFBertSelfOutput Dense Layer Normalization Dropout TFBertIntermediate Dense gelu TFBertOutput Dense Dropout Layer Normalization + hidden_status (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) D A A =A
  20. A z c lnpo cbvs    30 TFBertForSequenceClassification

    TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask multi head multi head multi head Dense [query (Q)] Dense [ Key (K)] Dense [ Value (V)] input = A u dcb - [mf g q ehi ] Qc[_a A DNMMb u[- vs N b ] tS (n_seq, dim) D = A e e AA A e Tb (n_seq, dim) (n_head, n_seq, dim/n_head) (n_head, n_seq, n_seq) (n_seq,) softmax Dropout hidden_status x _ [ A c cb attention_mask attention_mask attention_mask attention_mask (n_head, n_seq, n_seq) (n_head, n_seq, n_seq) (n_head, n_seq, dim/n_head) Reshape (n_seq, dim) x _ b Q vse QQ ] A A ( ) NtS M cbu AA A e AA A K [ + sMc KV A e r
  21. 31 [CLS] two kids are playing in a swimming pool

    … [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 1 2 = 1 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] dim/n_head =64 dim/n_head =64 dim/n_head =64 n_head = 12 multi head max_position_embeddings = 512
  22. [CLS] two kids are playing in a swimming pool …

    [SEP] [PAD] … [PAD] ( )( 32 max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Q lh K lh i d  !" !# !$ !% !&## '" '# '$ !( !& '% '&## !( '$ i ) ( e D D ) ( am Key Query lh ( n D b )( D D g
  23. ( )( 33 [CLS] two kids are playing in a

    swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] softmax m g D x w tl Key Query QueryK “kids” KeyKo s K Softmax q i1 b q ehd K () f n sK (K K n ra
  24. ( )( 34 [CLS] two kids are playing in a

    swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Key Query [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 !" !# !$ !% !&## !' !& [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 [CLS] two kids are playing in a swimming pool [SEP] [PAD] [PAD] Qweight h D gd ( ) 1Q ( a ( ( ) hQ V gd ( ( ) i ( ( ( Q be
  25.  35 BERT: Pre-training of Deep Bidirectional Transformers for Language

    Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova https://arxiv.org/abs/1810.04805 Transformers https://huggingface.co/transformers 3rd party pre-trained    https://huggingface.co/models