Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BERT入門

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for kenmatsu4 kenmatsu4
November 29, 2023

 BERT入門

Avatar for kenmatsu4

kenmatsu4

November 29, 2023
Tweet

More Decks by kenmatsu4

Other Decks in Technology

Transcript

  1.    )( 3 C Te TC a C

    RTs Ci C C  ü t t p s a s g C   • (/ 2) / H N Cs L • s C C N • Nv • ( - N • . N • coRh C L (/ 2) / • D V s m LirgS nd I GN C dpa V E A :
  2. 2015 32016 h c 1 1 P P b 2

    ( ) 0. 9 2 . . 1 2
  3.  5 Kaggle !    201911$( 125,564 627&

    (top 0.5%)   SIGNATE ü ")+%,-# *.'01  2& / https://www.slideshare.net/matsukenbook/signate-108228406
  4.    9 6 IB 5B MF EB /

    MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o EB C MA B FIB MBB - PM S o W xo 5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o ' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w 8P M 8PBN F 7 FMN 887 2 - PM S ho r o 5P F643 5 EBA 5643 I - PM S 2ho contradiction, entailment, neutralo3 Xtrain n ergenreo 5P F643 5FNI EBA 5643 II - PM S 2ho contradiction, entailment, neutralo3 Xtrain n dm genreo 8PBN F 643 8643 - PM S F FLBAF o n di u r o 9B D FTF D BR P 1 F IB 9 1 - PM S ho r o # F DM A 643 643 - PM S C u rk d e ab trpo r fsu ir 0F D N F N 5 F - 5 EBQ N / MM contradiction, entailment, neutralo3 Xtrain datamd GLUE y W dg o ho z w csr
  5. 11 1 ü / : : s : : SbF

    d aF. /gfe lind ü / : : - B 6 - 6 - B - . E 1 / 6: : .1/ gfed tRF mkn R : : N ü 2 B 6 urdk f lind ü 2 B N B T o ed • 4 B ( • ( x ü / B: : 6: B: Bd M 4 66: ) 2 L b p P Flin r cb
  6. - 1 2 1 1 13 0 B 4: :

    8 C 45 .: BC 45 h BB 4 F: 8 45 ) ) ( 0 B 4: : 8 .: BC Wo T dbe P unPR0 B 4: : 8To P vyo wtWPR ü dbe PTo wt ( 1 2 21 ( 1 ( ( )1 1 1 1 1 B 1 2 ) iL e E :8 B ü a : : :4 (sx C )sx 0 B 4: : 8To PRE :8 B vyo P rn -12 e gW k /4 p lk W n wt 1 2 N E MF L
  7.    14 text 1 position_id 0 1 2

    3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocodile around in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012 4 9 : :olpi uT t ir T ?:B : r ShW Pz g : - B 7 : . B 7 : B 7 : ea g P g :# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B : B5 9colp nkm g s :# B:9 ?:B : _ ( x = - = B : B : B5 : ?:B : = /= = = . ?:B : _ : 232#0 _ B9: ++df b t _)
  8.  15 ['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming',

    'pool', 'with', 'a', 'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an', 'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]'] text 1 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 text two kids are playing in a swimming pool with a green colored crocodile float . ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012 text 2 position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 text two kids push an in ##fl ##atable crocod ile aroun d in a pool . ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012      
  9. 1 1: 1 16 [CLS] two kids are playing kids

    [SEP] two kids two [SEP] ( o % 8 ) o - n s 8 o EA B E n s o n s l r 4 5t c 5 t 0 11 20 3 , 11 5 e p 5 playing two two [CLS] two kids are [MASK] kids [SEP] dog kids [MASK] [SEP] pred2 pred3 pred4 kids pred1 cross entropy loss cross entropy loss cross entropy loss cross entropy loss
  10. 2 - 2 2 2 17 N I 1 2

    IsNext pred1 cross entropy loss e ( ) ) ) [CLS] two kids are playing kids [SEP] two kids two [SEP] /
  11.     18 #   unk_token =

    “[UNK]”, #  sep_token = “[SEP]”, #  pad_token = “[PAD]”, #   cls_token = “[CLS]”, #  mask_token = "[MASK]", # pre-training https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145
  12.     21 import tensorflow as tf from

    transformers import BertTokenizer, TFBertForSequenceClassification tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile float. [SEP]" text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]" tokenized_text = tokenizer.tokenize(text1 + " " + text2) print(tokenized_text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep) tokens_tensor = tf.Variable([indexed_tokens]) segments_tensors = tf.Variable([segments_ids]) model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased') print(model.summary()) outputs = model(tokens_tensor, token_type_ids=segments_tensors) print(outputs) r Pf BF E gd T - P C - c C fm nk Fph e R -
  13.       22 https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52 vocab PRETRAINED_VOCAB_FILES_MAP

    = { "vocab_file": { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt", } } https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52 weight TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = { "bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5", "bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5", "bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5", "bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5", "bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5", "bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5", … "bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5", "bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5", }
  14.      24 L . -/ .

    1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ## vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, hidden_act="gelu", hidden_dropout_prob=0.1, attention_probs_dropout_prob=0.1, max_position_embeddings=512, type_vocab_size=2, initializer_range=0.02, layer_norm_eps=1e-12, :
  15.    25 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer

    TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput : /1 //1 /. :. : : : :. : 1 / . //1 /. C :. : C ::1.1 1 F B : 21 / C
  16. TFBertForSequenceClassification      26 TFBertMainLayer TFBertPooler TFBertEmbedding

    TFBertEncorder TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) input attention_mask (n_seq,) Dense DD N Extract first seq a e i F h (dim, ) (dim, ) (n_seq, dim) = bgF (n_seq, dim) pooled_output sequence_output (dim,) (n_seq, dim) N N Dropout Dense DD N (dim, ) (n_class) output BERT c Dropout, Dense _B fTS dC
  17.     27 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler

    TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput input_ids position_ids token_type_ids input_embeds 8 ab pd c n heT( (n_seq,) (n_seq,) (n_seq,) (n_seq, dim) 8 , , 8 , , 8 o Weight [word_embeddings] Embedding [position_embeddings] Embedding [token_type_embeddings] gather (vocab_size, dim) (n_seq, dim) (type_vocab_size, dim) + (n_seq, dim) LayerNormalization Dropout hidden_status [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 max_position_embeddings = 512 ( 8 7 sq[ m N l i ! " ] E _f r ) 8 y / / n L gx 8 ] E k 6 6 , t z 8 input (n_seq, dim) (n_seq, dim) (n_seq, dim)
  18.    28 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer

    TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput TFBertEncoder hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBertLayer hidden_status 2 (n_seq, dim) 22 2 2 12
  19. TFBertLayer TFBertSelfOutput   29 TFBertForSequenceClassification TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler

    TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask input (n_seq, dim) (n_seq,) TFBertSelfAttention TFBertSelfOutput Dense Layer Normalization Dropout TFBertIntermediate Dense gelu TFBertOutput Dense Dropout Layer Normalization + hidden_status (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) (n_seq, dim) D A A =A
  20. A z c lnpo cbvs    30 TFBertForSequenceClassification

    TFBertMainLayer TFBertEmbedding TFBertEncorder TFBertPooler TFBertLayer TFBertLayer TFBertLayer TFBertLayer TFBert Attention TFBertSelfAttention TFBertSelfOutput TFBertIntermediate TFBertOutput hidden_status attention_mask multi head multi head multi head Dense [query (Q)] Dense [ Key (K)] Dense [ Value (V)] input = A u dcb - [mf g q ehi ] Qc[_a A DNMMb u[- vs N b ] tS (n_seq, dim) D = A e e AA A e Tb (n_seq, dim) (n_head, n_seq, dim/n_head) (n_head, n_seq, n_seq) (n_seq,) softmax Dropout hidden_status x _ [ A c cb attention_mask attention_mask attention_mask attention_mask (n_head, n_seq, n_seq) (n_head, n_seq, n_seq) (n_head, n_seq, dim/n_head) Reshape (n_seq, dim) x _ b Q vse QQ ] A A ( ) NtS M cbu AA A e AA A K [ + sMc KV A e r
  21. 31 [CLS] two kids are playing in a swimming pool

    … [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 1 2 = 1 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] dim/n_head =64 dim/n_head =64 dim/n_head =64 n_head = 12 multi head max_position_embeddings = 512
  22. [CLS] two kids are playing in a swimming pool …

    [SEP] [PAD] … [PAD] ( )( 32 max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] max_position_embeddings = 512 hidden_size=768 [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Q lh K lh i d  !" !# !$ !% !&## '" '# '$ !( !& '% '&## !( '$ i ) ( e D D ) ( am Key Query lh ( n D b )( D D g
  23. ( )( 33 [CLS] two kids are playing in a

    swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] softmax m g D x w tl Key Query QueryK “kids” KeyKo s K Softmax q i1 b q ehd K () f n sK (K K n ra
  24. ( )( 34 [CLS] two kids are playing in a

    swimming pool … [SEP] [PAD] … [PAD] [CLS] two kids are playing in a swimming pool … [SEP] [PAD] [PAD] Key Query [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] max_position_embeddings = 512 hidden_size=768 !" !# !$ !% !&## !' !& [CLS] two kids are playing in a swimming pool … [SEP] [PAD] … [PAD] hidden_size=768 [CLS] two kids are playing in a swimming pool [SEP] [PAD] [PAD] Qweight h D gd ( ) 1Q ( a ( ( ) hQ V gd ( ( ) i ( ( ( Q be
  25.  35 BERT: Pre-training of Deep Bidirectional Transformers for Language

    Understanding Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova https://arxiv.org/abs/1810.04805 Transformers https://huggingface.co/transformers 3rd party pre-trained    https://huggingface.co/models