Slide 49
Slide 49 text
Easy to Try!
Official implementation is available at:
https://github.com/tatHi/maxmatch_dropout
from transformers import BertTokenizer
tknzr = BertTokenizer.from_pretrained('bert-base-cased’)
import maxMatchTokenizer
mmt = maxMatchTokenizer.MaxMatchTokenizer()
mmt.loadBertTokenizer(tknzr, doNaivePreproc=True)
mmt.tokenize(‘hello, wordpiece!’, p=0.5)
# outputs: ['hello', ',', 'w', '##ord', '##piece', '!']
BertTokenizer can be
directly loaded!
BertTokenizerに組み込んでpull request出したいなあと思いつつ半年が経っている
2022/12/14 NLPコロキウム 49