QIU XP, et al. Pre-trained Models for Natural Language Processing: A Survey March (2020) 17
Table 5: Resources of PTMs
Resource Description URL
Open-Source Implementations §
word2vec CBOW,Skip-Gram https://github.com/tmikolov/word2vec
GloVe Pre-trained word vectors https://nlp.stanford.edu/projects/glove
FastText Pre-trained word vectors https://github.com/facebookresearch/fastText
Transformers Framework: PyTorch&TF, PTMs: BERT, GPT-2, RoBERTa, XLNet, etc. https://github.com/huggingface/transformers
Fairseq Framework: PyTorch, PTMs:English LM, German LM, RoBERTa, etc. https://github.com/pytorch/fairseq
Flair Framework: PyTorch, PTMs:BERT, ELMo, GPT, RoBERTa, XLNet, etc. https://github.com/flairNLP/flair
AllenNLP [47] Framework: PyTorch, PTMs: ELMo, BERT, GPT-2, etc. https://github.com/allenai/allennlp
fastNLP Framework: PyTorch, PTMs: RoBERTa, GPT, etc. https://github.com/fastnlp/fastNLP
UniLMs Framework: PyTorch, PTMs: UniLM v1&v2, MiniLM, LayoutLM, etc. https://github.com/microsoft/unilm
Chinese-BERT [29] Framework: PyTorch&TF, PTMs: BERT, RoBERTa, etc. (for Chinese) https://github.com/ymcui/Chinese-BERT-wwm
BERT [36] Framework: TF, PTMs: BERT, BERT-wwm https://github.com/google-research/bert
RoBERTa [117] Framework: PyTorch https://github.com/pytorch/fairseq/tree/master/examples/roberta
XLNet [209] Framework: TF https://github.com/zihangdai/xlnet/
ALBERT [93] Framework: TF https://github.com/google-research/ALBERT
T5 [144] Framework: TF https://github.com/google-research/text-to-text-transfer-transformer
ERNIE(Baidu) [170, 171] Framework: PaddlePaddle https://github.com/PaddlePaddle/ERNIE
CTRL [84] Conditional Transformer Language Model for Controllable Generation. https://github.com/salesforce/ctrl
BertViz [185] Visualization Tool https://github.com/jessevig/bertviz
exBERT [65] Visualization Tool https://github.com/bhoov/exbert
TextBrewer [210] PyTorch-based toolkit for distillation of NLP models. https://github.com/airaria/TextBrewer
DeepPavlov Conversational AI Library. PTMs for the Russian, Polish, Bulgarian,
Czech, and informal English.
https://github.com/deepmipt/DeepPavlov
Corpora
OpenWebText Open clone of OpenAI’s unreleased WebText dataset. https://github.com/jcpeterson/openwebtext
Common Crawl A very large collection of text. http://commoncrawl.org/
WikiEn English Wikipedia dumps. https://dumps.wikimedia.org/enwiki/
Other Resources
Paper List https://github.com/thunlp/PLMpapers
Paper List https://github.com/tomohideshibata/BERT-related-papers
Paper List https://github.com/cedrickchee/awesome-bert-nlp
Bert Lang Street A collection of BERT models with reported performances on di↵erent
datasets, tasks and languages.
https://bertlang.unibocconi.it/
§ Most papers for PTMs release their links of o cial version. Here we list some popular third-party and o cial implementations.
However, motivated by the fact that the progress in recent
years has eroded headroom on the GLUE benchmark dra-
matically, a new benchmark called SuperGLUE [189] was
presented. Compared to GLUE, SuperGLUE has more chal-
lenging tasks and more diverse task formats (e.g., coreference
resolution and question answering).
State-of-the-art PTMs are listed in the corresponding leader-
board4) 5).
(HotpotQA) [208].
BERT creatively transforms the extractive QA task to the
spans prediction task that predicts the starting span as well
as the ending span of the answer [36]. After that, PTM as
an encoder for predicting spans has become a competitive
baseline. For extractive QA, Zhang et al. [215] proposed a ret-
rospective reader architecture and initialize the encoder with
PTM (e.g., ALBERT). For multi-round generative QA, Ju