NVLM) Ø With the shared grammar, the framework helps language model efficiently transfer to new corpus with better performance and using shorter time. • End-to-end learning Ø NVLM can be end-to-end trained without syntactic annotation. • Efficient software package Ø NVLM’s parser is capable of parsing one million sentences per hour on a single GPU.
natural language latent structure, such as RNNGs and PRPNs, require syntactic annotations. Ø accurate syntactic annotation is very costly Ø tree bank data is typically small-scale and not open to the public • The RNN language model performs terribly when training and testing on different datasets. • Training from scratch on every new corpus is obviously not good enough. Ø computationally expensive and not data-efficient Ø The size of target corpus may be too small To bridge the gap of language modeling on different corpora, grammar is the key since all corpora are in the same language and should share the same grammar.
a new corpus without parsing annotations are available Fix the parser to train the joint generative model on new corpus. The generative model can be either trained from scratch on new corpus or warmed-up on annotated training data
Unlike distant-supervised learning, the parser and the joint generative model together, and co-update the parameters % and & . Unfortunately, the derivative of is computationally intractable due to the large space of
Word Benchmark (OBWB) • Tasks : Ø Supervised learning (separately training both the parser and the joint generative model on PTB) Ø Distant-supervised learning (pre-training the parser on PTB, and fixing the parser to train the joint generative model on OBWB) Ø Semi-supervised learning (training the parser and the generative model together on OBWB) • Evaluation : per-word perplexity
joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model Test perplexity curves on the subsampled OBWB dataset Test perplexity on the subsampled OBWB dataset
the joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model • Semi-supervised learning results fine-tuned : training the parser and the generative model together
Parser Ø Jointly generative model • Grammar knowledge helps language model quickly land on new corpus • Algorithms for jointly training the two components to fine-tune the language model on new corpus without parsing annotations Ø Convergence speed and perplexity on new corpus are improved