2019論文読み会_Language-Modeling-with-Shared-Grammar

Language Modeling with Shared Grammar Yuyu Zhang, Le Song ACL2019
紹介者 : ⼭下郁海 (TMU B4 ⼩町研究室) 2019/12/11 @論⽂読み会2019

Overview (contributions) • Grammar-sharing framework (Neural Variational Language Model :
NVLM) Ø With the shared grammar, the framework helps language model efficiently transfer to new corpus with better performance and using shorter time. • End-to-end learning Ø NVLM can be end-to-end trained without syntactic annotation. • Efficient software package Ø NVLM’s parser is capable of parsing one million sentences per hour on a single GPU.

Introduction Issues of existing language model • Models focusing on
natural language latent structure, such as RNNGs and PRPNs, require syntactic annotations. Ø accurate syntactic annotation is very costly Ø tree bank data is typically small-scale and not open to the public • The RNN language model performs terribly when training and testing on different datasets. • Training from scratch on every new corpus is obviously not good enough. Ø computationally expensive and not data-efficient Ø The size of target corpus may be too small To bridge the gap of language modeling on different corpora, grammar is the key since all corpora are in the same language and should share the same grammar.

Framework

Constituency parser • RNN encoder, RNN decoder (They used separate
weights and word embeddings for the encoder and the decoder.) • Parse tree is linearized as a bracket representation example : (S (NP NNP ) (VP VBZ (NP DT NN ) ) . )

• Using RNN model • Mixed tree is linearized as
a bracket representation • Using a sentence word attaching algorithm Joint generative model (z is the mixed parse tree of x and y.)

Learning schemes • Supervised : Ø sentences (x) and their
corresponding parse trees (y) are available • Distant-supervised : Ø a pre-trained parser and a new corpus without parsing annotations • Semi-supervised : Ø no parsing annotations available

Learning schemes • Supervised : Ø sentences (x) and their
corresponding parse trees (y) are available

Learning schemes • Distant-supervised : Ø a pre-trained parser and
a new corpus without parsing annotations are available Fix the parser to train the joint generative model on new corpus. The generative model can be either trained from scratch on new corpus or warmed-up on annotated training data

Learning schemes • Semi-supervised : Ø no parsing annotations available
Unlike distant-supervised learning, the parser and the joint generative model together, and co-update the parameters % and & . Unfortunately, the derivative of is computationally intractable due to the large space of

Learning schemes (policy gradient algorithm) (stabilize and standardize) (variational EM)
• Semi-supervised : Ø no parsing annotations available

Experiments settings • Datasets : Penn Tree-bank (PTB), One Billion
Word Benchmark (OBWB) • Tasks : Ø Supervised learning (separately training both the parser and the joint generative model on PTB) Ø Distant-supervised learning (pre-training the parser on PTB, and fixing the parser to train the joint generative model on OBWB) Ø Semi-supervised learning (training the parser and the generative model together on OBWB) • Evaluation : per-word perplexity

Experiments results 1 • Supervised learning Test perplexity on PTB
Test parsing performance Comparsion of parser training and testing speed

Experiments results 2 • Distant-supervised learning scratch : training the
joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model Test perplexity curves on the subsampled OBWB dataset Test perplexity on the subsampled OBWB dataset

Experiments results 3 • Distant-supervised learning results scratch : training
the joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model • Semi-supervised learning results fine-tuned : training the parser and the generative model together

Summary • NVLM a framework for grammar-sharing language modeling Ø
Parser Ø Jointly generative model • Grammar knowledge helps language model quickly land on new corpus • Algorithms for jointly training the two components to fine-tune the language model on new corpus without parsing annotations Ø Convergence speed and perplexity on new corpus are improved

2019論文読み会_Language-Modeling-with-Shared-Grammar

2019論文読み会_Language-Modeling-with-Shared-Grammar

Ikumi Yamashita

More Decks by Ikumi Yamashita

Other Decks in Technology

Featured

Transcript

Language Modeling with Shared Grammar Yuyu Zhang, Le Song ACL2019

Overview (contributions) • Grammar-sharing framework (Neural Variational Language Model :

Introduction Issues of existing language model • Models focusing on

Framework

Constituency parser • RNN encoder, RNN decoder (They used separate

• Using RNN model • Mixed tree is linearized as

Learning schemes • Supervised : Ø sentences (x) and their

Learning schemes • Supervised : Ø sentences (x) and their

Learning schemes • Distant-supervised : Ø a pre-trained parser and

Learning schemes • Semi-supervised : Ø no parsing annotations available

Learning schemes (policy gradient algorithm) (stabilize and standardize) (variational EM)

Experiments settings • Datasets : Penn Tree-bank (PTB), One Billion

Experiments results 1 • Supervised learning Test perplexity on PTB

Experiments results 2 • Distant-supervised learning scratch : training the

Experiments results 3 • Distant-supervised learning results scratch : training

Summary • NVLM a framework for grammar-sharing language modeling Ø