Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2019論文読み会_Language-Modeling-with-Shared-Grammar

 2019論文読み会_Language-Modeling-with-Shared-Grammar

Ikumi Yamashita

December 11, 2019
Tweet

More Decks by Ikumi Yamashita

Other Decks in Technology

Transcript

  1. Language Modeling with Shared Grammar Yuyu Zhang, Le Song ACL2019

    紹介者 : ⼭下郁海 (TMU B4 ⼩町研究室) 2019/12/11 @論⽂読み会2019
  2. Overview (contributions) • Grammar-sharing framework (Neural Variational Language Model :

    NVLM) Ø With the shared grammar, the framework helps language model efficiently transfer to new corpus with better performance and using shorter time. • End-to-end learning Ø NVLM can be end-to-end trained without syntactic annotation. • Efficient software package Ø NVLM’s parser is capable of parsing one million sentences per hour on a single GPU.
  3. Introduction Issues of existing language model • Models focusing on

    natural language latent structure, such as RNNGs and PRPNs, require syntactic annotations. Ø accurate syntactic annotation is very costly Ø tree bank data is typically small-scale and not open to the public • The RNN language model performs terribly when training and testing on different datasets. • Training from scratch on every new corpus is obviously not good enough. Ø computationally expensive and not data-efficient Ø The size of target corpus may be too small To bridge the gap of language modeling on different corpora, grammar is the key since all corpora are in the same language and should share the same grammar.
  4. Constituency parser • RNN encoder, RNN decoder (They used separate

    weights and word embeddings for the encoder and the decoder.) • Parse tree is linearized as a bracket representation example : (S (NP NNP ) (VP VBZ (NP DT NN ) ) . )
  5. • Using RNN model • Mixed tree is linearized as

    a bracket representation • Using a sentence word attaching algorithm Joint generative model (z is the mixed parse tree of x and y.)
  6. Learning schemes • Supervised : Ø sentences (x) and their

    corresponding parse trees (y) are available • Distant-supervised : Ø a pre-trained parser and a new corpus without parsing annotations • Semi-supervised : Ø no parsing annotations available
  7. Learning schemes • Supervised : Ø sentences (x) and their

    corresponding parse trees (y) are available
  8. Learning schemes • Distant-supervised : Ø a pre-trained parser and

    a new corpus without parsing annotations are available Fix the parser to train the joint generative model on new corpus. The generative model can be either trained from scratch on new corpus or warmed-up on annotated training data
  9. Learning schemes • Semi-supervised : Ø no parsing annotations available

    Unlike distant-supervised learning, the parser and the joint generative model together, and co-update the parameters % and & . Unfortunately, the derivative of is computationally intractable due to the large space of
  10. Learning schemes (policy gradient algorithm) (stabilize and standardize) (variational EM)

    • Semi-supervised : Ø no parsing annotations available
  11. Experiments settings • Datasets : Penn Tree-bank (PTB), One Billion

    Word Benchmark (OBWB) • Tasks : Ø Supervised learning (separately training both the parser and the joint generative model on PTB) Ø Distant-supervised learning (pre-training the parser on PTB, and fixing the parser to train the joint generative model on OBWB) Ø Semi-supervised learning (training the parser and the generative model together on OBWB) • Evaluation : per-word perplexity
  12. Experiments results 1 • Supervised learning Test perplexity on PTB

    Test parsing performance Comparsion of parser training and testing speed
  13. Experiments results 2 • Distant-supervised learning scratch : training the

    joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model Test perplexity curves on the subsampled OBWB dataset Test perplexity on the subsampled OBWB dataset
  14. Experiments results 3 • Distant-supervised learning results scratch : training

    the joint generative model on OBWB warmed : training the joint generative model on OBWB with PTB warmed-up model • Semi-supervised learning results fine-tuned : training the parser and the generative model together
  15. Summary • NVLM a framework for grammar-sharing language modeling Ø

    Parser Ø Jointly generative model • Grammar knowledge helps language model quickly land on new corpus • Algorithms for jointly training the two components to fine-tune the language model on new corpus without parsing annotations Ø Convergence speed and perplexity on new corpus are improved