$30 off During Our Annual Pro Sale. View Details »

Recurrent neural network based language model

Recurrent neural network based language model

Tomas Mikolov, Martin Karafiat, Lukas Burget, JanCernocky, and Sanjeev Khudanpur. Recurrent neural network based language model. In 11th Annual Conference of the International Speech Communication Association, pp.1045–1048, 2010.

written by Nguyen Van Hai.

More Decks by 自然言語処理研究室

Other Decks in Research

Transcript

  1. 文献紹介 平成29年5月23日(火)
    Recurrent Neural Network
    based Language Model
    長岡技術科学大学
    自然言語処理研究室 修士2年
    NGUYEN VAN HAI

    View Slide

  2. Information
    2
    Tomas Mikolov, Martin Karafiat, Lukas Burget,
    JanCernocky, and Sanjeev Khudanpur
    Recurrent neural network based language model
    In 11th Annual Conference of the International
    Speech Communication Association, pp.1045–
    1048, 2010

    View Slide

  3. 1. Introduction
    • Statistical language modeling:
    • Predict the next word in textual data
    • Special language domain:
    • Sentence must be described by parse trees
    • Morphology of words, syntax and semantics
    • There are some significant progress in language model
    • Measure by ability of models to better predict sequential data
    3

    View Slide

  4. 2. Model Description
    • Simple Recurrent Neural Network
    • Optimization
    4

    View Slide

  5. 2.1 Simple Recurrent Neural
    Network
    5

    View Slide

  6. 2.1 Simple Recurrent Neural
    Network
    6
    • Networks are trained in several epochs
    • Weights are initialized to small values
    • Train network by standard backpropagation
    algorithm with stochastic gradient descent
    • Error vector:

    View Slide

  7. 2.2 Optimization
    7
    • Word-probabilities:

    View Slide

  8. 3. Experiments
    • Wall Street Journal (WSJ) Experiments
    • NIST Rich Transcription Evaluation 2005 (RT05)
    Experiments
    8

    View Slide

  9. 3.1 WSJ Experiments
    • Training corpus
    • 37M words from NYT section of English Gigaword
    • Training 6.4M words (300K sentences)
    • Perplexity evaluated on 230K words
    • Kneser-Ney smoothed 5-gram as KN5
    • RNN 90/2
    • Hidden layer size is 90
    • Threshold for merging words to rare token is 2
    9

    View Slide

  10. 3.1 WSJ Experiments
    10

    View Slide

  11. 3.1 WSJ Experiments
    11

    View Slide

  12. 3.1 WSJ Experiments
    12

    View Slide

  13. 3. NIST RT05 Experiments
    13

    View Slide

  14. Conclusion and future work
    • In WSJ, WER
    • Around 18% with the same data
    • Around 12% when backoff model is trained with
    data 5 times than RNN model
    • NIST RT05 can outperform big backoff models
    14
    Vietnamese Morphological Analysis
    2017/05/17

    View Slide