文献紹介_20171110_QRNN _ Quasi-Recurrent Neural Networks

James Bradbury∗, Stephen Merity∗ , Caiming Xiong & Richard Socher
Salesforce Research Palo Alto, California arXiv:1611.01576v2 [cs.NE] 21 Nov 2016 ICLR 2017 accepted 文献紹介 QRNN: QUASI-RECURRENT NEURAL NETWORKS

Abstract - QRNN = RNN processing like CNN - can
process sequential data in parallel - up to 16 times faster than LSTM in train/test - can make visual analysis of weights easy

Outline - Introduction - review of RNN/LSTM - Model (QRNN)
- Variants - Results - sentiment classification - language modeling - character-level machine translation - Conclusion - Reference

Introduction (review of RNN) - the standard model architecture for
deep learning approaches to sequence modeling tasks - sentence classification | word- and character-level language modeling | machine translation | question answering | image caption | time series forecasting

Introduction (review of RNN) - the network which has loop
arhictectures - RNN is very deep (causing gradient vanising) word2vec(“私”）昨日の株価 (“の”:0.2, “は”:0.3, ...) 今日の株価の予測値

Introduction (review of RNN) - problem: not good at learning
very long sequences - document classification | character-level - why?: can’t deal with sequential data in parallel

Introduction (review of LSTM) - LSTM solves gradient vanising, using
memory cell - LSTM has 3 gates to control information flow

Introduction (review of LSTM) - forget gate to control long-term
information (in memory cell c)

Introduction (review of LSTM) - input gate to control current+short-time
information (in x and h(t-1))

Introduction (review of LSTM) - update memory cell, mixing the
current with the previos memory cell

- output gate to control current hidden-state information to the
next layer Introduction (review of LSTM)

- using a forget gate instead of an input gate
Introduction (variants of LSTM)

Model (convolution component) “ズン”, “ドコ”, “きよし” ( 1, 0, 0,
) =“ズン” この例はone-hotだが word2vecというもっといい変換を使うズン, ズン, ズン, ドコ, きよし時刻tの値を予測するのに未来の時刻 t+1のデータを用いてはいけないので、 masked convolution

bottle-neckになっていた前の層のhidden state h[t-1] を用いるのではなく、前の時刻の入力x[t-1/2/...]を用いて並列処理を可能にした。 Model (pooling component) LSTM
さらに、hidden state h に重みをかけずに渡していくので、各要素の情報がごっちゃにならないので可視化しやすい。ここは従来のLSTMと同じく逐次計算するが、そんなに大して時間かからない。

Model (pooling component) other type poolings （この論文では使われていない？） f-pooling ifo-pooling

Variants - Zoneout: Dropout for LSTM - skip-connection like DenseNet
- Attention for Encoder-Decoder

Experiments - Sentiment Classification (document binary-classification) - IMDb movie review
- 25,000 positive/negative reviews - Language Modeling (word-level prediction) - PTB: Penn Treebank - Character-level Machine Translatoin - IWST English-German spoken language translation task

Results (sentiment classification) - 小 batch_size, 長 seq_len に向いている（最大16倍早かった。）
- training 時間は3倍早い

Results (sentiment classification) final layer’s hidden state

Results (language modeling)

Results (character-level machine translation) BLEU: upper is better http://unicorn.ike.tottori-u.ac.jp/2010/s072046/paper/graduation-thesis/node32.html

考察 - LSTMに精度で少し負けてしまった理由は、隠れ層の状態 h[t-1] ではなく、直前の入力 x[t-1|t-2|,...]を使って近似したからと考えられる。 - 入力で、隠れ層の状態を近似する場合、使う、前の時刻の filter
size k を無限大まで長くすれば一致する。(sentiment classificationのtaskではkを大きくしたら精度上がった） - なので、filter-sizeを大きくすればいいが、そうすると、計算速度はどれほど落ちるのかが問題。

Conclusion - QRNN = RNN processing like CNN - can
process sequential data in parallel - up to 16x faster than LSTM in train/test - can make visual analysis of weights easy

Reference - LSTM - LSTMネットワークの概要 https://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca - わかるLSTM ～最近の動向と共に
https://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca - ニューラルネットワーク勉強会 http://isw3.naist.jp/~neubig/student/2015/seitaro-s/161025neuralnet_study_LSTM.pdf - conv の 3D図作成 - thinkercad https://www.tinkercad.com/ - QRNN - LSTMを超える期待の新星、QRNN https://qiita.com/icoxfog417/items/d77912e10a7c60ae680e - slideshare https://www.slideshare.net/DeepLearningJP2016/dlquasirecurrent-neural-networks?qid=a4ead77d-d8dd-458b-965c-5e53723d7757 &v=&b=&from_search=1 - pytorchでの公式実装 https://github.com/salesforce/pytorch-qrnn/blob/master/torchqrnn/qrnn.py

文献紹介_20171110_QRNN _ Quasi-Recurrent Neural Net...

文献紹介_20171110_QRNN _ Quasi-Recurrent Neural Networks

hrsma2i

More Decks by hrsma2i

Other Decks in Research

Featured

Transcript

James Bradbury∗, Stephen Merity∗ , Caiming Xiong & Richard Socher

Abstract - QRNN = RNN processing like CNN - can

Outline - Introduction - review of RNN/LSTM - Model (QRNN)

Introduction (review of RNN) - the standard model architecture for

Introduction (review of RNN) - the network which has loop

Introduction (review of RNN) - problem: not good at learning

Introduction (review of LSTM) - LSTM solves gradient vanising, using

Introduction (review of LSTM) - forget gate to control long-term

Introduction (review of LSTM) - input gate to control current+short-time

Introduction (review of LSTM) - update memory cell, mixing the

- output gate to control current hidden-state information to the

- using a forget gate instead of an input gate

Model

Model (convolution component) “ズン”, “ドコ”, “きよし” ( 1, 0, 0,

Model

bottle-neckになっていた前の層のhidden state h[t-1] を用いるのではなく、前の時刻の入力x[t-1/2/...]を用いて並列処理を可能にした。 Model (pooling component) LSTM

Model (pooling component) other type poolings （この論文では使われていない？） f-pooling ifo-pooling

Variants - Zoneout: Dropout for LSTM - skip-connection like DenseNet

Experiments - Sentiment Classification (document binary-classification) - IMDb movie review

Results (sentiment classification) - 小 batch_size, 長 seq_len に向いている（最大16倍早かった。）

Results (sentiment classification) final layer’s hidden state

Results (language modeling)

Results (character-level machine translation) BLEU: upper is better http://unicorn.ike.tottori-u.ac.jp/2010/s072046/paper/graduation-thesis/node32.html

考察 - LSTMに精度で少し負けてしまった理由は、隠れ層の状態 h[t-1] ではなく、直前の入力 x[t-1|t-2|,...]を使って近似したからと考えられる。 - 入力で、隠れ層の状態を近似する場合、使う、前の時刻の filter

Conclusion - QRNN = RNN processing like CNN - can

Reference - LSTM - LSTMネットワークの概要 https://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca - わかるLSTM ～最近の動向と共に