Upgrade to Pro — share decks privately, control downloads, hide ads and more …

English to Katakana with Sequence-to-Sequence Learning in Keras

English to Katakana with Sequence-to-Sequence Learning in Keras

A presentation in TensorFlow Tokyo Meetup about building Sequence-to-Sequence model in Keras to write Katakana.

This presentation is originally from a blog post:
https://wanasit.github.io/english-to-katakana-using-sequence-to-sequence-in-keras.html

Wanasit Tanakitrungruang

August 15, 2017
Tweet

Other Decks in Technology

Transcript

  1. About me • Search Quality @ Indeed • #1 Jobsearch

    Website • Information Extraction • e.g. Skills, Salary • Love working with Text/NLP • Lucene, RegEx, Text Search • Deep learning NLP
  2. “From Data to Deployment: Full Stack Data Science” • Wed,

    16 Aug 2017 19:00 - 21:00 (Tomorrow) • Indeed Tokyo Tech Office, Ebisu Garden Place Tower (This building 32F) • Register at Meetup/Doorkeeper. Search “Indeed tech talk” • Free beer/pizza !! Indeed’s Tech Talk
  3. • Issue #550: Translate.py - hard to detect convergence •

    Issue #600: Translate (Seq2Seq) Tutorial Expectations Problems with TF’s tutorial
  4. DL Machine Translation • Require large datasets • e.g. 20

    GBs translation pairs • It’s unlikely you would find them in the real world • Require long training time • Training Time >>> Writing Code • e.g. one-line change, then wait 12 hours • No fine-tuning. • No pre-trained VGG, Inception etc.
  5. Why Katakana • A smaller machine translation problem • Require

    a smaller dataset • Wikipedia title pairs (on my github) • Require a smaller model • 1-layer Seq2Seq without attention • a few hours on CPU
  6. Introduction to Sequence-to-Sequence • Recurrent Neural Network (RNN / LSTM),

    Encoder / Decoder • References ◦ Sequence-to-Sequence (paper) ◦ Understanding LSTM Networks ◦ The Unreasonable Effectiveness of Recurrent Neural Networks
  7. RNN to summarize input Take the output after feed all

    the input not that terribly bad “... not that terribly bad” ~ Good? ...
  8. RNN to generate output Keep sampling next output from RNN

    (Language Model) <START> Hi , my name ...
  9. RNN to generate output ++ Adding previous output back to

    input helps RNN memorize <START> Hi , my name ... Hi , my
  10. Sequence-to-Sequence Model Use two RNNs to read input / write

    output B A A ... <> バ バ ナ ... ナ ナ Encoder Decoder ... ... ...
  11. Sequence-to-Sequence Model in Keras • References ◦ Getting started with

    the Keras functional API ◦ Keras’s Recurrent Layers
  12. Keras functional API 3 layers neural network in Keras from

    keras.layers import Input, Dense from keras.models import Model inputs = Input(shape=(784,)) x = Dense(64, activation='relu')(inputs) x = Dense(64, activation='relu')(x) predictions = Dense(10, activation='softmax')(x) model = Model(inputs=inputs, outputs=predictions) model.compile(optimizer='sgd', loss='categorical_crossentropy') model.fit(data, labels)
  13. Sequence-to-Sequence Model B A A ... <> バ バ ナ

    ... ナ ナ Encoder Decoder ... ... ...
  14. Encoder in Keras encoder = Embedding( Input_dict_size, 64, input_length=ENGLISH_LENGTH, mask_zero=True

    )(encoder_input) encoder = LSTM( units=64, return_sequences=False )(encoder) LSTM LSTM B A LSTM A ... encoder_input ... Embedding <encoder_output>
  15. Decoder in Keras decoder = Embedding( output_dict_size, 64, input_length=KATAKANA_LENGTH, mask_zero=True

    )( decoder_input ) decoder = LSTM( 64, return_sequences=True )( decoder, initial_state=[encoder, encoder] ) decoder = TimeDistributed( Dense(output_dict_size, activation="softmax") )( decoder ) LSTM LSTM <> バ バ ナ ... LSTM ナ ナ ... ... decoder_input decoder_output Embedding Dense Dense Dense <encoder_output>
  16. Combining them into a model encoder = Embedding( input_dict_size, 64,

    input_length=ENGLISH_LENGTH, mask_zero=True)(encoder_input) encoder = LSTM(64,return_sequences=False)(encoder) decoder = Embedding( output_dict_size,64,input_length=KATAKANA_LENGTH,mask_zero=True)(decoder_input) decoder = LSTM(units=64,return_sequences=True)(decoder, initial_state=[encoder, encoder]) decoder = TimeDistributed(Dense(output_dict_size, activation="softmax"))(decoder) model = Model(inputs=[encoder_input, decoder_input], outputs=[decoder]) model.compile(optimizer='adam', loss='binary_crossentropy') model.fit( x=[training_encoder_input, training_decoder_input], y=[training_decoder_output], batch_size=64, epochs=60)
  17. Summary • Why Katakana? • Introduction to Sequence-to-Sequence Model •

    Sequence-to-Sequence Model in Keras Try training machine to write Katakana