English to Katakana with Sequence-to-Sequence Learning in Keras

Slide 1

Slide 1 text

English to Katakana with Sequence-to-Sequence in Keras Wanasit T.

Slide 2

Slide 2 text

About me • Search Quality @ Indeed • #1 Jobsearch Website • Information Extraction • e.g. Skills, Salary • Love working with Text/NLP • Lucene, RegEx, Text Search • Deep learning NLP

Slide 3

Slide 3 text

“From Data to Deployment: Full Stack Data Science” • Wed, 16 Aug 2017 19:00 - 21:00 (Tomorrow) • Indeed Tokyo Tech Office, Ebisu Garden Place Tower (This building 32F) • Register at Meetup/Doorkeeper. Search “Indeed tech talk” • Free beer/pizza !! Indeed’s Tech Talk

Slide 4

Slide 4 text

Outline • Why Katakana? • Introduction to Sequence-to-Sequence Learning • Sequence-to-Sequence Learning in Keras

Slide 5

Slide 5 text

TensorFlow’s Seq2Seq Tutorial

Slide 6

Slide 6 text

• Issue #550: Translate.py - hard to detect convergence • Issue #600: Translate (Seq2Seq) Tutorial Expectations Problems with TF’s tutorial

Slide 7

Slide 7 text

DL Machine Translation • Require large datasets • e.g. 20 GBs translation pairs • It’s unlikely you would find them in the real world • Require long training time • Training Time >>> Writing Code • e.g. one-line change, then wait 12 hours • No fine-tuning. • No pre-trained VGG, Inception etc.

Slide 8

Slide 8 text

Why Katakana • A smaller machine translation problem • Require a smaller dataset • Wikipedia title pairs (on my github) • Require a smaller model • 1-layer Seq2Seq without attention • a few hours on CPU

Slide 9

Slide 9 text

Introduction to Sequence-to-Sequence ● Recurrent Neural Network (RNN / LSTM), Encoder / Decoder ● References ○ Sequence-to-Sequence (paper) ○ Understanding LSTM Networks ○ The Unreasonable Effectiveness of Recurrent Neural Networks

Slide 10

Slide 10 text

Recurrent Neural Network (RNN) (credit: colah’s blog, Understanding LSTM Network)

Slide 11

Slide 11 text

Recurrent Neural Network (RNN) (credit: colah’s blog, Understanding LSTM Network)

Slide 12

Slide 12 text

Recurrent Neural Network (RNN) (credit: Andrej Karpathy blog)

Slide 13

Slide 13 text

RNN to summarize input Take the output after feed all the input not that terribly bad “... not that terribly bad” ~ Good? ...

Slide 14

Slide 14 text

RNN to generate output Keep sampling next output from RNN (Language Model) Hi , my name ...

Slide 15

Slide 15 text

RNN to generate output ++ Adding previous output back to input helps RNN memorize Hi , my name ... Hi , my

Slide 16

Slide 16 text

Sequence-to-Sequence Model Use two RNNs to read input / write output B A A ... <> ババナ ... ナナ Encoder Decoder ... ... ...

Slide 17

Slide 17 text

Sequence-to-Sequence Model in Keras ● References ○ Getting started with the Keras functional API ○ Keras’s Recurrent Layers

Slide 18

Slide 18 text

Keras functional API 3 layers neural network in Keras from keras.layers import Input, Dense from keras.models import Model inputs = Input(shape=(784,)) x = Dense(64, activation='relu')(inputs) x = Dense(64, activation='relu')(x) predictions = Dense(10, activation='softmax')(x) model = Model(inputs=inputs, outputs=predictions) model.compile(optimizer='sgd', loss='categorical_crossentropy') model.fit(data, labels)

Slide 19

Slide 19 text

Sequence-to-Sequence Model B A A ... <> ババナ ... ナナ Encoder Decoder ... ... ...

Slide 20

Slide 20 text

Encoder in Keras encoder = Embedding( Input_dict_size, 64, input_length=ENGLISH_LENGTH, mask_zero=True )(encoder_input) encoder = LSTM( units=64, return_sequences=False )(encoder) LSTM LSTM B A LSTM A ... encoder_input ... Embedding

Slide 21

Slide 21 text

Decoder in Keras decoder = Embedding( output_dict_size, 64, input_length=KATAKANA_LENGTH, mask_zero=True )( decoder_input ) decoder = LSTM( 64, return_sequences=True )( decoder, initial_state=[encoder, encoder] ) decoder = TimeDistributed( Dense(output_dict_size, activation="softmax") )( decoder ) LSTM LSTM <> ババナ ... LSTM ナナ ... ... decoder_input decoder_output Embedding Dense Dense Dense

Slide 22

Slide 22 text

Combining them into a model encoder = Embedding( input_dict_size, 64, input_length=ENGLISH_LENGTH, mask_zero=True)(encoder_input) encoder = LSTM(64,return_sequences=False)(encoder) decoder = Embedding( output_dict_size,64,input_length=KATAKANA_LENGTH,mask_zero=True)(decoder_input) decoder = LSTM(units=64,return_sequences=True)(decoder, initial_state=[encoder, encoder]) decoder = TimeDistributed(Dense(output_dict_size, activation="softmax"))(decoder) model = Model(inputs=[encoder_input, decoder_input], outputs=[decoder]) model.compile(optimizer='adam', loss='binary_crossentropy') model.fit( x=[training_encoder_input, training_decoder_input], y=[training_decoder_output], batch_size=64, epochs=60)

Slide 23

Slide 23 text

Demo

Slide 24

Slide 24 text

Summary • Why Katakana? • Introduction to Sequence-to-Sequence Model • Sequence-to-Sequence Model in Keras Try training machine to write Katakana