Padmaja Bhagwat - Listen, Attend, and Walk : Interpreting natural language navigational instructions

Listen, Attend and Walk: Interpreting Natural Language Navigational Instructions PADMAJA
V BHAGWAT https://padmajavb.github.io/ 01

#NITK #Bachelors #MachineLearning #ArtificialNeuralNetworks @padmaja_ bhagwat 02

Natural Language Processing Why is hard? 03

04 Boy paralyzed after tumor fights back to gain black
belt Girl Hit By Car In Hospital For instance...

Instruction: take a left onto the red brick and go
a ways down until you come to the section with the butterflies on the wall. Introduction 05 http://www.cs.utexas.edu/users/ml/clamp/navigation/ Map of the “l” virtual environment Path: [(23, 23, 90), (23, 23, 0), (23, 22, 0), (23, 21, 0), (23, 20, 0), (23, 19, 0)] Action Sequence: [1, 0, 0, 0, 0, 3] Input = Instruction + Initial position + Map (world state) Output = Action sequence

Steps involved 06 Get data Transform Get vectors Build the
model Train Test and Simulate

SAIL route instruction dataset: http://www.cs.utexas.edu/users/ml/clamp/navigation/ 07 Get data

• Remove sentence with invalid action sequence • Remove stop-words
08 Transform

Convert NL instruction to vector • One - hot encoding
• Word2Vec Take the pink path to the red brick intersection 09

What model to use? 10

Encoder-Decoder Model 11 https://www.researchgate.net/figure/A-high-level-view-of-the-encoder-decoder-architecture-The-direction-of-arrows-show-the_fig21_324 706603

Overall Architecture 12 https://arxiv.org/pdf/1506.04089.pdf

Implemented using Bidirectional LSTM Natural language instruction X 1:N =
(X 1 , X 2 ,…..X N ) Hidden annotations h 1:N = (h 1 , h 2 ,….h N ) 13 Eg: Turn left after taking right https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66 Encoder

• h j summarizes the words up to and including
x j • concatenate forward and backward annotations Te - affine transformation σ - logistic sigmoid i e - Input gate of LSTM f e - forget gate of LSTM 0 e - output gate of LSTM h j is calculated as follows: 14 Encoder

import torch import torch.nn as nn class Encoder(nn.Module): def __init__(self,
input_size, hidden_size, bidirectionality = False): super(EncoderRNN, self).__init__() if bidirectionality is True: self.hidden_size = hidden_size self.hidden_size2 = hidden_size // 2 else: self.hidden_size = hidden_size # input_size = 524; hidden_size = 128 self.embedding = nn.Embedding(input_size, hidden_size) self.lstm = nn.LSTM(hidden_size, self.hidden_size2, bidirectional=True) 15 Encoder

def forward(self, input, hidden): lstm_input = self.embedding(input).view(1, 1, -1) output,
hidden = self.lstm(lstm_input, hidden) return output, hidden 16 Encoder

Multi-level Aligner Context vector z t is computed as follows:
Word Vector (X j ) + Hidden annotation (h j ) + Decoder’s prev. hidden state (S t-1 ) Context Vector (z t ) 17 Eg: Take the pink path to the red brick intersection

S t-1 - decoder hidden state at time t-1 x
j - input instruction j € {1, 2, 3,…. N} h j - hidden annotation v, W, U, V - learned parameters The weight α tj associated with each pair (x j , h j ) 18 Multi-level Aligner

Decoder Context Vector (Z t ) + World state (y
t ) + Decoder’s prev. hidden state (S t-1 ) Action Sequence (a t ) Implemented using LSTM 19 http://www.stratio.com/blog/deep-learning-3-recurrent-neural-networks-lstm/

Conditional probability distribution over the next action E - embedding
matrix L 0 , L s , L z - parameters to be learned 20 Decoder

Attention Decoder 21 MAX_LENGTH = 46 class AttentionDecoderRNN(nn.Module): def __init__(self,
input_size, hidden_size, world_state_size, output_size, max_length = MAX_LENGTH): “““ Initializing layers ””” super(AttentionDecoderRNN, self).__init__() self.embedding = nn.Embedding(input_size, hidden_size) self.lstm = nn.LSTM(hidden_size, hidden_size) self.input_hidden_combine = nn.Linear(hidden_size * 2, hidden_size) self.transform_beta = nn.Linear(hidden_size, 1) self.decoder_input = nn.Linear(hidden_size * 3, hidden_size) self.linear = nn.Linear(hidden_size * 2, hidden_size) self.out = nn.Linear(hidden_size, output_size) self.dense = nn.Linear(world_state_size, hidden_size)

def forward(self, input, world_state, hidden, encoder_outputs): “““ embedding the input
sentence ””” embed = self.embedding(input) embedded = Variable(torch.zeros(self.max_length, self.hidden_size)) for idx, e in enumerate(embed): embedded[idx] = e 22 “““ calculating beta ””” scope_attr = self.input_hidden_combine(torch.cat((embedded, encoder_outputs), 1)) beta_inprocess = scope_attr + hidden[0][0] beta = F.tanh(beta_inprocess) beta = self.transform_beta(beta) Attention Decoder

“““ calculating alpha ””” attn_weights = F.softmax( beta, dim =
0 ) 23 “““ calculating context vector ””” zt = torch.bmm( attn_weights.unsqueeze(0), scope_attr.unsqueeze(0) ) “““ calculating decoder output ””” combined_input = torch.cat ( ( world_state, hidden[0][0], zt[0] ), 1 ) input_to_decoder = self.decoder_input( combined_input ).unsqueeze(0) output, hidden = self.lstm( input_to_decoder, hidden ) output_ctx_combine = self.linear( torch.cat( ( output[0], zt[0] ), 1 ) ) qt = self.out( world_state + output_ctx_combine ) “““ calculating probability distribution ””” output = F.log_softmax( qt, dim = 1 ) return output, hidden, attn_weights1 Attention Decoder

Train the model Grid Jelly 24

Train the model STOP = 3 def train( idx_data, map_name,
input_variable, target_variable, action_seq, encoder, decoder ): 25 world_state = target_variable[0] # initialize the world state decoder_input = input_variable # initialize decoder input for di in range( action_length ): decoder_output, decoder_hidden, decoder_attention = AttentionDecoder(decoder_input, world_state, decoder_hidden, encoder_outputs) for ei in range( input_length ): encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden) topv, topi = decoder_output.data.topk(1); ni = topi[0][0] pos_curr = run_model.take_one_step(pos_curr, ni) world_state = run_model.get_feat_current_position(pos_curr, map_name) loss += criterion(decoder_output, action_seq[di]) if ni == STOP: break

Training Vs. Validation error Loss function: Negative log likelihood Optimizer:
Stochastic Gradient Descent 26

Test the model L 27

Conclusion and future work • Our system is successfully able
to generate action sequences corresponding to a novel navigational instruction • Proposed approach is limited to pre-processed textual input • Future work : Integrating computer vision along with NLU to build real time application 28

Python Libraries used NumPy and SciPy for various mathematical computation
on tensors PyTorch for building dynamic graphs PyGame for visualizing the virtual environment Matplotlib for visualizing the attention weights 29

GitHub https://github.com/PadmajaVB /listen-attend-and-walk 30

Thank you Team: Manisha Jhawar Nitya C K Padmaja V
Bhagwat Guide: Prof. Ananthanarayana VS https://github.com/PadmajaVB 31

Padmaja Bhagwat - Listen, Attend, and Walk : In...

Padmaja Bhagwat - Listen, Attend, and Walk : Interpreting natural language navigational instructions

PyCon 2018

More Decks by PyCon 2018

Other Decks in Programming

Featured

Transcript

Listen, Attend and Walk: Interpreting Natural Language Navigational Instructions PADMAJA

#NITK #Bachelors #MachineLearning #ArtificialNeuralNetworks @padmaja_ bhagwat 02

Natural Language Processing Why is hard? 03

04 Boy paralyzed after tumor fights back to gain black

Instruction: take a left onto the red brick and go

Steps involved 06 Get data Transform Get vectors Build the

SAIL route instruction dataset: http://www.cs.utexas.edu/users/ml/clamp/navigation/ 07 Get data

• Remove sentence with invalid action sequence • Remove stop-words

Convert NL instruction to vector • One - hot encoding

What model to use? 10

Encoder-Decoder Model 11 https://www.researchgate.net/figure/A-high-level-view-of-the-encoder-decoder-architecture-The-direction-of-arrows-show-the_fig21_324 706603

Overall Architecture 12 https://arxiv.org/pdf/1506.04089.pdf

Implemented using Bidirectional LSTM Natural language instruction X 1:N =

• h j summarizes the words up to and including

import torch import torch.nn as nn class Encoder(nn.Module): def init(self,

def forward(self, input, hidden): lstm_input = self.embedding(input).view(1, 1, -1) output,

Multi-level Aligner Context vector z t is computed as follows:

S t-1 - decoder hidden state at time t-1 x

Decoder Context Vector (Z t ) + World state (y

Conditional probability distribution over the next action E - embedding

Attention Decoder 21 MAX_LENGTH = 46 class AttentionDecoderRNN(nn.Module): def init(self,

def forward(self, input, world_state, hidden, encoder_outputs): “““ embedding the input

“““ calculating alpha ””” attn_weights = F.softmax( beta, dim =

Train the model Grid Jelly 24

Train the model STOP = 3 def train( idx_data, map_name,

Training Vs. Validation error Loss function: Negative log likelihood Optimizer:

Test the model L 27

Conclusion and future work • Our system is successfully able

Python Libraries used NumPy and SciPy for various mathematical computation

GitHub https://github.com/PadmajaVB /listen-attend-and-walk 30

Thank you Team: Manisha Jhawar Nitya C K Padmaja V