Slide 1

Slide 1 text

Empirical Analysis of LSTM Performance Robin Ranjit Singh Chauhan https://twitter.com/robinc

Slide 2

Slide 2 text

Origins I was at a vantech data science meetup, Harbourfront Center Vancouver circa 2017: “... I wonder how powerful an LSTM cell is…?” ● LSTMs = Long Short-Term Memory ○ S. Hochreiter, J. Schmidhuber, ”Long short-term memory” ● These are unit types in deep learning neural networks ● LSTMs are used for sequence-related problems ● Typically trained by gradient descent methods Looking out at rainy Stanley Park thinking about RNNs.

Slide 3

Slide 3 text

● Network accepts a sequence of random ints, of range 0 to feature_count [ 1 8 9 5 3 … 8 0 ] ● Correct answer is simply return nth element (say n=3) ○ Inputs and Outputs both one-hot encoded ○ Example concept from “Long Short-Term Memory Networks With Python” by Jason Brownlee ● This work: experiments on variations of this simple task ○ Grid-search style visualizations of accuracy during training Experiment

Slide 4

Slide 4 text

LSTM : Sequence Length 2 views on same surface

Slide 5

Slide 5 text

LSTM : Cell count in layer Easy mode Hard mode 2 different surfaces

Slide 6

Slide 6 text

LSTM : Feature count 2 views on same surface

Slide 7

Slide 7 text

LSTM : Data set size Logdss = log ( Data set size )

Slide 8

Slide 8 text

GRU : Data set size Logdss = log ( Data set size )

Slide 9

Slide 9 text

Credits Author contribution: Experiment, Parameter sweep code, Plots Robin Ranjit Singh Chauhan Initial task example Jason Brownlee, “Long Short-Term Memory Networks With Python” Rendering lib Plotly 3dMesh Deep learning lib Keras Tensorflow GRU Maybe next time : Bayesian Inference … ?