Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pycon

 Pycon

Language Modeling with Keras

Kashyap Raval

August 19, 2017
Tweet

More Decks by Kashyap Raval

Other Decks in Programming

Transcript

  1. 3 Neural networks to the rescue… Neural network: information processing

    paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)
  2. The idea of ANNs..? NNs learn relationship between cause and

    effect or organize large volumes of data into orderly and informative patterns. frog lion bird What is that? It’s a frog
  3. Network Layers Input Layer - The activity of the input

    units represents the raw information that is fed into the network. Hidden Layer - The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. Output Layer - The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units.
  4. Neural Machine Translation The translation problem is expressed as a

    probability P(F|E) Equivalent to P(fn , fn-1 , …, f0 | em , em-1 , …, e0 ) -> a sequence conditioned on another sequence. Create an RNN architecture where the output of on RNN (decoder) is conditioned on another RNN (encoder). We can connect them using a joint alignment and translation mechanism. Results in a single gestalt Machine Translation model which can generate candidate translations.
  5. Neural Machine Translation: Decoder s 0 s 1 s t

    s M … … f0 f1 ft FM = </S> … … … … <S> ft is produced by sampling the discrete probability produced by softmax output layer. Can be pre-trained as a RNN language model
  6. Neural Machine Translation: Joint Alignment h 0 h 1 h

    j h N … … … … … … s 0 s 1 s t s M … … f0 f1 ft fM … … … … <S> Ct = ∑a tj hj st- 1 z 0 z 1 z j z N a t,1:N zj = W ∙ tanh(V ∙ s t-1 + U ∙ hj )
  7. Neural Machine Translation: Features End-to-end differentiable, trained using SGD with

    cross-entropy error function. Encoder and Decoder learn to represent source and target sentences in a compact, distributed manner Does not make conditional independence assumptions to separate out translation model, alignment model, re-ordering model, etc… Does not pre-align words by bootstrapping from simpler models. Learns translation and joint alignment in a semantic space, not over surface forms. Conceptually easy to decode – complexity similar to speech processing, not SMT. Fewer Parameters – more memory efficient.
  8. A LSTM network has the following three aspects that differentiate

    it from an usual neuron in a recurrent neural network. 1. It has control on deciding when to let the input enter the neuron. 2. It has control on deciding when to remember what was computed in the previous time step. 3. It has control on deciding when to let the output pass on to the next time stamp.
  9. 20 • Simple installation – sudo python setup.py install O

    r – sudo pip install keras How to install
  10. Why this name, Keras? • Keras (κέ α ) means

    horn in Greek • It is a reference to a literary image from ancient Greek and Latin literature
  11. Minimalist, highly modular neural networks library Written in Python Capable

    of running on top of either TensorFlow or Theano or CNTK Developed with a focus on enabling fast experimentation Overview of KERAS
  12. Overview Why use Keras? Simple to get started, simple to

    keep going Written in python and highly modular; easy to expand Deep enough to build serious models STAT 946
  13. General Design General idea is to based on layers and

    their input/output Prepare your inputs and output tensors Create first layer to handle input tensor Create output layer to handle targets Build virtually any model you like in between STAT 946
  14. Layers and Layers (like an Ogre) Keras has a number

    of pre-built layers. Notable examples include: Regular dense, MLP type Recurrent layers, LSTM, GRU, etc. STAT 946
  15. from keras.models import Sequential from keras.layers import Dense, Dropout, Activation

    from keras.optimizers import SGD model = Sequential() model.add(Dense(64, input_dim=20, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Den se(64, init='uniform ')) model.add(Act ivation('tanh ')) model.add(Dro pout(0.5)) model.add(Den se(10, init='uniform ')) model.add(Act ivation('soft max')) 26 Multilayer Perceptron (MLP) for multi-class softmax classification
  16. Activations More or less all your favourite activations are available:

    Sigmoid, tanh, ReLu, softplus, hard sigmoid, linear Advanced activations implemented as a layer (after desired neural layer) Advanced activations: LeakyReLu, PReLu, ELU, Parametric Softplus, Thresholded linear and Thresholded Relu STAT 946
  17. Objectives and Optimizers Objective Functions: Error loss: rmse, mse, mae,

    mape, msle Hinge loss: squared hinge, hinge Class loss: binary crossentropy, categorical crossentropy Optimization: Provides SGD, Adagrad, Adadelta, Rmsprop and Adam All optimizers can be customized via parameters STAT 946
  18. Architecture/Weight Saving and Loading Model architectures can be saved and

    loaded Model parameters (weights) can be saved and loaded STAT 946
  19. Callbacks Allow for function call during training Callbacks can be

    called at different points of training (batch or epoch) Existing callbacks: Early Stopping, weight saving after epoch Easy to build and implement, called in training function, fit() STAT 946
  20. Model Type: Sequential Sequential models are linear stack of layers

    The model we all know and love Treat each layer as object that feeds into the next STAT 946
  21. Model Type: Graph Optimized over all outputs Graph model allows

    for two or more independent networks to diverge or merge Allows for multiple separate inputs or outputs Different merging layers (sum or concatenate) STAT 946
  22. In Summary Pros: Easy to implement Lots of choice Extendible

    and customizable GPU High level Active community keras.io Cons: Lack of generative models High level STAT 946