Pycon

INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS (ANN)

3 Neural networks to the rescue… Neural network: information processing
paradigm inspired by biological nervous systems, such as our brain Structure: large number of highly interconnected processing elements (neurons) working together Like people, they learn from experience (by example)

The idea of ANNs..? NNs learn relationship between cause and
effect or organize large volumes of data into orderly and informative patterns. frog lion bird What is that? It’s a frog

Network Layers Input Layer - The activity of the input
units represents the raw information that is fed into the network. Hidden Layer - The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. Output Layer - The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units.

Application of RNNs to Language Processing Language Modelling Machine Translation
Q/A System

9 Recurrent NN Input Hidden nodes Outputs

RNN architecture

Neural Machine Translation The translation problem is expressed as a
probability P(F|E) Equivalent to P(fn , fn-1 , …, f0 | em , em-1 , …, e0 ) -> a sequence conditioned on another sequence. Create an RNN architecture where the output of on RNN (decoder) is conditioned on another RNN (encoder). We can connect them using a joint alignment and translation mechanism. Results in a single gestalt Machine Translation model which can generate candidate translations.

Bi-Directional RNNs

Neural Machine Translation: Decoder s 0 s 1 s t
s M … … f0 f1 ft FM = </S> … … … … <S> ft is produced by sampling the discrete probability produced by softmax output layer. Can be pre-trained as a RNN language model

Neural Machine Translation: Joint Alignment h 0 h 1 h
j h N … … … … … … s 0 s 1 s t s M … … f0 f1 ft fM … … … … <S> Ct = ∑a tj hj st- 1 z 0 z 1 z j z N a t,1:N zj = W ∙ tanh(V ∙ s t-1 + U ∙ hj )

Neural Machine Translation: Features End-to-end differentiable, trained using SGD with
cross-entropy error function. Encoder and Decoder learn to represent source and target sentences in a compact, distributed manner Does not make conditional independence assumptions to separate out translation model, alignment model, re-ordering model, etc… Does not pre-align words by bootstrapping from simpler models. Learns translation and joint alignment in a semantic space, not over surface forms. Conceptually easy to decode – complexity similar to speech processing, not SMT. Fewer Parameters – more memory efficient.

A LSTM network has the following three aspects that differentiate
it from an usual neuron in a recurrent neural network. 1. It has control on deciding when to let the input enter the neuron. 2. It has control on deciding when to remember what was computed in the previous time step. 3. It has control on deciding when to let the output pass on to the next time stamp.

Keras: Deep Learning library for Theano and TensorFlow and CNTK

20 • Simple installation – sudo python setup.py install O
r – sudo pip install keras How to install

Why this name, Keras? • Keras (κέ α ) means
horn in Greek • It is a reference to a literary image from ancient Greek and Latin literature

Minimalist, highly modular neural networks library Written in Python Capable
of running on top of either TensorFlow or Theano or CNTK Developed with a focus on enabling fast experimentation Overview of KERAS

Overview Why use Keras? Simple to get started, simple to
keep going Written in python and highly modular; easy to expand Deep enough to build serious models STAT 946

General Design General idea is to based on layers and
their input/output Prepare your inputs and output tensors Create first layer to handle input tensor Create output layer to handle targets Build virtually any model you like in between STAT 946

Layers and Layers (like an Ogre) Keras has a number
of pre-built layers. Notable examples include: Regular dense, MLP type Recurrent layers, LSTM, GRU, etc. STAT 946

from keras.models import Sequential from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD model = Sequential() model.add(Dense(64, input_dim=20, init='uniform')) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Den se(64, init='uniform ')) model.add(Act ivation('tanh ')) model.add(Dro pout(0.5)) model.add(Den se(10, init='uniform ')) model.add(Act ivation('soft max')) 26 Multilayer Perceptron (MLP) for multi-class softmax classification

Other types of layer include: Dropout Noise Pooling Normalization Embedding
And many more... STAT 946

Activations More or less all your favourite activations are available:
Sigmoid, tanh, ReLu, softplus, hard sigmoid, linear Advanced activations implemented as a layer (after desired neural layer) Advanced activations: LeakyReLu, PReLu, ELU, Parametric Softplus, Thresholded linear and Thresholded Relu STAT 946

Objectives and Optimizers Objective Functions: Error loss: rmse, mse, mae,
mape, msle Hinge loss: squared hinge, hinge Class loss: binary crossentropy, categorical crossentropy Optimization: Provides SGD, Adagrad, Adadelta, Rmsprop and Adam All optimizers can be customized via parameters STAT 946

Architecture/Weight Saving and Loading Model architectures can be saved and
loaded Model parameters (weights) can be saved and loaded STAT 946

Callbacks Allow for function call during training Callbacks can be
called at different points of training (batch or epoch) Existing callbacks: Early Stopping, weight saving after epoch Easy to build and implement, called in training function, fit() STAT 946

Model Type: Sequential Sequential models are linear stack of layers
The model we all know and love Treat each layer as object that feeds into the next STAT 946

STAT 946

Model Type: Graph Optimized over all outputs Graph model allows
for two or more independent networks to diverge or merge Allows for multiple separate inputs or outputs Different merging layers (sum or concatenate) STAT 946

STAT 946

In Summary Pros: Easy to implement Lots of choice Extendible
and customizable GPU High level Active community keras.io Cons: Lack of generative models High level STAT 946

Live Coding Intent Recognition Q/A Sytem Machine Translation

Thank you!

Questions? github.com/kashyap32/ twitter.com/kashyapraval3 2 Contact email : [email protected]

Pycon

Pycon

More Decks by Kashyap Raval

Other Decks in Programming

Featured

Transcript