A quick introduction to Theano

A quick Introduction to Theano Liang Gong, Electric Engineering &
Computer Science, University of California, Berkeley. 1 Liang Gong

2 Liang Gong, Electric Engineering & Computer Science, University of
California, Berkeley. Theano Overview • Not specifically designed for NN. • Write computation in symbolic representation (in Python code) • Theano compiles the python code into CUDA code. • The framework provides low level API (e.g., shared memory, and gradient descent etc.) • Heavily rely on a python mathematical library called numpy

California, Berkeley. Theano Pros / Cons • Pros: – Python is a high level PL (comparing to Caffee) – Symbolic Representation of Computation (SymR) – Automatic gradient descent based on (SymR) – Fast prototyping (comparing with Caffee) – Allow fine tweaking and tuning of the model • Cons: – Not particularly designed for NN – Hard to scale – Not suitable in production development (comparing with Caffee) – Assembly a complex CNN or LSTM purely using Theano is tedious and error-prone

4 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,
W) y = T.nnet.sigmoid(dot) How to Set up a Simple NN in Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.

W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3])

W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3]) Input and Output

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory)

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value)

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values Symbolic representation of parameters

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values Symbolic representation of parameters Data is in GPU memory during the computation.

California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) w = W.get_value() // move data to main memory W.set_value(w) // move data to GPU memory Create a random array in CPU Lower bound, higher bound of values The number of values

W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3]) Input and Output

W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ ... }, config)

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ // substitutions ... }, config)

W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ // substitutions x: T.shared(...), // in GPU W: T.shared(...) // in GPU }, config)

California, Berkeley. Saving Model • Use any serialization library can persist the model. • import cPickle W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x, W) y = T.nnet.sigmoid(dot) ... w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 cPickle.dump(W.get_value(), file, ...)

California, Berkeley. Resume Model • Use any serialization library can persist the model. • import cPickle W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x, W) y = T.nnet.sigmoid(dot) ... w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 W.set_value(cPickle.load(file))

Multiple GPUs Liang Gong, Electric Engineering & Computer Science, University
of California, Berkeley. 25 • Provide some support for multiple GPUs import numpy import theano v01 = theano.shared(value1, target='dev0') // on the first GPU v02 = theano.shared(value2, target='dev0') // on the first GPU v11 = theano.shared(value3, target='dev1') // on the second GPU v12 = theano.shared(value4, target='dev1') // on the second GPU // compile into CUDA code f = theano.function([], [theano.tensor.dot(v01, v02), theano.tensor.dot(v11, v12)]) // start computating ret = f() http://deeplearning.net/software/theano/tutorial/using_multi_gpu.html

California, Berkeley. High Level Libraries • Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for optimized tensor manipulation on GPU and CPU. • Pylearn2 is a library that wraps a lot of models and training algorithms such as Stochastic Gradient Descent that are commonly used in Deep Learning. Its functional libraries are built on top of Theano. • Lasagne is a lightweight library to build and train neural networks in Theano. It is governed by simplicity, transparency, modularity, pragmatism , focus and restraint principles. • Blocks a framework that helps you build neural network models on top of Theano. http://www.teglor.com/b/deep-learning-libraries-language-cm569/

California, Berkeley. Keras • A high level python library that is built on top of Theano (and works with Tensorflow) • Designed for NN • Modular, and easy to expand • Based on layers and their input / output http://www.teglor.com/b/deep-learning-libraries-language-cm569/

California, Berkeley. Keras http://www.teglor.com/b/deep-learning-libraries-language-cm569/

TensorFlow vs Theano Liang Gong, Electric Engineering & Computer Science,
University of California, Berkeley. 29 • Mutual Parts: • python as API language • symbolic mathematical representation -> computation graph • support automated symbolic differentiation • TensorFlow In addition: • supports parallel computation • has capability to partition the compuation graph • provides coordination mechanism for parallel computation • Supports heterogeneous hardware • automatically allocates devices (gpu, cpu, mobile) to computation node • has visualization module (TensorBoard) • Implementation: • Theano complies python code into C or CUDA internal modules • TensorFlow is a C++ library that has a thin Python interface

A quick introduction to Theano

A quick introduction to Theano

Liang Gong

More Decks by Liang Gong

Other Decks in Technology

Featured

Transcript

A quick Introduction to Theano Liang Gong, Electric Engineering &

2 Liang Gong, Electric Engineering & Computer Science, University of

3 Liang Gong, Electric Engineering & Computer Science, University of

4 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

5 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

6 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

7 Liang Gong, Electric Engineering & Computer Science, University of

8 Liang Gong, Electric Engineering & Computer Science, University of

9 Liang Gong, Electric Engineering & Computer Science, University of

10 Liang Gong, Electric Engineering & Computer Science, University of

11 Liang Gong, Electric Engineering & Computer Science, University of

12 Liang Gong, Electric Engineering & Computer Science, University of

13 Liang Gong, Electric Engineering & Computer Science, University of

14 Liang Gong, Electric Engineering & Computer Science, University of

15 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

16 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

17 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

18 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

19 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

20 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

21 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

22 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

23 Liang Gong, Electric Engineering & Computer Science, University of

24 Liang Gong, Electric Engineering & Computer Science, University of

Multiple GPUs Liang Gong, Electric Engineering & Computer Science, University

26 Liang Gong, Electric Engineering & Computer Science, University of

27 Liang Gong, Electric Engineering & Computer Science, University of

28 Liang Gong, Electric Engineering & Computer Science, University of

TensorFlow vs Theano Liang Gong, Electric Engineering & Computer Science,