Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A quick introduction to Theano

A quick introduction to Theano

Theano is a framework that facilitates quick prototyping of deep learning models. Presented by Liang Gong in Berkeley's group meeting.

Liang Gong

April 01, 2017
Tweet

More Decks by Liang Gong

Other Decks in Technology

Transcript

  1. A quick Introduction to Theano Liang Gong, Electric Engineering &

    Computer Science, University of California, Berkeley. 1 Liang Gong
  2. 2 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Theano Overview • Not specifically designed for NN. • Write computation in symbolic representation (in Python code) • Theano compiles the python code into CUDA code. • The framework provides low level API (e.g., shared memory, and gradient descent etc.) • Heavily rely on a python mathematical library called numpy
  3. 3 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Theano Pros / Cons • Pros: – Python is a high level PL (comparing to Caffee) – Symbolic Representation of Computation (SymR) – Automatic gradient descent based on (SymR) – Fast prototyping (comparing with Caffee) – Allow fine tweaking and tuning of the model • Cons: – Not particularly designed for NN – Hard to scale – Not suitable in production development (comparing with Caffee) – Assembly a complex CNN or LSTM purely using Theano is tedious and error-prone
  4. 4 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) How to Set up a Simple NN in Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  5. 5 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3])
  6. 6 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3]) Input and Output
  7. 7 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory)
  8. 8 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value)
  9. 9 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU
  10. 10 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values
  11. 11 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values
  12. 12 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values Symbolic representation of parameters
  13. 13 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) Create a random array in CPU Lower bound, higher bound of values The number of values Symbolic representation of parameters Data is in GPU memory during the computation.
  14. 14 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Scaling + Data Movement • Shared variable (data is stored in GPU memory) value = numpy.random.uniform(low, high, size) W = theano.shared(value) w = W.get_value() // move data to main memory W.set_value(w) // move data to GPU memory Create a random array in CPU Lower bound, higher bound of values The number of values
  15. 15 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. // compile into CUDA code f = theano.function([x], [y], config) // start the computation ret = f([1,2,3]) Input and Output
  16. 16 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  17. 17 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.
  18. 18 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function
  19. 19 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W
  20. 20 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ ... }, config)
  21. 21 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ // substitutions ... }, config)
  22. 22 W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x,

    W) y = T.nnet.sigmoid(dot) // symbol for ground truth gt = T.matrix(‘gt’) Programming with Theano w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. loss = T.sqr(gt - y).sum() // loss function g = T.grad(loss, W) // gradient for W f = theano.function([x], [y], update=[(W, W – r * g)], // update state givens={ // substitutions x: T.shared(...), // in GPU W: T.shared(...) // in GPU }, config)
  23. 23 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Saving Model • Use any serialization library can persist the model. • import cPickle W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x, W) y = T.nnet.sigmoid(dot) ... w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 cPickle.dump(W.get_value(), file, ...)
  24. 24 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Resume Model • Use any serialization library can persist the model. • import cPickle W = T.matrix(‘W’) x = T.matrix(‘x’) dot = T.dot(x, W) y = T.nnet.sigmoid(dot) ... w11 w12 w33 w31 w22 w21 y1 y2 x1 x2 x3 W.set_value(cPickle.load(file))
  25. Multiple GPUs Liang Gong, Electric Engineering & Computer Science, University

    of California, Berkeley. 25 • Provide some support for multiple GPUs import numpy import theano v01 = theano.shared(value1, target='dev0') // on the first GPU v02 = theano.shared(value2, target='dev0') // on the first GPU v11 = theano.shared(value3, target='dev1') // on the second GPU v12 = theano.shared(value4, target='dev1') // on the second GPU // compile into CUDA code f = theano.function([], [theano.tensor.dot(v01, v02), theano.tensor.dot(v11, v12)]) // start computating ret = f() http://deeplearning.net/software/theano/tutorial/using_multi_gpu.html
  26. 26 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. High Level Libraries • Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for optimized tensor manipulation on GPU and CPU. • Pylearn2 is a library that wraps a lot of models and training algorithms such as Stochastic Gradient Descent that are commonly used in Deep Learning. Its functional libraries are built on top of Theano. • Lasagne is a lightweight library to build and train neural networks in Theano. It is governed by simplicity, transparency, modularity, pragmatism , focus and restraint principles. • Blocks a framework that helps you build neural network models on top of Theano. http://www.teglor.com/b/deep-learning-libraries-language-cm569/
  27. 27 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Keras • A high level python library that is built on top of Theano (and works with Tensorflow) • Designed for NN • Modular, and easy to expand • Based on layers and their input / output http://www.teglor.com/b/deep-learning-libraries-language-cm569/
  28. 28 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. Keras http://www.teglor.com/b/deep-learning-libraries-language-cm569/
  29. TensorFlow vs Theano Liang Gong, Electric Engineering & Computer Science,

    University of California, Berkeley. 29 • Mutual Parts: • python as API language • symbolic mathematical representation -> computation graph • support automated symbolic differentiation • TensorFlow In addition: • supports parallel computation • has capability to partition the compuation graph • provides coordination mechanism for parallel computation • Supports heterogeneous hardware • automatically allocates devices (gpu, cpu, mobile) to computation node • has visualization module (TensorBoard) • Implementation: • Theano complies python code into C or CUDA internal modules • TensorFlow is a C++ library that has a thin Python interface