Deep Learning with Python and Tensorflow

Deep Learning with Python and Tensorflow

Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application?

TensorFlow is a new Open-Source framework created at Google for building Deep Learning applications. Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel.

In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.

01dc8e954957a10b428aa60b28c89d52?s=128

Ian Lewis

July 22, 2016
Tweet

Transcript

  1. Deep Learning with Python & TensorFlow EuroPython 2016 #EuroPython

  2. Confidential & Proprietary Google Cloud Platform 2 Ian Lewis Developer

    Advocate - Google Cloud Platform Tokyo, Japan +Ian Lewis @IanMLewis
  3. Confidential & Proprietary Google Cloud Platform 3

  4. Confidential & Proprietary Google Cloud Platform 4

  5. Confidential & Proprietary Google Cloud Platform 5 Deep Learning 101

  6. Google Cloud Platform 6 ["cat"] Input Hidden Output(label) pixels( )

  7. Confidential & Proprietary Google Cloud Platform 7 Neural Network can

    find a way to solve the problem How do you classify these data points?
  8. Confidential & Proprietary Google Cloud Platform 8

  9. Google Cloud Platform 9 (x,y,z,?,?,?,?,...)

  10. Google Cloud Platform 10 v[x] => vector

  11. Google Cloud Platform 11 m[x][y][z] => matrix

  12. Google Cloud Platform 12 t[x][y][z][?][?]... => tensor

  13. Google Cloud Platform 13

  14. Confidential & Proprietary Google Cloud Platform 14

  15. Confidential & Proprietary Google Cloud Platform 15 Breakthroughs

  16. From: Andrew Ng

  17. Google Cloud Platform 17 The Inception model (GoogLeNet, 2015)

  18. DNN = a large matrix ops a few GPUs >>

    CPU (but it still takes hours/days to train) a supercomputer >> a few GPUs (but you don't have a supercomputer) You need Distributed Training
  19. None
  20. None
  21. What's the scalability of Google Brain? "Large Scale Distributed Systems

    for Training Neural Networks", NIPS 2015 ◦ Inception / ImageNet: 40x with 50 GPUs ◦ RankBrain: 300x with 500 nodes
  22. Confidential & Proprietary Google Cloud Platform 22 TensorFlow

  23. 23 Google's open source library for machine intelligence tensorflow.org launched

    in Nov 2015 The second generation Used by many production ML projects What is Tensorflow?
  24. 24 Operates over tensors: n-dimensional arrays Using a flow graph:

    data flow computation framework TensorFlow • Flexible, intuitive construction • automatic differentiation • Support for threads, queues, and asynchronous computation; distributed runtime • Train on CPUs, GPUs • Run wherever you like
  25. Google Cloud Platform 25 Core TensorFlow data structures and concepts...

    - Graph: A TensorFlow computation, represented as a dataflow graph. - collection of ops that may be executed together as a group - Operation: a graph node that performs computation on tensors - Tensor: a handle to one of the outputs of an Operation - provides a means of computing the value in a TensorFlow Session.
  26. Google Cloud Platform 26 Core TensorFlow data structures and concepts

    - Constants - Placeholders: must be fed with data on execution - Variables: a modifiable tensor that lives in TensorFlow's graph of interacting operations. - Session: encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated.
  27. Google Cloud Platform 27 Category Element-wise math ops Array ops

    Matrix ops Stateful ops NN building blocks Checkpointing ops Queue & synch ops Control flow ops Operations Examples Add, Sub, Mul, Div, Exp, Log, Greater, Less… Concat, Slice, Split, Constant, Rank, Shape… MatMul, MatrixInverse, MatrixDeterminant… Variable, Assign, AssignAdd... SoftMax, Sigmoid, ReLU, Convolution2D… Save, Restore Enqueue, Dequeue, MutexAcquire… Merge, Switch, Enter, Leave...
  28. Google Cloud Platform 28

  29. Distributed Training with TensorFlow

  30. Model Parallelism = split model, share data

  31. Distributed Training Model Parallelism Sub-Graph • Allows fine grained application

    of parallelism to slow graph components • Larger more complex graph Full Graph • Code is more similar to single process models • Not necessarily as performant (large models) Data Parallelism Synchronous • Prevents workers from “Falling behind” • Workers progress at the speed of the slowest worker Asynchronous • Workers advance as fast as they can • Can result in runs that aren’t reproducible or difficult to debug behavior (large models)
  32. • CPU/GPU scheduling • Communications ◦ Local, RPC, RDMA ◦

    32/16/8 bit quantization • Cost-based optimization • Fault tolerance Distributed Training with TensorFlow
  33. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Model Parallelism: Full Graph Replication • Similar code

    runs on each worker and workers use flags to determine their role in the cluster: server = tf.train.Server(cluster_def, job_name=this_job_name, task_index=this_task_index) if this_job_name == 'ps': server.join() elif this_job_name=='worker': // cont’d
  34. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Model Parallelism: Full Graph Replication • Copies of

    each variable and op are deterministically assigned to parameter servers and worker with tf.device(tf.train.replica_device_setter( worker_device="/job:worker/task:{}".format(this_task_index), cluster=cluster_def)): // Build the model global_step = tf.Variable(0) train_op = tf.train.AdagradOptimizer(0.01).minimize( loss, global_step=global_step)
  35. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Model Parallelism: Full Graph Replication • Workers coordinate

    once-per-cluster tasks using a Supervisor and train independently sv = tf.train.Supervisor( is_chief = (this_task_index==0), // training, summary and initialization ops)) with sv.managed_session(server.target) as session: step = 0 while not sv.should_stop() and step < 1000000: # Run a training step asynchronously. _, step = sess.run([train_op, global_step])
  36. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Model Parallelism: Sub-Graph Replication with tf.Graph().as_default(): losses =

    [] for worker in loss_workers: with tf.device(worker): // Computationally expensive model section // e.g. loss calculation losses.append(loss) • Can pin operations specifically to individual nodes in the cluster
  37. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Model Parallelism: Sub-Graph Replication with tf.device(master): losses_avg =

    tf.add_n(losses) / len(workers) train_op = tf.train.AdagradOptimizer(0.01).minimize( losses_avg, global_step=global_step) with tf.Session('grpc://master.address:8080') as session: step = 0 while step < num_steps: _, step = sess.run([train_op, global_step]) • Can use a single synchronized training step, averaging losses from multiple workers
  38. Data Parallelism = split data, share model

  39. Data Parallelism: Asynchronous train_op = tf.train.AdagradOptimizer(1.0, use_locking=False).minimize( loss, global_step=gs)

  40. Data Parallelism: Synchronous for worker in workers: with tf.device(worker): //

    expensive computation, e.g. loss losses.append(loss) with tf.device(master): avg_loss = tf.add_n(losses) / len(workers) tf.train.AdagradOptimizer(1.0).minimize(avg_loss, global_step=gs)
  41. bit.ly/tensorflow-at-pycon bit.ly/tensorflow-workshop Kubernetes as a Tensorflow Cluster Manager Jupyter Ingress

    :80 Tensorboard Ingress :6006 Jupyter gRPC :8080 jupyter-server tensorboard-server tensorflow-worker (master) ps-0 tensorflow -worker gRPC :8080 ps-1 tensorflow -worker gRPC :8080 worker-0 tensorflow -worker gRPC :8080 worker-1 tensorflow -worker gRPC :8080 worker-14 tensorflow -worker gRPC :8080
  42. Google Cloud Platform 42 Why is this important?

  43. Confidential & Proprietary Google Cloud Platform 43 “dog”

  44. Google Cloud Platform 44 Tweak Me!

  45. Google Cloud Platform 45 Tweak Me?!?

  46. Google Cloud Platform 46 ¯\_(ツ)_/¯

  47. Google Cloud Platform 47

  48. Fully managed, distributed training and prediction for custom TensorFlow graph

    Supports Regression and Classification initially Integrated with Cloud Dataflow and Cloud Datalab Limited Preview - cloud.google.com/ml Cloud Machine Learning (Cloud ML)
  49. Jeff Dean's keynote: YouTube video Define a custom TensorFlow graph

    Training at local: 8.3 hours w/ 1 node Training at cloud: 32 min w/ 20 nodes (15x faster) Prediction at cloud at 300 reqs / sec Cloud ML
  50. Tensor Processing Unit ASIC for TensorFlow Designed by Google 10x

    better perf / watt 8 bit quantization
  51. Confidential & Proprietary Google Cloud Platform 51 Thank You https://www.tensorflow.org/

    https://cloud.google.com/ml/ http://bit.ly/tensorflow-workshop Ian Lewis @IanMLewis