Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Распределенный Tensorflow и облака

Распределенный Tensorflow и облака

Глеб Ивашкевич (Независимый разработчик) @ Moscow Python Conf 2017
"Tensorflow быстро стал одним из самых популярных фреймворков для глубокого обучения. Но несмотря на свою гибкость и мощь, в нем есть немало плохо документированных, да и просто сложных элементов. Мы разберемся с некоторыми из них: работой на нескольких графических процессорах и распределенным использованием Tensorflow.
Системы с несколькими GPU - распространенная данность и мы рассмотрим несколько вариантов использования таких систем из Tensorflow. Распределенные системы более экзотичны, поэтому мы попробуем понять, когда они действительно нужны и насколько сложно с ними работать. Во всем этом нам поможет Amazon Web Services.
Без сравнения Tensorflow с конкурентами рассказ был бы неполным, поэтому мы немного покритикуем TF (и, возможно, сделаем несколько комплиментов MXNet) и разберемся, почему несмотря на некоторые недостатки Tensorflow остается лидером".
Видео: https://conf.python.ru/raspredelennyj-tensorflow-i-oblaka/

Moscow Python Meetup

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. WTF? What is TensorFlow? - library for numerical computation -

    computation = graph - graph = nodes (operations) + edges (tensors) - works on CPU, GPU (desktop, server, mobile) - from Google
  2. WTF? What is TensorFlow? - great for deep learning: from

    research to deployment - deep learning is intrinsically parallel - batteries included: neural net operations, tools to compute in parallel
  3. Deep learning hardware 101 main workhorse: GPU extremely efficient for

    parallel operations basic: single machine + single GPU intermediate: single machine + multiple GPUs advanced: multiple machines + multiple GPUs each
  4. Why and when go parallel data is large - training

    takes too long model is large - GPU memory is limited you just have multiple GPUs (lucky you) data access and transfer patterns are important
  5. TensorFlow at a glance: graphs and variables - computations are

    arranged as graph - tensors “flow” between nodes - variables = persistent tensors x add_op tensor_1 import tensorflow as tf x = tf.Variable(initial_values) y = tf.get_variable(var_name) z = x + y y
  6. TensorFlow at a glance: placeholders and sessions placeholder: to be

    fed on execution import tensorflow as tf x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) z = x + y with tf.Session() as sess: sess.run(tf.global_variables_initializer()) result = z.eval(feed_dict={x:x_val, y:y_val}) session manages execution actual graph evaluation
  7. Faces of parallelism: naive Run training script multiple times on

    different GPUs: > CUDA_VISIBLE_DEVICES=0 python train_model.py > CUDA_VISIBLE_DEVICES=1 python train_model.py … OK for initial hyperparameters search
  8. Faces of parallelism: data parallel loss grad graph loss grad

    graph average update batch 1 batch 2 device 1 device 2 device 0 - multiple replicas of model graph - each on different set of batches - aggregate gradients, update variables
  9. Faces of parallelism: task parallel sub_graph_1 device 1 - split

    graph between devices - TensorFlow will handle running it in parallel - good for large complex models sub_graph_2 device 2 sub_graph_0 device 0 sub_graph_3 device 0 dataflow
  10. Faces of parallelism: device placement lives on first GPU import

    tensorflow as tf with tf.device("/gpu:0"): a = tf.placeholder(tf.float32) b = tf.placeholder(tf.float32) ab = tf.matmul(a, b) with tf.device("/gpu:1"): c = tf.placeholder(tf.float32) d = tf.placeholder(tf.float32) cd = tf.matmul(c, d) with tf.device("/cpu:0"): res = ab + cd sess = tf.Session() result = sess.run(res, feed_dict={a: a_val, …}) lives on CPU, also sync point lives on second GPU log_device_placement is useful to understand what is going on
  11. Faces of parallelism: choice large data, moderate model: data (sync

    or async) moderate data, large model: task large data, large model: consider distributed what we do not cover: queues
  12. Faces of parallelism: distributed TensorFlow provides powerful (but complex) tools

    for distributed computing cluster job:ps job:worker task:0 task:1 task:0 task:1 task:2 task:3 address_0 address_1 address_2 address_3 address_4 address_5
  13. Faces of parallelism: cluster import tensorflow as tf cluster =

    tf.train.ClusterSpec({"ps":[], "worker":[server_addr]}) server = tf.train.Server(cluster, job_name='worker', task_index=0) server.join() import tensorflow as tf with tf.Session("grpc://server_addr") as sess: result = sess.run(...) server: client:
  14. Faces of parallelism: cluster import tensorflow as tf # set

    flags etc ... ps = [ps_addr_0, ps_addr_1, ...] # parameter servers workers = [server_addr_0, server_addr_1, ...] # worker servers cluster = tf.train.ClusterSpec({"ps":ps, "worker":workers}) server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_idx) if FLAGS.job_name == "ps": server.join() elif FLAGS.job_name == "worker": # do work
  15. Faces of parallelism: cluster ... elif FLAGS.job_name == "worker": with

    tf.device("/job:ps/task:0/cpu:0"): x = tf.placeholder(tf.float32, ...) # do some stuff ... device placement: ... elif FLAGS.job_name == "worker": with tf.device(tf.train.replica_device_setter(cluster=cluster,worker_device=this_worker)): x = tf.placeholder(tf.float32, ...) # do some stuff ... better way:
  16. Faces of parallelism: wrap-up Useful stuff: - ClusterSpec, Server -

    define cluster and server - tf.train.replica_device_setter - spread variables across parameter servers - Supervisor - handle crashes, saving etc. Does it work? https://www.tensorflow.org/performance/benchmarks
  17. Faces of parallelism: cloudy Small scale: - use TensorFlow provided

    abstractions - do everything by hand - store data on disk as is - feed data manually Large scale: - use cluster manager: Kubernetes, Mesos etc. - dokerized workers - distributed FS (HDFS, Amazon EFS) - run on Google Cloud ML, TensorPort? - run over Spark: TensorFlowOnSpark, databricks
  18. TF vs MXNet (gluon) Some MXNet pros: - Almost as

    simple as keras, but explicitly exposes CPU/GPU tensors and device contexts - gluon looks promising Some TensorFlow pros: - TensorBoard, TensorFlow Serving, TF for Android - do whatever you want: it’s not always easy, but you can
  19. Further reading TensorFlow white papers https://www.tensorflow.org/about/bib TensorFlow architecture https://www.tensorflow.org/extend/architecture Awesome

    TensorFlow https://github.com/jtoy/awesome-tensorflow TensorFlow Dev Summit 2017 videos https://events.withgoogle.com/tensorflow-dev-summit/ Learning TensorFlow, book by Tom Hope et al Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron