Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Распределенный Tensorflow и облака

Распределенный Tensorflow и облака

Глеб Ивашкевич (Независимый разработчик) @ Moscow Python Conf 2017
"Tensorflow быстро стал одним из самых популярных фреймворков для глубокого обучения. Но несмотря на свою гибкость и мощь, в нем есть немало плохо документированных, да и просто сложных элементов. Мы разберемся с некоторыми из них: работой на нескольких графических процессорах и распределенным использованием Tensorflow.
Системы с несколькими GPU - распространенная данность и мы рассмотрим несколько вариантов использования таких систем из Tensorflow. Распределенные системы более экзотичны, поэтому мы попробуем понять, когда они действительно нужны и насколько сложно с ними работать. Во всем этом нам поможет Amazon Web Services.
Без сравнения Tensorflow с конкурентами рассказ был бы неполным, поэтому мы немного покритикуем TF (и, возможно, сделаем несколько комплиментов MXNet) и разберемся, почему несмотря на некоторые недостатки Tensorflow остается лидером".
Видео: https://conf.python.ru/raspredelennyj-tensorflow-i-oblaka/

Moscow Python Meetup
PRO

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. Distributed TensorFlow
    and cloud
    Glib Ivashkevych
    independent developer

    View Slide

  2. WTF? What is TensorFlow?
    - library for numerical computation
    - computation = graph
    - graph = nodes (operations) + edges (tensors)
    - works on CPU, GPU (desktop, server, mobile)
    - from Google

    View Slide

  3. WTF? What is TensorFlow?
    - great for deep learning: from research to deployment
    - deep learning is intrinsically parallel
    - batteries included: neural net operations, tools to
    compute in parallel

    View Slide

  4. Deep learning hardware 101
    main workhorse: GPU
    extremely efficient for parallel operations
    basic: single machine + single GPU
    intermediate: single machine + multiple GPUs
    advanced: multiple machines + multiple GPUs each

    View Slide

  5. Why and when go parallel
    data is large - training takes too long
    model is large - GPU memory is limited
    you just have multiple GPUs (lucky you)
    data access and transfer patterns are
    important

    View Slide

  6. TensorFlow at a glance:
    graphs and variables
    - computations are
    arranged as graph
    - tensors “flow” between
    nodes
    - variables = persistent
    tensors
    x
    add_op tensor_1
    import tensorflow as tf
    x = tf.Variable(initial_values)
    y = tf.get_variable(var_name)
    z = x + y
    y

    View Slide

  7. TensorFlow at a glance:
    placeholders and sessions
    placeholder: to be fed on
    execution
    import tensorflow as tf
    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)
    z = x + y
    with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    result = z.eval(feed_dict={x:x_val, y:y_val})
    session manages execution
    actual graph evaluation

    View Slide

  8. Faces of parallelism: naive
    Run training script multiple times on different GPUs:
    > CUDA_VISIBLE_DEVICES=0 python train_model.py
    > CUDA_VISIBLE_DEVICES=1 python train_model.py

    OK for initial hyperparameters search

    View Slide

  9. Faces of parallelism: data
    parallel
    loss
    grad
    graph
    loss
    grad
    graph
    average
    update
    batch 1
    batch 2
    device 1
    device 2
    device 0
    - multiple replicas of
    model graph
    - each on different set
    of batches
    - aggregate
    gradients, update
    variables

    View Slide

  10. Faces of parallelism: task
    parallel
    sub_graph_1
    device 1
    - split graph between
    devices
    - TensorFlow will
    handle running it in
    parallel
    - good for large
    complex models
    sub_graph_2
    device 2
    sub_graph_0
    device 0
    sub_graph_3
    device 0
    dataflow

    View Slide

  11. Faces of parallelism: device
    placement
    lives on first GPU
    import tensorflow as tf
    with tf.device("/gpu:0"):
    a = tf.placeholder(tf.float32)
    b = tf.placeholder(tf.float32)
    ab = tf.matmul(a, b)
    with tf.device("/gpu:1"):
    c = tf.placeholder(tf.float32)
    d = tf.placeholder(tf.float32)
    cd = tf.matmul(c, d)
    with tf.device("/cpu:0"):
    res = ab + cd
    sess = tf.Session()
    result = sess.run(res, feed_dict={a: a_val, …})
    lives on CPU, also sync point
    lives on second GPU
    log_device_placement is useful to
    understand what is going on

    View Slide

  12. Faces of parallelism: choice
    large data, moderate model: data (sync or async)
    moderate data, large model: task
    large data, large model: consider distributed
    what we do not cover: queues

    View Slide

  13. Faces of parallelism:
    distributed
    TensorFlow provides powerful (but complex) tools for
    distributed computing
    cluster
    job:ps job:worker
    task:0 task:1 task:0 task:1 task:2 task:3
    address_0 address_1 address_2 address_3 address_4 address_5

    View Slide

  14. Faces of parallelism: cluster
    import tensorflow as tf
    cluster = tf.train.ClusterSpec({"ps":[], "worker":[server_addr]})
    server = tf.train.Server(cluster, job_name='worker', task_index=0)
    server.join()
    import tensorflow as tf
    with tf.Session("grpc://server_addr") as sess:
    result = sess.run(...)
    server:
    client:

    View Slide

  15. Faces of parallelism: cluster
    import tensorflow as tf
    # set flags etc
    ...
    ps = [ps_addr_0, ps_addr_1, ...] # parameter servers
    workers = [server_addr_0, server_addr_1, ...] # worker servers
    cluster = tf.train.ClusterSpec({"ps":ps, "worker":workers})
    server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_idx)
    if FLAGS.job_name == "ps":
    server.join()
    elif FLAGS.job_name == "worker":
    # do work

    View Slide

  16. Faces of parallelism: cluster
    ...
    elif FLAGS.job_name == "worker":
    with tf.device("/job:ps/task:0/cpu:0"):
    x = tf.placeholder(tf.float32, ...)
    # do some stuff
    ...
    device placement:
    ...
    elif FLAGS.job_name == "worker":
    with tf.device(tf.train.replica_device_setter(cluster=cluster,worker_device=this_worker)):
    x = tf.placeholder(tf.float32, ...)
    # do some stuff
    ...
    better way:

    View Slide

  17. Faces of parallelism:
    wrap-up
    Useful stuff:
    - ClusterSpec, Server - define cluster and server
    - tf.train.replica_device_setter - spread variables across
    parameter servers
    - Supervisor - handle crashes, saving etc.
    Does it work?
    https://www.tensorflow.org/performance/benchmarks

    View Slide

  18. Faces of parallelism: cloudy
    Small scale:
    - use TensorFlow provided
    abstractions
    - do everything by hand
    - store data on disk as is
    - feed data manually
    Large scale:
    - use cluster manager:
    Kubernetes, Mesos etc.
    - dokerized workers
    - distributed FS (HDFS, Amazon
    EFS)
    - run on Google Cloud ML,
    TensorPort?
    - run over Spark:
    TensorFlowOnSpark, databricks

    View Slide

  19. Multi-GPU/distributed TF
    Isn’t it overly complex?

    View Slide

  20. Multi-GPU/distributed TF
    It is.
    TensorFlow is a low-level framework,
    which pretends to be high-level.

    View Slide

  21. TF vs MXNet (gluon)
    Some MXNet pros:
    - Almost as simple as keras, but explicitly exposes CPU/GPU tensors and
    device contexts
    - gluon looks promising
    Some TensorFlow pros:
    - TensorBoard, TensorFlow Serving, TF for Android
    - do whatever you want: it’s not always easy, but you can

    View Slide

  22. Further reading
    TensorFlow white papers
    https://www.tensorflow.org/about/bib
    TensorFlow architecture
    https://www.tensorflow.org/extend/architecture
    Awesome TensorFlow
    https://github.com/jtoy/awesome-tensorflow
    TensorFlow Dev Summit 2017 videos
    https://events.withgoogle.com/tensorflow-dev-summit/
    Learning TensorFlow, book by Tom Hope et al
    Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron

    View Slide

  23. questions?

    View Slide