Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Intelligence at Google Scale: Vision/Speech API, TensorFlow and Cloud Machine Learning

Machine Intelligence at Google Scale: Vision/Speech API, TensorFlow and Cloud Machine Learning

The biggest challenge of Deep Learning technology is the scalability. As long as using single GPU server, you have to wait for hours or days to get the result of your work. This doesn't scale for production service, so you need a Distributed Training on the cloud eventually. Google has been building infrastructure for training the large scale neural network on the cloud for years, and now started to share the technology with external developers. In this session, we will introduce new pre-trained ML services such as Cloud Vision API and Speech API that works without any training. Also, we will look how TensorFlow and Cloud Machine Learning will accelerate custom model training for 10x - 40x with Google's distributed training infrastructure.

「Googleスケールの機械学習テクノロジー」
現在のディープラーニング技術の最大の課題はスケーラビリティです。1台のGPUサーバを使っている限り、学習結果が得られるまで数時間や数日待つ必要があります。プロダクションでの利用には分散学習の導入が不可欠です。Googleは過去数年に渡ってクラウド上での大規模なニューラルネットワーク学習のためのインフラを構築し、その成果を外部の開発者に提供し始めました。このセッションでは、Cloud Vision APIやSpeech APIなど、学習せずにすぐに使えるMLサービスを紹介します。また、TensorFlowやCloud Machine Learningなど、Googleの分散学習インフラにより10〜40倍の高速学習を実現するサービスについて解説します。

Kazunori Sato

April 04, 2016
Tweet

More Decks by Kazunori Sato

Other Decks in Programming

Transcript

  1. Machine Intelligence at Google Scale:
    Vision/Speech API, TensorFlow and Cloud ML

    View Slide

  2. +Kazunori Sato
    @kazunori_279
    Kaz Sato
    Staff Developer Advocate
    Tech Lead for Data & Analytics
    Cloud Platform, Google Inc.

    View Slide

  3. What we’ll cover
    Deep learning and distributed training
    Large scale neural network on Google Cloud
    Cloud Vision API and Speech API
    TensorFlow and Cloud Machine Learning

    View Slide

  4. Deep Learning and
    Distributed Training

    View Slide

  5. View Slide

  6. From: Andrew Ng

    View Slide

  7. DNN = a large matrix ops
    a few GPUs >> CPU
    (but it still takes days to train)
    a supercomputer >> a few GPUs
    (but you don't have a supercomputer)
    You need Distributed Training on the cloud

    View Slide

  8. Google Brain.
    Large scale neural network on Google Cloud

    View Slide

  9. View Slide

  10. Enterprise
    Google Cloud is
    The Datacenter as a Computer

    View Slide

  11. Jupiter network
    10 GbE x 100 K = 1 Pbps
    Consolidates servers with
    microsec latency

    View Slide

  12. Borg
    No VMs, pure containers
    10K - 20K nodes per Cell
    DC-scale job scheduling
    CPUs, mem, disks and IO

    View Slide

  13. 13
    Google Cloud +
    Neural Network =
    Google Brain

    View Slide

  14. The Inception model (GoogLeNet, 2015)

    View Slide

  15. What's the scalability of Google Brain?
    "Large Scale Distributed Systems for Training Neural
    Networks", NIPS 2015
    ○ Inception / ImageNet: 40x with 50 GPUs
    ○ RankBrain: 300x with 500 nodes

    View Slide

  16. Large-scale neural network
    for everyone

    View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. Pre-trained models. No ML skill required
    REST API: receives images and returns a JSON
    $2.5 or $5 / 1,000 units (free to try)
    Public Beta - cloud.google.com/vision
    Cloud Vision API

    View Slide

  21. View Slide

  22. 22
    22
    Demo

    View Slide

  23. Pre-trained models. No ML skill required
    REST API: receives audio and returns texts
    Supports 80+ languages
    Streaming or non-streaming
    Limited Preview - cloud.google.com/speech
    Cloud Speech API

    View Slide

  24. 24
    24
    Demo Video

    View Slide

  25. TensorFlow

    View Slide

  26. The Machine Learning Spectrum
    TensorFlow Cloud Machine Learning Machine Learning APIs
    Industry / applications
    Academic / research

    View Slide

  27. Google's open source library for
    machine intelligence
    tensorflow.org launched in Nov 2015
    The second generation
    Used by many production ML projects
    What is TensorFlow?

    View Slide

  28. What is TensorFlow?
    Tensor: N-dimensional array
    Flow: data flow computation framework (like MapReduce)
    For Machine Learning and Deep Learning
    Or any HPC (High Performance Computing) applications

    View Slide

  29. # define the network
    import tensorflow as tf
    x = tf.placeholder(tf.float32, [None, 784])
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    y = tf.nn.softmax(tf.matmul(x, W) + b)
    # define a training step
    y_ = tf.placeholder(tf.float32, [None, 10])
    xent = -tf.reduce_sum(y_*tf.log(y))
    step = tf.train.GradientDescentOptimizer(0.01).minimize
    (xent)

    View Slide

  30. # initialize session
    init = tf.initialize_all_variables()
    sess = tf.Session()
    sess.run(init)
    # training
    for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(step, feed_dict={x: batch_xs, y_:
    batch_ys})

    View Slide

  31. View Slide

  32. Portable
    ● Training on:
    ○ Data Center
    ○ CPUs, GPUs and etc
    ● Running on:
    ○ Mobile phones
    ○ IoT devices

    View Slide

  33. TensorBoard: visualization tool

    View Slide

  34. Cloud Machine Learning

    View Slide

  35. Fully managed, distributed training and prediction
    for custom TensorFlow graph
    Supports Regression and Classification initially
    Integrated with Cloud Dataflow and Cloud Datalab
    Limited Preview - cloud.google.com/ml
    Cloud Machine Learning (Cloud ML)

    View Slide

  36. View Slide

  37. Distributed Training with TensorFlow

    View Slide

  38. ● CPU/GPU scheduling
    ● Communications
    ○ Local, RPC, RDMA
    ○ 32/16/8 bit quantization
    ● Cost-based optimization
    ● Fault tolerance
    Distributed Training with TensorFlow

    View Slide

  39. Data Parallelism
    = split data, share model
    (but ordinary network is
    1,000x slower than GPU and
    doesn't scale)

    View Slide

  40. Cloud ML demo video

    View Slide

  41. Jeff Dean's keynote: YouTube video
    Define a custom TensorFlow graph
    Training at local: 8.3 hours w/ 1 node
    Training at cloud: 32 min w/ 20 nodes (15x faster)
    Prediction at cloud at 300 reqs / sec
    Cloud ML demo

    View Slide

  42. Summary

    View Slide

  43. Ready to use Machine
    Learning models
    Use your own data to
    train models
    Cloud
    Vision API
    Cloud
    Speech API
    Cloud
    Translate API
    Cloud Machine Learning
    Develop - Model - Test
    Google
    BigQuery
    Stay
    Tuned….
    Cloud
    Storage
    Cloud
    Datalab
    NEW
    Alpha
    GA Beta
    GA
    Alpha
    Beta
    GA

    View Slide

  44. Links & Resources
    Large Scale Distributed Systems for Training Neural Networks, Jeff Dean and
    Oriol Vinals
    Cloud Vision API: cloud.google.com/vision
    Cloud Speech API: cloud.google.com/speech
    TensorFlow: tensorflow.org
    Cloud Machine Learning: cloud.google.com/ml
    Cloud Machine Learning: demo video

    View Slide

  45. Thank you!

    View Slide