Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Intelligence at Google Scale: Vision/Speech API, TensorFlow and Cloud Machine Learning

Machine Intelligence at Google Scale: Vision/Speech API, TensorFlow and Cloud Machine Learning

The biggest challenge of Deep Learning technology is the scalability. As long as using single GPU server, you have to wait for hours or days to get the result of your work. This doesn't scale for production service, so you need a Distributed Training on the cloud eventually. Google has been building infrastructure for training the large scale neural network on the cloud for years, and now started to share the technology with external developers. In this session, we will introduce new pre-trained ML services such as Cloud Vision API and Speech API that works without any training. Also, we will look how TensorFlow and Cloud Machine Learning will accelerate custom model training for 10x - 40x with Google's distributed training infrastructure.

「Googleスケールの機械学習テクノロジー」
現在のディープラーニング技術の最大の課題はスケーラビリティです。1台のGPUサーバを使っている限り、学習結果が得られるまで数時間や数日待つ必要があります。プロダクションでの利用には分散学習の導入が不可欠です。Googleは過去数年に渡ってクラウド上での大規模なニューラルネットワーク学習のためのインフラを構築し、その成果を外部の開発者に提供し始めました。このセッションでは、Cloud Vision APIやSpeech APIなど、学習せずにすぐに使えるMLサービスを紹介します。また、TensorFlowやCloud Machine Learningなど、Googleの分散学習インフラにより10〜40倍の高速学習を実現するサービスについて解説します。

Kazunori Sato

April 04, 2016
Tweet

More Decks by Kazunori Sato

Other Decks in Programming

Transcript

  1. Machine Intelligence at Google Scale:
    Vision/Speech API, TensorFlow and Cloud ML

    View full-size slide

  2. +Kazunori Sato
    @kazunori_279
    Kaz Sato
    Staff Developer Advocate
    Tech Lead for Data & Analytics
    Cloud Platform, Google Inc.

    View full-size slide

  3. What we’ll cover
    Deep learning and distributed training
    Large scale neural network on Google Cloud
    Cloud Vision API and Speech API
    TensorFlow and Cloud Machine Learning

    View full-size slide

  4. Deep Learning and
    Distributed Training

    View full-size slide

  5. From: Andrew Ng

    View full-size slide

  6. DNN = a large matrix ops
    a few GPUs >> CPU
    (but it still takes days to train)
    a supercomputer >> a few GPUs
    (but you don't have a supercomputer)
    You need Distributed Training on the cloud

    View full-size slide

  7. Google Brain.
    Large scale neural network on Google Cloud

    View full-size slide

  8. Enterprise
    Google Cloud is
    The Datacenter as a Computer

    View full-size slide

  9. Jupiter network
    10 GbE x 100 K = 1 Pbps
    Consolidates servers with
    microsec latency

    View full-size slide

  10. Borg
    No VMs, pure containers
    10K - 20K nodes per Cell
    DC-scale job scheduling
    CPUs, mem, disks and IO

    View full-size slide

  11. 13
    Google Cloud +
    Neural Network =
    Google Brain

    View full-size slide

  12. The Inception model (GoogLeNet, 2015)

    View full-size slide

  13. What's the scalability of Google Brain?
    "Large Scale Distributed Systems for Training Neural
    Networks", NIPS 2015
    ○ Inception / ImageNet: 40x with 50 GPUs
    ○ RankBrain: 300x with 500 nodes

    View full-size slide

  14. Large-scale neural network
    for everyone

    View full-size slide

  15. Pre-trained models. No ML skill required
    REST API: receives images and returns a JSON
    $2.5 or $5 / 1,000 units (free to try)
    Public Beta - cloud.google.com/vision
    Cloud Vision API

    View full-size slide

  16. Pre-trained models. No ML skill required
    REST API: receives audio and returns texts
    Supports 80+ languages
    Streaming or non-streaming
    Limited Preview - cloud.google.com/speech
    Cloud Speech API

    View full-size slide

  17. 24
    24
    Demo Video

    View full-size slide

  18. The Machine Learning Spectrum
    TensorFlow Cloud Machine Learning Machine Learning APIs
    Industry / applications
    Academic / research

    View full-size slide

  19. Google's open source library for
    machine intelligence
    tensorflow.org launched in Nov 2015
    The second generation
    Used by many production ML projects
    What is TensorFlow?

    View full-size slide

  20. What is TensorFlow?
    Tensor: N-dimensional array
    Flow: data flow computation framework (like MapReduce)
    For Machine Learning and Deep Learning
    Or any HPC (High Performance Computing) applications

    View full-size slide

  21. # define the network
    import tensorflow as tf
    x = tf.placeholder(tf.float32, [None, 784])
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    y = tf.nn.softmax(tf.matmul(x, W) + b)
    # define a training step
    y_ = tf.placeholder(tf.float32, [None, 10])
    xent = -tf.reduce_sum(y_*tf.log(y))
    step = tf.train.GradientDescentOptimizer(0.01).minimize
    (xent)

    View full-size slide

  22. # initialize session
    init = tf.initialize_all_variables()
    sess = tf.Session()
    sess.run(init)
    # training
    for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(step, feed_dict={x: batch_xs, y_:
    batch_ys})

    View full-size slide

  23. Portable
    ● Training on:
    ○ Data Center
    ○ CPUs, GPUs and etc
    ● Running on:
    ○ Mobile phones
    ○ IoT devices

    View full-size slide

  24. TensorBoard: visualization tool

    View full-size slide

  25. Cloud Machine Learning

    View full-size slide

  26. Fully managed, distributed training and prediction
    for custom TensorFlow graph
    Supports Regression and Classification initially
    Integrated with Cloud Dataflow and Cloud Datalab
    Limited Preview - cloud.google.com/ml
    Cloud Machine Learning (Cloud ML)

    View full-size slide

  27. Distributed Training with TensorFlow

    View full-size slide

  28. ● CPU/GPU scheduling
    ● Communications
    ○ Local, RPC, RDMA
    ○ 32/16/8 bit quantization
    ● Cost-based optimization
    ● Fault tolerance
    Distributed Training with TensorFlow

    View full-size slide

  29. Data Parallelism
    = split data, share model
    (but ordinary network is
    1,000x slower than GPU and
    doesn't scale)

    View full-size slide

  30. Cloud ML demo video

    View full-size slide

  31. Jeff Dean's keynote: YouTube video
    Define a custom TensorFlow graph
    Training at local: 8.3 hours w/ 1 node
    Training at cloud: 32 min w/ 20 nodes (15x faster)
    Prediction at cloud at 300 reqs / sec
    Cloud ML demo

    View full-size slide

  32. Ready to use Machine
    Learning models
    Use your own data to
    train models
    Cloud
    Vision API
    Cloud
    Speech API
    Cloud
    Translate API
    Cloud Machine Learning
    Develop - Model - Test
    Google
    BigQuery
    Stay
    Tuned….
    Cloud
    Storage
    Cloud
    Datalab
    NEW
    Alpha
    GA Beta
    GA
    Alpha
    Beta
    GA

    View full-size slide

  33. Links & Resources
    Large Scale Distributed Systems for Training Neural Networks, Jeff Dean and
    Oriol Vinals
    Cloud Vision API: cloud.google.com/vision
    Cloud Speech API: cloud.google.com/speech
    TensorFlow: tensorflow.org
    Cloud Machine Learning: cloud.google.com/ml
    Cloud Machine Learning: demo video

    View full-size slide