Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analyzing Images with Google's Cloud Vision API

sararob
February 06, 2016

Analyzing Images with Google's Cloud Vision API

Ever wondered about the technology behind Google Photos? Or wanted to build an app that performs complex image analysis, like detecting objects, faces, emotions, and landmarks? The new Google Cloud Vision API (currently in alpha) exposes the machine learning models that power Google Photos and Google Image Search. Developers can now access these features with just a simple REST API call. We’ll learn how to make a request to the Vision API, and then we’ll see it classify images, extract text, and even identify landmarks like Harry Potter World. We’ll end the talk by live coding an iOS app that implements image detection with the Vision API.

sararob

February 06, 2016
Tweet

More Decks by sararob

Other Decks in Programming

Transcript

  1. What we’ll cover 01 02 03 04 Image detection in

    the wild What is the Cloud Vision API? Let’s make an API request! Live demo
  2. @SRobTweets 8 TensorFlow • Open source ML library for researchers

    and developers to build and train their own deep learning models • Python and C++ APIs • Used in many areas across Google tensorflow.org github.com/tensorflow
  3. @SRobTweets 12 12 Types of Detection • Label • Landmark

    • Logo • Face • Text • Safe search
  4. @SRobTweets 13 13 Types of Detection Face Detection ◦ Find

    multiple faces ◦ Location of eyes, nose, mouth ◦ Detect emotions: joy, anger, surprise, sorrow Entity Detection ◦ Find common objects and landmarks, and their location in the image ◦ Detect explicit content
  5. Making a request { "requests":[ { "image": { "content": "base64ImageString"

    }, "features": [ { "type": "LABEL_DETECTION", "maxResults": 10 }, { "type": "FACE_DETECTION", "maxResults": 10 }, // More feature detection types... ] } ] }
  6. 17 { "labelAnnotations" : [ { "mid" : "\/m\/01wydv", "score"

    : 0.92442685, "description" : "beignet" }, { "mid" : "\/m\/0270h", "score" : 0.90845567, "description" : "dessert" }, { "mid" : "\/m\/033nb2", "score" : 0.74553984, "description" : "profiterole" }, { "mid" : "\/m\/01dk8s", "score" : 0.71415579, "description" : "powdered sugar" } ] } Label Detection 17
  7. 18 { "landmarkAnnotations" : [ { "boundingPoly" : { "vertices"

    : [ { "x" : 52, "y" : 25 }, ... ] }, "mid" : "\/m\/0b__kbm", "score" : 0.4231607, "description" : "The Wizarding World of Harry Potter", "locations" : [ { "latLng" : { "longitude" : -81.471261, "latitude" : 28.473 } } ] } ] } Landmark Detection 18 Photo attributions: Eiffel Tower (Creative Commons via Sathish J), Lens (Creative Commons via Mark Hunter)
  8. 19 "logoAnnotations" : [ { "boundingPoly" : { "vertices" :

    [ { "x" : 130, "y" : 157 }, ... ] }, "mid" : "\/m\/018c_r", "score" : 0.811352, "description" : "Google" } ] Logo Detection 19
  9. 20 "faceAnnotations" : [ { "headwearLikelihood" : "VERY_UNLIKELY", "surpriseLikelihood" :

    "VERY_UNLIKELY", "rollAngle" : 8.5484314, "angerLikelihood" : "VERY_UNLIKELY", "detectionConfidence" : 0.9996134, "joyLikelihood" : "VERY_LIKELY", "panAngle" : 18.178885, "sorrowLikelihood" : "VERY_UNLIKELY", "tiltAngle" : -12.244568, "underExposedLikelihood" : "VERY_UNLIKELY", "blurredLikelihood" : "VERY_UNLIKELY" "landmarks" : [ { "type" : "LEFT_EYE", "position" : { "x" : 268.25815, "y" : 491.55255, "z" : -0.0022390306 } }, ... Face Detection 20 { "type" : "RIGHT_EYE", "position" : { "x" : 418.42868, "y" : 508.22632, "z" : 49.302765 } }, { "type" : "MIDPOINT_BETWEEN_EYES", "position" : { "x" : 359.86551, "y" : 500.2868, "z" : -7.9241152 } }, { "type" : "NOSE_TIP", "position" : { "x" : 358.51404, "y" : 611.80286, "z" : -31.350466 } }, ...
  10. 21 { "locale" : "en", "boundingPoly" : { "vertices" :

    [ { "x" : 99, "y" : 220 }, { "x" : 551, "y" : 220 }, { "x" : 551, "y" : 345 }, { "x" : 99, "y" : 345 } ] }, "description" : "WINCHESTER\nCOTTAGE S\nCOPPERFIELD STREET\nLONDON BOROUGH OF SOUTHWARK\n" } Text Detection 21
  11. @SRobTweets 23 23 We’ll build • An iOS app in

    Swift that... • Runs label detection on an image • Stores label data in Firebase
  12. @SRobTweets 25 25 Resources • Sign up for the alpha:

    cloud.google.com/vision • Read the announcement blog post: bit.ly/vision-api-blog • Get started with TensorFlow: tensorflow.org • Jeff Dean’s presentation on neural networks: bit.ly/google-brain-ml Photo Attributions • Eiffel Tower: Creative Commons via Sathish J • Lens: Creative Commons via Mark Hunter • Harry Potter World: Creative Commons via daihung • London building: Creative Commons via DncnH • iPhone camera: Creative Commons via David Goehring
  13. Confidential & Proprietary Google Cloud Platform 27 ABOUT GCP NEXT

    2016 WHAT NEXT 2016 is a user conference and learning opportunity for developers, IT professionals and technologists who want to understand what’s next for cloud technology. The two-day conference will feature: - product updates and demonstrations - perspectives from industry leaders - hands-on code labs - 30+ breakout sessions - technical training - opportunities to connect with Google engineers and the GCP community. WHERE/WHEN Pier 48, San Francisco, CA March 23-24, 2016 HOW MUCH $499/attendee ($349 early bird rate ends 5 Feb)
  14. Confidential & Proprietary Google Cloud Platform 28 Top 5 reasons

    why developers need to go... Don’t miss out, Reserve your seat: goo.gl/lNPpwr