Mobile Vision API - Speaker Deck

Slide 1

Slide 1 text

Mobile Vision API

Slide 2

Slide 2 text

Tato Kutalia tatocaster tatocaster.me github.com/tatocaster twitter.com/@tatokutalia

Slide 3

Slide 3 text

What is Vision API?

Slide 4

Slide 4 text

What is Vision API? Find objects in photos and video, using real-time on-device vision technology. • detect faces • scan barcodes • recognize text

Slide 5

Slide 5 text

Face Detection • not recognition, just detection • The Face API finds human faces in photos, videos, or live streams. It also finds and tracks positions of facial landmarks such as the eyes, nose, and mouth • classification

Slide 6

Slide 6 text

Face Orientation

Slide 7

Slide 7 text

Landmarks Landmark detection is not done by default, since it takes additional time to run. You can optionally specify that landmark detection should be done.

Slide 8

Slide 8 text

Classification Classification determines whether a certain facial characteristic is present. The Android Face API currently supports two classifications: eyes open and smiling. but iOS Face API currently supports only smiling classification. note: “eyes open” and “smiling” classification only works for frontal faces, that is, faces with at most about +/- 18 degrees.

Slide 9

Slide 9 text

Face tracking extends face detection to video sequences. Any face appearing in a video for any length of time can be tracked. That is, faces that are detected in consecutive video frames can be identiﬁed as being the same person. Note: This is not face recognition

Slide 10

Slide 10 text

1. Create

Slide 11

Slide 11 text

2. Detect

Slide 12

Slide 12 text

3. Release

Slide 13

Slide 13 text

Result

Slide 14

Slide 14 text

Operational Status The ﬁrst time that an app using the Face API is installed on a device, GMS will download a native library to the device in order to do face detection. A detector’s isOperational method can be used to check if the required native library is currently available

Slide 15

Slide 15 text

Some weird things • detect nose base • scale bitmap • calculate Y for external bitmap • action

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Barcode • 1D barcodes: EAN-13, EAN-8, UPC-A, UPC-E, Code-39, Code-93, Code-128, ITF, Codabar • 2D barcodes: QR Code, Data Matrix, PDF-417, AZTEC Android 4.2.2

Slide 18

Slide 18 text

Text Recognition • The Text API can recognize text in any Latin based language Real-Time, on device

Slide 19

Slide 19 text

Text Recognition The Text Recognizer segments text into blocks, lines, and words. a Block is a contiguous set of text lines, such as a paragraph or column, a Line is a contiguous set of words on the same vertical axis, and a Word is a contiguous set of alphanumeric characters on the same vertical axis.

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

more • mobile OCR codelab :   https://codelabs.developers.google.com/codelabs/mobile-vision-ocr/ • GitHub samples   https://github.com/googlesamples/android-vision

Slide 22

Slide 22 text

more • https://github.com/tatocaster/VisionAPIAndRxJava • https://speakerdeck.com/tatocaster/mobile-vision-api

Slide 23

Slide 23 text

No content