Tato Kutalia
tatocaster
tatocaster.me
github.com/tatocaster
twitter.com/@tatokutalia
Slide 3
Slide 3 text
What is Vision API?
Slide 4
Slide 4 text
What is Vision API?
Find objects in photos and video, using real-time
on-device vision technology.
• detect faces
• scan barcodes
• recognize text
Slide 5
Slide 5 text
Face Detection
• not recognition, just detection
• The Face API finds human
faces in photos, videos, or live
streams. It also finds and
tracks positions of facial
landmarks such as the eyes,
nose, and mouth
• classification
Slide 6
Slide 6 text
Face Orientation
Slide 7
Slide 7 text
Landmarks
Landmark detection is not done by default, since it takes additional time to run. You
can optionally specify that landmark detection should be done.
Slide 8
Slide 8 text
Classification
Classification determines whether a certain facial characteristic is present.
The Android Face API currently supports two classifications: eyes open
and smiling.
but iOS Face API currently supports only smiling classification.
note: “eyes open” and “smiling” classification only works for frontal faces, that is, faces with at
most about +/- 18 degrees.
Slide 9
Slide 9 text
Face tracking
extends face detection to video sequences. Any face appearing in a
video for any length of time can be tracked. That is, faces that are
detected in consecutive video frames can be identified as being the
same person.
Note: This is not face recognition
Slide 10
Slide 10 text
1. Create
Slide 11
Slide 11 text
2. Detect
Slide 12
Slide 12 text
3. Release
Slide 13
Slide 13 text
Result
Slide 14
Slide 14 text
Operational Status
The first time that an app using the Face API is installed on a device,
GMS will download a native library to the device in order to do face
detection.
A detector’s isOperational method can be used to check if the required
native library is currently available
Slide 15
Slide 15 text
Some weird things
• detect nose base
• scale bitmap
• calculate Y for external bitmap
• action
Text Recognition
• The Text API can recognize text in any Latin
based language
Real-Time, on device
Slide 19
Slide 19 text
Text Recognition
The Text Recognizer segments text into blocks, lines, and words.
a Block is a contiguous set of text lines, such as a paragraph or
column,
a Line is a contiguous set of words on the same vertical axis, and
a Word is a contiguous set of alphanumeric characters on the same
vertical axis.
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
more
• mobile OCR codelab :
https://codelabs.developers.google.com/codelabs/mobile-vision-ocr/
• GitHub samples
https://github.com/googlesamples/android-vision
Slide 22
Slide 22 text
more
• https://github.com/tatocaster/VisionAPIAndRxJava
• https://speakerdeck.com/tatocaster/mobile-vision-api