ML Kit in Action - Speaker Deck

Slide 1

Slide 1 text

ML Kit in Action (Android) Mobile Things S02E03 When machine learning meets augmented reality Qian JIN | @bonbonking | [email protected] Image Credit: https://becominghuman.ai/part-1-migrate-deep-learning-training-onto-mobile-devices-c28029ffeb30

Slide 2

Slide 2 text

ML Kit in Action • The building blocks of ML Kit • Vision APIs: text recognition, face detection, barcode scanning, image labeling, landmark recognition • Custom Models • Custom TensorFlow build • General feedbacks

Slide 3

Slide 3 text

The building blocks of ML Kit

Slide 4

Slide 4 text

Mobile Vision API TensorFlow Lite Neural Network API Google Cloud Vision API + ML Kit Vision APIs + ML Kit Custom Models / TF Lite Build = =

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Vision APIs

Slide 9

Slide 9 text

Vision: You talking to me?

Slide 10

Slide 10 text

FirebaseVisionImage • fromBitmap • fromByteArray • fromByteBuffer • fromFilePath • fromMediaImage

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Text Recognition On-device Cloud Pricing Free Free for first 1000 uses of this feature per month Ideal use cases Real-time processing High-accuracy text recognition Document scanning Language support Latin characters A broad range of languages and special characters

Slide 13

Slide 13 text

FirebaseVisionTextDetector FirebaseVisionImage INPUT FirebaseVisionText OUTPUT

Slide 14

Slide 14 text

MobileThings: ML Kit in action for (FirebaseVisionText.Block block: firebaseVisionText.getBlocks()) { Rect boundingBox = block.getBoundingBox(); Point[] cornerPoints = block.getCornerPoints(); String text = block.getText(); for (FirebaseVisionText.Line line: block.getLines()) { // ... for (FirebaseVisionText.Element element: line.getElements()) { // ... } } }

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Face Detection: Key Capabilities • Recognise and locate facial features • Recognise facial expressions • Track faces across video frames • Process video frames in real time

Slide 18

Slide 18 text

Face tracking Landmark Classiﬁcation Face Orientation

Slide 19

Slide 19 text

Face Orientation • Euler X • Euler Y • Euler Z

Slide 20

Slide 20 text

Landmarks • A landmark is a point of interest within a face. The left eye, right eye, and nose base are all examples of landmarks

Slide 21

Slide 21 text

Classiﬁcation • 2 classiﬁcations are supported: Eye open (left & right eye) & Smiling • Inspiration: Android Things photo booth

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Face Detection Options FirebaseVisionFaceDetectorOptions options = new FirebaseVisionFaceDetectorOptions.Builder() .setModeType(FirebaseVisionFaceDetectorOptions.ACCURATE_MODE) .setLandmarkType(FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS) .setClassificationType(FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS) .setMinFaceSize(0.15f) .setTrackingEnabled(true) .build();

Slide 25

Slide 25 text

FirebaseVisionFaceDetector FirebaseVisionImage List INPUT OUTPUT

Slide 26

Slide 26 text

➡ boundingBox: Rect ➡ trackingId: Int ➡ headEulerAngleY: Float ➡ headEulerAngleZ: Float ➡ smilingProbability: Float ➡ leftEyeOpenProbability: Float ➡ rightEyeOpenProbability: Float !

Slide 27

Slide 27 text

Feedback • Real-time application: pay attention to the image size /fr.xebia.mlkitinactions E/pittpatt: online_face_detector.cc:236] inconsistent image dimensions detector.cc:220] inconsistent image dimensions /fr.xebia.mlkitinactions E/NativeFaceDetectorImpl: Native face detection failed java.lang.RuntimeException: Error detecting faces. at com.google.android.gms.vision.face.NativeFaceDetectorImpl.detectFacesJni(Native Method)

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Image Labeling On-device Cloud Pricing Free Free for first 1000 uses of this feature per month Label coverage 400+ labels that cover the most commonly-found concepts in photos. See below. 10,000+ labels in many categories. See below. Also, try the Cloud Vision API demo to see what labels can be found for an image you provide. Knowledge Graph entity ID support

Slide 32

Slide 32 text

FirebaseVisionLabelDetector FirebaseVisionImage List INPUT OUTPUT

Slide 33

Slide 33 text

➡ label: String ➡ confidence: Float ➡ entityId: String

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Landmark Recognition • Still in preview, using Cloud Vision API instead • Recognizes well-known landmarks • Get Google Knowledge Graph entity IDs • Low-volume use free (ﬁrst 1000 images)

Slide 37

Slide 37 text

Custom Model

Slide 38

Slide 38 text

Some night towards the end of 2016

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

!40

Slide 41

Slide 41 text

Android SDK (Java) Android NDK (C++) Classifier Implementation TensorFlow JNI wrapper Image (Bitmap) Trained Model top_results Classifications + Confidence input_tensor 1 2 3 4 Camera Preview Ref: https://jalammar.github.io/Supercharging-android-apps-using-tensorflow/ Overlay Display

Slide 42

Slide 42 text

Magritte Ceci n’est pas une pomme.

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Android Makers Paris, April 2017

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Model Size All weights are stored as they are (64-bit floats) => 80MB !46

Slide 47

Slide 47 text

~80MB -> ~20MB !47 Weights Quantization 6.372638493746383 => 6.4 Source: https://www.tensorflow.org/performance/quantization

Slide 48

Slide 48 text

Model Inception V3 Optimized & Quantized

Slide 49

Slide 49 text

Google I/O, May 2017

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Google AI Blog, June 2017

Slide 52

Slide 52 text

MobileNet Mobile-first computer vision models for TensorFlow !52 Image credit : https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

~80Mb => ~20Mb => ~1-5Mb Source: https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html

Slide 55

Slide 55 text

DevFest Nantes, October 2017

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Model MobileNets_0.25_224

Slide 58

Slide 58 text

Google I/O, May 2018

Slide 59

Slide 59 text

Custom Model: Key capabilities • TensorFlow lite model hosting • On-device ML inference • Automatic model fallback • Automatic model updates

Slide 60

Slide 60 text

Convert model to TF Lite (model.lite) Host your TF Lite model on Firebase Use the TF Lite model for inference Train your TF model (model.pb) TOCO (TensorFlow Lite Optimizing Converter)

Slide 61

Slide 61 text

How to train your dragon model?

Slide 62

Slide 62 text

Train your model Source: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/ python -m tensorflow/examples/image_retraining/retrain.py \ --bottleneck_dir=tf_files/bottlenecks \ --how_many_training_steps=500 \ --model_dir=tf_files/models/ \ --summaries_dir=tf_files/training_summaries/ \ --output_graph=tf_files/retrained_graph.pb \ --output_labels=tf_files/retrained_labels.txt \ —architecture=mobilenet_0.50_224 \ --image_dir=tf_files/fruit_photos

Slide 63

Slide 63 text

TF Lite conversion for retrained quantized model is currently unavailable. Firebase quickstart ML Kit sample only aimes quantized model.

Slide 64

Slide 64 text

Convert to tflite format Source: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/ bazel run --config=opt \ //tensorflow/contrib/lite/toco:toco -- \ --input_file=/tmp/magritte_retrained_graph.pb \ --output_file=/tmp/magritte_graph.tflite \ --inference_type=FLOAT \ --input_shape=1,224,224,3 \ --input_array=input \ --output_array=final_result \ --mean_value=128 \ --std_value=128 \ --default_ranges_min=0 \ --default_ranges_max=6

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

Do you need custom bob models?

Slide 67

Slide 67 text

Use custom models if • Speciﬁc needs CAN NOT be met by general purpose APIs • Need high matching precision • You are an experienced ML developer (or you know Yoann Benoit) Let me train your model!

Slide 68

Slide 68 text

FirebaseModelInterpreter FirebaseModelInputs FirebaseModelOutputs INPUT OUTPUT

Slide 69

Slide 69 text

// input & output options for non-quantized model val inputDims = intArrayOf(DIM_BATCH_SIZE, DIM_IMG_SIZE_X, DIM_IMG_SIZE_Y, DIM_PIXEL_SIZE) val outputDims = intArrayOf(1, labelList.size) inputOutputOptions = FirebaseModelInputOutputOptions.Builder() .setInputFormat(0, FirebaseModelDataType.FLOAT32, inputDims) .setOutputFormat(0, FirebaseModelDataType.FLOAT32, outputDims) .build()

Slide 70

Slide 70 text

Slide 71

Slide 71 text

ByteBuffer FirebaseModelInputs INPUT

Slide 72

Slide 72 text

FirebaseModelOutputs OUTPUT

Slide 73

Slide 73 text

Performance Benchmarks

Slide 74

Slide 74 text

Model MobileNets_1.0_224

Slide 75

Slide 75 text

Model MobileNets_1.0_224

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

• No callback or other feedback for model downloading • Model downloading seems to be blocking => do not use on main thread • Lack of documentations at this point (e.g. how to stop the interpreter?) • Slight performance loss comparing to TensorFlow Lite • A/B test your machine learning model!

Slide 78

Slide 78 text

HowTo: Face Recognition Model • Trained with Keras + FaceNet • Converted to TensorFlow • Then converted to TensorFlow lite • Then we got stuck…

Slide 79

Slide 79 text

Custom TensorFlow Lite build

Slide 80

Slide 80 text

Custom TF Lite build • ML Kit uses a pre-built TensorFlow Lite library • Build your own AAR with bazel • Add custom ops for example

Slide 81

Slide 81 text

Takeaway

Slide 82

Slide 82 text

ML Kit: State of the art • Lack of high quality demos (e.g. ﬁrebase mlkit quickstart, bugs, deprecated camera API, deformed camera preview) • Lack of high level guidelines / best practises • Performance issue on old devices

Slide 83

Slide 83 text

The best is yet to come • Face contours: 100 data points • Smart Reply: conversation model • Online model compression

Slide 84

Slide 84 text

References

Slide 85

Slide 85 text

References • Talk Magritte for DroidCon London • Medium article: Android meets Machine Learning • Github Repo for demo • Joe Birch: Exploring Firebase MLKit on Android: Introducing MLKit (Part one) • Joe Birch: Exploring Firebase MLKit on Android: Face Detection (Part Two) • Merci Sandra ;)

Slide 86

Slide 86 text

Questions?