Android ML- Android11과 함께하는 Custom Model Serving

GDG Pangyo Android ML Android11과 함께 하는 Custom Model
Serving Jeongah Shin (@jeongahri) Machine Learning of Things Lab Researcher, Modulabs Freelance R&D, Naver ClovaAI

GDG Pangyo MoT Doyoung Gwak DongSeok Yang Taekmin Kim Jaewook
Kang Yonggeun Lee Jeongah Shin Youngchae Ji Machine Learning of Things 스마트폰 속의 딥러닝 Model Optimization Hardware Mobile 신정아 (@jeongahri) 안드로이드 전담 연구원 Tensorflow Lite, ML Kit, PyTorch Mobile, Android NN API, Federated Learning 지표 측정과 테스트 생활화 주로 Vision (Pose Estimation, Object Detection, Face Detection, Text Detection … 직접 경량화 모델을 설계하고, 학습 시켜서 재미 있는 앱 서비스를 만들고, 사용자를 만나는 과정 즐겨요! 2018.05 ~

GDG Pangyo Turtle 경험을 나눠요! 7개의 모바일 머신러닝 앱을 생산(?)
하며, 겪고 느꼈던 것들을 이야기 하러 왔어요. CPM-Based Pose Estimation Mobile-BERT based SQUAD Finger Pose Estimation + ML Kit Text Recognition ML Kit Face Detection

GDG Pangyo CONTENTS On-device ML Production with Android Modeling Conversion
Inference 1. Custom Model Building 2. AutoML Vision Edge 3. Tensorflow Lite Model Maker (Transfer Learning) 4. Tensorflow Hub 1. Tensorflow Lite Converter (with tflite-support) 2. Post-training Quantization 3. Performance Benchmark 1. Inference (Java/Kotlin) 2. Tensorflow Lite Support with Android Studio 3. Hardware Acceleration (CPU, GPU, NNAPI) 4. Enhance your Model with Android 11

GDG Pangyo 1. Custom Model Building Part 1 Modeling Modeling
/ Training Conversion TFLiteConverter TocoConverter MACE converter .tflite Inference Script Module

GDG Pangyo 1. Custom Model Building Part 1 Modeling Performance
TF Lite is fast with no noticeable accuracy loss Portability Android, iOS, and more specialized IoT devices. Low latency Optimized float- and fixed-point CPU kernels, op-fusing, and more. Acceleration Integration with GPU and internal/external accelerators. Small model size Controlled dependencies, quantization and op registration. Tooling Conversion, compression, benchmarking, power-consumption 휴대용 모바일 기기(Portable)에서 적은 지연(Low latency)과 작은 모델 사이즈(Small Model Size)로 OnDevice 딥러닝 추론을 가능하게 해주는 머신러닝 프레임워크

GDG Pangyo 1. Custom Model Building Part 1 Modeling Tensorflow
Lite Custom Model만 지원 Pretrained 된 구글의 모델들이 많이 나와있지만, 기본적으로 프로젝트에 직접 .tflite 파일 이식해야 함. On-Device 추론만 지원 하드웨어 가속과 GPU / Hexagon DSP Serving / NNAPI 활용 (GPU, NPU, DSP) 가능 ML Kit Production-Ready 모델(Made by Google) / Custom Model 모두 지원 On-Device 추론 / 클라우드 추론 모두 지원 두 방법 모두 Firebase 활용 하드웨어 가속은 가능하지만 CPU Serving만 가능

GDG Pangyo 1. Custom Model Building Part 1 Modeling

GDG Pangyo 1. Custom Model Building Part 1 Modeling Modeling
/ Training Conversion TFLiteConverter TocoConverter .tflite 가능한 파이프라인이지만 권장하지 않음! 대개 연구자 / 개발자 역할이 분리되어 있는 경우가 많고, 많은 연구자분들이 PyTorch 를 선호 하심. But, 기 구축된 PyTorch 모델이 있거나 한 상황이 아니라면, Tensorflow 로 처음부터 End-to-End 파이프라인을 구성하는 것을 권장 드림.

GDG Pangyo 1. Custom Model Building Part 1 Modeling 경량
구조 모델 설계 (Compact Network Design) 모델 경량화 (Model Compression & Optimization) Model Architecture ResNet, DenseNet, SqueezeNet Optimizing Convolution Filter MobileNet, ShuffleNet Neural Architecture Search NetAdapt, MNasNet Weight Pruning (가중치 가지치기) Quantization (양자화) Binarization (이진화) Weight Sharing (가중치 공유) Knowledge Distillation (지식 증류) Transfer Learning (전이 학습) During- Training

GDG Pangyo 1. Custom Model Building Part 1 Modeling Compact
Network Design - 어떤 구조를 선택할 것인가? Model Compression & Optimzation (정해진 구조 내에서) 어떻게 최적화 할 것인가? Network Pruning DNN Quantization

GDG Pangyo 2. AutoML Vision Edge Part 1 Modeling https://firebase.google.com/docs/ml/automl-image-labeling
Train models based on your data Built-in model hosting Train Data를 넣으면 (분류 모델에 대해서만) 자동으로 단일/다중 라벨 분류를 실행, 혹은 주어진 정보(gt값, 라벨링 결과 등)를 습득 하고, 새로운 모델 구조를 찾아 모델을 학습 시켜 줌. 학습 결과를 원하는 Format으로 Export 가능

GDG Pangyo 2. AutoML Vision Edge Part 1 Modeling https://firebase.google.com/docs/ml/automl-image-labeling
타사 AutoML에 비해 성능이 매우 훌륭한 (특히 다중 분류 모델 Task에 대해) AutoML 소프트웨어이지만, 경량화 구조의 모델을 Production 목적으로 활용하기에 아직 한계점이 많이 보임. But, 여러 모델의 Backbone을 실험할 목적이라면 아주 권장 드림. For Lightweight Backbone Experiment

GDG Pangyo 3. Tensorflow Lite Model Maker Part 1 Modeling
아주 간단한 구현으로 Transfer Learning 실험 가능, 스크래치 부터 학습 할 때 보다 적은 학습 시간, 작은 규모의 데이터 활용 Teacher-Student Learning 단, 현재는 Image Classification, Text Classification, Question-Answer 모델 도메인에서만 활용 가능

GDG Pangyo 4. Tensorflow Hub Part 1 Modeling Tensorflow 기반
pre-trained 모델 집합소

GDG Pangyo 1. Tensorflow Lite Converter Part 2 Conversion import
argparse import os os.environ['CUDA_VISIBLE_DEVICES'] = '' os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' parser = argparse.ArgumentParser(description="Tools for convert frozen_pb into tflite or coreml.") parser.add_argument("--frozen_pb", type=str, default="model-23500.pb", help="Path for storing checkpoint.") parser.add_argument("--input_node_name", type=str, default="image", help="Name of input node name.") parser.add_argument("--output_node_name", type=str, default="hourglass_out_3", help="Name of output node name.") parser.add_argument("--output_path", type=str, default="./result", help="Path for storing tflite & coreml") parser.add_argument("--type", type=str, default="tflite", help="tflite or coreml") args = parser.parse_args() output_filename = args.frozen_pb.rsplit("/", 1)[0] output_filename = output_filename.split(".")[0] if "tflite" in args.type: import tensorflow as tf output_filename += ".tflite" converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph( args.frozen_pb, [args.input_node_name], [args.output_node_name] ) tflite_model = converter.convert() open(os.path.join(args.output_path, output_filename), "wb").write(tflite_model) print("Generate tflite success.") Tensorflow 1.x implementation - Pose Estimation model

GDG Pangyo 1. Tensorflow Lite Converter Tensorflow 2.x implementation with
tflite-support 약속 해주세요! 원활한 협업을 위한 Metadata 작성 Part 2 Conversion

GDG Pangyo 2. Post-training Quantization ✓ Post-training Quantization - Fully
Quantized (weight/activation) ✓ Post-training Quantization Weight ONLY Part 2 Conversion

GDG Pangyo 3. Performance Benchmark For specific target device For
general Part 2 Conversion

GDG Pangyo 1. Inference (python, Java/Kotlin, C++) Part 3 Inference
전처리/후처리 클래스에 어떤 역할을 위임할 것인가? .tflite input output 전처리 : ImageClassifer convertBitmapToByteBuffer() loadModel() loadAsset() … 후처리 : ImageClassiferFloatException runInference() addPixelValue() getProbability() getLabelProbArray() … 이미지 다중 분류 모델

GDG Pangyo 2. Tensorflow Lite Support with Android Studio Part
3 Inference

GDG Pangyo 2. Tensorflow Lite Support with Android Studio (안드로이드
플랫폼 개발 경험이 적은) 연구자가 실험용으로 사용하기 위한 목적이라면 적극 권장하지만 프로덕션 목적이라면 권장하지 않음. 간소화된 code line으로 모델 추론 인터페이스를 만들 수 있음. Part 3 Inference

GDG Pangyo 3. Hardware Acceleration (CPU, GPU, NNAPI) For general
perspective, GPU(Graphics Processing Unit) : 많은 처리를 요하지만 빨리 처리해야하는 모델 ex) 3D handling 하는 model CPU(Central Processing Unit) : 간단한 ML 모델에의 최선의 옵션 NPU (Neural Processing Unit), DSP(Digital Signal Processor) : 적은 처리량을 요하면서도 복잡한 모델, 엄청나게 빨리 처리해야하는 모델 (Device Compatibility 문제로 아직 상용화 어려움) Part 3 Inference

GDG Pangyo Q. GPU, NNAPI 를 사용해서 하드웨어를 가속하면 무조건
속도가 빨라지나요?

GDG Pangyo Q. GPU, NNAPI 를 사용해서 하드웨어를 가속하면 무조건
속도가 빨라지나요? 아니요!

GDG Pangyo 3. Hardware Acceleration (CPU, GPU, NNAPI) ✓ 특정
하드웨어를 가속할 수 있는 Tensorflow Operator는 한정되어 있음. ✓ Delegate에서 지원하지 않는 Operator가 사용된 부분에 한에서 CPU와의 병행적인 사용이 가능하지만, CPU만 사용할 때 보다 되려 성능이 하락될 수 있음. ✓Mobile-BERT; GPU Delegation을 사용하는 것 보다 CPU 환경에서 Multi- threading 하는게 더 효과적이었음. ✓ GPU, NNAPI, DSP 가속을 할 때 기억해주세요! Part 3 Inference https://www.tensorflow.org/lite/performance/gpu_advanced

GDG Pangyo 4. Enhance your Model with Android 11 Android
Neural Networks API 1.3 (API >= 27, Android 8.1 Oreo) Quality of Service API 모델을 실행하는 데에 있어서 priority를 지정할 수 있고, 모델 실행에 대한 timeout 설정을 지원 Memory Domain API 연속적인 모델 추론 시에 memory copy와 변형을 감소 시킴 Expanded Quantization Support Signed integer asymmetric quantization 지원 이전의 NNAPI 상에서 지원하는 quantization의 종류 - per-tensor, assymetric, uint8 8-bit Quantization Formula A (m x n) Activation Function (in most cases) - Signed int8 [-128, 127] 범위에서 zero-point를 가지고 있음 (Assymetirc) B (n x p) Quantized Weight (Weight Initialization Technique은 다양하지만) 0으로 강제 초기화 가능 (Symmetric) Part 3 Inference

GDG Pangyo Thank you ! Jeongah Shin (@jeongahri) jeongah.arie@gmail.com https://github.com/motlabs/awesome-ml-demos-with-android

Android ML- Android11과 함께하는 Custom Model Serving

Android ML- Android11과 함께하는 Custom Model Serving

jeongah.arie

More Decks by jeongah.arie

Other Decks in Technology

Featured

Transcript

GDG Pangyo Android ML Android11과 함께 하는 Custom Model

GDG Pangyo MoT Doyoung Gwak DongSeok Yang Taekmin Kim Jaewook

GDG Pangyo Turtle 경험을 나눠요! 7개의 모바일 머신러닝 앱을 생산(?)

GDG Pangyo CONTENTS On-device ML Production with Android Modeling Conversion

GDG Pangyo 1. Custom Model Building Part 1 Modeling Modeling

GDG Pangyo 1. Custom Model Building Part 1 Modeling Performance

GDG Pangyo 1. Custom Model Building Part 1 Modeling Tensorflow

GDG Pangyo 1. Custom Model Building Part 1 Modeling

GDG Pangyo 1. Custom Model Building Part 1 Modeling Modeling

GDG Pangyo 1. Custom Model Building Part 1 Modeling 경량

GDG Pangyo 1. Custom Model Building Part 1 Modeling Compact

GDG Pangyo 2. AutoML Vision Edge Part 1 Modeling https://firebase.google.com/docs/ml/automl-image-labeling

GDG Pangyo 2. AutoML Vision Edge Part 1 Modeling https://firebase.google.com/docs/ml/automl-image-labeling

GDG Pangyo 3. Tensorflow Lite Model Maker Part 1 Modeling

GDG Pangyo 4. Tensorflow Hub Part 1 Modeling Tensorflow 기반

GDG Pangyo 1. Tensorflow Lite Converter Part 2 Conversion import

GDG Pangyo 1. Tensorflow Lite Converter Tensorflow 2.x implementation with

GDG Pangyo 2. Post-training Quantization ✓ Post-training Quantization - Fully

GDG Pangyo 3. Performance Benchmark For specific target device For

GDG Pangyo 1. Inference (python, Java/Kotlin, C++) Part 3 Inference

GDG Pangyo 2. Tensorflow Lite Support with Android Studio Part

GDG Pangyo 2. Tensorflow Lite Support with Android Studio (안드로이드

GDG Pangyo 3. Hardware Acceleration (CPU, GPU, NNAPI) For general

GDG Pangyo Q. GPU, NNAPI 를 사용해서 하드웨어를 가속하면 무조건

GDG Pangyo Q. GPU, NNAPI 를 사용해서 하드웨어를 가속하면 무조건

GDG Pangyo 3. Hardware Acceleration (CPU, GPU, NNAPI) ✓ 특정

GDG Pangyo 4. Enhance your Model with Android 11 Android

GDG Pangyo Thank you ! Jeongah Shin (@jeongahri) jeongah.arie@gmail.com https://github.com/motlabs/awesome-ml-demos-with-android