Tensorflow 2.0 for Edge TPU Programming

Slide 1

Slide 1 text

Tensorﬂow 2.0 for Edge TPU Programming @davilagrau

Slide 2

Slide 2 text

A few words about me... Andres-Leonardo Martinez-Ortiz a.k.a almo is a member of the Google Engineering team, leading Google Developer Relations worldwide. Based in Zurich, he drives the success of Google's developer products and the Open Web by creating a thriving ecosystem of developers. Nurturing developers experts and partners in large companies, startups, universities and enterprises, almo fosters open standards and Google technologies. almo is also a member of IEEE, ACM, Linux Foundation and Computer Society. @davilagrau almo.dev almo

Slide 3

Slide 3 text

Introduction

Slide 4

Slide 4 text

TensorFlow What is TensorFlow? ● An end-to-end open source machine learning platform ● For research and production ● Distributed training and serving predictions ● Apache 2.0 license Current Stable Version 2.x

Slide 5

Slide 5 text

TensorFlow Hello World import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) model.compile(optimizer='adam',loss=loss_fn,metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test, verbose=2)

Slide 6

Slide 6 text

Why TensorFlow Easy model building Build and train ML models easily using intuitive high-level APIs like Keras with eager execution, which makes for immediate model iteration and easy debugging. Robust ML production anywhere Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no matter what language you use. Powerful for research A simple and ﬂexible architecture to take new ideas from concept to code, to state-of-the-art models, and to publication faster.

Slide 7

Slide 7 text

TensorFlow.js Library for ML in JavaScript Run existing models Use off-the-shelf JavaScript models or convert Python TensorFlow models to run in the browser or under Node.js. Retrain existing models Retrain pre-existing ML models using your own data. Develop ML with JavaScript Build and train models directly in JavaScript using ﬂexible and intuitive APIs.

Slide 8

Slide 8 text

TensorFlow Lite ML models on mobile and IoT devices Pick a model Pick a new model or retrain an existing one. Optimize Quantize by converting 32-bit floats to more efficient 8-bit integers or run on GPU. Convert Convert a TensorFlow model into a compressed flat buffer with the TensorFlow Lite Converter. Deploy Take the compressed .tflite file and load it into a mobile or embedded device.

Slide 9

Slide 9 text

Coral Architecture - Edge TPU AI at the edge End-to-end AI infrastructure High performance in a small physical and power footprint. Co-design of AI hardware, software and algorithms A broad range of applications An open, end-to-end infrastructure for deploying AI solutions

Slide 10

Slide 10 text

TensorFlow Extended (TFX) Deploying production ML pipelines TensorFlow Data Validation TensorFlow Data Validation (TFDV) helps developers understand, validate, and monitor their ML data at scale. TensorFlow Serving Machine Learning serving systems, supporting model versioning and multiple models, experimenting via A/B testing, while ensuring high throughput with low latency. TensorFlow Transform Preprocessing data into a suitable format, converting between formats, tokenizing and stemming text and forming vocabularies, etc. TensorFlow Model Analysis TensorFlow Model Analysis (TFMA) enables developers to compute and visualize evaluation metrics for their models.

Slide 11

Slide 11 text

Become an expert in machine learning Coding skills: Building ML models involves much more than just knowing ML concepts—it requires coding in order to do the data management, parameter tuning, and parsing results needed to test and optimize your model. Math and stats: ML is a math heavy discipline, so if you plan to modify ML models or build new ones from scratch, familiarity with the underlying math concepts is crucial to the process. ML theory: Knowing the basics of ML theory will give you a foundation to build on, and help you troubleshoot when something goes wrong. Build your own projects: Getting hands on experience with ML is the best way to put your knowledge to the test, so don’t be afraid to dive in early with a simple colab or tutorial to get some practice. More: https://www.tensorﬂow.org/resources/learn-ml

Slide 12

Slide 12 text

TensorFlow 2.0

Slide 13

Slide 13 text

Tensorﬂow: all developers ecosystem support Newbies, rookies an other earl -entr specimens Padawans, wh ar abl t buil their ow lightsabers (Standar us cases) Discipline an perience , Jed Knights Jed Master, amon th m accomplishe an recogn e polymaths i th Star Wars gal . Sequential API + built-in layers Functional API + built-in layers Functional API + Custom: - Layers - Metrics - Losses Subclassing: everything from scratch What Who

Slide 14

Slide 14 text

Eager execution TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: ● An intuitive interface—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data. ● Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting. ● Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

Slide 15

Slide 15 text

The Functional API at a glance ● An API to conﬁgure the connectivity of DAGs of layers ● Targeted at users more than developers ● Declarative conﬁguration level: no logic ○ All logic is contained inside of layers ● All “debugging” is done statically at construction time; any model you can instantiate will you: ○ You don’t write any Python, so you don’t write bugs ○ “Debugging” == topology debugging (can be done visually) ● Modes are static data structures ○ Inspectable: you can retrieve intermediate activations and use them in a new model ○ Plottable: you can directly generate the graphs via “plot_model” ○ Safely serializable

Slide 16

Slide 16 text

TensorFlow Core v2.1.0 (Stable)

Slide 17

Slide 17 text

TensorFlow Keras https://www.tensorflow.org/guide/keras ● TensorFlow's implementation of the Keras API specification ● Support for TensorFlow-specific ○ Eager execution ○ Data Pipelines ○ Estimator ● Keras functional API ● Build complex model topologies ○ Multi-input models, ○ Multi-output models, ○ Models with shared layers (the same layer called several times), ○ Models with non-sequential data flows (e.g. residual connections). ● Training Callbacks import tensorflow as tf from tensorflow import keras

Slide 18

Slide 18 text

TensorFlow Datasets https://www.tensorflow.org/datasets ● Easy-to-use ● High-performance input pipelines ● Compatible with both TensorFlow Eager mode and Graph mode ● Dictionaries mapping feature ● Caching and prefetch ● Integrated with Google Cloud Platform import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train', shuffle_files=True) https://github.com/tensorflow/datasets https://www.tensorflow.org/datasets/catalog/overview Catalogs

Slide 19

Slide 19 text

TensorFlow Hub https://tfhub.dev Discover our hub Find out what you can do in TensorFlow Hub and how our platform works. Meet our community Get to know other users, ﬁnd new collaborators, or post questions and get answers. Intro to Machine Learning If you’re new to machine learning, our introductory resources explain all the ins and outs. !pip install "tensorflow_hub>=0.6.0" import tensorflow_hub as hub embed = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2") embeddings = embed(["A long sentence.", "single-word","http://example.com"]) print(embeddings.shape) #(3,128)

Slide 20

Slide 20 text

Model Garden for TensorFlow https://github.com/tensorflow/models/tree/master/official ● State-of-the-art language understanding models: More members in Transformer family ● Classification models: EfficientNet, MnasNet and variants. ● Trainable on: ○ Distributed training on multiple GPUs ○ Distributed training on multiple GPU hosts ○ Distributed training on Cloud TPUs !pip install tf-models-nightly !export PYTHONPATH=$PYTHONPATH:/path/to/models import os os.environ['PYTHONPATH'] += ":/path/to/models"

Slide 21

Slide 21 text

Distributed training with TensorFlow https://www.tensorﬂow.org/guide/distributed_training Distributed Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes, eagerly, or in a graph. API can also be used for distributing evaluation and prediction on different platforms. Integrated Distribute Strategy into Keras Keras API Custom training loop Estimator API Mirrored Supported Supported Limited TPU Supported Supported No Support Multi Worker Mirrored Supported Supported Limited Central Storage Experimental Experimental Limited Parameter Server Post TF 2.4 Experimental Limited

Slide 22

Slide 22 text

Migration https://www.tensorflow.org/guide/migrate https://www.tensorflow.org/guide/upgrade#recommended_upgrade_process ● It is still possible to run 1.X code, unmodified (except for contrib), in TensorFlow 2.0 ● Or make the code 2.0-native following the migration guide import tensorflow.compat.v1 as tf tf.disable_v2_behavior() $ tf_upgrade_v2 \ --intree my_project/ \ --outtree my_project_v2/ \ --reportfile report.txt Automatically upgrade code to TensorFlow 2 ● Follow the recommended upgrade process

Slide 23

Slide 23 text

TensorFlow Trusted Partner Pilot Program Uses Cases ● Learn how TensorFlow solves real, everyday machine learning problems ● An entire ecosystem to help you solve challenging, real-world problems with machine learning ● Connect with a TensorFlow Trusted Partner https://www.tensorﬂow.org/about/case-studies

Slide 24

Slide 24 text

Coral Edge TPU

Slide 25

Slide 25 text

Coral Edge TPU intro Inference accelerator: ● Optimized for vision applications and convolutional neural networks ● Runs concurrent state-of -the-art models on high-resolution video, at real-time (MobileNet V2 at 400 FPS) ● Full support for quantized TensorFlow Lite models An individual Edge TPU can perform 4 trillion (ﬁxed-point) operations per second (4 TOPS), using only 2 watts of power—in other words, you get 2 TOPS per watt.

Slide 26

Slide 26 text

Coral Portfolio (1) Dev Board A single-board computer with a removable system-on-module (SOM) featuring the Edge TPU. Available Now Price $129.99 USB Accelerator A USB accessory featuring the Edge TPU that brings ML inferencing to existing systems. Available Now Price $59.99 PCI-E Accelerator Integrate the Edge TPU into legacy and new systems using a Mini PCIe interface. Available Now Price $24.99 M.2 Accelerator A+E key Integrate the Edge TPU into legacy and new systems using an M.2 A+E key interface. Available Now Price $24.99

Slide 27

Slide 27 text

Coral Portfolio (2) Dev Board Mini A single-board computer with a removable system-on-module (SOM) featuring the Edge TPU. Available Coming soon Price $99.99 M.2 Accelerator B+M key Integrate the Edge TPU into legacy and new systems using an M.2 B+M key interface. Available Now Price $24.99 Accelerator module A solderable multi-chip module including the Edge TPU Available Coming soon Price $19.99 System on Module (SoM) A fully-integrated system for accelerated ML applications in a 40mm x 48mm pluggable module. Available Now Price $114.99

Slide 28

Slide 28 text

Features Dev board 28 ● Edge TPU System-on-Module (SoM) ○ NXP i.MX 8M SoC (Quad-core Arm Cortex-A53, plus Cortex-M4F) ○ Google Edge TPU ML accelerator coprocessor ○ Cryptographic coprocessor ○ Wi-Fi 2x2 MIMO (802.11b/g/n/ac 2.4/5 GHz) ○ Bluetooth 4.2 ○ 8 GB eMMC ○ 1 GB LPDDR4 ● USB connections ○ USB Type-C power port (5 V DC) ○ USB 3.0 Type-C OTG port ○ USB 3.0 Type-A host port ○ USB 2.0 Micro-B serial console port ● Audio connections ○ 3.5 mm audio jack (CTIA compliant) ○ Digital PDM microphone (x2) ○ 2.54 mm 4-pin terminal for stereo speakers ● Video connections ○ HDMI 2.0a (full size) ○ 39-pin FFC connector for MIPI DSI display (4-lane) ○ 24-pin FFC connector for MIPI CSI-2 camera (4-lane) ● MicroSD card slot ● Gigabit Ethernet port ● 40-pin GPIO expansion header ● Supports Mendel Linux (derivative of Debian)

Slide 29

Slide 29 text

Coral SoM block diagram 29 ● CPU: Quad symmetric Cortex-A53 processors, supports 64-bit Armv8-A architecture. Plus Arm Cortex-M4 core ● GPU: 4 shaders, 267 million triangles/sec, 1.6 Gigapixel/sec, 32 GFLOPs 32-bit or 64 GFLOPs 16-bit. ● Video: 4Kp60 HEVC/H.265 main, 4Kp60 VP9 and 4Kp30 AVC/H.264. 1080p60 MPEG-2, MPEG-4p2, VC-1, H.263, etc. ● Memory: 1GB LPDDR4 SDRAM, 1600MHz maximum DDR clock. 8GB NAND eMMC ﬂash memory, 8-bits MMC mode ● Edge TPU interfaces with SoM via PCIe and I2C/GPIO to interface the iMX8MQ SOC ● Microchip ATECC608A cryptographic coprocessor, with asymmetric (public/private) key cryptographic signature

Slide 30

Slide 30 text

Software toolchain Mendel OS A fork of the Debian OS to power our Intelligence Boards, and a C++ & Python SDK APIs to low level connections Edge TPU Compiler Converts TF graphs to run on targeted chipsets Companion Software Abstracts away traditional board management/coding in a high-level program Input Model (TFLite) Compiler C++ & Python SDK User Apps TFLite C++ API Mendel OS (Debian Linux)

Slide 31

Slide 31 text

Mendel Development Tool (MDT) Similar to the Android-standard ADB tool "Porcelain" wrapper based around industry standard protocols such as SSH, mDNS, and HTTP Handles device discovery, shell, and key management Cross-platform (Mac, Windows, Linux) Open source, Apache licensed Available as a Debian package via Google-hosted APT repositories Also available via the Python standard pip installation tool $ mdt devices $ mdt shell $ mdt push $ mdt pull $ mdt install

Slide 32

Slide 32 text

Edge TPU performance Embedded CPU: Quad-core Cortex-A53 @ 1.5GHz; Dev Board: Quad-core Cortex-A53 @ 1.5GHz + Edge TPU Source: https://coral.ai/technology

Slide 33

Slide 33 text

Edge TPU Compiler 33 ● Compiles a TensorFlow Lite model (.tflite file) into a file that's compatible with the Edge TPU. ● Runs on any modern Debian-based Linux system, does not work on the Coral device or MAC OSX.

Slide 34

Slide 34 text

Edge TPU Compiler 34 To run a model on the Coral Edge TPU one needs two components: - A model quantized for UINT8 (restricted to operations that support UINT8) - The compiled version of the quantized model edgetpu_compiler [options] model... Source: https://coral.ai/docs/edgetpu/models-intro/

Slide 35

Slide 35 text

DEMO Inception V2 with/without compilation Inception V2 model with quantization and compiled (optimized for TPU) versus Inception V2 model with quantization but not compiled mendel@fun-calf:~/inception_v2$ ./command.sh Using Inception V2 model with quantization and compiled (optimized for TPU; downloaded from https://coral.ai/models) Detects 1000 type of objects; dataset ImageNet; Input size: 224x224 --------------------------- macaw Score : 0.9921875 Inference time: 36.04 ms (27.75 fps) ***************************** Using Inception V2 model with quantization but not compiled --------------------------- macaw Score : 0.9921875 Inference time: 612.56 ms (1.63 fps)

Slide 36

Slide 36 text

Coral Edge TPU pre-trained models TF Lite models already pre-compiled to run on the Edge TPU: image classiﬁcation, object detection, semantic segmentation, on-device retraining Source: https:/ /coral.ai/models/

Slide 37

Slide 37 text

Co-compiling multiple models Co-compilation to run multiple models on the same Edge TPU: caches their parameter data together, eliminating the need to clear the cache each time you run a diﬀerent model. Be careful if using co-compilation in combination with multiple Edge TPUs.

Slide 38

Slide 38 text

Python API ● ClassiﬁcationEngine: Performs image classiﬁcation. Create an instance by specifying a model, and then pass an image (such as a JPEG) to ClassifyWithImage() and it returns a list of labels and scores. ● DetectionEngine: Performs object detection. Create an instance by specifying a model, and then pass an image (such as a JPEG) to DetectWithImage() and it returns a list of DetectionCandidate objects, each of which contains a label, a score, and the coordinates of the object. ● ImprintingEngine: This implements a transfer-learning technique called imprinting that does not require backward propagation, allowing you to perform model retraining that's accelerated on the Edge TPU

Slide 39

Slide 39 text

DEMO Detection Engine (Python API) # Initialize engine. engine = DetectionEngine(args.model) labels = ReadLabelFile(args.label) if args.label else None # Open image. img = Image.open(args.input) draw = ImageDraw.Draw(img) # Run inference. ans = engine.DetectWithImage(img, threshold=0.05, keep_aspect_ratio=True, relative_coord=False, top_k=10) for obj in ans: box = obj.bounding_box.flatten().tolist() # Draw a rectangle. draw.rectangle(box, outline='red')

Slide 40

Slide 40 text

References Coral Edge TPU [1] https:/ /coral.ai/docs [2] Source code for the Edge TPU: https:/ /github.com/google-coral/edgetpu [3] Blog: https:/ /blog.tensorflow.org/2019/03/build-ai-that-works-offline-with-coral.html [4] Codelab: https:/ /codelabs.developers.google.com/codelabs/edgetpu-classifier/index.html [5] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., … Yoon, D. H. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Retrieved from https:/ /arxiv.org/abs/1704.04760

Slide 41

Slide 41 text

Tensorﬂow 2.0 for Edge TPU Programming @davilagrau