Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensorflow 2.0 for Edge TPU Programming

almo
March 11, 2021

Tensorflow 2.0 for Edge TPU Programming

Introduction to Tensorflow 2.0 and using it with TPUs to develop Edge Programming.

Tensorflow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Coral helps you bring on-device AI application ideas from prototype to production. It offers a platform of hardware components, software tools, and pre-compiled models for building devices with local AI.

almo

March 11, 2021
Tweet

More Decks by almo

Other Decks in Technology

Transcript

  1. A few words about me... Andres-Leonardo Martinez-Ortiz a.k.a almo is

    a member of the Google Engineering team, leading Google Developer Relations worldwide. Based in Zurich, he drives the success of Google's developer products and the Open Web by creating a thriving ecosystem of developers. Nurturing developers experts and partners in large companies, startups, universities and enterprises, almo fosters open standards and Google technologies. almo is also a member of IEEE, ACM, Linux Foundation and Computer Society. @davilagrau almo.dev almo
  2. TensorFlow What is TensorFlow? • An end-to-end open source machine

    learning platform • For research and production • Distributed training and serving predictions • Apache 2.0 license Current Stable Version 2.x
  3. TensorFlow Hello World import tensorflow as tf mnist = tf.keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) model.compile(optimizer='adam',loss=loss_fn,metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test, verbose=2)
  4. Why TensorFlow Easy model building Build and train ML models

    easily using intuitive high-level APIs like Keras with eager execution, which makes for immediate model iteration and easy debugging. Robust ML production anywhere Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no matter what language you use. Powerful for research A simple and flexible architecture to take new ideas from concept to code, to state-of-the-art models, and to publication faster.
  5. TensorFlow.js Library for ML in JavaScript Run existing models Use

    off-the-shelf JavaScript models or convert Python TensorFlow models to run in the browser or under Node.js. Retrain existing models Retrain pre-existing ML models using your own data. Develop ML with JavaScript Build and train models directly in JavaScript using flexible and intuitive APIs.
  6. TensorFlow Lite ML models on mobile and IoT devices Pick

    a model Pick a new model or retrain an existing one. Optimize Quantize by converting 32-bit floats to more efficient 8-bit integers or run on GPU. Convert Convert a TensorFlow model into a compressed flat buffer with the TensorFlow Lite Converter. Deploy Take the compressed .tflite file and load it into a mobile or embedded device.
  7. Coral Architecture - Edge TPU AI at the edge End-to-end

    AI infrastructure High performance in a small physical and power footprint. Co-design of AI hardware, software and algorithms A broad range of applications An open, end-to-end infrastructure for deploying AI solutions
  8. TensorFlow Extended (TFX) Deploying production ML pipelines TensorFlow Data Validation

    TensorFlow Data Validation (TFDV) helps developers understand, validate, and monitor their ML data at scale. TensorFlow Serving Machine Learning serving systems, supporting model versioning and multiple models, experimenting via A/B testing, while ensuring high throughput with low latency. TensorFlow Transform Preprocessing data into a suitable format, converting between formats, tokenizing and stemming text and forming vocabularies, etc. TensorFlow Model Analysis TensorFlow Model Analysis (TFMA) enables developers to compute and visualize evaluation metrics for their models.
  9. Become an expert in machine learning Coding skills: Building ML

    models involves much more than just knowing ML concepts—it requires coding in order to do the data management, parameter tuning, and parsing results needed to test and optimize your model. Math and stats: ML is a math heavy discipline, so if you plan to modify ML models or build new ones from scratch, familiarity with the underlying math concepts is crucial to the process. ML theory: Knowing the basics of ML theory will give you a foundation to build on, and help you troubleshoot when something goes wrong. Build your own projects: Getting hands on experience with ML is the best way to put your knowledge to the test, so don’t be afraid to dive in early with a simple colab or tutorial to get some practice. More: https://www.tensorflow.org/resources/learn-ml
  10. Tensorflow: all developers ecosystem support Newbies, rookies an other earl

    -entr specimens Padawans, wh ar abl t buil their ow lightsabers (Standar us cases) Discipline an perience , Jed Knights Jed Master, amon th m accomplishe an recogn e polymaths i th Star Wars gal . Sequential API + built-in layers Functional API + built-in layers Functional API + Custom: - Layers - Metrics - Losses Subclassing: everything from scratch What Who
  11. Eager execution TensorFlow's eager execution is an imperative programming environment

    that evaluates operations immediately, without building graphs: • An intuitive interface—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data. • Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting. • Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.
  12. The Functional API at a glance • An API to

    configure the connectivity of DAGs of layers • Targeted at users more than developers • Declarative configuration level: no logic ◦ All logic is contained inside of layers • All “debugging” is done statically at construction time; any model you can instantiate will you: ◦ You don’t write any Python, so you don’t write bugs ◦ “Debugging” == topology debugging (can be done visually) • Modes are static data structures ◦ Inspectable: you can retrieve intermediate activations and use them in a new model ◦ Plottable: you can directly generate the graphs via “plot_model” ◦ Safely serializable
  13. TensorFlow Keras https://www.tensorflow.org/guide/keras • TensorFlow's implementation of the Keras API

    specification • Support for TensorFlow-specific ◦ Eager execution ◦ Data Pipelines ◦ Estimator • Keras functional API • Build complex model topologies ◦ Multi-input models, ◦ Multi-output models, ◦ Models with shared layers (the same layer called several times), ◦ Models with non-sequential data flows (e.g. residual connections). • Training Callbacks import tensorflow as tf from tensorflow import keras
  14. TensorFlow Datasets https://www.tensorflow.org/datasets • Easy-to-use • High-performance input pipelines •

    Compatible with both TensorFlow Eager mode and Graph mode • Dictionaries mapping feature • Caching and prefetch • Integrated with Google Cloud Platform import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train', shuffle_files=True) https://github.com/tensorflow/datasets https://www.tensorflow.org/datasets/catalog/overview Catalogs
  15. TensorFlow Hub https://tfhub.dev Discover our hub Find out what you

    can do in TensorFlow Hub and how our platform works. Meet our community Get to know other users, find new collaborators, or post questions and get answers. Intro to Machine Learning If you’re new to machine learning, our introductory resources explain all the ins and outs. !pip install "tensorflow_hub>=0.6.0" import tensorflow_hub as hub embed = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2") embeddings = embed(["A long sentence.", "single-word","http://example.com"]) print(embeddings.shape) #(3,128)
  16. Model Garden for TensorFlow https://github.com/tensorflow/models/tree/master/official • State-of-the-art language understanding models:

    More members in Transformer family • Classification models: EfficientNet, MnasNet and variants. • Trainable on: ◦ Distributed training on multiple GPUs ◦ Distributed training on multiple GPU hosts ◦ Distributed training on Cloud TPUs !pip install tf-models-nightly !export PYTHONPATH=$PYTHONPATH:/path/to/models import os os.environ['PYTHONPATH'] += ":/path/to/models"
  17. Distributed training with TensorFlow https://www.tensorflow.org/guide/distributed_training Distributed Strategy is a TensorFlow

    API to distribute training across multiple GPUs, multiple machines or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes, eagerly, or in a graph. API can also be used for distributing evaluation and prediction on different platforms. Integrated Distribute Strategy into Keras Keras API Custom training loop Estimator API Mirrored Supported Supported Limited TPU Supported Supported No Support Multi Worker Mirrored Supported Supported Limited Central Storage Experimental Experimental Limited Parameter Server Post TF 2.4 Experimental Limited
  18. Migration https://www.tensorflow.org/guide/migrate https://www.tensorflow.org/guide/upgrade#recommended_upgrade_process • It is still possible to run

    1.X code, unmodified (except for contrib), in TensorFlow 2.0 • Or make the code 2.0-native following the migration guide import tensorflow.compat.v1 as tf tf.disable_v2_behavior() $ tf_upgrade_v2 \ --intree my_project/ \ --outtree my_project_v2/ \ --reportfile report.txt Automatically upgrade code to TensorFlow 2 • Follow the recommended upgrade process
  19. TensorFlow Trusted Partner Pilot Program Uses Cases • Learn how

    TensorFlow solves real, everyday machine learning problems • An entire ecosystem to help you solve challenging, real-world problems with machine learning • Connect with a TensorFlow Trusted Partner https://www.tensorflow.org/about/case-studies
  20. Coral Edge TPU intro Inference accelerator: • Optimized for vision

    applications and convolutional neural networks • Runs concurrent state-of -the-art models on high-resolution video, at real-time (MobileNet V2 at 400 FPS) • Full support for quantized TensorFlow Lite models An individual Edge TPU can perform 4 trillion (fixed-point) operations per second (4 TOPS), using only 2 watts of power—in other words, you get 2 TOPS per watt.
  21. Coral Portfolio (1) Dev Board A single-board computer with a

    removable system-on-module (SOM) featuring the Edge TPU. Available Now Price $129.99 USB Accelerator A USB accessory featuring the Edge TPU that brings ML inferencing to existing systems. Available Now Price $59.99 PCI-E Accelerator Integrate the Edge TPU into legacy and new systems using a Mini PCIe interface. Available Now Price $24.99 M.2 Accelerator A+E key Integrate the Edge TPU into legacy and new systems using an M.2 A+E key interface. Available Now Price $24.99
  22. Coral Portfolio (2) Dev Board Mini A single-board computer with

    a removable system-on-module (SOM) featuring the Edge TPU. Available Coming soon Price $99.99 M.2 Accelerator B+M key Integrate the Edge TPU into legacy and new systems using an M.2 B+M key interface. Available Now Price $24.99 Accelerator module A solderable multi-chip module including the Edge TPU Available Coming soon Price $19.99 System on Module (SoM) A fully-integrated system for accelerated ML applications in a 40mm x 48mm pluggable module. Available Now Price $114.99
  23. Features Dev board 28 • Edge TPU System-on-Module (SoM) ◦

    NXP i.MX 8M SoC (Quad-core Arm Cortex-A53, plus Cortex-M4F) ◦ Google Edge TPU ML accelerator coprocessor ◦ Cryptographic coprocessor ◦ Wi-Fi 2x2 MIMO (802.11b/g/n/ac 2.4/5 GHz) ◦ Bluetooth 4.2 ◦ 8 GB eMMC ◦ 1 GB LPDDR4 • USB connections ◦ USB Type-C power port (5 V DC) ◦ USB 3.0 Type-C OTG port ◦ USB 3.0 Type-A host port ◦ USB 2.0 Micro-B serial console port • Audio connections ◦ 3.5 mm audio jack (CTIA compliant) ◦ Digital PDM microphone (x2) ◦ 2.54 mm 4-pin terminal for stereo speakers • Video connections ◦ HDMI 2.0a (full size) ◦ 39-pin FFC connector for MIPI DSI display (4-lane) ◦ 24-pin FFC connector for MIPI CSI-2 camera (4-lane) • MicroSD card slot • Gigabit Ethernet port • 40-pin GPIO expansion header • Supports Mendel Linux (derivative of Debian)
  24. Coral SoM block diagram 29 • CPU: Quad symmetric Cortex-A53

    processors, supports 64-bit Armv8-A architecture. Plus Arm Cortex-M4 core • GPU: 4 shaders, 267 million triangles/sec, 1.6 Gigapixel/sec, 32 GFLOPs 32-bit or 64 GFLOPs 16-bit. • Video: 4Kp60 HEVC/H.265 main, 4Kp60 VP9 and 4Kp30 AVC/H.264. 1080p60 MPEG-2, MPEG-4p2, VC-1, H.263, etc. • Memory: 1GB LPDDR4 SDRAM, 1600MHz maximum DDR clock. 8GB NAND eMMC flash memory, 8-bits MMC mode • Edge TPU interfaces with SoM via PCIe and I2C/GPIO to interface the iMX8MQ SOC • Microchip ATECC608A cryptographic coprocessor, with asymmetric (public/private) key cryptographic signature
  25. Software toolchain Mendel OS A fork of the Debian OS

    to power our Intelligence Boards, and a C++ & Python SDK APIs to low level connections Edge TPU Compiler Converts TF graphs to run on targeted chipsets Companion Software Abstracts away traditional board management/coding in a high-level program Input Model (TFLite) Compiler C++ & Python SDK User Apps TFLite C++ API Mendel OS (Debian Linux)
  26. Mendel Development Tool (MDT) Similar to the Android-standard ADB tool

    "Porcelain" wrapper based around industry standard protocols such as SSH, mDNS, and HTTP Handles device discovery, shell, and key management Cross-platform (Mac, Windows, Linux) Open source, Apache licensed Available as a Debian package via Google-hosted APT repositories Also available via the Python standard pip installation tool $ mdt devices $ mdt shell $ mdt push $ mdt pull $ mdt install
  27. Edge TPU performance Embedded CPU: Quad-core Cortex-A53 @ 1.5GHz; Dev

    Board: Quad-core Cortex-A53 @ 1.5GHz + Edge TPU Source: https://coral.ai/technology
  28. Edge TPU Compiler 33 • Compiles a TensorFlow Lite model

    (.tflite file) into a file that's compatible with the Edge TPU. • Runs on any modern Debian-based Linux system, does not work on the Coral device or MAC OSX.
  29. Edge TPU Compiler 34 To run a model on the

    Coral Edge TPU one needs two components: - A model quantized for UINT8 (restricted to operations that support UINT8) - The compiled version of the quantized model edgetpu_compiler [options] model... Source: https://coral.ai/docs/edgetpu/models-intro/
  30. DEMO Inception V2 with/without compilation Inception V2 model with quantization

    and compiled (optimized for TPU) versus Inception V2 model with quantization but not compiled mendel@fun-calf:~/inception_v2$ ./command.sh Using Inception V2 model with quantization and compiled (optimized for TPU; downloaded from https://coral.ai/models) Detects 1000 type of objects; dataset ImageNet; Input size: 224x224 --------------------------- macaw Score : 0.9921875 Inference time: 36.04 ms (27.75 fps) ***************************** Using Inception V2 model with quantization but not compiled --------------------------- macaw Score : 0.9921875 Inference time: 612.56 ms (1.63 fps)
  31. Coral Edge TPU pre-trained models TF Lite models already pre-compiled

    to run on the Edge TPU: image classification, object detection, semantic segmentation, on-device retraining Source: https:/ /coral.ai/models/
  32. Co-compiling multiple models Co-compilation to run multiple models on the

    same Edge TPU: caches their parameter data together, eliminating the need to clear the cache each time you run a different model. Be careful if using co-compilation in combination with multiple Edge TPUs.
  33. Python API • ClassificationEngine: Performs image classification. Create an instance

    by specifying a model, and then pass an image (such as a JPEG) to ClassifyWithImage() and it returns a list of labels and scores. • DetectionEngine: Performs object detection. Create an instance by specifying a model, and then pass an image (such as a JPEG) to DetectWithImage() and it returns a list of DetectionCandidate objects, each of which contains a label, a score, and the coordinates of the object. • ImprintingEngine: This implements a transfer-learning technique called imprinting that does not require backward propagation, allowing you to perform model retraining that's accelerated on the Edge TPU
  34. DEMO Detection Engine (Python API) # Initialize engine. engine =

    DetectionEngine(args.model) labels = ReadLabelFile(args.label) if args.label else None # Open image. img = Image.open(args.input) draw = ImageDraw.Draw(img) # Run inference. ans = engine.DetectWithImage(img, threshold=0.05, keep_aspect_ratio=True, relative_coord=False, top_k=10) for obj in ans: box = obj.bounding_box.flatten().tolist() # Draw a rectangle. draw.rectangle(box, outline='red')
  35. References Coral Edge TPU [1] https:/ /coral.ai/docs [2] Source code

    for the Edge TPU: https:/ /github.com/google-coral/edgetpu [3] Blog: https:/ /blog.tensorflow.org/2019/03/build-ai-that-works-offline-with-coral.html [4] Codelab: https:/ /codelabs.developers.google.com/codelabs/edgetpu-classifier/index.html [5] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., … Yoon, D. H. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Retrieved from https:/ /arxiv.org/abs/1704.04760