Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's New in TensorFlow and Keras (By: Mamoona...

GDG Lahore
September 28, 2023

What's New in TensorFlow and Keras (By: Mamoona Riaz) - Keras Community Day 2023

Talk by Mamoona Riaz (https://www.linkedin.com/in/mamoona-riaz-a961b41b2/) at Keras Community Day 2023 by GDG Lahore.

GDG Lahore

September 28, 2023
Tweet

More Decks by GDG Lahore

Other Decks in Programming

Transcript

  1. Section 01 Create state-of-the-art models, in just a few lines

    of code. KerasCV & KerasNLP Unlock the power of data and model parallelism together, so you can scale up with confidence. DTensor Cross-framework compatibility that’s simple and easy. JAX2TF Flexible, fine-grained control over model size like never before for ML development that’s cheaper and faster. TF Quantization API (preview) Building for a changing landscape.
  2. 540B PaLM-540 Parameters (2022) 340M BERT-Large Parameters (2018) 355M petaFLOPs

    Training compute for Google LaMDA Building for a changing landscape.
  3. 540B PaLM-540 Parameters (2022) 340M BERT-Large Parameters (2018) 355M petaFLOPs

    Training compute for Google LaMDA 5B+ Devices running ML in 100k+ apps Building for a changing landscape.
  4. 355M petaFLOPs Training compute for Google LaMDA 540B PaLM-540 Parameters

    (2022) 340M BERT-Large Parameters (2018) 5B+ Devices running ML in 100k+ apps 16M+ TensorFlow monthly downloads Building for a changing landscape.
  5. The cutting edge of machine learning, right at your fingertips.

    Applied ML with KerasCV & KerasNLP Section 02
  6. Image Classification Object Detection Data Augmentation Text Classification Text Generation

    Image Generation KerasCV KerasNLP Libraries for state of the art computer vision and natural language processing. From idea to implementation in just a few lines of code! Section 02 What can you do with KerasCV and KerasNLP?
  7. Section 02 BERT, GPT-2, Stable Diffusion, ResNet, RetinaNet, etc. SOTA

    models, written in minutes TFLite, DTensor, XLA, TPUs, and beyond Integrated with the TF Ecosystem Readable and modular design with great documentation Easy to get started Why KerasCV and KerasNLP?
  8. model = StableDiffusion( img_width=512, img_height=512, ) images = model.text_to_image( "photograph

    of an astronaut " "riding a horse", batch_size=3, ) model = BertClassifier.from_preset( "bert_base_en_uncased", num_classes=2, ) model.compile(...) model.fit(movie_review_dataset) model.predict([ "What an amazing movie!", ]) Here’s a quick look! Want to learn more? Take a deep dive in our full talk on KerasCV/NLP! Classify text Text to image…and much more! Section 02
  9. • data parallelism: Traditionally, ML developers have scaled up models

    through data parallelism, which splits up your data and feeds it to horizontally-scaled model instances. But it requires that the model fits within a single hardware device. • Across the device: developers need to be able to scale their models across hardware devices. Dtensor
  10. One toolkit, for both data and model parallelism Flexible Safely

    split work across multiple machines Efficient An API that abstracts across TPU/GPU/CPU Device Agnostic Models are getting bigger and bigger. And as model size grows, so does the complexity of training and serving. That’s where DTensor can help! Section 03
  11. Device:0 Device:2 Device:1 Device:3 Batch Split up your data and

    feed it to horizontally-scaled model instances. Data Parallelism, with DTensor. Section 03
  12. Device:0 Device:2 Device:1 Device:3 Split up your data and feed

    it to horizontally-scaled model instances. Data Parallelism, with DTensor. Data shard 0 Data shard 1 Data shard 3 Data shard 2 Batch Section 03
  13. Split up your data and feed it to horizontally-scaled model

    instances. Data Parallelism, with DTensor. Device:0 Device:2 Device:1 Device:3 Data shard 0 Data shard 1 Data shard 3 Data shard 2 Model replica 0 Model replica 1 Model replica 3 Model replica 2 Batch Section 03
  14. Device:0 Device:2 Device:1 Device:3 Model Split up your data and

    feed it to horizontally-scaled model instances. Data Parallelism, with DTensor. Section 03
  15. Device:0 Device:2 Device:1 Device:3 Split up your data and feed

    it to horizontally-scaled model instances. Data Parallelism, with DTensor. Model shard 0 Model shard 1 Model shard 3 Model shard 2 Model Section 03
  16. Device:0 Device:2 Device:1 Device:3 Split up your data and feed

    it to horizontally-scaled model instances. Data Parallelism, with DTensor. Model shard 0 Model shard 1 Model shard 3 Model shard 2 Data replica 0 Data replica 1 Data replica 3 Data replica 2 Model Section 03
  17. All together now! Data and model parallelism, with DTensor. Device:0

    Device:1 Device:2 Device:3 Data Model Section 03
  18. All together now! Data and model parallelism, with DTensor. Device:0

    Device:1 Device:2 Device:3 Data Model Model R0 S0 Model R1 S0 Model R1 S1 Model R0 S1 Section 03
  19. All together now! Data and model parallelism, with DTensor. Device:0

    Device:1 Device:2 Device:3 Data Model Model R0 S0 Model R1 S0 Model R1 S1 Model R0 S1 Data R0 S0 Data R0 S1 Data R1 S1 Data R1 S0 Section 03
  20. # OPT training setup via KerasNLP, before DTensor opt_lm =

    keras_nlp.models.OPTCasualLM.from_preset("opt_6.7b_en") opt_lm.compile(...) opt_lm.fit(wiki_text_dataset)
  21. # DTensor-enabled training! mesh_dims = [("batch", 2), ("model", 4)] mesh

    = dtensor.create_distributed_mesh(mesh_dims, device_type="GPU") dtensor.initialize_accelerator_system("GPU") layout_map = keras_nlp.models.OPTCausalLM.create_layout_map(mesh) with layout_map.scope(): opt_lm = keras_nlp.models.OPTCasualLM.from_preset("opt_6.7b_en") opt_lm.compile(...) opt_lm.fit(wiki_text_dataset)
  22. Performance Already inline with NVIDIA’s Megatron for transformer training, Mesh

    TensorFlow, and JAX. Even further increases coming soon! Learn more! https://www.tensorflow.org/guide/dtensor_overview For more details on Keras integrations, check out the guides at: keras.io What’s next? Complete integration with Keras and tf.distribute. One strategy for TPU/GPU/CPU. Automatic determination of layouts. Pipelining support. DTensor tf.distribute + Unified Parallelism Section 03 Built for today, ready for tomorrow
  23. From Research to Production with JAX2TF Bring JAX Models into

    the TensorFlow Ecosystem with one line of code. Section 04
  24. 15+ modular, specialized libraries built on JAX’s core >200% PyPi

    download growth (3 months) Section 04 What’s JAX? An open-source framework for high-performance, ML research. Bringing JAX’s development power into production has been hard – until now.
  25. + Model Fusion Serving (server or on-device) Fine-tuning Section 04

    What’s JAX2TF? A simple, lightweight API to give JAX models access to the full strength of TensorFlow ecosystem. From Research to Production
  26. # Save seamlessly in the form of a TensorFlow SavedModel

    model = JAXModel() state = model.init(...) tf_model = TFModel(state, model) tf.saved_model.save(tf_model, "./") self.loss_fn = jax2tf.convert(model.loss, ...) self.predict_fn = jax2tf.convert(model.predict, ...)
  27. 31 Section 04 Flexibility and fidelity. Whether you train just

    in JAX, or use JAX2TF to fine-tune and finish training in TF, it’s just as accurate, and converges just as fast.
  28. Making training and deploying models faster, easier, and cheaper. Nimble

    Machine Learning: Quantization in TensorFlow Section 05
  29. Introducing the TF Quantization API Adjust model size, easily. Smaller

    models are faster to run and require fewer resources Reduces memory, latency, compute and battery costs Section 05
  30. For mobile, server, microcontrollers, and more. Flexible Introducing the TF

    Quantization API Adjust model size, easily. Smaller models are faster to run and require fewer resources Reduces memory, latency, compute and battery costs Section 05
  31. For mobile, server, microcontrollers, and more. Flexible Tools that just

    work, without model rewrites. Easy Introducing the TF Quantization API Adjust model size, easily. Smaller models are faster to run and require fewer resources Reduces memory, latency, compute and battery costs Section 05
  32. Fundamentally better than before! For mobile, server, microcontrollers, and more.

    Flexible Tools that just work, without model rewrites. Easy Reduced memory, latency, compute and battery costs Efficient Introducing the TF Quantization API Adjust model size, easily. Smaller models are faster to run and require fewer resources Reduces memory, latency, compute and battery costs Section 05
  33. # This is all it takes to create a quantization-aware

    model! tf.quantization.apply_quantization_on_model(model, config_map, …) # From here, you can train and save just as always. model.fit() model.save() # You can also export to TFLite, without any changes! converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert()
  34. • Post-Training Quantization (PTQ): Convert to a quantized model after

    training. This is as simple as it gets and most readily accessible, but there can be a small quality drop. • Quantization-Aware Training (QAT): Simulate quantization during just the forward pass, providing for maximal flexibility with a minimal quality tradeoff. • Quantized Training: Quantize all computations while training. This is still nascent, and needs a lot more testing, but is a powerful tool we want to make sure TensorFlow users have access to. Quantization
  35. 16.56X P7 EdgeTPU quantized serving throughput, versus baseline floating point

    MobileNetV2. Model: MobileNetV2 Device: Pixel 7 Serving throughput vs. float32 CPU baseline: CPU with XNNPack (1 thread): 2.24x Edge-TPU: 16.56x Performance and quality Section 05
  36. 16.56X P7 EdgeTPU quantized serving throughput, versus baseline floating point

    MobileNetV2. Model: MobileNetV2 Device: Pixel 7 Serving throughput vs. float32 CPU baseline: CPU with XNNPack (1 thread): 2.24x Edge-TPU: 16.56x All without noticeable detriment to accuracy. float32: 73% int8: Still 73%! Performance and quality. Section 05
  37. Program details at a glance: • Title: Campus Ambassador •

    Duration: 1 Academic Year • Eligibility Criteria: BSCS/BSSE/BSIT students in 3rd - 6th Semester only • Monthly Stipend Registration deadline: 1st October, 2023 Campus Ambassador Program 2023-2024