Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NDC 2024: WebAssembly++ - Build AI-driven appli...

NDC 2024: WebAssembly++ - Build AI-driven applications for browser, edge, and server(less)

WebAssembly (Wasm) is changing how AI models work on different platforms. Christian is exploring how Wasm lets us run AI in web browsers, utilizing tools like WebNN for neural network APIs, WebGPU for accelerated calculations, and ONNX Runtime for cross-platform model inferencing. Discover transformative libraries such as transformers.js for executing pre-trained models for Natural Language Processing, Computer Vision, Audio, or even Multimodal scenarios. You'll see how Wasm is used not just in browsers but also in edge and server computing. Technologies like WASI-NN brings neural network capabilities to WebAssembly System Interface (WASI), and WasmEdge offers a high-performance runtime for running AI code on edge devices. Last but not least, Serverless AI approaches reshape how we deploy AI models with minimal overhead and maximum efficiency. Join Christian to unlock the potential of AI models - like Whisper, Mistral, Llama2, or Llava - in your applications with the power and flexibility of the WebAssembly ecosystem. There is a world beyond OpenAI & GPT.

Christian Weyer

June 14, 2024
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. § Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures

    § Microsoft Regional Director § Microsoft MVP for Developer Technologies & Azure ASPInsider, AzureInsider § Google GDE for Web Technologies [email protected] @christianweyer https://www.thinktecture.com Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Christian Weyer Co-Founder & CTO @ Thinktecture AG 2
  2. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Our

    journey today AI-all-the- things? WebAssembly AI in browser Recap AI on edge / server(less) WASI 3
  3. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ A.I.

    5 https://archive.org/details/Artificial_Intelligence_Projects_ for_the_Commodore_64_1985_TAB_Books
  4. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI

    space 6 Data Science Artificial Intelligence Machine Learning Unsupervised, supervised, reinforcement learning Deep Learning ANN, CNN, RNN etc. NLP (Natural Language Processing) Generative AI GAN, VAE, Transformers etc. Image / Video Generation GAN, VAE Large Language Models Transformers
  5. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI

    scenarios & use cases - Ubiquitousness 7 Run AI models and algorithms anywhere (offline, privacy) Write code once, run it anywhere § Natural Language Processing: text classification, named entity recognition, question answering, summarization, translation, and text generation § Computer Vision: image classification, object detection, and segmentation § Audio: automatic speech recognition and audio classification § Multimodal: zero-shot image classification § Edge computing: deploy ML / AI models on edge devices with efficient resource utilization § IoT applications: integrate ML capabilities in IoT devices - local data processing, real-time decision-making § Server(less): implement ML inference in serverless environments for scalable and efficient processing Browser Beyond browser
  6. WebAssembly (Wasm) is a § secure § fast § portable

    stack-based execution environment (aka virtual machine), for running applications by executing low-level sandboxed bytecode Build AI-driven applications for browser, edge, and server(less) WebAssembly++ What exactly is WebAssembly? 9
  7. Server Server Build AI-driven applications for browser, edge, and server(less)

    WebAssembly++ Why do we need WebAssembly? Everything, everywhere, by everyone – securely Origins Current developments Future Web Web Web Cloud Cloud Microcontroller Extensibility Extensibility Bare metal Edge 10
  8. Hello Wasm in browser WAT, JS Build AI-driven applications for

    browser, edge, and server(less) WebAssembly++ DEMO 11
  9. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ 12

    https://blog.scottlogic.com/2023/10/18/the-state-of-webassembly-2023.html
  10. Photoshop in browser Migrating code to the web with Wasm

    Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 13
  11. § API that exposes capabilities of GPU hardware for the

    web § Designed from the ground up to efficiently map to native GPU APIs § Not related to WebGL and does not explicitly target OpenGL ES § Enables web developers to use underlying system's GPU to carry out high-performance computations and draw complex graphics that can be rendered in the browser § … or use high-performance computations for AI use cases Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebGPU: What is good for graphics, is good for AI 15
  12. Segment anything - in browser Wasm vs. WebGPU Build AI-driven

    applications for browser, edge, and server(less) WebAssembly++ DEMO 16
  13. § Web Neural Network API § Web-friendly hardware-agnostic abstraction layer

    for ML & DL § Use ML capabilities of OS & underlying hardware platforms without being tied to platform-specific capabilities § Addresses requirements of key ML JavaScript frameworks § Allows web developers familiar with ML domain to write custom code without the help of libraries Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebNN: Neural networks in browser 17
  14. Facial landmark detection - in browser Wasm vs. WebNN Build

    AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 18
  15. § Open Neural Network Exchange (ONNX) § Open-source ecosystem for

    algorithms, tools, formats § ONNX Runtime § Cross-platform machine-learning model accelerator § Flexible interface to integrate hardware-specific libraries § Can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks § ONNX Runtime Web § Builds on Wasm § Optionally uses WebGPU, WebNN § Inference in browser Build AI-driven applications for browser, edge, and server(less) WebAssembly++ ONNX: Open format & runtime for inferencing 19
  16. Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI

    in browser: Putting it all together 20 ONNX models TensorFlow models Other models JavaScript ML frameworks (ONNX Web, TensorFlow.js, …) WebGPU WebNN WebAssembly ML Compute macOS/iOS DirectML Windows NN API Android OpenVINO Linux NPU GPU CPU Web App Browser OS Hardware
  17. § Hiding the nitty gritty details of Wasm, WebGPU, WebNN,

    ONNX § “Don’t worry – be happy!” § Run HuggingFace Transformers directly in browser § Uses ONNX Runtime Web to run models § All the scenarios we want § Natural language processing § Computer vision § Audio § Multimodal Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Transformers.js: State of the art ML applications in the browser 21
  18. Sentiment analysis – in browser Transformers.js, ONNX Runtime Web Build

    AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 22
  19. Running Whipser - in browser Transformers.js, ONNX Runtime Web, WebGPU

    Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 23
  20. Running Phi-3 mini SLM - in browser Transformers.js, ONNX Runtime

    Web, WebGPU Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 24
  21. § Size § Compared to other distributables (containers | VMs),

    Wasm is super small § Speed § Wasm modules are bootstrapped in “no time” and are fast at runtime § Resource utilization § With same compute power, more applications can be executed § Security § Wasm modules are isolated, and every module can have dedicated permissions Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebAssembly outside the browser – ubiquitous compute 26
  22. § Enabling Wasm in a Cloud-native world § WASI provides

    standardized system calls to sandboxed Wasm code § Capability-based isolation § Platform-agnostic: run Wasm applications on any WASI-compliant platform (device, OS, CPU arch) § WASI offers containerization on the next level Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI (WebAssembly System Interface) https://twitter.com/solomonstre/status/1111004913222324225 27
  23. § Host infrastructure for executing Wasm applications leveraging APIs specified

    and provided by WASI § Handle system calls made by Wasm modules and delegate them to the underlying platform, e.g.: § File access § Network access § Environment variables § Special hardware § WASI runtime examples § wasmtime § wasmer § WasmEdge Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI runtimes 28
  24. Hello WASI Rust and wasmtime Build AI-driven applications for browser,

    edge, and server(less) WebAssembly++ DEMO 29
  25. § Extension of WASI for machine learning and neural networks

    § Enables loading & execution of pre-trained neural network models within Wasm environments § Supports common neural network formats & runtimes (e.g., ONNX) § Much faster than pure Wasm through hardware acceleration Build AI-driven applications for browser, edge, and server(less) WebAssembly++ wasi-nn: Taking ML to WASI 31
  26. § LLM inference engine with wasi-nn § Portable & lightweight:

    Utilizes WasmEdge as lightweight, high- performance WASI runtime § Local & edge execution: Runs LLMs locally or on edge devices for improved performance and privacy § OpenAI API compatibility: Provides OpenAI-compatible API server for open-source LLMs Build AI-driven applications for browser, edge, and server(less) WebAssembly++ LlamaEdge: WASI-based AI inference 32
  27. Running Mistral-7B LLM/SLM on local laptop with Open AI-compatible HTTP

    API LlamaEdge (Wasm, WASI), GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 33
  28. Running Phi-3 mini SLM on a Raspberry Pi5 with Open

    AI-compatible HTTP API Llamafile, llama.cpp, native optimizations, GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 35
  29. § No infrastructure management § Speed of Wasm allows scale-down

    to 0 § No impact on request performance § Notable Serverless AI Wasm platforms today § Fermyon § Fermyon Serverless AI § CloudFlare § AI Workers Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Serverless AI with Wasm & WASI 37
  30. § WebAssembly & WASI § Provide a robust framework for

    ubiquitous AI deployment § Ensure security, speed, and efficient resource usage § Wasm limitations: browser § Performance constraints for complex models § Limited access to hardware- specific optimizations § Wasm / WASI limitations: edge § Resource constraints § Abstractions harm performance in fast- changing Gen AI space § Future potential § Continued innovation in AI model deployment § Integration with complementary technologies for enhanced performance Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Recap: Wasm & WASI are great for AI - but… 🤪 39