NDC 2024: WebAssembly++ - Build AI-driven applications for browser, edge, and server(less)

by Christian Weyer

Slide 1

Slide 1 text

WebAssembly++ Build AI-driven applications for browser, edge, and server(less) Christian Weyer Co-Founder & CTO

Slide 2

Slide 2 text

§ Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures § Microsoft Regional Director § Microsoft MVP for Developer Technologies & Azure ASPInsider, AzureInsider § Google GDE for Web Technologies [email protected] @christianweyer https://www.thinktecture.com Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Christian Weyer Co-Founder & CTO @ Thinktecture AG 2

Slide 3

Slide 3 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Our journey today AI-all-the- things? WebAssembly AI in browser Recap AI on edge / server(less) WASI 3

Slide 4

Slide 4 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI-all-the-things? 4

Slide 5

Slide 5 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ A.I. 5 https://archive.org/details/Artiﬁcial_Intelligence_Projects_ for_the_Commodore_64_1985_TAB_Books

Slide 6

Slide 6 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI space 6 Data Science Artificial Intelligence Machine Learning Unsupervised, supervised, reinforcement learning Deep Learning ANN, CNN, RNN etc. NLP (Natural Language Processing) Generative AI GAN, VAE, Transformers etc. Image / Video Generation GAN, VAE Large Language Models Transformers

Slide 7

Slide 7 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI scenarios & use cases - Ubiquitousness 7 Run AI models and algorithms anywhere (offline, privacy) Write code once, run it anywhere § Natural Language Processing: text classification, named entity recognition, question answering, summarization, translation, and text generation § Computer Vision: image classification, object detection, and segmentation § Audio: automatic speech recognition and audio classification § Multimodal: zero-shot image classification § Edge computing: deploy ML / AI models on edge devices with efficient resource utilization § IoT applications: integrate ML capabilities in IoT devices - local data processing, real-time decision-making § Server(less): implement ML inference in serverless environments for scalable and efficient processing Browser Beyond browser

Slide 8

Slide 8 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebAssembly 8

Slide 9

Slide 9 text

WebAssembly (Wasm) is a § secure § fast § portable stack-based execution environment (aka virtual machine), for running applications by executing low-level sandboxed bytecode Build AI-driven applications for browser, edge, and server(less) WebAssembly++ What exactly is WebAssembly? 9

Slide 10

Slide 10 text

Server Server Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Why do we need WebAssembly? Everything, everywhere, by everyone – securely Origins Current developments Future Web Web Web Cloud Cloud Microcontroller Extensibility Extensibility Bare metal Edge 10

Slide 11

Slide 11 text

Hello Wasm in browser WAT, JS Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 11

Slide 12

Slide 12 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ 12 https://blog.scottlogic.com/2023/10/18/the-state-of-webassembly-2023.html

Slide 13

Slide 13 text

Photoshop in browser Migrating code to the web with Wasm Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 13

Slide 14

Slide 14 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI in browser 14 WebAssembly++

Slide 15

Slide 15 text

§ API that exposes capabilities of GPU hardware for the web § Designed from the ground up to efficiently map to native GPU APIs § Not related to WebGL and does not explicitly target OpenGL ES § Enables web developers to use underlying system's GPU to carry out high-performance computations and draw complex graphics that can be rendered in the browser § … or use high-performance computations for AI use cases Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebGPU: What is good for graphics, is good for AI 15

Slide 16

Slide 16 text

Segment anything - in browser Wasm vs. WebGPU Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 16

Slide 17

Slide 17 text

§ Web Neural Network API § Web-friendly hardware-agnostic abstraction layer for ML & DL § Use ML capabilities of OS & underlying hardware platforms without being tied to platform-speciﬁc capabilities § Addresses requirements of key ML JavaScript frameworks § Allows web developers familiar with ML domain to write custom code without the help of libraries Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebNN: Neural networks in browser 17

Slide 18

Slide 18 text

Facial landmark detection - in browser Wasm vs. WebNN Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 18

Slide 19

Slide 19 text

§ Open Neural Network Exchange (ONNX) § Open-source ecosystem for algorithms, tools, formats § ONNX Runtime § Cross-platform machine-learning model accelerator § Flexible interface to integrate hardware-speciﬁc libraries § Can be used with models from PyTorch, Tensorﬂow/Keras, TFLite, scikit-learn, and other frameworks § ONNX Runtime Web § Builds on Wasm § Optionally uses WebGPU, WebNN § Inference in browser Build AI-driven applications for browser, edge, and server(less) WebAssembly++ ONNX: Open format & runtime for inferencing 19

Slide 20

Slide 20 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI in browser: Putting it all together 20 ONNX models TensorFlow models Other models JavaScript ML frameworks (ONNX Web, TensorFlow.js, …) WebGPU WebNN WebAssembly ML Compute macOS/iOS DirectML Windows NN API Android OpenVINO Linux NPU GPU CPU Web App Browser OS Hardware

Slide 21

Slide 21 text

§ Hiding the nitty gritty details of Wasm, WebGPU, WebNN, ONNX § “Don’t worry – be happy!” § Run HuggingFace Transformers directly in browser § Uses ONNX Runtime Web to run models § All the scenarios we want § Natural language processing § Computer vision § Audio § Multimodal Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Transformers.js: State of the art ML applications in the browser 21

Slide 22

Slide 22 text

Sentiment analysis – in browser Transformers.js, ONNX Runtime Web Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 22

Slide 23

Slide 23 text

Running Whipser - in browser Transformers.js, ONNX Runtime Web, WebGPU Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 23

Slide 24

Slide 24 text

Running Phi-3 mini SLM - in browser Transformers.js, ONNX Runtime Web, WebGPU Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 24

Slide 25

Slide 25 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI 25

Slide 26

Slide 26 text

§ Size § Compared to other distributables (containers | VMs), Wasm is super small § Speed § Wasm modules are bootstrapped in “no time” and are fast at runtime § Resource utilization § With same compute power, more applications can be executed § Security § Wasm modules are isolated, and every module can have dedicated permissions Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebAssembly outside the browser – ubiquitous compute 26

Slide 27

Slide 27 text

§ Enabling Wasm in a Cloud-native world § WASI provides standardized system calls to sandboxed Wasm code § Capability-based isolation § Platform-agnostic: run Wasm applications on any WASI-compliant platform (device, OS, CPU arch) § WASI offers containerization on the next level Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI (WebAssembly System Interface) https://twitter.com/solomonstre/status/1111004913222324225 27

Slide 28

Slide 28 text

§ Host infrastructure for executing Wasm applications leveraging APIs specified and provided by WASI § Handle system calls made by Wasm modules and delegate them to the underlying platform, e.g.: § File access § Network access § Environment variables § Special hardware § WASI runtime examples § wasmtime § wasmer § WasmEdge Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI runtimes 28

Slide 29

Slide 29 text

Hello WASI Rust and wasmtime Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 29

Slide 30

Slide 30 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI on edge 30

Slide 31

Slide 31 text

§ Extension of WASI for machine learning and neural networks § Enables loading & execution of pre-trained neural network models within Wasm environments § Supports common neural network formats & runtimes (e.g., ONNX) § Much faster than pure Wasm through hardware acceleration Build AI-driven applications for browser, edge, and server(less) WebAssembly++ wasi-nn: Taking ML to WASI 31

Slide 32

Slide 32 text

§ LLM inference engine with wasi-nn § Portable & lightweight: Utilizes WasmEdge as lightweight, high- performance WASI runtime § Local & edge execution: Runs LLMs locally or on edge devices for improved performance and privacy § OpenAI API compatibility: Provides OpenAI-compatible API server for open-source LLMs Build AI-driven applications for browser, edge, and server(less) WebAssembly++ LlamaEdge: WASI-based AI inference 32

Slide 33

Slide 33 text

Running Mistral-7B LLM/SLM on local laptop with Open AI-compatible HTTP API LlamaEdge (Wasm, WASI), GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 33

Slide 34

Slide 34 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ But… ! 34 WebAssembly++

Slide 35

Slide 35 text

Running Phi-3 mini SLM on a Raspberry Pi5 with Open AI-compatible HTTP API Llamaﬁle, llama.cpp, native optimizations, GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 35

Slide 36

Slide 36 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI on server(less) 36

Slide 37

Slide 37 text

§ No infrastructure management § Speed of Wasm allows scale-down to 0 § No impact on request performance § Notable Serverless AI Wasm platforms today § Fermyon § Fermyon Serverless AI § CloudFlare § AI Workers Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Serverless AI with Wasm & WASI 37

Slide 38

Slide 38 text

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Summary 38

Slide 39

Slide 39 text

§ WebAssembly & WASI § Provide a robust framework for ubiquitous AI deployment § Ensure security, speed, and efﬁcient resource usage § Wasm limitations: browser § Performance constraints for complex models § Limited access to hardware- speciﬁc optimizations § Wasm / WASI limitations: edge § Resource constraints § Abstractions harm performance in fast- changing Gen AI space § Future potential § Continued innovation in AI model deployment § Integration with complementary technologies for enhanced performance Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Recap: Wasm & WASI are great for AI - but… 🤪 39

Slide 40

Slide 40 text

Thank you! Christian Weyer [email protected] https://thinktecture.com/christian-weyer 40 Demos https://github.com/thinktecture-labs/ai-applications-webassembly-plusplus