NDC 2024: WebAssembly++ - Build AI-driven applications for browser, edge, and server(less)

WebAssembly++ Build AI-driven applications for browser, edge, and server(less) Christian
Weyer Co-Founder & CTO

§ Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures
§ Microsoft Regional Director § Microsoft MVP for Developer Technologies & Azure ASPInsider, AzureInsider § Google GDE for Web Technologies [email protected] @christianweyer https://www.thinktecture.com Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Christian Weyer Co-Founder & CTO @ Thinktecture AG 2

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Our
journey today AI-all-the- things? WebAssembly AI in browser Recap AI on edge / server(less) WASI 3

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI-all-the-things?
4

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ A.I.
5 https://archive.org/details/Artiﬁcial_Intelligence_Projects_ for_the_Commodore_64_1985_TAB_Books

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ AI
space 6 Data Science Artificial Intelligence Machine Learning Unsupervised, supervised, reinforcement learning Deep Learning ANN, CNN, RNN etc. NLP (Natural Language Processing) Generative AI GAN, VAE, Transformers etc. Image / Video Generation GAN, VAE Large Language Models Transformers

scenarios & use cases - Ubiquitousness 7 Run AI models and algorithms anywhere (offline, privacy) Write code once, run it anywhere § Natural Language Processing: text classification, named entity recognition, question answering, summarization, translation, and text generation § Computer Vision: image classification, object detection, and segmentation § Audio: automatic speech recognition and audio classification § Multimodal: zero-shot image classification § Edge computing: deploy ML / AI models on edge devices with efficient resource utilization § IoT applications: integrate ML capabilities in IoT devices - local data processing, real-time decision-making § Server(less): implement ML inference in serverless environments for scalable and efficient processing Browser Beyond browser

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebAssembly
8

WebAssembly (Wasm) is a § secure § fast § portable
stack-based execution environment (aka virtual machine), for running applications by executing low-level sandboxed bytecode Build AI-driven applications for browser, edge, and server(less) WebAssembly++ What exactly is WebAssembly? 9

Server Server Build AI-driven applications for browser, edge, and server(less)
WebAssembly++ Why do we need WebAssembly? Everything, everywhere, by everyone – securely Origins Current developments Future Web Web Web Cloud Cloud Microcontroller Extensibility Extensibility Bare metal Edge 10

Hello Wasm in browser WAT, JS Build AI-driven applications for
browser, edge, and server(less) WebAssembly++ DEMO 11

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ 12
https://blog.scottlogic.com/2023/10/18/the-state-of-webassembly-2023.html

Photoshop in browser Migrating code to the web with Wasm
Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 13

in browser 14 WebAssembly++

§ API that exposes capabilities of GPU hardware for the
web § Designed from the ground up to efficiently map to native GPU APIs § Not related to WebGL and does not explicitly target OpenGL ES § Enables web developers to use underlying system's GPU to carry out high-performance computations and draw complex graphics that can be rendered in the browser § … or use high-performance computations for AI use cases Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebGPU: What is good for graphics, is good for AI 15

Segment anything - in browser Wasm vs. WebGPU Build AI-driven
applications for browser, edge, and server(less) WebAssembly++ DEMO 16

§ Web Neural Network API § Web-friendly hardware-agnostic abstraction layer
for ML & DL § Use ML capabilities of OS & underlying hardware platforms without being tied to platform-speciﬁc capabilities § Addresses requirements of key ML JavaScript frameworks § Allows web developers familiar with ML domain to write custom code without the help of libraries Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebNN: Neural networks in browser 17

Facial landmark detection - in browser Wasm vs. WebNN Build
AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 18

§ Open Neural Network Exchange (ONNX) § Open-source ecosystem for
algorithms, tools, formats § ONNX Runtime § Cross-platform machine-learning model accelerator § Flexible interface to integrate hardware-speciﬁc libraries § Can be used with models from PyTorch, Tensorﬂow/Keras, TFLite, scikit-learn, and other frameworks § ONNX Runtime Web § Builds on Wasm § Optionally uses WebGPU, WebNN § Inference in browser Build AI-driven applications for browser, edge, and server(less) WebAssembly++ ONNX: Open format & runtime for inferencing 19

in browser: Putting it all together 20 ONNX models TensorFlow models Other models JavaScript ML frameworks (ONNX Web, TensorFlow.js, …) WebGPU WebNN WebAssembly ML Compute macOS/iOS DirectML Windows NN API Android OpenVINO Linux NPU GPU CPU Web App Browser OS Hardware

§ Hiding the nitty gritty details of Wasm, WebGPU, WebNN,
ONNX § “Don’t worry – be happy!” § Run HuggingFace Transformers directly in browser § Uses ONNX Runtime Web to run models § All the scenarios we want § Natural language processing § Computer vision § Audio § Multimodal Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Transformers.js: State of the art ML applications in the browser 21

Sentiment analysis – in browser Transformers.js, ONNX Runtime Web Build
AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 22

Running Whipser - in browser Transformers.js, ONNX Runtime Web, WebGPU
Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 23

Running Phi-3 mini SLM - in browser Transformers.js, ONNX Runtime
Web, WebGPU Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 24

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI
25

§ Size § Compared to other distributables (containers | VMs),
Wasm is super small § Speed § Wasm modules are bootstrapped in “no time” and are fast at runtime § Resource utilization § With same compute power, more applications can be executed § Security § Wasm modules are isolated, and every module can have dedicated permissions Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WebAssembly outside the browser – ubiquitous compute 26

§ Enabling Wasm in a Cloud-native world § WASI provides
standardized system calls to sandboxed Wasm code § Capability-based isolation § Platform-agnostic: run Wasm applications on any WASI-compliant platform (device, OS, CPU arch) § WASI offers containerization on the next level Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI (WebAssembly System Interface) https://twitter.com/solomonstre/status/1111004913222324225 27

§ Host infrastructure for executing Wasm applications leveraging APIs specified
and provided by WASI § Handle system calls made by Wasm modules and delegate them to the underlying platform, e.g.: § File access § Network access § Environment variables § Special hardware § WASI runtime examples § wasmtime § wasmer § WasmEdge Build AI-driven applications for browser, edge, and server(less) WebAssembly++ WASI runtimes 28

Hello WASI Rust and wasmtime Build AI-driven applications for browser,
edge, and server(less) WebAssembly++ DEMO 29

on edge 30

§ Extension of WASI for machine learning and neural networks
§ Enables loading & execution of pre-trained neural network models within Wasm environments § Supports common neural network formats & runtimes (e.g., ONNX) § Much faster than pure Wasm through hardware acceleration Build AI-driven applications for browser, edge, and server(less) WebAssembly++ wasi-nn: Taking ML to WASI 31

§ LLM inference engine with wasi-nn § Portable & lightweight:
Utilizes WasmEdge as lightweight, high- performance WASI runtime § Local & edge execution: Runs LLMs locally or on edge devices for improved performance and privacy § OpenAI API compatibility: Provides OpenAI-compatible API server for open-source LLMs Build AI-driven applications for browser, edge, and server(less) WebAssembly++ LlamaEdge: WASI-based AI inference 32

Running Mistral-7B LLM/SLM on local laptop with Open AI-compatible HTTP
API LlamaEdge (Wasm, WASI), GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 33

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ But…
! 34 WebAssembly++

Running Phi-3 mini SLM on a Raspberry Pi5 with Open
AI-compatible HTTP API Llamaﬁle, llama.cpp, native optimizations, GGUF model Build AI-driven applications for browser, edge, and server(less) WebAssembly++ DEMO 35

on server(less) 36

§ No infrastructure management § Speed of Wasm allows scale-down
to 0 § No impact on request performance § Notable Serverless AI Wasm platforms today § Fermyon § Fermyon Serverless AI § CloudFlare § AI Workers Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Serverless AI with Wasm & WASI 37

Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Summary
38

§ WebAssembly & WASI § Provide a robust framework for
ubiquitous AI deployment § Ensure security, speed, and efﬁcient resource usage § Wasm limitations: browser § Performance constraints for complex models § Limited access to hardware- speciﬁc optimizations § Wasm / WASI limitations: edge § Resource constraints § Abstractions harm performance in fast- changing Gen AI space § Future potential § Continued innovation in AI model deployment § Integration with complementary technologies for enhanced performance Build AI-driven applications for browser, edge, and server(less) WebAssembly++ Recap: Wasm & WASI are great for AI - but… 🤪 39

Thank you! Christian Weyer [email protected] https://thinktecture.com/christian-weyer 40 Demos https://github.com/thinktecture-labs/ai-applications-webassembly-plusplus

NDC 2024: WebAssembly++ - Build AI-driven appli...

NDC 2024: WebAssembly++ - Build AI-driven applications for browser, edge, and server(less)

More Decks by Christian Weyer

Other Decks in Programming

Featured

Transcript