Slide 1

Slide 1 text

WebNN: Die AI-Revolution im Browser? Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. WebNN Die AI-Revolution im Browser? Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

What to expect Focus on web app development Focus on Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications WebNN Die AI-Revolution im Browser? Expectations

Slide 4

Slide 4 text

WebNN Die AI-Revolution im Browser? Generative AI everywhere

Slide 5

Slide 5 text

Run locally on the user’s system WebNN Die AI-Revolution im Browser? Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 6

Slide 6 text

Make SPAs offline-capable WebNN Die AI-Revolution im Browser? Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 7

Slide 7 text

Speech OpenAI Whisper tortoise-tts … Overview WebNN Die AI-Revolution im Browser? Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 8

Slide 8 text

Speech OpenAI Whisper tortoise-tts … Overview WebNN Die AI-Revolution im Browser? Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 9

Slide 9 text

Examples WebNN Die AI-Revolution im Browser? Generative AI Cloud Providers

Slide 10

Slide 10 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? WebNN Die AI-Revolution im Browser? Generative AI Cloud Providers

Slide 11

Slide 11 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini (Google) – LLaMa (Meta AI) WebNN Die AI-Revolution im Browser? Large Language Models

Slide 12

Slide 12 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. WebNN Die AI-Revolution im Browser? Large Language Models

Slide 13

Slide 13 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections WebNN Die AI-Revolution im Browser? Large Language Models

Slide 14

Slide 14 text

https://webllm.mlc.ai/ WebNN Die AI-Revolution im Browser? Large Language Models

Slide 15

Slide 15 text

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB WebNN Die AI-Revolution im Browser? Large Language Models

Slide 16

Slide 16 text

Storing model files locally WebNN Die AI-Revolution im Browser? Cache API Internet Website HTML/JS Cache with model files Hugging Face

Slide 17

Slide 17 text

Parameter cache WebNN Die AI-Revolution im Browser? Cache API

Slide 18

Slide 18 text

WebNN Die AI-Revolution im Browser? WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Slide 19

Slide 19 text

WebNN Die AI-Revolution im Browser? WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 20

Slide 20 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ WebNN Die AI-Revolution im Browser? Outlook: WebNN

Slide 21

Slide 21 text

WebNN Die AI-Revolution im Browser? WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Slide 22

Slide 22 text

On NPM WebNN Die AI-Revolution im Browser? WebLLM

Slide 23

Slide 23 text

Live Demo Add a “copilot” to a todo application using the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). WebNN Die AI-Revolution im Browser? Large Language Models

Slide 24

Slide 24 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source WebNN Die AI-Revolution im Browser? Stable Diffusion

Slide 25

Slide 25 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time WebNN Die AI-Revolution im Browser? Web Stable Diffusion

Slide 26

Slide 26 text

https://websd.mlc.ai/ WebNN Die AI-Revolution im Browser? Web Stable Diffusion

Slide 27

Slide 27 text

Live Demo Retrofitting AI image generation into an existing drawing application (https://paint.js.org) WebNN Die AI-Revolution im Browser? Web Stable Diffusion

Slide 28

Slide 28 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Stability (external API changes) – Low cost WebNN Die AI-Revolution im Browser? Local AI Models

Slide 29

Slide 29 text

Disadvantages – High system requirements (RAM, GPU) – High bandwidth requirements (large model size) – Inference relatively slow – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Higher-quality models such as GPT are closed-source WebNN Die AI-Revolution im Browser? Local AI Models

Slide 30

Slide 30 text

Mitigations Download model in the background if the user is not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space WebNN Die AI-Revolution im Browser? Local AI Models

Slide 31

Slide 31 text

Mitigations Hybrid modes: – Allow the user to switch between cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) WebNN Die AI-Revolution im Browser? Local AI Models

Slide 32

Slide 32 text

Alternatives: Ollama Local runner for AI models Offers a local server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ WebNN Die AI-Revolution im Browser? Local AI Models

Slide 33

Slide 33 text

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text WebNN Die AI-Revolution im Browser? Alternatives

Slide 34

Slide 34 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ WebNN Die AI-Revolution im Browser? Alternatives

Slide 35

Slide 35 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful WebNN Die AI-Revolution im Browser? Summary

Slide 36

Slide 36 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]