Slide 1

Slide 1 text

KI im Browser Smartere Web-Apps mit WebGPU und WebNN Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. KI im Browser Smartere Web-Apps mit WebGPU und WebNN Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

What to expect Focus on web app development Focus on Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications KI im Browser Smartere Web-Apps mit WebGPU und WebNN Expectations

Slide 4

Slide 4 text

KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI everywhere

Slide 5

Slide 5 text

Run locally on the user’s system KI im Browser Smartere Web-Apps mit WebGPU und WebNN Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 6

Slide 6 text

Make SPAs offline-capable KI im Browser Smartere Web-Apps mit WebGPU und WebNN Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 7

Slide 7 text

Speech OpenAI Whisper tortoise-tts … Overview KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 8

Slide 8 text

Speech OpenAI Whisper tortoise-tts … Overview KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 9

Slide 9 text

Examples KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI Cloud Providers

Slide 10

Slide 10 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI Cloud Providers

Slide 11

Slide 11 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 12

Slide 12 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 13

Slide 13 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 14

Slide 14 text

https://webllm.mlc.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 15

Slide 15 text

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 16

Slide 16 text

Storing model files locally KI im Browser Smartere Web-Apps mit WebGPU und WebNN Cache API Internet Website HTML/JS Cache with model files Hugging Face

Slide 17

Slide 17 text

Parameter cache KI im Browser Smartere Web-Apps mit WebGPU und WebNN Cache API

Slide 18

Slide 18 text

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Slide 19

Slide 19 text

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 20

Slide 20 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Outlook: WebNN

Slide 21

Slide 21 text

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Slide 22

Slide 22 text

On NPM KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebLLM

Slide 23

Slide 23 text

Live Demo Add a “copilot” to a todo application using the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Slide 24

Slide 24 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source KI im Browser Smartere Web-Apps mit WebGPU und WebNN Stable Diffusion

Slide 25

Slide 25 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time KI im Browser Smartere Web-Apps mit WebGPU und WebNN Web Stable Diffusion

Slide 26

Slide 26 text

https://websd.mlc.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Web Stable Diffusion

Slide 27

Slide 27 text

Live Demo Retrofitting AI image generation into an existing drawing application (https://paint.js.org) KI im Browser Smartere Web-Apps mit WebGPU und WebNN Web Stable Diffusion

Slide 28

Slide 28 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Low cost KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Slide 29

Slide 29 text

Disadvantages – High system requirements (RAM, GPU) – High bandwidth requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Slide 30

Slide 30 text

Mitigations Download model in the background if the user is not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Slide 31

Slide 31 text

Alternatives: Ollama Local runner for AI models Offers a local server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Slide 32

Slide 32 text

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text KI im Browser Smartere Web-Apps mit WebGPU und WebNN Alternatives

Slide 33

Slide 33 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Alternatives

Slide 34

Slide 34 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful KI im Browser Smartere Web-Apps mit WebGPU und WebNN Summary

Slide 35

Slide 35 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]