Webinar: KI im Browser: Smartere Web-Apps mit WebGPU und WebNN

KI im Browser Smartere Web-Apps mit WebGPU und WebNN Christian
Liebel @christianliebel Consultant

Hello, it’s me. KI im Browser Smartere Web-Apps mit WebGPU
und WebNN Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

What to expect Focus on web app development Focus on
Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications KI im Browser Smartere Web-Apps mit WebGPU und WebNN Expectations

KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative
AI everywhere

Run locally on the user’s system KI im Browser Smartere
Web-Apps mit WebGPU und WebNN Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable KI im Browser Smartere Web-Apps mit WebGPU
und WebNN Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Speech OpenAI Whisper tortoise-tts … Overview KI im Browser Smartere
Web-Apps mit WebGPU und WebNN Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Examples KI im Browser Smartere Web-Apps mit WebGPU und WebNN
Generative AI Cloud Providers

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? KI im Browser Smartere Web-Apps mit WebGPU und WebNN Generative AI Cloud Providers

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

https://webllm.mlc.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN
Large Language Models

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB
llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Storing model files locally KI im Browser Smartere Web-Apps mit
WebGPU und WebNN Cache API Internet Website HTML/JS Cache with model files Hugging Face

Parameter cache KI im Browser Smartere Web-Apps mit WebGPU und
WebNN Cache API

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebAssembly
(Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebGPU
Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Outlook: WebNN

KI im Browser Smartere Web-Apps mit WebGPU und WebNN WebNN:
near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

On NPM KI im Browser Smartere Web-Apps mit WebGPU und
WebNN WebLLM

Live Demo Add a “copilot” to a todo application using
the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). KI im Browser Smartere Web-Apps mit WebGPU und WebNN Large Language Models

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source KI im Browser Smartere Web-Apps mit WebGPU und WebNN Stable Diffusion

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time KI im Browser Smartere Web-Apps mit WebGPU und WebNN Web Stable Diffusion

https://websd.mlc.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN
Web Stable Diffusion

Live Demo Retrofitting AI image generation into an existing drawing
application (https://paint.js.org) KI im Browser Smartere Web-Apps mit WebGPU und WebNN Web Stable Diffusion

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Low cost KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Disadvantages – High system requirements (RAM, GPU) – High bandwidth
requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Mitigations Download model in the background if the user is
not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Alternatives: Ollama Local runner for AI models Offers a local
server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Local AI Models

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI
Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text KI im Browser Smartere Web-Apps mit WebGPU und WebNN Alternatives

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ KI im Browser Smartere Web-Apps mit WebGPU und WebNN Alternatives

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful KI im Browser Smartere Web-Apps mit WebGPU und WebNN Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Webinar: KI im Browser: Smartere Web-Apps mit W...

Webinar: KI im Browser: Smartere Web-Apps mit WebGPU und WebNN

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript