The AI Revolution in the Browser? Making Single-Page Apps Smarter

The AI Revolution in the Browser? Making Single-Page Apps Smarter
Christian Liebel @christianliebel Consultant

Hello, it’s me. The AI Revolution in the Browser? Making
Single-Page Apps Smarter Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Generative AI everywhere

Run locally on the user’s system The AI Revolution in
the Browser? Making Single-Page Apps Smarter Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable The AI Revolution in the Browser? Making
Single-Page Apps Smarter Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in
the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Examples The AI Revolution in the Browser? Making Single-Page Apps
Smarter Generative AI Cloud Providers

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Cloud Providers

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB
llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

https://webllm.mlc.ai/ The AI Revolution in the Browser? Making Single-Page Apps
Smarter WebLLM DEMO

On NPM The AI Revolution in the Browser? Making Single-Page
Apps Smarter WebLLM

Demo The AI Revolution in the Browser? Making Single-Page Apps
Smarter WebLLM DEMO

Benchmarks Selection of available models for WebLLM: – LLaMa-2 7B
Chat – LLaMa-2 13B Chat – Mistral 7B Instruct – Gemma 2B IT https://medium.com/@kagglepro.llc/gemma-vs-llama- vs-mistral-a-comparative-analysis-with-a-coding- twist-8eb4d849e4d5 The AI Revolution in the Browser? Making Single-Page Apps Smarter Choosing a model

Storing model files locally The AI Revolution in the Browser?
Making Single-Page Apps Smarter Cache API Internet Website HTML/JS Cache with model files Hugging Face

Parameter cache The AI Revolution in the Browser? Making Single-Page
Apps Smarter Cache API

WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Outlook: WebNN

WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Concept and limitations The todo data has to be converted
into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

System prompt Metaprompt that defines… – character – capabilities/limitations –
output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

Flow System message • The user has these todos: 1.
… 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

Techniques – Providing examples (single shot, few shot, …) –
Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model The
AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering Effort

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300
400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec The AI Revolution in the Browser? Making Single-Page Apps Smarter Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source The AI Revolution in the Browser? Making Single-Page Apps Smarter Stable Diffusion Prompt: A guinea pig eating a watermelon

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion

https://websd.mlc.ai/ The AI Revolution in the Browser? Making Single-Page Apps
Smarter Web Stable Diffusion DEMO

Live Demo Retrofitting AI image generation into an existing drawing
application (https://paint.js.org) The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion DEMO

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Stability (external API changes) – Low cost The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models

Disadvantages – Lower quality than closed-source models – High system
requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU is currently only supported by Chromium-based browsers on macOS and Windows, WebNN is not available yet The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Alternatives

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful The AI Revolution in the Browser? Making Single-Page Apps Smarter Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

The AI Revolution in the Browser? Making Single...

The AI Revolution in the Browser? Making Single-Page Apps Smarter

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript