Slide 1

Slide 1 text

The AI Revolution in the Browser? Making Single-Page Apps Smarter Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. The AI Revolution in the Browser? Making Single-Page Apps Smarter Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI everywhere

Slide 4

Slide 4 text

Run locally on the user’s system The AI Revolution in the Browser? Making Single-Page Apps Smarter Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 5

Slide 5 text

Make SPAs offline-capable The AI Revolution in the Browser? Making Single-Page Apps Smarter Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 6

Slide 6 text

Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 7

Slide 7 text

Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 8

Slide 8 text

Examples The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Cloud Providers

Slide 9

Slide 9 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Cloud Providers

Slide 10

Slide 10 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Slide 11

Slide 11 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Slide 12

Slide 12 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Slide 13

Slide 13 text

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models

Slide 14

Slide 14 text

https://webllm.mlc.ai/ The AI Revolution in the Browser? Making Single-Page Apps Smarter WebLLM DEMO

Slide 15

Slide 15 text

On NPM The AI Revolution in the Browser? Making Single-Page Apps Smarter WebLLM

Slide 16

Slide 16 text

Demo The AI Revolution in the Browser? Making Single-Page Apps Smarter WebLLM DEMO

Slide 17

Slide 17 text

Benchmarks Selection of available models for WebLLM: – LLaMa-2 7B Chat – LLaMa-2 13B Chat – Mistral 7B Instruct – Gemma 2B IT https://medium.com/@kagglepro.llc/gemma-vs-llama- vs-mistral-a-comparative-analysis-with-a-coding- twist-8eb4d849e4d5 The AI Revolution in the Browser? Making Single-Page Apps Smarter Choosing a model

Slide 18

Slide 18 text

Storing model files locally The AI Revolution in the Browser? Making Single-Page Apps Smarter Cache API Internet Website HTML/JS Cache with model files Hugging Face

Slide 19

Slide 19 text

Parameter cache The AI Revolution in the Browser? Making Single-Page Apps Smarter Cache API

Slide 20

Slide 20 text

The AI Revolution in the Browser? Making Single-Page Apps Smarter WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Slide 21

Slide 21 text

The AI Revolution in the Browser? Making Single-Page Apps Smarter WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 22

Slide 22 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Outlook: WebNN

Slide 23

Slide 23 text

The AI Revolution in the Browser? Making Single-Page Apps Smarter WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Slide 24

Slide 24 text

Concept and limitations The todo data has to be converted into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

Slide 25

Slide 25 text

System prompt Metaprompt that defines… – character – capabilities/limitations – output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

Slide 26

Slide 26 text

Flow System message • The user has these todos: 1. … 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data

Slide 27

Slide 27 text

Techniques – Providing examples (single shot, few shot, …) – Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering

Slide 28

Slide 28 text

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering Effort

Slide 29

Slide 29 text

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300 400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec The AI Revolution in the Browser? Making Single-Page Apps Smarter Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)

Slide 30

Slide 30 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source The AI Revolution in the Browser? Making Single-Page Apps Smarter Stable Diffusion Prompt: A guinea pig eating a watermelon

Slide 31

Slide 31 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion

Slide 32

Slide 32 text

https://websd.mlc.ai/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion DEMO

Slide 33

Slide 33 text

Live Demo Retrofitting AI image generation into an existing drawing application (https://paint.js.org) The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion DEMO

Slide 34

Slide 34 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Stability (external API changes) – Low cost The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models

Slide 35

Slide 35 text

Disadvantages – Lower quality than closed-source models – High system requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU is currently only supported by Chromium-based browsers on macOS and Windows, WebNN is not available yet The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models

Slide 36

Slide 36 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Alternatives

Slide 37

Slide 37 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful The AI Revolution in the Browser? Making Single-Page Apps Smarter Summary

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]