Generative-AI-Power im Web: Progressive Web Apps smarter machen

by Christian Liebel

Slide 1

Slide 1 text

Generative-AI-Power im Web Progressive Web Apps smarter machen Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 3

Slide 3 text

Generative AI everywhere Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 4

Slide 4 text

Run locally on the user’s system Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 5

Slide 5 text

Make SPAs offline-capable Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 6

Slide 6 text

Speech OpenAI Whisper tortoise-tts … Overview Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna … Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Examples Generative AI Cloud Providers Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 9

Slide 9 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Generative AI Cloud Providers Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 10

Slide 10 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 11

Slide 11 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 12

Slide 12 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 13

Slide 13 text

Use Cases Content consumption – summarization – translation – answering questions about some content – categorization – characterizing Content creation – writing assistance – proofreading – grammar correction – rephrasing Large Language Models https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 14

Slide 14 text

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Large Language Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 15

Slide 15 text

https://webllm.mlc.ai/ WebLLM DEMO Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 16

Slide 16 text

On NPM WebLLM Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 17

Slide 17 text

Demo WebLLM DEMO Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 18

Slide 18 text

Benchmarks Selection of available models for WebLLM: – LLaMa-3 8B Instruct – LLaMa-3 70B Instruct – Mistral 7B Instruct – Gemma 2B IT https://www.theverge.com/2024/4/18/24134103/llama- 3-benchmark-testing-ai-gemma-gemini-mistral Choosing a model Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 19

Slide 19 text

Storing model files locally Cache API Internet Website HTML/JS Cache with model files Hugging Face Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 20

Slide 20 text

Parameter cache Cache API Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 21

Slide 21 text

WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 22

Slide 22 text

WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113 Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 23

Slide 23 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Outlook: WebNN Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 24

Slide 24 text

WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1) Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 25

Slide 25 text

Caveats – Due to the Same-Origin Policy, models can’t be shared across origins (i.e., https://example.org cannot access https://test.example.org). – Downloading LLMs multiple times leads to very high storage consumption. WebLLM Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 26

Slide 26 text

Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 27

Slide 27 text

Part of Chrome’s Built-In AI initiative – Exploratory API for local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future Prompt API https://developer.chrome.com/docs/ai/built-in Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 28

Slide 28 text

First Glance Prompt API Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 29

Slide 29 text

Demo: Smart Form Filler Prompt API DEMO Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 30

Slide 30 text

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Prompt Engineering Effort Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 31

Slide 31 text

Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300 400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023) Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 32

Slide 32 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source Stable Diffusion Prompt: A guinea pig eating a watermelon Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 33

Slide 33 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports Web Stable Diffusion Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 34

Slide 34 text

https://websd.mlc.ai/ Web Stable Diffusion DEMO Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 35

Slide 35 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Stability (external API changes) – Low cost Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 36

Slide 36 text

Disadvantages – Lower quality than closed-source models – High system requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU and WebNN are currently only supported by Chromium- based browsers on macOS and Windows (WebNN only behind a flag) – Prompt API is only an exploratory API Local AI Models Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 37

Slide 37 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ Alternatives Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 38

Slide 38 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source GenAI models are becoming more compact and efficient – Vendors are beginning to ship AI models with their devices – Devices are becoming more powerful for AI tasks Summary Progressive Web Apps smarter machen Generative-AI-Power im Web

Slide 39

Slide 39 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]