Generative-AI-Power im Web: Progressive Web Apps smarter machen

by Christian Liebel

Slide 1

Slide 1 text

Generative-AI-Power im Web Progressive Web Apps smarter machen Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. Generative-AI-Power im Web Progressive Web Apps smarter machen Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

Generative-AI-Power im Web Progressive Web Apps smarter machen DEMO

Slide 4

Slide 4 text

Generative-AI-Power im Web Progressive Web Apps smarter machen Generative AI everywhere Source: https://www.apple.com/chde/apple-intelligence/

Slide 5

Slide 5 text

Run locally on the user’s system Generative-AI-Power im Web Progressive Web Apps smarter machen Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 6

Slide 6 text

Make SPAs offline-capable Generative-AI-Power im Web Progressive Web Apps smarter machen Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 7

Slide 7 text

Overview Generative-AI-Power im Web Progressive Web Apps smarter machen Generative AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Examples Generative-AI-Power im Web Progressive Web Apps smarter machen Generative AI Cloud Providers

Slide 10

Slide 10 text

Drawbacks Generative-AI-Power im Web Progressive Web Apps smarter machen Generative AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription

Slide 11

Slide 11 text

Can we run GenAI models locally? Generative-AI-Power im Web Progressive Web Apps smarter machen

Slide 12

Slide 12 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models

Slide 13

Slide 13 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models

Slide 14

Slide 14 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models

Slide 15

Slide 15 text

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models

Slide 16

Slide 16 text

https://webllm.mlc.ai/ Generative-AI-Power im Web Progressive Web Apps smarter machen WebLLM DEMO

Slide 17

Slide 17 text

On NPM Generative-AI-Power im Web Progressive Web Apps smarter machen WebLLM

Slide 18

Slide 18 text

Storing model files locally Generative-AI-Power im Web Progressive Web Apps smarter machen Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.

Slide 19

Slide 19 text

Parameter cache Generative-AI-Power im Web Progressive Web Apps smarter machen Cache API

Slide 20

Slide 20 text

Generative-AI-Power im Web Progressive Web Apps smarter machen WebAssembly (Wasm) – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model- specific Wasm library to accelerate model computations

Slide 21

Slide 21 text

Generative-AI-Power im Web Progressive Web Apps smarter machen WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 22

Slide 22 text

– Grants web apps access to the device’s CPU, GPU and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU Generative-AI-Power im Web Progressive Web Apps smarter machen WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO

Slide 23

Slide 23 text

Generative-AI-Power im Web Progressive Web Apps smarter machen WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Slide 24

Slide 24 text

Comparison 22.98 33.96 19.08 38.75 564.63 0 100 200 300 400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Generative-AI-Power im Web Progressive Web Apps smarter machen Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)

Slide 25

Slide 25 text

– Open-source text-to-image model – Generates 512x512px images from a prompt – WebSD: special version of Stable Diffusion for the web (2 GB in size) – No npm package this time Generative-AI-Power im Web Progressive Web Apps smarter machen Stable Diffusion Prompt: A guinea pig eating a watermelon

Slide 26

Slide 26 text

https://websd.mlc.ai/ Generative-AI-Power im Web Progressive Web Apps smarter machen Web Stable Diffusion DEMO

Slide 27

Slide 27 text

Pros & Cons + Data does not leave the browser (privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 28

Slide 28 text

Mitigations Download model in the background if the user is not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 29

Slide 29 text

Mitigations Hybrid modes: – Allow the user to switch between cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 30

Slide 30 text

Alternatives: Prompt API Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano

Slide 31

Slide 31 text

Alternatives: Prompt API – Exploratory API for local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model is shared across origins – Uses native APIs directly – Related APIs: Translation API, Writing Assistance APIs Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models https://developer.chrome.com/docs/ai/built-in DEMO

Slide 32

Slide 32 text

Alternatives: Ollama – Local runner for AI models – Offers a local server a website can connect to à allows sharing models across origins – Supported on macOS and Linux (Windows in Preview) https://webml-demo.vercel.app/ https://ollama.ai/ Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 33

Slide 33 text

Alternatives: Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 34

Slide 34 text

Alternatives: Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI – JavaScript library to run Hugging Face transformers in the browser – Supports most of the models https://xenova.github.io/transformers.js/ Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models

Slide 35

Slide 35 text

– Cloud-based models remain the most powerful models – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small language models are becoming more powerful – Vendors start shipping AI models with their devices – Devices are becoming more powerful for running AI tasks – Experiment with the AI APIs and make your web apps smarter! Generative-AI-Power im Web Progressive Web Apps smarter machen Summary

Slide 36

Slide 36 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]