Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative-AI-Power im Web: Progressive Web App...

Generative-AI-Power im Web: Progressive Web Apps smarter machen

Immer mehr Entwickler beabsichtigen, Generative-AI-Funktionen in ihre Anwendungen zu integrieren. Dieser Weg führt bislang praktisch immer in die Cloud – doch das muss nicht unbedingt so sein! Aktuell gibt es unterschiedliche vielversprechende Ansätze, KI-Modelle direkt auf dem Rechner des Anwenders auszuführen: Hugging Face bietet etwa mit Transformers.js die Möglichkeit, Machine-Learning-Modelle direkt im Browser zu nutzen. Die Web Neural Network API (WebNN) des W3C, die sich noch in der Spezifikationsphase befindet, wird solchen Modellen in Zukunft Zugang zur Neural Processing Unit (NPU) des Geräts gewähren: Damit können etwa auch Large Language Models (LLM) oder Stable-Diffusion-Modelle effizient im Browser betrieben werden.

Die Vorteile dieser Ansätze liegen auf der Hand: Lokal ausgeführte KI-Modelle stehen auch offline zur Verfügung, die Nutzerdaten verlassen das Gerät nicht und das alles dank Open-Source-Modellen sogar kostenfrei. Aber natürlich muss das Modell erstmal auf das Gerät des Anwenders übertragen werden, das auch noch ausreichend leistungsfähig sein muss. In dieser Session wird Christian Liebel, Thinktectures Vertreter beim W3C, diese unterschiedlichen Ansätze präsentieren, um auch Ihre Progressive Web App smarter zu machen. Wir werden Anwendungsfälle diskutieren und Vor- und Nachteile der jeweiligen Lösungen aufzeigen. Seien Sie dabei!

Christian Liebel

November 09, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. Hello, it’s me. Generative-AI-Power im Web Progressive Web Apps smarter

    machen Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel
  2. Generative-AI-Power im Web Progressive Web Apps smarter machen Generative AI

    everywhere Source: https://www.apple.com/chde/apple-intelligence/
  3. Run locally on the user’s system Generative-AI-Power im Web Progressive

    Web Apps smarter machen Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS
  4. Make SPAs offline-capable Generative-AI-Power im Web Progressive Web Apps smarter

    machen Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch
  5. Overview Generative-AI-Power im Web Progressive Web Apps smarter machen Generative

    AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …
  6. Overview Generative-AI-Power im Web Progressive Web Apps smarter machen Generative

    AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …
  7. Drawbacks Generative-AI-Power im Web Progressive Web Apps smarter machen Generative

    AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription
  8. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models
  9. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models
  10. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models
  11. Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

    llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Generative-AI-Power im Web Progressive Web Apps smarter machen Large Language Models
  12. Storing model files locally Generative-AI-Power im Web Progressive Web Apps

    smarter machen Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.
  13. Generative-AI-Power im Web Progressive Web Apps smarter machen WebAssembly (Wasm)

    – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model- specific Wasm library to accelerate model computations
  14. Generative-AI-Power im Web Progressive Web Apps smarter machen WebGPU –

    Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113
  15. – Grants web apps access to the device’s CPU, GPU

    and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU Generative-AI-Power im Web Progressive Web Apps smarter machen WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO
  16. Generative-AI-Power im Web Progressive Web Apps smarter machen WebNN: near-native

    inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  17. Comparison 22.98 33.96 19.08 38.75 564.63 0 100 200 300

    400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Generative-AI-Power im Web Progressive Web Apps smarter machen Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)
  18. – Open-source text-to-image model – Generates 512x512px images from a

    prompt – WebSD: special version of Stable Diffusion for the web (2 GB in size) – No npm package this time Generative-AI-Power im Web Progressive Web Apps smarter machen Stable Diffusion Prompt: A guinea pig eating a watermelon
  19. Pros & Cons + Data does not leave the browser

    (privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  20. Mitigations Download model in the background if the user is

    not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  21. Mitigations Hybrid modes: – Allow the user to switch between

    cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  22. Alternatives: Prompt API Generative-AI-Power im Web Progressive Web Apps smarter

    machen Local AI Models Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano
  23. Alternatives: Prompt API – Exploratory API for local experiments and

    use case determination – Downloads Gemini Nano into Google Chrome – Model is shared across origins – Uses native APIs directly – Related APIs: Translation API, Writing Assistance APIs Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models https://developer.chrome.com/docs/ai/built-in DEMO
  24. Alternatives: Ollama – Local runner for AI models – Offers

    a local server a website can connect to à allows sharing models across origins – Supported on macOS and Linux (Windows in Preview) https://webml-demo.vercel.app/ https://ollama.ai/ Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  25. Alternatives: Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond

    GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  26. Alternatives: Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI

    – JavaScript library to run Hugging Face transformers in the browser – Supports most of the models https://xenova.github.io/transformers.js/ Generative-AI-Power im Web Progressive Web Apps smarter machen Local AI Models
  27. – Cloud-based models remain the most powerful models – Due

    to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small language models are becoming more powerful – Vendors start shipping AI models with their devices – Devices are becoming more powerful for running AI tasks – Experiment with the AI APIs and make your web apps smarter! Generative-AI-Power im Web Progressive Web Apps smarter machen Summary