Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The AI Revolution in the Browser? Making Single...

The AI Revolution in the Browser? Making Single-Page Apps Smarter

More and more developers intend to integrate Generative AI features into their applications. Until now, this path has practically always led to the cloud—but it doesn't have to be like that! Currently, there are various promising approaches to running AI models directly on the user's computer: Hugging Face, for example, offers the possibility of using machine learning models directly in the browser with Transformers.js. The W3C's Web Neural Network API (WebNN), which is still in the specification phase, will grant such models access to the device's Neural Processing Unit (NPU). This will allow Large Language Models (LLM) or stable diffusion models to be run efficiently in the browser. The advantages of these approaches are obvious: Locally executed AI models are also available offline, the user data does not leave the device, and all this is even free of charge thanks to open-source models. But of course, the model must first be transferred to the user's device, which must also be sufficiently powerful. In this talk, Christian Liebel, Thinktecture's representative at W3C, will present the approaches to make your single-page app smarter. We will discuss use cases and show the advantages and disadvantages of each solution.

Christian Liebel

November 14, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Christian Liebel @christianliebel Consultant
  2. Hello, it’s me. Intelligent Forms Spice up your Angular Forms

    with AI & LLMs Christian Liebel X: @christianliebel Bluesky: @christianliebel.com Email: christian.liebel @thinktecture.com Angular, PWA & Generative AI Slides: thinktecture.com /christian-liebel
  3. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Generative AI everywhere Source: https://www.apple.com/chde/apple-intelligence/
  4. Overview The AI Revolution in the Browser? Making Single-Page Apps

    Smarter Generative AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …
  5. Overview The AI Revolution in the Browser? Making Single-Page Apps

    Smarter Generative AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …
  6. Drawbacks The AI Revolution in the Browser? Making Single-Page Apps

    Smarter Generative AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription
  7. Can we run GenAI models locally? The AI Revolution in

    the Browser? Making Single-Page Apps Smarter
  8. Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

    llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  9. Impact on Software Architecture The AI Revolution in the Browser?

    Making Single-Page Apps Smarter Large Language Models Prompts serve as the universal interface for users and developers Paradigm shift Natural language becomes a first-class citizen Caveats Non-determinism, hallucinations, prompt injection
  10. Storing model files locally The AI Revolution in the Browser?

    Making Single-Page Apps Smarter Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.
  11. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebAssembly (Wasm) – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model- specific Wasm library to accelerate model computations
  12. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebGPU – Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113
  13. – Grants web apps access to the device’s CPU, GPU

    and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU The AI Revolution in the Browser? Making Single-Page Apps Smarter WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO
  14. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebNN: Near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  15. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano
  16. Part of Chrome’s Built-In AI initiative – Exploratory API for

    local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model is shared across origins – Uses native APIs directly – Related APIs: Translation API, Writing Assistance APIs The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt API https://developer.chrome.com/docs/ai/built-in
  17. Comparison 45 33 1200 0 200 400 600 800 1000

    1200 1400 WebLLM (Llama3-8b, M4) Azure OpenAI (gpt-4o-mini) Groq (Llama3-8b) Tokens/sec The AI Revolution in the Browser? Making Single-Page Apps Smarter Performance WebLLM/Groq: Own tests (14.11.2024), OpenAI/Azure OpenAI: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput (18.07.2024)
  18. – Open-source text-to-image model – Generates 512x512px images from a

    prompt – WebSD: special version of Stable Diffusion for the web (2 GB in size) – No npm package this time The AI Revolution in the Browser? Making Single-Page Apps Smarter Stable Diffusion Prompt: A guinea pig eating a watermelon
  19. Pros & Cons + Data does not leave the browser

    (privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models
  20. Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI –

    JavaScript library to run Hugging Face transformers in the browser – Supports most of the models https://xenova.github.io/transformers.js/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Alternatives
  21. – Cloud-based models remain the most powerful models – Due

    to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Large language models are becoming more compact and efficient – Vendors start shipping AI models with their devices – Devices are becoming more powerful for running AI tasks – Experiment with the AI APIs and make your Angular App smarter! The AI Revolution in the Browser? Making Single-Page Apps Smarter Summary