Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The AI Revolution in the Browser? Making Single...

The AI Revolution in the Browser? Making Single-Page Apps Smarter

More and more developers intend to integrate Generative AI features into their applications. Until now, this path has practically always led to the cloud—but it doesn't have to be like that! Currently, there are various promising approaches to running AI models directly on the user's computer: Hugging Face, for example, offers the possibility of using machine learning models directly in the browser with Transformers.js. The W3C's Web Neural Network API (WebNN), which is still in the specification phase, will grant such models access to the device's Neural Processing Unit (NPU). This will allow Large Language Models (LLM) or stable diffusion models to be run efficiently in the browser. The advantages of these approaches are obvious: Locally executed AI models are also available offline, the user data does not leave the device, and all this is even free of charge thanks to open-source models. But of course, the model must first be transferred to the user's device, which must also be sufficiently powerful. In this talk, Christian Liebel, Thinktecture's representative at W3C, will present the approaches to make your single-page app smarter. We will discuss use cases and show the advantages and disadvantages of each solution.

Christian Liebel

October 08, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Christian Liebel @christianliebel Consultant
  2. Hello, it’s me. The AI Revolution in the Browser? Making

    Single-Page Apps Smarter Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel
  3. Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in

    the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  4. Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in

    the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  5. Drawbacks – Require an active internet connection – Affected by

    network latency and server availability – Data is transferred to the cloud service – Require a subscription → Can we run models locally? The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Cloud Providers
  6. Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB

    llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  7. Storing model files locally The AI Revolution in the Browser?

    Making Single-Page Apps Smarter Cache API Internet Website HTML/JS Cache with model files Hugging Face
  8. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations
  9. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113
  10. Grants web applications access to the Neural Processing Unit (NPU)

    of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Outlook: WebNN
  11. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  12. Caveats – Due to the Same-Origin Policy, models can’t be

    shared across origins (i.e., https://example.org cannot access https://test.example.org). – Downloading LLMs multiple times leads to very high storage consumption. The AI Revolution in the Browser? Making Single-Page Apps Smarter WebLLM
  13. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Prompt API Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano
  14. Part of Chrome’s Built-In AI initiative – Exploratory API for

    local experiments and use case determination – Downloads Gemini Nano into Google Chrome – Model can be shared across origins – Uses native APIs directly – Fine-tuning API might follow in the future The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt API https://developer.chrome.com/docs/ai/built-in
  15. Demo: Smart Form Filler The AI Revolution in the Browser?

    Making Single-Page Apps Smarter Prompt API DEMO
  16. Additional APIs – Prompt API – Assistant – Translator API

    – Translator – Language Detector – Writing Assistance APIs – Summarizer – Writer – Rewriter The AI Revolution in the Browser? Making Single-Page Apps Smarter Built-in AI
  17. Comparison 22.98 33.96 19.08 38.75 564.63 0 100 200 300

    400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec The AI Revolution in the Browser? Making Single-Page Apps Smarter Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)
  18. Text-to-image model Generates 512x512px images from a prompt Runs on

    “commodity” hardware (with 8 GB VRAM) Open-source The AI Revolution in the Browser? Making Single-Page Apps Smarter Stable Diffusion Prompt: A guinea pig eating a watermelon
  19. Specialized version of the Stable Diffusion model for the web

    2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable-diffusion-v1- 5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion
  20. Advantages – Data does not leave the browser – High

    availability (offline support) – Low latency – Stability (external API changes) – Low cost The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models
  21. Disadvantages – Lower quality than closed-source models – High system

    requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU and WebNN are currently only supported by Chromium-based browsers on macOS and Windows (WebNN only behind a flag) – Prompt API is only an exploratory API The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models
  22. – Cloud-based models (especially OpenAI/GPT) remain the most potent models

    and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source GenAI models are becoming more compact and efficient – Vendors are beginning to ship AI models with their devices – Devices are becoming more powerful for AI tasks The AI Revolution in the Browser? Making Single-Page Apps Smarter Summary