Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The AI Revolution in the Browser? Making Single...

The AI Revolution in the Browser? Making Single-Page Apps Smarter

More and more developers intend to integrate Generative AI features into their applications. Until now, this path has practically always led to the cloud—but it doesn't have to be like that! Currently, there are various promising approaches to running AI models directly on the user's computer: Hugging Face, for example, offers the possibility of using machine learning models directly in the browser with Transformers.js. The W3C's Web Neural Network API (WebNN), which is still in the specification phase, will grant such models access to the device's Neural Processing Unit (NPU). This will allow Large Language Models (LLM) or stable diffusion models to be run efficiently in the browser. The advantages of these approaches are obvious: Locally executed AI models are also available offline, the user data does not leave the device, and all this is even free of charge thanks to open-source models. But of course, the model must first be transferred to the user's device, which must also be sufficiently powerful. In this talk, Christian Liebel, Thinktecture's representative at W3C, will present the approaches to make your single-page app smarter. We will discuss use cases and show the advantages and disadvantages of each solution.

Christian Liebel

April 11, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    Christian Liebel @christianliebel Consultant
  2. Hello, it’s me. The AI Revolution in the Browser? Making

    Single-Page Apps Smarter Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel
  3. Run locally on the user’s system The AI Revolution in

    the Browser? Making Single-Page Apps Smarter Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS
  4. Make SPAs offline-capable The AI Revolution in the Browser? Making

    Single-Page Apps Smarter Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch
  5. Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in

    the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  6. Speech OpenAI Whisper tortoise-tts … Overview The AI Revolution in

    the Browser? Making Single-Page Apps Smarter Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  7. Drawbacks – Require an active internet connection – Affected by

    network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? The AI Revolution in the Browser? Making Single-Page Apps Smarter Generative AI Cloud Providers
  8. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  9. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  10. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  11. Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB

    llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB The AI Revolution in the Browser? Making Single-Page Apps Smarter Large Language Models
  12. Benchmarks Selection of available models for WebLLM: – LLaMa-2 7B

    Chat – LLaMa-2 13B Chat – Mistral 7B Instruct – Gemma 2B IT https://medium.com/@kagglepro.llc/gemma-vs-llama- vs-mistral-a-comparative-analysis-with-a-coding- twist-8eb4d849e4d5 The AI Revolution in the Browser? Making Single-Page Apps Smarter Choosing a model
  13. Storing model files locally The AI Revolution in the Browser?

    Making Single-Page Apps Smarter Cache API Internet Website HTML/JS Cache with model files Hugging Face
  14. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations
  15. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113
  16. Grants web applications access to the Neural Processing Unit (NPU)

    of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Outlook: WebNN
  17. The AI Revolution in the Browser? Making Single-Page Apps Smarter

    WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  18. Concept and limitations The todo data has to be converted

    into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data
  19. System prompt Metaprompt that defines… – character – capabilities/limitations –

    output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data
  20. Flow System message • The user has these todos: 1.

    … 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. The AI Revolution in the Browser? Making Single-Page Apps Smarter Chat with data
  21. Techniques – Providing examples (single shot, few shot, …) –

    Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. The AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering
  22. Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model The

    AI Revolution in the Browser? Making Single-Page Apps Smarter Prompt Engineering Effort
  23. Comparison 22,98 33,96 19,08 38,75 564,63 0 100 200 300

    400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec The AI Revolution in the Browser? Making Single-Page Apps Smarter Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)
  24. Text-to-image model Generates 512x512px images from a prompt Runs on

    “commodity” hardware (with 8 GB VRAM) Open-source The AI Revolution in the Browser? Making Single-Page Apps Smarter Stable Diffusion Prompt: A guinea pig eating a watermelon
  25. Specialized version of the Stable Diffusion model for the web

    2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion
  26. Live Demo Retrofitting AI image generation into an existing drawing

    application (https://paint.js.org) The AI Revolution in the Browser? Making Single-Page Apps Smarter Web Stable Diffusion DEMO
  27. Advantages – Data does not leave the browser – High

    availability (offline support) – Low latency – Stability (external API changes) – Low cost The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models
  28. Disadvantages – Lower quality than closed-source models – High system

    requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU is currently only supported by Chromium-based browsers on macOS and Windows, WebNN is not available yet The AI Revolution in the Browser? Making Single-Page Apps Smarter Local AI Models
  29. Transformers.js JavaScript library to run Hugging Face transformers in the

    browser Supports most of the models https://xenova.github.io/transformers.js/ The AI Revolution in the Browser? Making Single-Page Apps Smarter Alternatives
  30. – Cloud-based models (especially OpenAI/GPT) remain the most potent models

    and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful The AI Revolution in the Browser? Making Single-Page Apps Smarter Summary