Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebNN: Die AI-Revolution im Browser?

WebNN: Die AI-Revolution im Browser?

Viele Entwickler:innen möchten derzeit KI-Funktionen in ihre Anwendungen bringen. Bislang führt dieser Weg immer in die Cloud. Muss das so sein? Nein, sagt das W3C mit seiner neuen Web Neural Network API (WebNN). Viele aktuelle Computer verfügen über eine Neural Processing Unit (NPU), die es ermöglicht, KI-Modelle wie Large Language Models (LLM) oder Stable-Diffussion-Modelle effizient auf der eigenen Hardware auszuführen. WebNN wiederum soll Entwicklern einen effizienten Zugriff auf die NPU bieten. Dadurch können Entwickler die Leistungsfähigkeit von AI-Technologien voll ausschöpfen und zukunftsweisende Anwendungen direkt für den Browser entwickeln: offlinefähig, komplett lokal und möglicherweise sogar komplett kostenlos.
In diesem Konferenzbeitrag wird Christian Liebel die Funktionsweise des WebNN API vorstellen, mögliche Anwendungsfälle diskutieren und die Einschränkungen des Ansatzes aufzeigen. Er vertritt Thinktecture beim W3C, das für die Spezifikation der WebNN-API verantwortlich ist.

Christian Liebel

February 14, 2024
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. Hello, it’s me. WebNN Die AI-Revolution im Browser? Christian Liebel

    X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel
  2. What to expect Focus on web app development Focus on

    Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications WebNN Die AI-Revolution im Browser? Expectations
  3. Run locally on the user’s system WebNN Die AI-Revolution im

    Browser? Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS
  4. Make SPAs offline-capable WebNN Die AI-Revolution im Browser? Progressive Web

    Apps Service Worker Internet Website HTML/JS Cache fetch
  5. Speech OpenAI Whisper tortoise-tts … Overview WebNN Die AI-Revolution im

    Browser? Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  6. Speech OpenAI Whisper tortoise-tts … Overview WebNN Die AI-Revolution im

    Browser? Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  7. Drawbacks – Require an active internet connection – Affected by

    network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? WebNN Die AI-Revolution im Browser? Generative AI Cloud Providers
  8. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini (Google) – LLaMa (Meta AI) WebNN Die AI-Revolution im Browser? Large Language Models
  9. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. WebNN Die AI-Revolution im Browser? Large Language Models
  10. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections WebNN Die AI-Revolution im Browser? Large Language Models
  11. Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB

    llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB WebNN Die AI-Revolution im Browser? Large Language Models
  12. Storing model files locally WebNN Die AI-Revolution im Browser? Cache

    API Internet Website HTML/JS Cache with model files Hugging Face
  13. WebNN Die AI-Revolution im Browser? WebAssembly (Wasm) Bytecode for the

    web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations
  14. WebNN Die AI-Revolution im Browser? WebGPU Grants low-level access to

    the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113
  15. Grants web applications access to the Neural Processing Unit (NPU)

    of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ WebNN Die AI-Revolution im Browser? Outlook: WebNN
  16. WebNN Die AI-Revolution im Browser? WebNN: near-native inference performance Source:

    Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
  17. Live Demo Add a “copilot” to a todo application using

    the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). WebNN Die AI-Revolution im Browser? Large Language Models
  18. Text-to-image model Generates 512x512px images from a prompt Runs on

    “commodity” hardware (with 8 GB VRAM) Open-source WebNN Die AI-Revolution im Browser? Stable Diffusion
  19. Specialized version of the Stable Diffusion model for the web

    2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time WebNN Die AI-Revolution im Browser? Web Stable Diffusion
  20. Live Demo Retrofitting AI image generation into an existing drawing

    application (https://paint.js.org) WebNN Die AI-Revolution im Browser? Web Stable Diffusion
  21. Advantages – Data does not leave the browser – High

    availability (offline support) – Low latency – Stability (external API changes) – Low cost WebNN Die AI-Revolution im Browser? Local AI Models
  22. Disadvantages – High system requirements (RAM, GPU) – High bandwidth

    requirements (large model size) – Inference relatively slow – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Higher-quality models such as GPT are closed-source WebNN Die AI-Revolution im Browser? Local AI Models
  23. Mitigations Download model in the background if the user is

    not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space WebNN Die AI-Revolution im Browser? Local AI Models
  24. Mitigations Hybrid modes: – Allow the user to switch between

    cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) WebNN Die AI-Revolution im Browser? Local AI Models
  25. Alternatives: Ollama Local runner for AI models Offers a local

    server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ WebNN Die AI-Revolution im Browser? Local AI Models
  26. Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI

    Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text WebNN Die AI-Revolution im Browser? Alternatives
  27. Transformers.js JavaScript library to run Hugging Face transformers in the

    browser Supports most of the models https://xenova.github.io/transformers.js/ WebNN Die AI-Revolution im Browser? Alternatives
  28. – Cloud-based models (especially OpenAI/GPT) remain the most potent models

    and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful WebNN Die AI-Revolution im Browser? Summary