Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smarter Web Apps: Offline AI Capabilities in Yo...

Smarter Web Apps: Offline AI Capabilities in Your SPA

Artificial intelligence (AI) has become a significant computing and software development trend. In this session, participants will learn how to integrate offline AI capabilities into their Single-Page Apps (SPA) using large language and stable diffusion models.

This is made possible by WebGPU, a bleeding-edge web technology that not only unlocks high-performance graphics rendering for the web but also allows executing machine learning models directly within the browser, even when the user is offline. This results in improved performance and reduced resource consumption and costs. However, as with any emerging technology, there are potential challenges, such as higher bandwidth consumption, limited browser support, and reduced accuracy.

Join Christian Liebel, a consultant at Thinktecture, in this session to understand which types of apps are best suited for this approach and where potential limitations may arise. By thoroughly understanding the benefits and drawbacks of integrating AI into SPAs, you will be able to decide if local AI models fit your needs.

Christian Liebel

October 26, 2023
Tweet

More Decks by Christian Liebel

Other Decks in Programming

Transcript

  1. Hello, it’s me. Smarter Web Apps Offline AI Capabilities in

    Your SPA Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel
  2. What to expect Focus on web app development Focus on

    Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications Smarter Web Apps Offline AI Capabilities in Your SPA Expectations
  3. Run locally on the user’s system Smarter Web Apps Offline

    AI Capabilities in Your SPA Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS
  4. Make SPAs offline-capable Smarter Web Apps Offline AI Capabilities in

    Your SPA Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch
  5. Speech OpenAI Whisper tortoise-tts … Overview Smarter Web Apps Offline

    AI Capabilities in Your SPA Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  6. Speech OpenAI Whisper tortoise-tts … Overview Smarter Web Apps Offline

    AI Capabilities in Your SPA Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …
  7. Drawbacks – Require an active internet connection – Affected by

    network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Smarter Web Apps Offline AI Capabilities in Your SPA Generative AI Cloud Providers
  8. Large: Trained on lots of data Language: Process and generate

    text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models
  9. Token A meaningful unit of text (e.g., a word, a

    part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models
  10. Prompts serve as the universal interface Unstructured text conveying specific

    semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models
  11. Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB

    llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models
  12. Storing model files locally Smarter Web Apps Offline AI Capabilities

    in Your SPA Cache API Internet Website HTML/JS Cache with model files Hugging Face
  13. Smarter Web Apps Offline AI Capabilities in Your SPA WebAssembly

    (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations
  14. Smarter Web Apps Offline AI Capabilities in Your SPA WebGPU

    Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113
  15. Grants web applications access to the Neural Processing Unit (NPU)

    of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Smarter Web Apps Offline AI Capabilities in Your SPA Outlook: WebNN
  16. Live Demo Add a “copilot” to a todo application using

    the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models
  17. Text-to-image model Generates 512x512px images from a prompt Runs on

    “commodity” hardware (with 8 GB VRAM) Open-source Smarter Web Apps Offline AI Capabilities in Your SPA Stable Diffusion
  18. Specialized version of the Stable Diffusion model for the web

    2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Smarter Web Apps Offline AI Capabilities in Your SPA Web Stable Diffusion
  19. Live Demo Retrofitting AI image generation into an existing drawing

    application (https://paint.js.org) Smarter Web Apps Offline AI Capabilities in Your SPA Web Stable Diffusion
  20. Advantages – Data does not leave the browser – High

    availability (offline support) – Low latency – Low cost Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models
  21. Disadvantages – High system requirements (RAM, GPU) – High bandwidth

    requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models
  22. Mitigations Download model in the background if the user is

    not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models
  23. Alternatives: Ollama Local runner for AI models Offers a local

    server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models
  24. Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI

    Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Smarter Web Apps Offline AI Capabilities in Your SPA Alternatives
  25. Transformers.js JavaScript library to run Hugging Face transformers in the

    browser Supports most of the models https://xenova.github.io/transformers.js/ Smarter Web Apps Offline AI Capabilities in Your SPA Alternatives
  26. – Cloud-based models (especially OpenAI/GPT) remain the most potent models

    and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful Smarter Web Apps Offline AI Capabilities in Your SPA Summary