Smarter Web Apps: Offline AI Capabilities in Your SPA

Smarter Web Apps Offline AI Capabilities in Your SPA Christian
Liebel @christianliebel Consultant

Hello, it’s me. Smarter Web Apps Offline AI Capabilities in
Your SPA Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

What to expect Focus on web app development Focus on
Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications Smarter Web Apps Offline AI Capabilities in Your SPA Expectations

Smarter Web Apps Offline AI Capabilities in Your SPA Generative
AI everywhere

Run locally on the user’s system Smarter Web Apps Offline
AI Capabilities in Your SPA Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable Smarter Web Apps Offline AI Capabilities in
Your SPA Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Speech OpenAI Whisper tortoise-tts … Overview Smarter Web Apps Offline
AI Capabilities in Your SPA Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Examples Smarter Web Apps Offline AI Capabilities in Your SPA
Generative AI Cloud Providers

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Smarter Web Apps Offline AI Capabilities in Your SPA Generative AI Cloud Providers

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models

https://webllm.mlc.ai/ Smarter Web Apps Offline AI Capabilities in Your SPA
Large Language Models

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB
llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models

Storing model files locally Smarter Web Apps Offline AI Capabilities
in Your SPA Cache API Internet Website HTML/JS Cache with model files Hugging Face

Parameter cache Smarter Web Apps Offline AI Capabilities in Your
SPA Cache API

Smarter Web Apps Offline AI Capabilities in Your SPA WebAssembly
(Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Smarter Web Apps Offline AI Capabilities in Your SPA WebGPU
Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Smarter Web Apps Offline AI Capabilities in Your SPA Outlook: WebNN

On NPM Smarter Web Apps Offline AI Capabilities in Your
SPA WebLLM

Live Demo Add a “copilot” to a todo application using
the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). Smarter Web Apps Offline AI Capabilities in Your SPA Large Language Models

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source Smarter Web Apps Offline AI Capabilities in Your SPA Stable Diffusion

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Smarter Web Apps Offline AI Capabilities in Your SPA Web Stable Diffusion

https://websd.mlc.ai/ Smarter Web Apps Offline AI Capabilities in Your SPA
Web Stable Diffusion

Live Demo Retrofitting AI image generation into an existing drawing
application (https://paint.js.org) Smarter Web Apps Offline AI Capabilities in Your SPA Web Stable Diffusion

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Low cost Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models

Disadvantages – High system requirements (RAM, GPU) – High bandwidth
requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models

Mitigations Download model in the background if the user is
not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models

Alternatives: Ollama Local runner for AI models Offers a local
server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ Smarter Web Apps Offline AI Capabilities in Your SPA Local AI Models

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI
Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Smarter Web Apps Offline AI Capabilities in Your SPA Alternatives

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ Smarter Web Apps Offline AI Capabilities in Your SPA Alternatives

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful Smarter Web Apps Offline AI Capabilities in Your SPA Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Smarter Web Apps: Offline AI Capabilities in Yo...

Smarter Web Apps: Offline AI Capabilities in Your SPA

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript