Smartere SPAs bauen mit offlinefähigen KI-Funktionen

Smartere SPAs bauen mit offlinefähigen KI-Funktionen Christian Liebel @christianliebel Consultant

Hello, it’s me. Smartere SPAs bauen mit offlinefähigen KI-Funktionen Christian
Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

What to expect Focus on web app development Focus on
Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications Smartere SPAs bauen mit offlinefähigen KI-Funktionen Expectations

Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI everywhere

Run locally on the user’s system Smartere SPAs bauen mit
offlinefähigen KI-Funktionen Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable Smartere SPAs bauen mit offlinefähigen KI-Funktionen Progressive
Web Apps Service Worker Internet Website HTML/JS Cache fetch

Speech OpenAI Whisper tortoise-tts … Overview Smartere SPAs bauen mit
offlinefähigen KI-Funktionen Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Examples Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Cloud
Providers

Drawbacks – Require an active internet connection – Affected by
network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Cloud Providers

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

https://webllm.mlc.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB
llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Storing model files locally Smartere SPAs bauen mit offlinefähigen KI-Funktionen
Cache API Internet Website HTML/JS Cache with model files Hugging Face

Parameter cache Smartere SPAs bauen mit offlinefähigen KI-Funktionen Cache API

Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebAssembly (Wasm) Bytecode for
the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebGPU Grants low-level access
to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Grants web applications access to the Neural Processing Unit (NPU)
of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Outlook: WebNN

On NPM Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebLLM

Live Demo Add a “copilot” to a todo application using
the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Text-to-image model Generates 512x512px images from a prompt Runs on
“commodity” hardware (with 8 GB VRAM) Open-source Smartere SPAs bauen mit offlinefähigen KI-Funktionen Stable Diffusion

Specialized version of the Stable Diffusion model for the web
2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

https://websd.mlc.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

Live Demo Retrofitting AI image generation into an existing drawing
application (https://paint.js.org) Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

Advantages – Data does not leave the browser – High
availability (offline support) – Low latency – Low cost Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Disadvantages – High system requirements (RAM, GPU) – High bandwidth
requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Mitigations Download model in the background if the user is
not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Alternatives: Ollama Local runner for AI models Offers a local
server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI
Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Smartere SPAs bauen mit offlinefähigen KI-Funktionen Alternatives

Transformers.js JavaScript library to run Hugging Face transformers in the
browser Supports most of the models https://xenova.github.io/transformers.js/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Alternatives

– Cloud-based models (especially OpenAI/GPT) remain the most potent models
and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful Smartere SPAs bauen mit offlinefähigen KI-Funktionen Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Smartere SPAs bauen mit offlinefähigen KI-Funkt...

Smartere SPAs bauen mit offlinefähigen KI-Funktionen

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript