Slide 1

Slide 1 text

Smartere SPAs bauen mit offlinefähigen KI-Funktionen Christian Liebel @christianliebel Consultant

Slide 2

Slide 2 text

Hello, it’s me. Smartere SPAs bauen mit offlinefähigen KI-Funktionen Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

What to expect Focus on web app development Focus on Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware What not to expect Deep dive into AI specifics Stable libraries or specifications Smartere SPAs bauen mit offlinefähigen KI-Funktionen Expectations

Slide 4

Slide 4 text

Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI everywhere

Slide 5

Slide 5 text

Run locally on the user’s system Smartere SPAs bauen mit offlinefähigen KI-Funktionen Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 6

Slide 6 text

Make SPAs offline-capable Smartere SPAs bauen mit offlinefähigen KI-Funktionen Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 7

Slide 7 text

Speech OpenAI Whisper tortoise-tts … Overview Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 8

Slide 8 text

Speech OpenAI Whisper tortoise-tts … Overview Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 9

Slide 9 text

Examples Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Cloud Providers

Slide 10

Slide 10 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Smartere SPAs bauen mit offlinefähigen KI-Funktionen Generative AI Cloud Providers

Slide 11

Slide 11 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – LaMDA (Google Bard) – LLaMa (Meta AI) Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 12

Slide 12 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters Internal variables learned during training, used to make predictions. Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 13

Slide 13 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Human language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 14

Slide 14 text

https://webllm.mlc.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 15

Slide 15 text

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 16

Slide 16 text

Storing model files locally Smartere SPAs bauen mit offlinefähigen KI-Funktionen Cache API Internet Website HTML/JS Cache with model files Hugging Face

Slide 17

Slide 17 text

Parameter cache Smartere SPAs bauen mit offlinefähigen KI-Funktionen Cache API

Slide 18

Slide 18 text

Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Slide 19

Slide 19 text

Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 20

Slide 20 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance when compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Outlook: WebNN

Slide 21

Slide 21 text

On NPM Smartere SPAs bauen mit offlinefähigen KI-Funktionen WebLLM

Slide 22

Slide 22 text

Live Demo Add a “copilot” to a todo application using the @mlc-ai/web-llm package. For the sake of simplicity, all TODOs are added to the prompt. Remember: LLMs have a context window. If you need to chat with a larger set of text (including documents), please refer to Retrieval Augmented Generation (RAG). Smartere SPAs bauen mit offlinefähigen KI-Funktionen Large Language Models

Slide 23

Slide 23 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source Smartere SPAs bauen mit offlinefähigen KI-Funktionen Stable Diffusion

Slide 24

Slide 24 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

Slide 25

Slide 25 text

https://websd.mlc.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

Slide 26

Slide 26 text

Live Demo Retrofitting AI image generation into an existing drawing application (https://paint.js.org) Smartere SPAs bauen mit offlinefähigen KI-Funktionen Web Stable Diffusion

Slide 27

Slide 27 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Low cost Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Slide 28

Slide 28 text

Disadvantages – High system requirements (RAM, GPU) – High bandwidth requirements (large model size) – WebGPU is only supported by Chromium-based browsers – WebNN is not available yet – Loading the model takes time – Models cannot be shared across origins – Potent models such as GPT are closed-source Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Slide 29

Slide 29 text

Mitigations Download model in the background if the user is not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Slide 30

Slide 30 text

Alternatives: Ollama Local runner for AI models Offers a local server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows coming soon) https://webml-demo.vercel.app/ https://ollama.ai/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Local AI Models

Slide 31

Slide 31 text

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Smartere SPAs bauen mit offlinefähigen KI-Funktionen Alternatives

Slide 32

Slide 32 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ Smartere SPAs bauen mit offlinefähigen KI-Funktionen Alternatives

Slide 33

Slide 33 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful Smartere SPAs bauen mit offlinefähigen KI-Funktionen Summary

Slide 34

Slide 34 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]