Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig

by Christian Liebel

Slide 1

Slide 1 text

Christian Liebel @christianliebel Consultant Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig

Slide 2

Slide 2 text

Hello, it’s me. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

Slide 3

Slide 3 text

09:00–10:30 Block 1 10:30–11:00 Coffee Break 11:00–12:30 Block 2 Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Timetable

Slide 4

Slide 4 text

What to expect Focus on web app development Focus on Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware Hands-on labs What not to expect Deep dive into AI specifics, RAG, model finetuning or training Stable libraries or specifications WebSD in Angular 1:1 Support Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Expectations First-time workshop! Huge downloads! High requirements! Things may break!

Slide 5

Slide 5 text

(Workshop Edition) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Demo Use Case DEMO

Slide 6

Slide 6 text

Setup complete? (Node.js, Google Chrome, Editor, Git, macOS/Windows, 20 GB free disk space, 6 GB VRAM) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Setup (1/2) LAB #0

Slide 7

Slide 7 text

git clone https://github.com/thinktecture/angular- days-2024-spring-genai.git cd angular-days-2024-spring-genai npm i npm start -- --open Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Setup (2/2) LAB #0

Slide 8

Slide 8 text

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Generative AI everywhere

Slide 9

Slide 9 text

Run locally on the user’s system Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Slide 10

Slide 10 text

Make SPAs offline-capable Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Slide 11

Slide 11 text

Speech OpenAI Whisper tortoise-tts … Overview Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Generative AI Images Midjourney DALL·E Stable Diffusion … Audio/Music Musico Soundraw … Text OpenAI GPT LLaMa Vicuna …

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Examples Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Generative AI Cloud Providers

Slide 14

Slide 14 text

Drawbacks – Require an active internet connection – Affected by network latency and server availability – Data is transferred to the cloud service – Require a subscription à Can we run models locally? Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Generative AI Cloud Providers

Slide 15

Slide 15 text

Large: Trained on lots of data Language: Process and generate text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Slide 16

Slide 16 text

Token A meaningful unit of text (e.g., a word, a part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Slide 17

Slide 17 text

Prompts serve as the universal interface Unstructured text conveying specific semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Slide 18

Slide 18 text

Size Comparison Model:Parameters Size mistral:7b 4.1 GB vicuna:7b 3.8 GB llama2:7b 3.8 GB llama2:13b 7.4 GB llama2:70b 39.0 GB zephyr:7b 4.1 GB Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Slide 19

Slide 19 text

https://webllm.mlc.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebLLM DEMO

Slide 20

Slide 20 text

On NPM Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebLLM

Slide 21

Slide 21 text

npm i @mlc-ai/web-llm Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig LAB #1

Slide 22

Slide 22 text

Benchmarks Selection of available models for WebLLM: – LLaMa-2 7B Chat – LLaMa-2 13B Chat – Mistral 7B Instruct – Gemma 2B IT https://medium.com/@kagglepro.llc/gemma-vs-llama- vs-mistral-a-comparative-analysis-with-a-coding- twist-8eb4d849e4d5 Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Choosing a model

Slide 23

Slide 23 text

In app.component.ts, add the following lines: protected readonly chatModule = new ChatModule(); protected readonly progress = signal(0); protected readonly ready = signal(false); You may need to add these imports at the very top of the file: import { ChatModule } from '@mlc-ai/web-llm'; import { Component, OnInit, signal } from '@angular/core'; Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

Slide 24

Slide 24 text

In app.component.ts (ngOnInit()), add the following lines: this.chatModule.setInitProgressCallback(({ progress }) => this.progress.set(progress)); await this.chatModule.reload( 'Mistral-7B-Instruct-v0.2-q4f16_1', undefined, { model_list }); this.ready.set(true); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

Slide 25

Slide 25 text

In app.component.html, add the following lines:

Ask Launch the app via npm run start. The progress bar should begin to move. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

Slide 26

Slide 26 text

Storing model files locally Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Cache API Internet Website HTML/JS Cache with model files Hugging Face

Slide 27

Slide 27 text

Parameter cache Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Cache API

Slide 28

Slide 28 text

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebAssembly (Wasm) Bytecode for the web Compile target for arbitrary languages Can be faster than JavaScript WebLLM needs the model and a Wasm library to accelerate model computations

Slide 29

Slide 29 text

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebGPU Grants low-level access to the Graphics Processing Unit (GPU) Near native performance for machine learning applications Supported by Chromium-based browsers on Windows and macOS from version 113

Slide 30

Slide 30 text

Grants web applications access to the Neural Processing Unit (NPU) of the system via platform-specific machine learning services (e.g., ML Compute on macOS/iOS, DirectML on Windows, …) Even better performance compared to WebGPU Currently in specification by the WebML Working Group at W3C Implementation in progress for Chromium-based browsers https://webmachinelearning.github.io/webnn-intro/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Outlook: WebNN

Slide 31

Slide 31 text

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebNN: near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

Slide 32

Slide 32 text

In app.component.ts, add the following lines at the top of the class: protected readonly reply = signal(''); In the runPrompt() method, add the following code: await this.chatModule.resetChat(); this.reply.set('…'); await this.chatModule .generate(userPrompt, (_, reply) => this.reply.set(reply)); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Model inference LAB #3

Slide 33

Slide 33 text

In app.component.html, add the following line:

{{ reply() }}

You should now be able to send prompts to the model and see the responses in the template. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Model inference LAB #3

Slide 34

Slide 34 text

npm run build Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig LAB #4

Slide 35

Slide 35 text

1. Add the following lines to package.json: "browser": { "perf_hooks": false, "url": false } 2. In angular.json, increase the bundle size for projects. genai-demo.architect. build.configurations. production.budgets[0]. maximumError to at least 5mb. 3. Then, run npm run build again. This time, the build should succeed. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Build issues LAB #4

Slide 36

Slide 36 text

In app.component.ts, add the following signal at the top: protected readonly todos = signal([]); Add the following line to the addTodo() method: this.todos.update(todos => [...todos, { done: false, text }]); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management LAB #5

Slide 37

Slide 37 text

In app.component.html, add the following lines to add todos from the UI: Add

{{ todo.text }}

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management LAB #5

Slide 38

Slide 38 text

In app.component.ts, add the following lines to toggleTodo(): this.todos.update(todos => todos.map((todo, todoIndex) => todoIndex === index ? { ...todo, done: !todo.done } : todo)); In app.component.html, add the following content to the

node: You should now be able to toggle the checkboxes. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management (extended) LAB #6

Slide 39

Slide 39 text

Concept and limitations The todo data has to be converted into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

Slide 40

Slide 40 text

System prompt Metaprompt that defines… – character – capabilities/limitations – output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

Slide 41

Slide 41 text

Flow System message • The user has these todos: 1. … 2. … 3. … User message • How many todos do I have? Assistant message • You have three todos. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

Slide 42

Slide 42 text

Using a system & user prompt Remove the last line in runPrompt() and replace it with the following: const systemPrompt = `Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (${todo.done ? 'done' : 'not done'})`).join('\n')}`; await this.chatModule.generate([ { role: 'system', content: systemPrompt }, { role: 'user', content: userPrompt }, ], (_, reply) => this.reply.set(reply)); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data LAB #7

Slide 43

Slide 43 text

Techniques – Providing examples (single shot, few shot, …) – Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering

Slide 44

Slide 44 text

const systemPrompt = `You are a helpful assistant. The user will ask questions about their todo list. Briefly answer the questions. Don't try to make up an answer if you don't know it. Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (this todo is ${todo.done ? 'done' : 'not done'})`).join('\n')} ${this.todos().length === 0 ? 'The list is empty, there are no todos.' : ''}`; Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering LAB #8

Slide 45

Slide 45 text

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering Effort

Slide 46

Slide 46 text

Text-to-image model Generates 512x512px images from a prompt Runs on “commodity” hardware (with 8 GB VRAM) Open-source Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Stable Diffusion Prompt: A guinea pig eating a watermelon

Slide 47

Slide 47 text

Specialized version of the Stable Diffusion model for the web 2 GB in size Subject to usage conditions: https://huggingface.co/runwayml/stable- diffusion-v1-5#uses No npm package this time Currently incompatible with Angular & esbuild due to Wasm imports Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Web Stable Diffusion

Slide 48

Slide 48 text

https://websd.mlc.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Web Stable Diffusion DEMO

Slide 49

Slide 49 text

Live Demo Retrofitting AI image generation into an existing drawing application (https://paint.js.org) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Web Stable Diffusion DEMO

Slide 50

Slide 50 text

Advantages – Data does not leave the browser – High availability (offline support) – Low latency – Stability (external API changes) – Low cost Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Slide 51

Slide 51 text

Disadvantages – Lower quality than closed-source models – High system requirements (RAM, GPU) – Large model size, high initial bandwidth requirements, models cannot be shared across origins – Model initialization and inference are relatively slow – WebGPU is currently only supported by Chromium-based browsers on macOS and Windows, WebNN is not available yet Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Slide 52

Slide 52 text

Mitigations Download model in the background if the user is not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Slide 53

Slide 53 text

Mitigations Hybrid modes: – Allow the user to switch between cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Slide 54

Slide 54 text

Alternatives: Ollama Local runner for AI models Offers a local server a website can connect to à allows sharing models across origins Supported on macOS and Linux (Windows in Preview) https://webml-demo.vercel.app/ https://ollama.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Slide 55

Slide 55 text

Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Alternatives

Slide 56

Slide 56 text

Transformers.js JavaScript library to run Hugging Face transformers in the browser Supports most of the models https://xenova.github.io/transformers.js/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Alternatives

Slide 57

Slide 57 text

– Cloud-based models (especially OpenAI/GPT) remain the most potent models and are easier to integrate (for now) – Due to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Open-source generative AI models rapidly advance and are becoming more compact and efficient – Computers are getting more powerful Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Summary

Slide 58

Slide 58 text

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]