Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig

Christian Liebel @christianliebel Consultant Angular-Apps smarter machen mit Generative AI:
lokal und offlinefähig

Hello, it’s me. Angular-Apps smarter machen mit Generative AI: lokal
und offlinefähig Christian Liebel X: @christianliebel Email: christian.liebel @thinktecture.com Angular & PWA Slides: thinktecture.com /christian-liebel

09:00–10:30 Block 1 10:30–11:00 Coffee Break 11:00–12:30 Block 2 Angular-Apps
smarter machen mit Generative AI: lokal und offlinefähig Timetable

What to expect Focus on web app development Focus on
Generative AI Up-to-date insights: the ML/AI field is evolving fast Live demos on real hardware Hands-on labs What not to expect Deep dive into AI specifics, RAG, model finetuning or training Stable libraries or specifications WebSD in Angular 1:1 Support Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Expectations

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig DEMO

(Workshop Edition) Angular-Apps smarter machen mit Generative AI: lokal und
offlinefähig Demo Use Case DEMO

Setup complete? (Node.js, Google Chrome, Editor, Git, macOS/Windows, 20 GB
free disk space, 6 GB VRAM) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Setup (1/2) LAB #0

git clone https://github.com/thinktecture/angular- days-2024-fall-genai.git cd angular-days-2024-fall-genai npm i npm start
-- --open Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Setup (2/2) LAB #0

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Generative
AI everywhere Source: https://www.apple.com/chde/apple-intelligence/

Run locally on the user’s system Angular-Apps smarter machen mit
Generative AI: lokal und offlinefähig Single-Page Applications Server- Logik Web API Push Service Web API DBs HTML, JS, CSS, Assets Webserver Webbrowser SPA Client- Logik View HTML/CSS View HTML/CSS View HTML/CSS HTTPS WebSockets HTTPS HTTPS

Make SPAs offline-capable Angular-Apps smarter machen mit Generative AI: lokal
und offlinefähig Progressive Web Apps Service Worker Internet Website HTML/JS Cache fetch

Overview Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig
Generative AI Text OpenAI GPT Mistral … Speech OpenAI Whisper tortoise-tts … Images DALL·E Stable Diffusion … Audio/Music Musico Soundraw …

Examples Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig
Generative AI Cloud Providers

Drawbacks Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig
Generative AI Cloud Providers Require a (stable) internet connection Subject to network latency and server availability Data is transferred to the cloud service Require a subscription

Can we run GenAI models locally? Angular-Apps smarter machen mit
Generative AI: lokal und offlinefähig

Large: Trained on lots of data Language: Process and generate
text Models: Programs/neural networks Examples: – GPT (ChatGPT, Bing Chat, …) – Gemini, Gemma (Google) – LLaMa (Meta AI) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Token A meaningful unit of text (e.g., a word, a
part of a word, a character). Context Window The maximum amount of tokens the model can process. Parameters/weights Internal variables learned during training, used to make predictions. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Prompts serve as the universal interface Unstructured text conveying specific
semantics Paradigm shift in software architecture Natural language becomes a first-class citizen Caveats Non-determinism and hallucination, prompt injections Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

Size Comparison Model:Parameters Size phi3:3b 2.2 GB mistral:7b 4.1 GB
llama3:8b 4.7 GB gemma2:9b 5.4 GB gemma2:27b 16 GB llama3:70b 40 GB Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Large Language Models

https://webllm.mlc.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig
WebLLM DEMO

On NPM Angular-Apps smarter machen mit Generative AI: lokal und
offlinefähig WebLLM

npm i @mlc-ai/web-llm Angular-Apps smarter machen mit Generative AI: lokal
und offlinefähig LAB #1

(1/3) In app.component.ts, add the following lines: protected readonly progress
= signal(0); protected readonly ready = signal(false); protected engine?: MLCEngine; Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

(2/3) In app.component.ts (ngOnInit()), add the following lines: const model
= 'Llama-3.2-3B-Instruct-q4f32_1-MLC'; this.engine = await CreateMLCEngine(model, { initProgressCallback: ({ progress }) => this.progress.set(progress) }); this.ready.set(true); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

(3/3) In app.component.html, add the following lines: <div><progress [value]="progress()"></progress></div> <input
type="text" #prompt> <button (click)="runPrompt(prompt.value)" [disabled]="!ready()"> Ask </button> Launch the app via npm start. The progress bar should begin to move. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Downloading a model LAB #2

Storing model files locally Angular-Apps smarter machen mit Generative AI:
lokal und offlinefähig Cache API Internet Website HTML/JS Cache with model files Hugging Face Note: Due to the Same-Origin Policy, models cannot be shared across origins.

Parameter cache Angular-Apps smarter machen mit Generative AI: lokal und
offlinefähig Cache API

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebAssembly
(Wasm) – Bytecode for the web – Compile target for arbitrary languages – Can be faster than JavaScript – WebLLM uses a model-specific Wasm library to accelerate model computations

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebGPU
– Grants low-level access to the Graphics Processing Unit (GPU) – Near native performance for machine learning applications – Supported by Chromium-based browsers on Windows and macOS from version 113

– Grants web apps access to the device’s CPU, GPU
and Neural Processing Unit (NPU) – In specification by the WebML Working Group at W3C – Implementation in progress in Chromium (behind a flag) – Even better performance compared to WebGPU Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebNN Source: https://webmachinelearning.github.io/webnn-intro/ DEMO

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig WebNN:
near-native inference performance Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)

(1/3) In app.component.ts, add the following lines at the top
of the class: protected readonly reply = signal(''); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Model inference LAB #3

(2/3) In the runPrompt() method, add the following code: await
this.engine!.resetChat(); this.reply.set('…'); const messages: ChatCompletionMessageParam[] = [ { role: "user", content: userPrompt } ]; const reply = await this.engine!.chat.completions.create({ messages }); this.reply.set(reply.choices[0].message.content ?? ''); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Model inference LAB #3

(3/3) In app.component.html, add the following line: <pre>{{ reply() }}</pre>
You should now be able to send prompts to the model and see the responses in the template. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Model inference LAB #3

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig

npm run build Angular-Apps smarter machen mit Generative AI: lokal
und offlinefähig LAB #4

1. In angular.json, increase the bundle size for the Angular
project (property architect.build.configurations.production.budgets[0] .maximumError) to at least 5MB. 2. Then, run npm run build again. This time, the build should succeed. 3. If you stopped the development server, don’t forget to bring it back up again (npm start). Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Build issues LAB #4

(1/2) In app.component.ts, add the following signal at the top:
protected readonly todos = signal<Todo[]>([]); Add the following line to the addTodo() method: this.todos.update(todos => [...todos, { done: false, text }]); Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management LAB #5

(2/2) In app.component.html, add the following lines to add todos
from the UI: <input type="text" #input> <button (click)="addTodo(input.value)">Add</button> <ul> @for(todo of todos(); track $index) { <li>{{ todo.text }}</li> } </ul> Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management LAB #5

In app.component.ts, add the following lines to toggleTodo(): this.todos.update(todos =>
todos.map((todo, todoIndex) => todoIndex === index ? { ...todo, done: !todo.done } : todo)); In app.component.html, add the following content to the <li> node: <input type="checkbox" [checked]="todo.done" (change)="toggleTodo($index)"> You should now be able to toggle the checkboxes. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Todo management (extended) LAB #6

Concept and limitations The todo data has to be converted
into natural language. For the sake of simplicity, we will add all TODOs to the prompt. Remember: LLMs have a context window (Mistral-7B: 8K). If you need to chat with larger sets of text, refer to Retrieval Augmented Generation (RAG). These are the todos: * Wash clothes * Pet the dog * Take out the trash Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

System prompt Metaprompt that defines… – character – capabilities/limitations –
output format – behavior – grounding data Hallucinations and prompt injections cannot be eliminated. You are a helpful assistant. Answer user questions on todos. Generate a valid JSON object. Avoid negative content. These are the user’s todos: … Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

Flow System message •The user has these todos: 1. …
2. … 3. … User message •How many todos do I have? Assistant message •You have three todos. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data

Using a system & user prompt Adjust the implementation in
runPrompt() to include the system prompt: const systemPrompt = `Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (${todo.done ? 'done' : 'not done'})`).join('\n')}`; const messages: ChatCompletionMessageParam[] = [ { role: "system", content: systemPrompt }, { role: "user", content: userPrompt } ]; Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Chat with data LAB #7

Techniques – Providing examples (single shot, few shot, …) –
Priming outputs – Specify output structure – Repeating instructions – Chain of thought – … Success also depends on the model. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering

const systemPrompt = `You are a helpful assistant. The user
will ask questions about their todo list. Briefly answer the questions. Don't try to make up an answer if you don't know it. Here's the user's todo list: ${this.todos().map(todo => `* ${todo.text} (this todo is ${todo.done ? 'done' : 'not done'})`).join('\n')} ${this.todos().length === 0 ? 'The list is empty, there are no todos.' : ''}`; Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering LAB #8

Alternatives Prompt Engineering Retrieval Augmented Generation Fine-tuning Custom model Angular-Apps
smarter machen mit Generative AI: lokal und offlinefähig Prompt Engineering Effort

Add the following line to the runPrompt() method: console.log(reply.usage); Ask
a new question and check your console for performance statistics. Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Performance LAB #9

Comparison 22.98 33.96 19.08 38.75 564.63 0 100 200 300
400 500 600 WebLLM (Mistral-7b, M1) WebLLM (Mistral-7b, M3) OpenAI (GPT-4) Azure OpenAI (GPT-4) Groq (Mixtral-8x7b) Tokens/sec Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Performance WebLLM/Groq: Own tests (23.03.2024), OpenAI/Azure OpenAI: https://mcplusa.com/comparing-performance-of-openai-gpt-4-and-microsoft-azure-gpt-4/ (31.08.2023)

– Open-source text-to-image model – Generates 512x512px images from a
prompt – WebSD: special version of Stable Diffusion for the web (2 GB in size) – No npm package this time Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Stable Diffusion Prompt: A guinea pig eating a watermelon

https://websd.mlc.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig
Web Stable Diffusion DEMO

Live Demo Retrofitting AI image generation into an existing drawing
application (https://paint.js.org) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Web Stable Diffusion DEMO

Pros & Cons + Data does not leave the browser
(privacy) + High availability (offline support) + Low latency + Stability (no external API changes) + Low cost – Lower quality – High system (RAM, GPU) and bandwidth requirements – Large model size, models cannot always be shared – Model initialization and inference are relatively slow – APIs are experimental Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Mitigations Download model in the background if the user is
not on a metered connection Helpful APIs: – Network Information API to estimate the network quality/determine data saver (negative standards position by Apple and Mozilla) – Storage Manager API to estimate the available free disk space Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Mitigations Hybrid modes: – Allow the user to switch between
cloud/local execution (availability, system requirements) – Deploy OSS model on internal/enterprise infrastructure (privacy) Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Alternatives: Prompt API Angular-Apps smarter machen mit Generative AI: lokal
und offlinefähig Local AI Models Operating System Website HTML/JS Browser Internet Apple Intelligence Gemini Nano

Alternatives: Prompt API – Exploratory API for local experiments and
use case determination – Downloads Gemini Nano into Google Chrome – Model is shared across origins – Uses native APIs directly – Related APIs: Translation API, Writing Assistance APIs Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models https://developer.chrome.com/docs/ai/built-in

Alternatives: Ollama – Local runner for AI models – Offers
a local server a website can connect to → allows sharing models across origins – Supported on macOS and Linux (Windows in Preview) https://webml-demo.vercel.app/ https://ollama.ai/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Alternatives: Hugging Face Transformers Pre-trained, specialized, significantly smaller models beyond
GenAI Examples: – Text generation – Image classification – Translation – Speech recognition – Image-to-text Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

Alternatives: Transformers.js – Pre-trained, specialized, significantly smaller models beyond GenAI
– JavaScript library to run Hugging Face transformers in the browser – Supports most of the models https://xenova.github.io/transformers.js/ Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Local AI Models

– Cloud-based models remain the most powerful models – Due
to their size and high system requirements, local generative AI models are currently rather interesting for very special scenarios (e.g., high privacy demands, offline availability) – Small, specialized models are an interesting alternative (if available) – Large language models are becoming more compact and efficient – Vendors start shipping AI models with their devices – Devices are becoming more powerful for running AI tasks – Experiment with the AI APIs and make your Angular App smarter! Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig Summary

Thank you for your kind attention! Christian Liebel @christianliebel [email protected]

Angular-Apps smarter machen mit Generative AI: ...

Angular-Apps smarter machen mit Generative AI: lokal und offlinefähig

More Decks by Christian Liebel

Other Decks in Programming

Featured

Transcript