Christian Liebel
@christianliebel
Consultant
Smartere Web-Apps mit Angular,
WebLLM und Prompt API
Lokal und offlinefähig
Slide 2
Slide 2 text
Hello, it’s me.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Christian Liebel
X:
@christianliebel
Email:
christian.liebel
@thinktecture.com
Angular & PWA
Slides:
thinktecture.com
/christian-liebel
Slide 3
Slide 3 text
09:00–10:30 Block 1
10:30–11:00 Coffee Break
11:00–12:30 Block 2
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Timetable
Slide 4
Slide 4 text
What to expect
Focus on web app development
Focus on Generative AI
Up-to-date insights: the ML/AI
field is evolving fast
Live demos on real hardware
Hands-on labs
What not to expect
Deep dive into AI specifics, RAG,
model finetuning or training
Stable libraries or specifications
WebSD in Angular
1:1 Support
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Expectations
Huge downloads! High requirements! Things may break!
Slide 5
Slide 5 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
DEMO
Slide 6
Slide 6 text
(Workshop Edition)
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Demo Use Case
DEMO
Slide 7
Slide 7 text
Setup complete?
(Node.js, Google Chrome Canary, Editor, Git,
macOS/Windows,
20 GB free disk space, 6 GB VRAM)
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Setup (1/2)
LAB #0
Slide 8
Slide 8 text
git clone https://github.com/thinktecture/angular-
days-2025-spring-genai.git
cd angular-days-2025-spring-genai
npm i
npm start -- --open
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Setup (2/2)
LAB #0
Slide 9
Slide 9 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI everywhere
Source: https://www.apple.com/chde/apple-intelligence/
Slide 10
Slide 10 text
Run locally on the user’s system
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Single-Page Applications
Server-
Logik
Web API
Push Service
Web API
DBs
HTML, JS,
CSS, Assets
Webserver Webbrowser
SPA
Client-
Logik
View
HTML/CSS
View
HTML/CSS
View
HTML/CSS
HTTPS
WebSockets
HTTPS
HTTPS
Slide 11
Slide 11 text
Make SPAs offline-capable
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Progressive Web Apps
Service
Worker
Internet
Website
HTML/JS
Cache
fetch
Slide 12
Slide 12 text
Overview
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI
Text
OpenAI GPT
Mistral
…
Audio/Music
Musico
Soundraw
…
Images
DALL·E
Firefly
…
Video
Sora
Runway
…
Speech
Whisper
tortoise-tts
…
Slide 13
Slide 13 text
Overview
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI
Text
OpenAI GPT
Mistral
…
Audio/Music
Musico
Soundraw
…
Images
DALL·E
Firefly
…
Video
Sora
Runway
…
Speech
Whisper
tortoise-tts
…
Slide 14
Slide 14 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Media
DEMO
Slide 15
Slide 15 text
Overview
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI
Text
OpenAI GPT
Mistral
…
Audio/Music
Musico
Soundraw
…
Images
DALL·E
Firefly
…
Video
Sora
Runway
…
Speech
Whisper
tortoise-tts
…
Slide 16
Slide 16 text
Examples
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI Cloud Providers
Slide 17
Slide 17 text
Drawbacks
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Generative AI Cloud Providers
Require a (stable) internet connection
Subject to network latency and server availability
Data is transferred to the cloud service
Require a subscription
Slide 18
Slide 18 text
Can we run GenAI models locally?
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Slide 19
Slide 19 text
Large: Trained on lots of data
Language: Process and generate text
Models: Programs/neural networks
Examples:
– GPT (ChatGPT, Bing Chat, …)
– Gemini, Gemma (Google)
– LLaMa (Meta AI)
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Large Language Models
Slide 20
Slide 20 text
Token
A meaningful unit of text (e.g., a word, a part of a word, a character).
Context Window
The maximum amount of tokens the model can process.
Parameters/weights
Internal variables learned during training, used to make predictions.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Large Language Models
Slide 21
Slide 21 text
Prompts serve as the universal interface
Unstructured text conveying specific semantics
Paradigm shift in software architecture
Natural language becomes a first-class citizen
Caveats
Non-determinism and hallucination, prompt injections
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Large Language Models
Slide 22
Slide 22 text
Size Comparison
Model:Parameters Size
phi3:3b 2.2 GB
mistral:7b 4.1 GB
llama3:8b 4.7 GB
gemma2:9b 5.4 GB
gemma2:27b 16 GB
llama3:70b 40 GB
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Large Language Models
Slide 23
Slide 23 text
https://webllm.mlc.ai/
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebLLM
DEMO
Slide 24
Slide 24 text
On NPM
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebLLM
Slide 25
Slide 25 text
npm i @mlc-ai/web-llm
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
LAB #1
Slide 26
Slide 26 text
(1/4)
In app.component.ts, add the following lines:
protected readonly progress = signal(0);
protected readonly ready = signal(false);
protected engine?: MLCEngine;
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Downloading a model LAB #2
Slide 27
Slide 27 text
(2/4)
In app.component.ts (ngOnInit()), add the following lines:
const model = 'Llama-3.2-3B-Instruct-q4f32_1-MLC';
this.engine = await CreateMLCEngine(model, {
initProgressCallback: ({ progress }) =>
this.progress.set(progress)
});
this.ready.set(true);
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Downloading a model LAB #2
Slide 28
Slide 28 text
(3/4)
In app.component.html, change the following lines:
@if(!ready()) {
}
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Downloading a model LAB #2
Slide 29
Slide 29 text
(4/4)
Launch the app via npm start. The progress bar should begin to move.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Downloading a model LAB #2
Slide 30
Slide 30 text
Storing model files locally
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Cache API
Internet
Website
HTML/JS
Cache with
model files
Hugging
Face
Note: Due to the Same-Origin Policy, models cannot be shared across origins.
Slide 31
Slide 31 text
Parameter cache
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Cache API
Slide 32
Slide 32 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebAssembly (Wasm)
– Bytecode for the web
– Compile target for arbitrary
languages
– Can be faster than JavaScript
– WebLLM uses a model-
specific Wasm library to
accelerate model
computations
Slide 33
Slide 33 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebGPU
– Grants low-level access to the
Graphics Processing Unit (GPU)
– Near native performance for
machine learning applications
– Supported by Chromium-based
browsers on Windows and
macOS from version 113
Slide 34
Slide 34 text
webgpureport.org
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebGPU
Slide 35
Slide 35 text
– Grants web apps access to
the device’s CPU, GPU and
Neural Processing Unit (NPU)
– In specification by the WebML
Working Group at W3C
– Implementation in progress in
Chromium (behind a flag)
– Even better performance
compared to WebGPU
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebNN
Source: https://webmachinelearning.github.io/webnn-intro/
DEMO
Slide 36
Slide 36 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebNN: near-native inference performance
Source: Intel. Browser: Chrome Canary 118.0.5943.0, DUT: Dell/Linux/i7-1260P, single p-core, Workloads: MediaPipe solution models (FP32, batch=1)
Slide 37
Slide 37 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
WebNN
Source: https://github.com/webmachinelearning/webnn/issues/375#issuecomment-2720701672
Slide 38
Slide 38 text
(1/4)
In app.component.ts, add the following lines at the top of the class:
protected readonly reply = signal('');
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Model inference LAB #3
Slide 39
Slide 39 text
(2/4)
In the runPrompt() method, add the following code:
this.reply.set('…');
const chunks = languageModel === 'webllm'
? await this.inferWebLLM(userPrompt)
: await this.inferPromptApi(userPrompt);
for await (const chunk of chunks) {
this.reply.set(chunk);
}
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Model inference LAB #3
Slide 40
Slide 40 text
(3/4)
In the inferWebLLM() method, add the following code:
await this.engine!.resetChat();
const messages: ChatCompletionMessageParam[] = [{role: "user", content: userPrompt}];
const chunks = await this.engine!.chat.completions.create({messages, stream: true});
let reply = '';
for await (const chunk of chunks) {
reply += chunk.choices[0]?.delta.content ?? '';
yield reply;
}
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Model inference LAB #3
Slide 41
Slide 41 text
(4/4)
In app.component.html, change the following line:
{{ reply() }}
You should now be able to send prompts to the model and see the
responses in the template.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Model inference LAB #3
Slide 42
Slide 42 text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Slide 43
Slide 43 text
npm run build
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
LAB #4
Slide 44
Slide 44 text
1. In angular.json, increase the bundle size for the Angular project
(property architect.build.configurations.production.budgets[0]
.maximumError) to at least 6MB.
2. Then, run npm run build again. This time, the build should succeed.
3. If you stopped the development server, don’t forget to bring it back up
again (npm start).
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Build issues LAB #4
Slide 45
Slide 45 text
(1/2)
In app.component.ts, add the following signal at the top:
protected readonly todos = signal([]);
Add the following line to the addTodo() method:
const text = prompt() ?? '';
this.todos.update(todos => [...todos,
{ done: false, text }]);
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Todo management LAB #5
Slide 46
Slide 46 text
(2/2)
In app.component.html, add the following lines to add todos from the UI:
@for (todo of todos(); track $index) {
{{ todo.text }}
}
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Todo management LAB #5
Slide 47
Slide 47 text
@for (todo of todos(); track $index) {
{{ todo.text }}
}
⚠ Boo! This pattern is not recommended. Instead, you should set the
changed values on the signal. But this messes up with Angular Material…
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Todo management (extended) LAB #6
Slide 48
Slide 48 text
Concept and limitations
The todo data has to be
converted into natural language.
For the sake of simplicity, we will
add all TODOs to the prompt.
Remember: LLMs have a context
window (Mistral-7B: 8K).
If you need to chat with larger
sets of text, refer to Retrieval
Augmented Generation (RAG).
These are the todos:
* Wash clothes
* Pet the dog
* Take out the trash
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Chat with data
Slide 49
Slide 49 text
System prompt
Metaprompt that defines…
– character
– capabilities/limitations
– output format
– behavior
– grounding data
Hallucinations and prompt injections cannot be eliminated.
You are a helpful assistant.
Answer user questions on todos.
Generate a valid JSON object.
Avoid negative content.
These are the user’s todos: …
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Chat with data
Slide 50
Slide 50 text
Flow
System
message
• The user has
these todos:
1. … 2. … 3. …
User
message
• How many
todos do I
have?
Assistant
message
• You have
three todos.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Chat with data
Slide 51
Slide 51 text
Using a system & user prompt
Adjust the code in inferWebLLM() to include the system prompt:
const systemPrompt = `Here's the user's todo list:
${this.todos().map(todo => `* ${todo.text} (${todo.done ?
'done' : 'not done'})`).join('\n')}`;
const messages: ChatCompletionMessageParam[] = [
{ role: "system", content: systemPrompt },
{ role: "user", content: userPrompt }
];
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Chat with data LAB #7
Slide 52
Slide 52 text
Techniques
– Providing examples (single shot, few shot, …)
– Priming outputs
– Specify output structure
– Repeating instructions
– Chain of thought
– …
Success also depends on the model.
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Prompt Engineering
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering
Slide 53
Slide 53 text
const systemPrompt = `You are a helpful assistant.
The user will ask questions about their todo list.
Briefly answer the questions.
Don't try to make up an answer if you don't know it.
Here's the user's todo list:
${this.todos().map(todo => `* ${todo.text} (this todo is
${todo.done ? 'done' : 'not done'})`).join('\n')}
${this.todos().length === 0 ? 'The list is empty, there are
no todos.' : ''}`;
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Prompt Engineering LAB #8
Slide 54
Slide 54 text
Alternatives
Prompt Engineering
Retrieval Augmented
Generation
Fine-tuning
Custom model
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Prompt Engineering
Effort
Slide 55
Slide 55 text
Comparison
45 33
1200
0
200
400
600
800
1000
1200
1400
WebLLM (Llama3-8b, M4) Azure OpenAI (gpt-4o-mini) Groq (Llama3-8b)
Tokens/sec
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Performance
WebLLM/Groq: Own tests (14.11.2024), OpenAI/Azure OpenAI: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput (18.07.2024)
Slide 56
Slide 56 text
– Open-source text-to-image
model
– Generates 512x512px images
from a prompt
– WebSD: special version of
Stable Diffusion for the web
(2 GB in size)
– No npm package this time
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Stable Diffusion
Prompt: A guinea pig eating a watermelon
Slide 57
Slide 57 text
https://websd.mlc.ai/
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Web Stable Diffusion
DEMO
Slide 58
Slide 58 text
Pros & Cons
+ Data does not leave the
browser (privacy)
+ High availability
(offline support)
+ Low latency
+ Stability
(no external API changes)
+ Low cost
– Lower quality
– High system (RAM, GPU) and
bandwidth requirements
– Large model size, models
cannot always be shared
– Model initialization and
inference are relatively slow
– APIs are experimental
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 59
Slide 59 text
Mitigations
Download model in the background if the user is not on a metered
connection
Helpful APIs:
– Network Information API to estimate the network quality/determine
data saver (negative standards position by Apple and Mozilla)
– Storage Manager API to estimate the available free disk space
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 60
Slide 60 text
Mitigations
Hybrid modes:
– Allow the user to switch between cloud/local execution (availability,
system requirements)
– Deploy OSS model on internal/enterprise infrastructure (privacy)
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 61
Slide 61 text
Alternatives: Prompt API
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Operating
System
Website
HTML/JS
Browser Internet
Apple Intelligence
Gemini Nano
Slide 62
Slide 62 text
Alternatives: Prompt API
– Exploratory API for local
experiments and use case
determination
– Downloads Gemini Nano into
Google Chrome
– Model is shared across origins
– Uses native APIs directly
– Related APIs: Translation API,
Writing Assistance APIs
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
https://developer.chrome.com/docs/ai/built-in
Slide 63
Slide 63 text
https://www.google.com/chrome/canary/
about://flags
Enables optimization guide on device à
EnabledBypassPerfRequirement
Prompt API for Gemini Nano à Enabled
about://on-device-internals
AI in the Browser
Smarter Angular apps with WebGPU and WebNN
Local AI Models
Slide 64
Slide 64 text
Add the following line to the inferPromptApi() method:
const systemPrompt = `
The user will ask questions about their todo list.
Here's the user's todo list:
${this.todos().map(todo => `* ${todo.text} (${todo.done ? 'done' : 'not
done'})`).join('\n')}`;
const session = await window.ai.languageModel.create({ systemPrompt });
const chunks = session.promptStreaming(userPrompt);
let reply = '';
for await (const chunk of chunks) {
reply += chunk;
yield reply;
}
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models LAB #9
Slide 65
Slide 65 text
Alternatives: Ollama
– Local runner for AI models
– Offers a local server a website
can connect to à allows
sharing models across origins
– Supported on macOS and
Linux (Windows in Preview)
https://webml-demo.vercel.app/
https://ollama.ai/
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 66
Slide 66 text
Alternatives: Hugging Face Transformers
Pre-trained, specialized, significantly smaller models beyond GenAI
Examples:
– Text generation
– Image classification
– Translation
– Speech recognition
– Image-to-text
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 67
Slide 67 text
Alternatives: Transformers.js
– Pre-trained, specialized,
significantly smaller models
beyond GenAI
– JavaScript library to run
Hugging Face transformers
in the browser
– Supports most of the models
https://xenova.github.io/transformers.js/
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Local AI Models
Slide 68
Slide 68 text
– Cloud-based models remain the most powerful models
– Due to their size and high system requirements, local generative AI
models are currently rather interesting for very special scenarios (e.g.,
high privacy demands, offline availability)
– Small, specialized models are an interesting alternative (if available)
– Large language models are becoming more compact and efficient
– Vendors start shipping AI models with their devices
– Devices are becoming more powerful for running AI tasks
– Experiment with the AI APIs and make your Angular App smarter!
Smartere Web-Apps mit Angular, WebLLM und Prompt API
Lokal und offlinefähig
Summary
Slide 69
Slide 69 text
Thank you
for your kind attention!
Christian Liebel
@christianliebel
[email protected]