PREL13: Learn how to observe, manage, and scale agentic AI apps using Azure

Observe, Manage & Scale Agentic AI Apps On Azure AI
Foundry Nitya Narasimhan Senior AI Advocate, Microsoft Bethany Jepchumba AI Advocate, Microsoft PREL13

Welcome – Meet The Team! INSTRUCTOR Nitya Narasimhan, PhD Senior
AI Advocate Microsoft INSTRUCTOR Bethany Jepchumba AI Advocate Microsoft Our Amazing Proctors Paul Shealy Hanchi Wang Nagkumar Arkalgud Dave Voutila

Quick Exercise – Let’s Review Pre-Requisites Raise your hand Lower
it if you don’t have a GitHub account Create one now. It takes just a minute or two (proctors can help) Lower it if you have NOT used VS Code before Lower it if you have NOT coded before in Python, Java etc. That’s okay! Sit at the front of the room for instructor-led demo Hand still raised? This lab was designed for you. Click “Launch” now and follow steps.

Lab Overview: 5 Things To Know Before We Begin BROWSER-BASED
LAB · Do NOT log into Skillable VM. CLICK LAUNCH 5-MIN INSTRUCTOR REVIEW · We’ll explain the big picture before diving in YOU NEED A PERSONAL GITHUB ACCOUNT · Create one now if you need it DO NOT USE YOUR AZURE SUBSCRIPTION · Use the credentials in the lab INFRA IS PRE-PROVISIONED· Just start the session now.

Lab Outline: What Will You Do (In 4 hours) SETUP
Provision resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Lab Objectives: What Will You Learn Scenario · Learn to
build Cora (an AI customer service chat bot) - for Zava (an enterprise retailer) Planning · Jumpstart design with AZD templates – customize, provision & deploy with 1 tool! Development · Create the base agent – generate testing dataset – select a good base model Optimization · Fine-tune for tone – build grader to evaluate it – distill model to reduce cost Observability · Run AI-Assisted Evaluation – explore built-in evaluators – try built-in tracing Operationalization · Deploy FT variant for testing – explore logs – monitor for app insights

Stage 0: Validate Infrastructure SETUP Provision resources for your AI
Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

The Zava AI Engineer’s journey Question-answering task using natural language
I’m painting my living room wall. What should I buy? User input Start with prompt engineering Add your data with RAG Tone and style Example responses Intent mapping Query extraction Inventory retrieval Personalization Context engineering Optimize model with fine-tuning Deliver fast, accurate response with less cost Good choice! I recommend our Eggshell Paint. Would you like to know more about color choices? Model adaptation Desired output

Why does Zava need model customization? Bruno Zhao Zava Customer
Help me find the right product—for the right price—and I’ll buy it! Make it helpful Customize Cora’s tone & response format to be polite, helpful & conversational Robin Counts Retail Store Manager Help me build loyalty & move inventory— to grow my store sales Make it precise Improve Cora’s use of retrieved knowledge, reducing customer frustration & driving conversions Kian Lambert App Dev Manager Help me operate Cora to be more effective in cost, response quality Make it cheaper Distill Cora’s knowledge into smaller, cheaper models—reducing cost with comparable accuracy

Introduction To AI Agents  Nitya Narasimhan  Senior AI
Advocate, Microsoft

Stage 1: Start Development SETUP Provision resources for your AI
Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Introduction to AI Agents Plan & Setup AI Agents Model
Context Break / Survey Feedback PART 01

Notebook: 00-validate-setup.ipynb Plan & Setup Zava Scenario A quick check
to make sure our Azure environment is ready, confirming that all connections and configurations are set up correctly Learning Objectives • Verify that your Azure services are properly connected. • Confirm you are ready to begin building the agent.,

Notebook: 11-build-cora-retail-agent.ipynb AI Agents Zava Scenario We will create "Cora,"
an AI assistant for Zave that helps customers by answering their questions about different products. Learning Objectives • Build a simple AI Agent. • Ground the agent product information,

Notebooks: 21-simulate-dataset.ipynb | 22-evaluate-models.ipynb Model Context Zava Scenario We will
use AI to generate synthetic query- response pairs from our product catalog. Next, we compare different AI models to find the best one for Cora based on quality, safety, and speed. Learning Objectives • Generate synthetic test data . • Run evaluations across multiple models using built-in quality and safety metrics. • Compare and select the optimal model.

Optimization Bethany Jepchumba AI Advocate, Microsoft

Stage 2: Start Model Customization SETUP Provision resources for your
AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Optimization Custom Graders Supervised Fine Tuning Distillation Break / Survey
Feedback PART 02

Where does fine-tuning fit in? Fine-tune the LLM to: •
Reduce the length of your prompt • Show not tell the model how to behave • Improve the accuracy when you look up information • Improve the model’s handling of retrieved data Will my sleeping bag work for my trip to Patagonia next month? Tone and style Weather lookup Example responses Personalization Intent mapping …and more! Yes, your Elite Eco sleeping bag is rated to 21.6F, which is below the average low temperature in Patagonia in September User input Prompt engineering Output LLM Basic prompt engineering Retrieval/RAG LLMs are language calculators

Distillation & Data Generation  Distillation refers to the process
of using a large, general purpose teacher model to train a smaller student model to perform well at a specific task.  Distillation is of particular interest for several reasons:  reduce the costs and latency  improve performance.  operate in resource-constrained environments  Distillation typically has three steps:  Data Generation (Stored Completions)  Training (Azure OpenAI Finetuning)  Evaluation (Azure OpenAI Evaluation) From Microsoft Product Terms Azure OpenAI Evaluation Define testing criteria Evaluate your Finetuned model Export data with pass status to fine-tune Fine-tuning Select hyper parameters Finetune a GPT-4o-mini model Stored Completions Log GPT-4o model responses View, query and filter data Export filtered data to fine- tune or evaluation

Parameter Selection  Batch size is how many training examples
you use in a single pass during training – trade off between speed & accuracy  Learning Rate Multiplier multiplies the original learning rate used by the base model. Values > 1 increase the learning rate, < 1 decrease.  Epochs determine the number of passes through the training data. Too few – underfit; too many – overfit  Seed sets the random seed for your run – so you can get reproducible results! Start with the defaults (except for seed) and iterate from there!

Notebook: 32-custom-grader.ipynb Custom Graders Zava Scenario To ensure Cora consistently
delivers quality responses with the right tone and style, we need a custom evaluator that can measure performance based on Zava's specific business criteria. Learning Objectives • Create a custom evaluator with business-specific grading criteria. • Establish baseline "gold standard" responses for consistent evaluation.

Notebook: 31-basic-finetuning.ipynb Supervised FT Zava Scenario To make Cora sound
consistently polite, factual, and helpful without lengthy prompts, we'll fine-tune a model to embed Zava's tone and style directly into its responses. Learning Objectives • Prepare and validate training datasets in JSONL format. • Submit and monitor a finetuning job.

Notebook: 33-distill-finetuning.ipynb Distillation Zava Scenario To make Cora faster and
cheaper while maintaining quality, we'll transfer knowledge from a large, capable model to a small, efficient model using distillation and the custom grader we built Learning Objectives • Identify the best "teacher" model using evaluation results. • Generate training data from the teacher's high-quality responses. • Fine-tune a smaller "student" model through knowledge distillation. • Validate that the student model maintains quality while improving cost and speed

Observability  Nitya Narasimhan  Senior AI Advocate, Microsoft

Stage 3: Explore Evaluations & Tracing Capabilities SETUP Provision resources
for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Evaluators Azure AI Foundry Quality Document Retrieval Groundedness Relevance Coherence
Fluency Similarity NLP Metrics (e.g., F1 Score) AOAI Graders Risk & safety Indirect Attack Jailbreaks Direct Attack Jailbreaks Hate and Unfairness Sexual Violence Self-Harm Protected Material Ungrounded Attributes Code Vulnerability Agents Intent Resolution Tool Call Accuracy Task Adherence Response Completeness + Custom evaluators NEW NEW NEW NEW NEW

Evaluations for agents User query “Weather now.” User Proxy Agent
User wants to know the local weather in current time Tool Agent Call location and time API Call weather API Response Agent “The temperature is 30 degrees.” Intent resolution Tool calling evaluation Task adherence Preview Preview Preview • Correct intent classification • Clarification for ambiguity • Scope adherence • Single-step call accuracy • Parameter extraction accuracy • Multi-step trajectory efficiency • Final response satisfaction • Response completeness

Observability AI Assisted Evaluation Built-in evaluators Tracing and Telemetry Break
/ Survey Feedback PART 03

Notebook: 41-first-evaluation-run.ipynb AI Assisted Evals Zava Scenario Before deploying Cora,
we run a first evaluation to ensure her responses are relevant and safe using a small test dataset and built-in quality and safety evaluators. Learning Objectives • Configure relevance and safety (violence) evaluators. • Publish results to Azure AI Foundry and save locally. • Interpret relevance and safety metrics for improvement.

Notebook: 44-evaluate-agents.ipynb Built-in Evals Zava Scenario Evaluate Cora’s multi-step agent
behavior (intent, tool use, task adherence, and safety) using specialized Azure AI agent evaluators. Learning Objectives • Use intent, tool call accuracy, and task adherence evaluators. • Combine quality and safety checks in agent evaluations. • Identify failure patterns to guide agent refinement.

Notebook: 51-trace-cora-retail-agent.ipynb Tracing & Telemetry Zava Scenario Instrument Cora with
OpenTelemetry to trace agent reasoning, tool calls, and conversations, exporting telemetry to Application Insights for observability. Learning Objectives • Create manual spans for session context. • Capture multi-turn chat, tool, and model spans. • Query and analyze traces for performance and reliability.

Operationalization Bethany Jepchumba AI Advocate, Microsoft

Stage 4: Explore Insights, Deploy Model for Testing SETUP Provision
resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Operationalization Deployments Observability in Portal Summary and Next Steps Survey
Feedback PART 04

Notebook: 60-deployment.ipynb Deployments Zava Scenario Deploy Cora’s fine-tuned model to
Azure AI Foundry, verify it provisions successfully, and test production responses that follow Zava’s tone guidelines. Learning Objectives • Configure and submit an Azure AI Foundry deployment. • Monitor provisioning status to completion. • Send test prompts to validate behavior and tone.

A “checkpoint” is a deployable model created at the end
of a training epoch Using the UI or API you can select, review, and deploy checkpoints just like any other fine-tuned model The last three checkpoints are automatically saved Checkpoints: or, it’s ok if you accidentally overfit

• 26 supported regions added • Lower per-token training rates
compared to Standard regional training • Flexible data handling aligned with Azure’s privacy and residency policies Supported Models • GPT-4o, GPT-4o-mini • GPT-4.1, GPT-4.1-mini, GPT-4.1-nano Global Training: Fine-tune closer to your data and at lower cost Generally Available

 Free 24-hour hosting for fine-tuned models  Pay-per-token pricing
with no upfront commitment  Multi-model evaluation to compare outputs side-by-side  Supports GPT-4.1 and GPT-4.1-mini from any training region  Ideal for pre-production testing and model selection  No SLAs or data residency guarantees - fast, flexible, and lightweight Developer Tier: Ship new models confidently Generally Available

Observability Zava Scenario Monitor deployed agent performance and reliability using
Azure AI Foundry portal metrics, Application Insights dashboards, and OpenTelemetry traces for end-to-end observability. Learning Objectives • Key portal metrics • App Insights • Tracing with OpenTelemetry • Frameworks supported: LangChain, LangGraph, Agent Framework, OpenAI Agents SDK Notebook: 60-monitoring-observability.ipynb

Alerts, Diagnostics & Customized Dashboards Azure AI Foundry + Azure
Monitor Visualize Debug & Alert Broad Visibility Drill-In Views Smart Alerts Automated Actions Azure Dashboards Workbooks Grafana

Continuous Evaluation & Monitoring Azure AI Foundry + Azure Monitor
Debug with Tracing Monitor Key Metrics Performance Quality Safety Resource Usage Agent Execution Flow Evaluation Metrics I/O Available for Agents Only

Wrap-up and Next Steps Nitya Narasimhan, PhD Senior AI Advocate
@Microsoft PREL13 Bethany Jepchumba AI Advocate, Microsoft

Let’s Recap: What You Learned (E2E Development) Scenario · Learn
to build Cora (an AI customer service chat bot) - for Zava (an enterprise retailer) Planning · Jumpstart design with AZD templates – customize, provision & deploy with 1 tool! Development · Create the base agent – generate testing dataset – select a good base model Optimization · Fine-tune for tone – build grader to evaluate it – distill model to reduce cost Observability · Run AI-Assisted Evaluation – explore built-in evaluators – try built-in tracing Operationalization · Deploy FT variant for testing – explore logs – monitor for app insights

Let’s Recap: What You Did (sandbox to keep learning) SETUP
Provision resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows

Get started and build Azure skills on MS Learn PREL13

Thank you!

PREL13: Learn how to observe, manage, and scale...

PREL13: Learn how to observe, manage, and scale agentic AI apps using Azure

More Decks by Nitya Narasimhan, PhD

Featured

Transcript