Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PREL13: Learn how to observe, manage, and scale...

Avatar for Nitya Narasimhan, PhD Nitya Narasimhan, PhD
November 21, 2025
44

PREL13: Learn how to observe, manage, and scale agentic AI apps using Azure

This hands-on workshop will provide participants with the skills to effectively manage, govern, and scale agentic AI applications using Azure and Azure AI Foundry. The session will cover observability capabilities, model management policies, agent functionalities, and governance strategies. Participants will engage in practical exercises to apply these concepts in real-world scenarios.

Level: 300-400
Duration: 4 hours

Visit the Repo:
https://github.com/microsoft/ignite25-PREL13-observe-manage-and-scale-agentic-ai-apps-with-microsoft-foundry

Avatar for Nitya Narasimhan, PhD

Nitya Narasimhan, PhD

November 21, 2025
Tweet

Transcript

  1. Observe, Manage & Scale Agentic AI Apps On Azure AI

    Foundry Nitya Narasimhan Senior AI Advocate, Microsoft Bethany Jepchumba AI Advocate, Microsoft PREL13
  2. Welcome – Meet The Team! INSTRUCTOR Nitya Narasimhan, PhD Senior

    AI Advocate Microsoft INSTRUCTOR Bethany Jepchumba AI Advocate Microsoft Our Amazing Proctors Paul Shealy Hanchi Wang Nagkumar Arkalgud Dave Voutila
  3. Quick Exercise – Let’s Review Pre-Requisites Raise your hand Lower

    it if you don’t have a GitHub account Create one now. It takes just a minute or two (proctors can help) Lower it if you have NOT used VS Code before Lower it if you have NOT coded before in Python, Java etc. That’s okay! Sit at the front of the room for instructor-led demo Hand still raised? This lab was designed for you. Click “Launch” now and follow steps.
  4. Lab Overview: 5 Things To Know Before We Begin BROWSER-BASED

    LAB · Do NOT log into Skillable VM. CLICK LAUNCH 5-MIN INSTRUCTOR REVIEW · We’ll explain the big picture before diving in YOU NEED A PERSONAL GITHUB ACCOUNT · Create one now if you need it DO NOT USE YOUR AZURE SUBSCRIPTION · Use the credentials in the lab INFRA IS PRE-PROVISIONED· Just start the session now.
  5. Lab Outline: What Will You Do (In 4 hours) SETUP

    Provision resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  6. Lab Objectives: What Will You Learn Scenario · Learn to

    build Cora (an AI customer service chat bot) - for Zava (an enterprise retailer) Planning · Jumpstart design with AZD templates – customize, provision & deploy with 1 tool! Development · Create the base agent – generate testing dataset – select a good base model Optimization · Fine-tune for tone – build grader to evaluate it – distill model to reduce cost Observability · Run AI-Assisted Evaluation – explore built-in evaluators – try built-in tracing Operationalization · Deploy FT variant for testing – explore logs – monitor for app insights
  7. Stage 0: Validate Infrastructure SETUP Provision resources for your AI

    Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  8. The Zava AI Engineer’s journey Question-answering task using natural language

    I’m painting my living room wall. What should I buy? User input Start with prompt engineering Add your data with RAG Tone and style Example responses Intent mapping Query extraction Inventory retrieval Personalization Context engineering Optimize model with fine-tuning Deliver fast, accurate response with less cost Good choice! I recommend our Eggshell Paint. Would you like to know more about color choices? Model adaptation Desired output
  9. Why does Zava need model customization? Bruno Zhao Zava Customer

    Help me find the right product—for the right price—and I’ll buy it! Make it helpful Customize Cora’s tone & response format to be polite, helpful & conversational Robin Counts Retail Store Manager Help me build loyalty & move inventory— to grow my store sales Make it precise Improve Cora’s use of retrieved knowledge, reducing customer frustration & driving conversions Kian Lambert App Dev Manager Help me operate Cora to be more effective in cost, response quality Make it cheaper Distill Cora’s knowledge into smaller, cheaper models—reducing cost with comparable accuracy
  10. Stage 1: Start Development SETUP Provision resources for your AI

    Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  11. Introduction to AI Agents Plan & Setup AI Agents Model

    Context Break / Survey Feedback PART 01
  12. Notebook: 00-validate-setup.ipynb Plan & Setup Zava Scenario A quick check

    to make sure our Azure environment is ready, confirming that all connections and configurations are set up correctly Learning Objectives • Verify that your Azure services are properly connected. • Confirm you are ready to begin building the agent.,
  13. Notebook: 11-build-cora-retail-agent.ipynb AI Agents Zava Scenario We will create "Cora,"

    an AI assistant for Zave that helps customers by answering their questions about different products. Learning Objectives • Build a simple AI Agent. • Ground the agent product information,
  14. Notebooks: 21-simulate-dataset.ipynb | 22-evaluate-models.ipynb Model Context Zava Scenario We will

    use AI to generate synthetic query- response pairs from our product catalog. Next, we compare different AI models to find the best one for Cora based on quality, safety, and speed. Learning Objectives • Generate synthetic test data . • Run evaluations across multiple models using built-in quality and safety metrics. • Compare and select the optimal model.
  15. Stage 2: Start Model Customization SETUP Provision resources for your

    AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  16. Where does fine-tuning fit in? Fine-tune the LLM to: •

    Reduce the length of your prompt • Show not tell the model how to behave • Improve the accuracy when you look up information • Improve the model’s handling of retrieved data Will my sleeping bag work for my trip to Patagonia next month? Tone and style Weather lookup Example responses Personalization Intent mapping …and more! Yes, your Elite Eco sleeping bag is rated to 21.6F, which is below the average low temperature in Patagonia in September User input Prompt engineering Output LLM Basic prompt engineering Retrieval/RAG LLMs are language calculators
  17. Distillation & Data Generation  Distillation refers to the process

    of using a large, general purpose teacher model to train a smaller student model to perform well at a specific task.  Distillation is of particular interest for several reasons:  reduce the costs and latency  improve performance.  operate in resource-constrained environments  Distillation typically has three steps:  Data Generation (Stored Completions)  Training (Azure OpenAI Finetuning)  Evaluation (Azure OpenAI Evaluation) From Microsoft Product Terms Azure OpenAI Evaluation Define testing criteria Evaluate your Finetuned model Export data with pass status to fine-tune Fine-tuning Select hyper parameters Finetune a GPT-4o-mini model Stored Completions Log GPT-4o model responses View, query and filter data Export filtered data to fine- tune or evaluation
  18. Parameter Selection  Batch size is how many training examples

    you use in a single pass during training – trade off between speed & accuracy  Learning Rate Multiplier multiplies the original learning rate used by the base model. Values > 1 increase the learning rate, < 1 decrease.  Epochs determine the number of passes through the training data. Too few – underfit; too many – overfit  Seed sets the random seed for your run – so you can get reproducible results! Start with the defaults (except for seed) and iterate from there!
  19. Notebook: 32-custom-grader.ipynb Custom Graders Zava Scenario To ensure Cora consistently

    delivers quality responses with the right tone and style, we need a custom evaluator that can measure performance based on Zava's specific business criteria. Learning Objectives • Create a custom evaluator with business-specific grading criteria. • Establish baseline "gold standard" responses for consistent evaluation.
  20. Notebook: 31-basic-finetuning.ipynb Supervised FT Zava Scenario To make Cora sound

    consistently polite, factual, and helpful without lengthy prompts, we'll fine-tune a model to embed Zava's tone and style directly into its responses. Learning Objectives • Prepare and validate training datasets in JSONL format. • Submit and monitor a fine- tuning job.
  21. Notebook: 33-distill-finetuning.ipynb Distillation Zava Scenario To make Cora faster and

    cheaper while maintaining quality, we'll transfer knowledge from a large, capable model to a small, efficient model using distillation and the custom grader we built Learning Objectives • Identify the best "teacher" model using evaluation results. • Generate training data from the teacher's high-quality responses. • Fine-tune a smaller "student" model through knowledge distillation. • Validate that the student model maintains quality while improving cost and speed
  22. Stage 3: Explore Evaluations & Tracing Capabilities SETUP Provision resources

    for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  23. Evaluators Azure AI Foundry Quality Document Retrieval Groundedness Relevance Coherence

    Fluency Similarity NLP Metrics (e.g., F1 Score) AOAI Graders Risk & safety Indirect Attack Jailbreaks Direct Attack Jailbreaks Hate and Unfairness Sexual Violence Self-Harm Protected Material Ungrounded Attributes Code Vulnerability Agents Intent Resolution Tool Call Accuracy Task Adherence Response Completeness + Custom evaluators NEW NEW NEW NEW NEW
  24. Evaluations for agents User query “Weather now.” User Proxy Agent

    User wants to know the local weather in current time Tool Agent Call location and time API Call weather API Response Agent “The temperature is 30 degrees.” Intent resolution Tool calling evaluation Task adherence Preview Preview Preview • Correct intent classification • Clarification for ambiguity • Scope adherence • Single-step call accuracy • Parameter extraction accuracy • Multi-step trajectory efficiency • Final response satisfaction • Response completeness
  25. Notebook: 41-first-evaluation-run.ipynb AI Assisted Evals Zava Scenario Before deploying Cora,

    we run a first evaluation to ensure her responses are relevant and safe using a small test dataset and built-in quality and safety evaluators. Learning Objectives • Configure relevance and safety (violence) evaluators. • Publish results to Azure AI Foundry and save locally. • Interpret relevance and safety metrics for improvement.
  26. Notebook: 44-evaluate-agents.ipynb Built-in Evals Zava Scenario Evaluate Cora’s multi-step agent

    behavior (intent, tool use, task adherence, and safety) using specialized Azure AI agent evaluators. Learning Objectives • Use intent, tool call accuracy, and task adherence evaluators. • Combine quality and safety checks in agent evaluations. • Identify failure patterns to guide agent refinement.
  27. Notebook: 51-trace-cora-retail-agent.ipynb Tracing & Telemetry Zava Scenario Instrument Cora with

    OpenTelemetry to trace agent reasoning, tool calls, and conversations, exporting telemetry to Application Insights for observability. Learning Objectives • Create manual spans for session context. • Capture multi-turn chat, tool, and model spans. • Query and analyze traces for performance and reliability.
  28. Stage 4: Explore Insights, Deploy Model for Testing SETUP Provision

    resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows
  29. Notebook: 60-deployment.ipynb Deployments Zava Scenario Deploy Cora’s fine-tuned model to

    Azure AI Foundry, verify it provisions successfully, and test production responses that follow Zava’s tone guidelines. Learning Objectives • Configure and submit an Azure AI Foundry deployment. • Monitor provisioning status to completion. • Send test prompts to validate behavior and tone.
  30. A “checkpoint” is a deployable model created at the end

    of a training epoch Using the UI or API you can select, review, and deploy checkpoints just like any other fine-tuned model The last three checkpoints are automatically saved Checkpoints: or, it’s ok if you accidentally overfit
  31. • 26 supported regions added • Lower per-token training rates

    compared to Standard regional training • Flexible data handling aligned with Azure’s privacy and residency policies Supported Models • GPT-4o, GPT-4o-mini • GPT-4.1, GPT-4.1-mini, GPT-4.1-nano Global Training: Fine-tune closer to your data and at lower cost Generally Available
  32.  Free 24-hour hosting for fine-tuned models  Pay-per-token pricing

    with no upfront commitment  Multi-model evaluation to compare outputs side-by-side  Supports GPT-4.1 and GPT-4.1-mini from any training region  Ideal for pre-production testing and model selection  No SLAs or data residency guarantees - fast, flexible, and lightweight Developer Tier: Ship new models confidently Generally Available
  33. Observability Zava Scenario Monitor deployed agent performance and reliability using

    Azure AI Foundry portal metrics, Application Insights dashboards, and OpenTelemetry traces for end-to-end observability. Learning Objectives • Key portal metrics • App Insights • Tracing with OpenTelemetry • Frameworks supported: LangChain, LangGraph, Agent Framework, OpenAI Agents SDK Notebook: 60-monitoring-observability.ipynb
  34. Alerts, Diagnostics & Customized Dashboards Azure AI Foundry + Azure

    Monitor Visualize Debug & Alert Broad Visibility Drill-In Views Smart Alerts Automated Actions Azure Dashboards Workbooks Grafana
  35. Continuous Evaluation & Monitoring Azure AI Foundry + Azure Monitor

    Debug with Tracing Monitor Key Metrics Performance Quality Safety Resource Usage Agent Execution Flow Evaluation Metrics I/O Available for Agents Only
  36. Wrap-up and Next Steps Nitya Narasimhan, PhD Senior AI Advocate

    @Microsoft PREL13 Bethany Jepchumba AI Advocate, Microsoft
  37. Let’s Recap: What You Learned (E2E Development) Scenario · Learn

    to build Cora (an AI customer service chat bot) - for Zava (an enterprise retailer) Planning · Jumpstart design with AZD templates – customize, provision & deploy with 1 tool! Development · Create the base agent – generate testing dataset – select a good base model Optimization · Fine-tune for tone – build grader to evaluate it – distill model to reduce cost Observability · Run AI-Assisted Evaluation – explore built-in evaluators – try built-in tracing Operationalization · Deploy FT variant for testing – explore logs – monitor for app insights
  38. Let’s Recap: What You Did (sandbox to keep learning) SETUP

    Provision resources for your AI Agent app LAB 01 Create & explore an AI Agent LAB 02 Create synthetic datasets for testing LAB 03 Select a model by evaluating options LAB 04 Customize the model to improve its tone Azure Developer CLI Azure AI Foundry Agent Service AI Evaluation SDK Simulator AI Evaluation SDK Evaluate Flow Supervised Fine-Tuning LAB 05 Build a custom grader to evaluate the tone Azure OpenAI Grader LAB 06 Compress the model for less cost, latency Distillation with Fine Tuning LAB 07 Run AI-Assisted Quality & Safety Evals AI Evaluation SDK Evaluate API LAB 08 Run AI-Assisted Agent Evals AI Evaluation SDK Built-in Evals LAB 09 Activate Tracing & View Performance Open Telemetry & Tracing LAB 10 Deploy FT model for testing & monitor it AI Foundry Portal Azure Monitor WRAPUP End lab, fill in feedback, next steps Ignite Sessions & Fast-Follows