Observe, optimize & protect your hosted agents in Microsoft Foundry

Observe, optimize & protect your hosted agents in Microsoft Foundry
Nitya Narasimhan, PhD Senior AI Advocate, Microsoft

Setup: Lab Dev Environment o Launch Skillable VM – verify
hosted agent deployment o Launch GitHub Codespaces – configure local dev environment o Launch GitHub Copilot – activate MCP servers and verify skills

Agents are non-deterministic, creating new reliability and consistency challenges for
developers and operators

Reliable AI agent development needs observability Gain execution flow visibility
Trace Assess quality and safety Evaluate Detect issues in real-time Monitor Improve agent performance Optimize Understand and optimize every agent’s health, cost, and behavior in real-time

Microsoft Foundry The AI app and agent factory Agent Service
Models IQ Tools Machine Learning Control Plane Cloud Edge Governed agent lifecycle

Foundry Observability End-to-end visibility, quality & control for production AI
Trace Tracing for any agent framework End-to-end agent traces (prompt → model → tool) OpenTelemetry standard Azure Monitor & App Insights integration Evaluate Evaluation for any agent framework Built-in evaluators: quality, RAG, safety, agent, code-based metrics Rubric evaluators Multi-turn evaluation User Simulation AI Red Teaming Agent (PyRIT-based) CI/CD integration Monitor Real-time observability dashboard Token, latency, cost & error metrics Quality & safety scores in production Scheduled evals and red teaming for drift detection Azure Monitor alerts Optimize Single shot optimization Agent optimizer ROI for Agents in Foundry

The Agent DevOps Lifecycle Evaluate, trace, monitor and continuously improve
Get Started Plan Code Test Monitor Analyze Optimize

The Scenario: A Multi-Agent Travel Concierge Building a reliable multi-
agent AI solution requires end-to-end observability Maintaining desired quality, cost & latency in production with env changes requires continuous optimization

The Challenge: Knowledge Gaps & Dev Experience Datasets – I
just deployed the working prototype to production. I have no data. How do I evaluate it? Evaluations – I have cost, quality & latency targets. How do I define and assess metrics for compliance? Analysis – Real-world use may uncover unexpected issues as env & requirements shift. How can I tell? Optimization – Analysis shows drift in eval results. How can I ensure continuous optimization?

 Modern AI Agents often fail in ways traditional monitoring
can’t catch. Observe & Optimize agents with the Microsoft Foundry Observability platform  Agent Optimization Loop In 60 Minutes  Lab 0: Setup · GitHub Codespaces & GitHub Copilot  Lab 1: Explore · Built-in Observability For Hosted Agents  Lab 2: Observe · Code-First Observability using Skills  Lab 3: Optimize · Hill-Climbing iteratively using Skills The Workshop: Optimize Your Microsoft Foundry agents code- first with Github Copilot

W E W I L L S E E S
O M E O F T H E S E I N A C T I O N D U R I N G T H E L A B Developer Surfaces: azd for lifecycle, skill for the loop P R I M A R Y L O O P O R C H E ST R A T O R Copilot (CLI) skill Natural language, multi-step workflow. Works in Copilot CLI, Claude Code, Cursor, Codex via Foundry Plugin. Covers hosted, non-hosted, 3P agents. V I S U A L C O MP A N I O N VS Code + Foundry Toolkit Eval result panels, trace viewer, agent inspector. Renders what the skills produce — not a separate workflow. M O N I T O R I N G + D R I L L- D O W N Foundry Portal Production health dashboards, continuous eval alerts, version comparison, trace drill-down. The outer-loop read surface. C I / C D S C R I P T I N G · H O S T E D O N L Y azd CLI Deterministic imperative commands: azd ai agent deploy / optimize / invoke. Right for scripting and automation. Not the developer-facing UX. Skill is the primary surface · azd is plumbing . Portal is where monitoring lives · VS Code + FTK is the visual layer. 8 / 18

Setup: GitHub Copilot o Launch Skillable VM – verify hosted
agent deployment o Launch GitHub Codespaces – configure local dev environment o Launch GitHub Copilot – activate MCP servers and verify skills

Get Started with Out of the Box Observability Get Started
Plan Code Test Monitor Analyze Optimize

Tracing & Evals for Any Agent Framework Announcement Open Ecosystem
Support OTel-based tracing and evals extend to LangChain, LangGraph, OpenAI SDK, and Microsoft Agent Framework — not just Foundry-hosted agents. Unified Trace Visibility Every agent step — tool calls, LLM invocations, handoffs — captured in one trace view regardless of which framework was used to build the agent. Eval Signal, Everywhere Run structured evals against traces from any framework. Get consistent quality signals across your entire agent fleet.

Lab 1: Explore Hosted Agent o Understand Hosted Agents structure
– explore in Playground o Understand Tracing capabilities – explore in Playground o Understand Evaluations capabilities – explore in Playground

Fast-Forward to Production: The full Agent DevOps Loop Plan Code
Test Monitor Analyze Optimize

Code-First Observability for Foundry Agents Skill-based guided experience Observability is
enabled automatically when you create a hosted Foundry agent. Run your first eval and seamlessly transition to optimization. VS Code and GitHub Copilot Chat/CLI Analyze traces, run evals, compare results and optimize directly in your IDE.

Lab: Activate Observe Skill o Understand Foundry Skills usage –
kick off the “observe” skill o Explore Auto-generated Datasets – for agent evaluation o Explore Batch Evaluation Run – baseline metrics for agent

Hill Climb with confidence with Foundry Optimizer Plan Code Test
Monitor Analyze Optimize

Lab: Optimize Hosted Agent o Explore Failure Analysis – skill
assesses metrics & finds gaps o Explore Prompt Optimizer – skill fixes gaps & redeploys new version o Learn Hill Climbing – skill compares v1 & v1, suggests next steps o Explore Custom Evaluators – skill determines need & builds evaluator

Foundry Observability End-to-end visibility, quality & control for production AI
Evaluate, monitor, trace, govern and optimize Build reliable agents Debug & optimize in production Gain fleet-wide visibility & control

New Evaluation Capabilities Public preview Multi-Turn Evaluation Evaluate agent performance
across full conversational flows, capturing context carryover, reasoning consistency, and end-to-end task success User Simulation Automatically generate realistic multi-turn conversations and scenarios to evaluate how agents perform Traces to Datasets Convert production traces into relevant structured evaluation datasets to improve offline test coverage Evals with Intelligent Trace Sampling Sample the most relevant traces for continuous online evaluation

Foundry Evaluators Quality Document Retrieval Groundedness Relevance Response Completeness Coherence
Fluency Similarity Task Completion Customer Satisfaction NLP Metrics (e.g., F1 Score) Quality Grader Azure Open AI Graders Multi-Turn Multi-Turn Multi-Turn Multi-Turn Risk & Safety Indirect Attack Jailbreaks Hate and Unfairness Sexual Violence Self-Harm Protected Material Ungrounded Attributes Code Vulnerability Prohibited Actions Sensitive Data Leakage Agents Intent Resolution Tool Call Accuracy Tool Selection Tool Input Accuracy Tool Output Utilization Tool Call Success Task Adherence + Custom Evaluators + Rubric Evaluators

 This workshop is also available for self-paced exploration at
home with your own Azure and GitHub Copilot subscription.  Look for updates to repo with new labs in July:  Eval Rubrics · Create custom evaluators that are adaptive  Trace Replays · Use visual UI to build intuition & debug  Agent Optimizer · Continuous optimization in production  Assert · Build spec-driven evaluation harnesses for agents Keep Learning: What You Can Expect To See in Labs Post-Build

Observe, optimize & protect your hosted agents ...

Observe, optimize & protect your hosted agents in Microsoft Foundry

Nitya Narasimhan, PhD

More Decks by Nitya Narasimhan, PhD

Other Decks in Technology

Featured

Transcript

Observe, optimize & protect your hosted agents in Microsoft Foundry

Setup: Lab Dev Environment o Launch Skillable VM – verify

Agents are non-deterministic, creating new reliability and consistency challenges for

Reliable AI agent development needs observability Gain execution flow visibility

Microsoft Foundry The AI app and agent factory Agent Service

Foundry Observability End-to-end visibility, quality & control for production AI

The Agent DevOps Lifecycle Evaluate, trace, monitor and continuously improve

The Scenario: A Multi-Agent Travel Concierge Building a reliable multi-

The Challenge: Knowledge Gaps & Dev Experience Datasets – I

 Modern AI Agents often fail in ways traditional monitoring

W E W I L L S E E S

Setup: GitHub Copilot o Launch Skillable VM – verify hosted

Get Started with Out of the Box Observability Get Started

Tracing & Evals for Any Agent Framework Announcement Open Ecosystem

Lab 1: Explore Hosted Agent o Understand Hosted Agents structure

Fast-Forward to Production: The full Agent DevOps Loop Plan Code

Code-First Observability for Foundry Agents Skill-based guided experience Observability is

Lab: Activate Observe Skill o Understand Foundry Skills usage –

Hill Climb with confidence with Foundry Optimizer Plan Code Test

Lab: Optimize Hosted Agent o Explore Failure Analysis – skill

Foundry Observability End-to-end visibility, quality & control for production AI

New Evaluation Capabilities Public preview Multi-Turn Evaluation Evaluate agent performance

Foundry Evaluators Quality Document Retrieval Groundedness Relevance Response Completeness Coherence

 This workshop is also available for self-paced exploration at

© Copyright Microsoft Corporation. All rights reserved. Thank You