[AITour 26] Automate model selection with Azure AI Foundry Model Router

Automate model selection with Azure AI Foundry Model Router Nitya
Narasimhan, PhD Senior AI Advocate Microsoft

LLMs are transformational Transform natural language into: • Conversational Interfaces
• Structured Data Extraction • Agent Orchestration • Task Automation • Rich Media Generation and much more!

LLMs are transformational So, let’s just pick one for our
new Zava enterprise scenario! Should be easy to do.

LLMs are transformational So, let’s just pick one for our
new Zava enterprise scenario! Should be easy to do. Not quite ..

First steps Typical Approach Organizations will frequently default to a
single LLM—often prioritizing either costs or quality. Problem Your workflows and scenarios are made of numerous steps and tasks on a quality-cost spectrum. Consequence Significant costs that add up fast because of missed quality-cost triage capabilities for large, high- volume systems.

Additional challenges DEMAND FOR COST EFFICIENCIES BETTER GOVERNANCE BOTH SPEED
& QUALITY FASTER MODEL OPS AND GTM MOTIONS ELIMINATING MODEL OVERLOAD + ASKS FOR

Route each prompt to the most suitable model based on
factors like complexity, costs, and performance. Smart routing optimizes cost while maintaining quality. Dynamically select the best model. Reduce expenses without degrading user experience. The Solution Your Prompt PROMPT ROUTING Best Response <input> <output> . . Higher performance Reduce overall costs Router endpoint For example: • Simple Query: “How is the weather in Paris in May or June?” • Complex Query: “Plan a 5-day itinerary from Paris to Venice for a vegan family of 10…”

Model Router Preview GPT 4.1, o4-mini, and GPT 5 model
families Optimize on quality, cost and latency Enterprise grade reliability of Azure Intelligent prompt routing across popular AI models Use lightweight models for simpler tasks and premium models for complex queries. Pick the best model based on task complexity and required quality. Built-in security, observability, and integrations.

How it works “How is the weather in Paris in
May?” <Simple query> “Plan a 5-day itinerary from Paris to Venice for a vegan family of 10” <Complex query> a b Step 1 Users invoke the endpoint with their prompt “Develop a game with the following ten rules to guide the play:…” Reasoning query> c Your prompts of varying complexity are sent to a central router Router endpoint Azure Model Router GPT-5-mini GPT-5-nano GPT-5- reasoning Dynamic intelligent routing a b Step 2 Router assesses the input parameters (like prompt, tool use) and dynamically routes to the optimal model c The router performs intelligent analysis to map to the best-fit model Optimal Response <output> Step 3 The endpoint returns underlying model’s response. And returns transparent cost-effective responses from the chosen model

How to use… DEPLOY INVOKE RESPONSE Model router is packaged
as a single Azure AI Foundry model that you deploy. Use chat completions API in the same way you'd use other OpenAI chat models. In the JSON response, the "model" field reveals which underlying model responded.

Let’s See A Demo!

Model Router Mock Demo – Using Synthetic Data for Scenario
Exploration

Model Router Live Demo – Using Model Deployments (with gpt-5
benchmark model)

Model Router Pre-Recorded Demo – Live Comparison vs. Benchmark model
(gpt-5)

Model Router Datasheet Regions  East US2  Sweden Central
Deployment types  Global Standard API (s)  Chat Completions Supported models  gpt-4.1  gpt-4.1-mini  gpt-4.1-nano  o4-mini  gpt-5  gpt-5-mini  gpt-5-nano  gpt-5-chat • To be effective starting October. Delayed from the published September 1st date. Pricing • 200K tokens - Smallest underlying model from the collection. Context window • Depends on the underlying model Max output tokens • Parameters like `temperature` and `top_p` will be ignored if not supported by the underlying models. Parameters • Model Router selects a `reasoning_effort` input value based on the complexity of the prompt. Reasoning parameters 1. Monitoring > Metrics page 2. Filter by your model router deployment 3. Split the metrics by underlying models Monitor performance 1. Resource Management -> Cost analysis page 2. Filter by Resource 3. Filter by Deployment name (tag) Monitor cost

The Model Router Advantage MINIMIZE AI COSTS WHILE MAINTAINING QUALITY
BY MATCHING QUERIES TO MOST SUITABLE MODELS REDUCED MAINTENANCE COMPLEXITY BETTER CUSTOMER EXPERIENCE & INNOVATION FOR YOU COST EFFICIENCIES BETTER OUTCOMES STREAMLINED MODELOPS BETTER FOR EVERYONE Visit: aka.ms/ModelRouter

Feedback Your feedback is valuable. Please submit your thoughts about
today’s experiences at aka.ms/MicrosoftAITour/Survey …or use the QR code. Scan QR code to respond

[AITour 26] Automate model selection with Azure...

[AITour 26] Automate model selection with Azure AI Foundry Model Router

Nitya Narasimhan, PhD

More Decks by Nitya Narasimhan, PhD

Other Decks in Technology

Featured

Transcript

Automate model selection with Azure AI Foundry Model Router Nitya

LLMs are transformational Transform natural language into: • Conversational Interfaces

LLMs are transformational So, let’s just pick one for our

LLMs are transformational So, let’s just pick one for our

First steps Typical Approach Organizations will frequently default to a

Additional challenges DEMAND FOR COST EFFICIENCIES BETTER GOVERNANCE BOTH SPEED

Route each prompt to the most suitable model based on

Model Router Preview GPT 4.1, o4-mini, and GPT 5 model

How it works “How is the weather in Paris in

How to use… DEPLOY INVOKE RESPONSE Model router is packaged

Let’s See A Demo!

Model Router Mock Demo – Using Synthetic Data for Scenario

Model Router Live Demo – Using Model Deployments (with gpt-5

Model Router Pre-Recorded Demo – Live Comparison vs. Benchmark model

Model Router Datasheet Regions  East US2  Sweden Central

The Model Router Advantage MINIMIZE AI COSTS WHILE MAINTAINING QUALITY

Feedback Your feedback is valuable. Please submit your thoughts about