Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[AITour 26] Automate model selection with Azure...

[AITour 26] Automate model selection with Azure AI Foundry Model Router

As 85% of enterprises embrace multi-model AI strategies, developers face growing complexity in selecting, deploying, and managing the right models for each task. Discover how the model router in Azure AI Foundry intelligently automates model selection based on performance, cost, and latency. This intelligent orchestration layer reduces operational overhead, accelerates deployment, and unlocks scalable AI operations across your organization. Say goodbye to model fatigue and let the router do the strategic lifting.

Location: Toronto
Date: Oct 1, 2025
Session: https://aitour.microsoft.com/flow/microsoft/toronto26/sessioncatalog/page/sessioncatalog/session/1755310350425001jaaD

Visit the Repo:
https://github.com/microsoft/aitour26-LTG153-automate-model-selection-and-ai-app-design-with-azure-ai-foundry

Join the Discord:
https://aka.ms/model-mondays/discord

Avatar for Nitya Narasimhan, PhD

Nitya Narasimhan, PhD

October 08, 2025
Tweet

More Decks by Nitya Narasimhan, PhD

Other Decks in Technology

Transcript

  1. Automate model selection with Azure AI Foundry Model Router Nitya

    Narasimhan, PhD Senior AI Advocate Microsoft
  2. LLMs are transformational Transform natural language into: • Conversational Interfaces

    • Structured Data Extraction • Agent Orchestration • Task Automation • Rich Media Generation and much more!
  3. LLMs are transformational So, let’s just pick one for our

    new Zava enterprise scenario! Should be easy to do.
  4. LLMs are transformational So, let’s just pick one for our

    new Zava enterprise scenario! Should be easy to do. Not quite ..
  5. First steps Typical Approach Organizations will frequently default to a

    single LLM—often prioritizing either costs or quality. Problem Your workflows and scenarios are made of numerous steps and tasks on a quality-cost spectrum. Consequence Significant costs that add up fast because of missed quality-cost triage capabilities for large, high- volume systems.
  6. Additional challenges DEMAND FOR COST EFFICIENCIES BETTER GOVERNANCE BOTH SPEED

    & QUALITY FASTER MODEL OPS AND GTM MOTIONS ELIMINATING MODEL OVERLOAD + ASKS FOR
  7. Route each prompt to the most suitable model based on

    factors like complexity, costs, and performance. Smart routing optimizes cost while maintaining quality. Dynamically select the best model. Reduce expenses without degrading user experience. The Solution Your Prompt PROMPT ROUTING Best Response <input> <output> . . Higher performance Reduce overall costs Router endpoint For example: • Simple Query: “How is the weather in Paris in May or June?” • Complex Query: “Plan a 5-day itinerary from Paris to Venice for a vegan family of 10…”
  8. Model Router Preview GPT 4.1, o4-mini, and GPT 5 model

    families Optimize on quality, cost and latency Enterprise grade reliability of Azure Intelligent prompt routing across popular AI models Use lightweight models for simpler tasks and premium models for complex queries. Pick the best model based on task complexity and required quality. Built-in security, observability, and integrations.
  9. How it works “How is the weather in Paris in

    May?” <Simple query> “Plan a 5-day itinerary from Paris to Venice for a vegan family of 10” <Complex query> a b Step 1 Users invoke the endpoint with their prompt “Develop a game with the following ten rules to guide the play:…” Reasoning query> c Your prompts of varying complexity are sent to a central router Router endpoint Azure Model Router GPT-5-mini GPT-5-nano GPT-5- reasoning Dynamic intelligent routing a b Step 2 Router assesses the input parameters (like prompt, tool use) and dynamically routes to the optimal model c The router performs intelligent analysis to map to the best-fit model Optimal Response <output> Step 3 The endpoint returns underlying model’s response. And returns transparent cost-effective responses from the chosen model
  10. How to use… DEPLOY INVOKE RESPONSE Model router is packaged

    as a single Azure AI Foundry model that you deploy. Use chat completions API in the same way you'd use other OpenAI chat models. In the JSON response, the "model" field reveals which underlying model responded.
  11. Model Router Datasheet Regions  East US2  Sweden Central

    Deployment types  Global Standard API (s)  Chat Completions Supported models  gpt-4.1  gpt-4.1-mini  gpt-4.1-nano  o4-mini  gpt-5  gpt-5-mini  gpt-5-nano  gpt-5-chat • To be effective starting October. Delayed from the published September 1st date. Pricing • 200K tokens - Smallest underlying model from the collection. Context window • Depends on the underlying model Max output tokens • Parameters like `temperature` and `top_p` will be ignored if not supported by the underlying models. Parameters • Model Router selects a `reasoning_effort` input value based on the complexity of the prompt. Reasoning parameters 1. Monitoring > Metrics page 2. Filter by your model router deployment 3. Split the metrics by underlying models Monitor performance 1. Resource Management -> Cost analysis page 2. Filter by Resource 3. Filter by Deployment name (tag) Monitor cost
  12. The Model Router Advantage MINIMIZE AI COSTS WHILE MAINTAINING QUALITY

    BY MATCHING QUERIES TO MOST SUITABLE MODELS REDUCED MAINTENANCE COMPLEXITY BETTER CUSTOMER EXPERIENCE & INNOVATION FOR YOU COST EFFICIENCIES BETTER OUTCOMES STREAMLINED MODELOPS BETTER FOR EVERYONE Visit: aka.ms/ModelRouter
  13. Feedback Your feedback is valuable. Please submit your thoughts about

    today’s experiences at aka.ms/MicrosoftAITour/Survey …or use the QR code. Scan QR code to respond