Building Kinder and Cost-Effective Agents through fine tuning

by Bethany Jepchumba

Embed

Start on current slide

Slide 1

Slide 1 text

Part of the AgentCon World Tour by the Global AI Community AI Agents Developer Conference #AgentConNairobi

Slide 2

Slide 2 text

Made Possible by

Slide 3

Slide 3 text

BY BETHANY JEPCHUMBA Building Kinder and Cost-Effective Agents through fine tuning #AgentConNairobi

Slide 4

Slide 4 text

What we will cover today Unlock business value with fine-tuning Model customization is critical when you need to improve the quality, accuracy, and operation costs for your agentic AI apps Fine-tuning in Microsoft Foundry When is fine-tuning the right choice for me? Why is Microsoft Foundry the right platform to use? Fine-tuning for Agentic Applications Why do models struggle with multi-turn tool calling? How does Fine-Tuning improve agent performance? Summary Model Customization – unlocks business value for enterprise Microsoft Foundry - streamlines fine-tuning for AI developers Demo: Basic Fine-Tuning Understand supervised fine-tuning Demo: Custom Grader Write evaluators to test target criteria Demo: Agentic Fine-Tuning SFT + Distillation to improve tool calling Demo: Synthetic Data Get high-quality datasets with less effort

Slide 5

Slide 5 text

Unlock business value with model optimization

Slide 6

Slide 6 text

Why does Zava need model customization? Bruno Zhao Zava Customer Help me find the right product for DIY project. Who can I talk to? Make it helpful Customize Cora’s tone & response format to be polite, helpful & conversational Robin Counts Retail Store Manager Help me build loyalty. Can we ensure its responses are valid? Make it accurate Improve Cora’s use of tool selection, parameter propagation, policy adherence Kian Lambert App Dev Manager Help me save on operating costs. Can I teach a cheaper model this task? Make it cost-effective Distill Cora’s task-specific behaviors into smaller models without losing accuracy

Slide 7

Slide 7 text

The AI Engineer’s customization journey Fine-tune the LLM to: • Reduce the length of your prompt • Show not tell the model how to behave • Improve the accuracy when you look up information • Improve the model’s handling of retrieved data Prompt I’m painting my living room wall. What paint should I buy Tone and style Query Extraction Example responses Personalization Intent mapping Inventory retrieval User input Context engineering Desired Output Model adaptation Basic prompt engineering Retrieval/RAG Response I recommend our Eggshell Paint Would you like to know more about color choices

Slide 8

Slide 8 text

What does fine-tuning mean? LLM Fine-tuned LLM Fine-tuning refers to customizing a pre-trained LLM with additional training on a specific task or new dataset for enhanced performance, new skills, or improved accuracy

Slide 9

Slide 9 text

Why should Zava consider fine-tuning? Domain-specific optimization Task-specific optimization Reduced token consumption Efficient resource utilization Smaller models, faster response Shorter prompts, improve response Improve quality Reduce cost Reduce latency Example: Zava has a domain-specific focus (retail) and task-specific focus (question-answering). Let’s think about how fine-tuning can help optimization

Slide 10

Slide 10 text

Customization options in Microsoft Foundry

Slide 11

Slide 11 text

What are my fine-tuning options in Microsoft Foundry? Supervised fine-tuning Module learns from examples Ex: Content generation task Reinforcement fine-tuning Use grader to reinforce CoT Ex: Reasoning tasks Model distillation Transfer learning to cheaper model Optimize for COST Vision fine-tuning Preference fine-tuning Hybrid fine-tuning Improve image understanding Ex: ClassificationTask Provide good & bad examples Ex: Tone adaptation Improve model use of RAG context Optimize for PRECISION

Slide 12

Slide 12 text

Demo: Supervised fine-tuning in Zava Bruno Zhao Zava Customer Make it helpful Customize Cora’s tone & response format to be polite, helpful & conversational

Slide 13

Slide 13 text

How can I fine-tune my model to customize the tone? Decide vision and scope Choose base model Choose FT technique Pick enterprise-ready model options Dataset Fine-tuning Evaluation Deploy and monitor Regularly benchmark and iterate!

Slide 14

Slide 14 text

Demo: Fine-tuning for Agentic Applications Make it accurate Improve Cora’s use of tool selection, parameter propagation, policy adherence Make it cost-effective Distill Cora’s task-specific behaviors into smaller models without losing accuracy

Slide 15

Slide 15 text

New for Fine-tuning AI.Azure.com No Data? No problem! Synthesize training data from documents and code with synthetic data generation Public Preview Train for Less! 50% discount with Dev Training tier – and higher quota Public Preview Open Source Models Ministral, OSS-20B, Llama 3.3 70B, and Qwen3 32B – in the same UX Public Preview Agentic RFT Fine-tune for tool use in GPT-5 chain of thought reasoning Private Preview We’re making every aspect of fine-tuning better

Slide 16

Slide 16 text

Synthetic data generation Pre-defined recipes like Q&A and tool calling (for agents) Upload PDFs, docs, or code Our multi-agent framework creates high quality training data Public preview

Slide 17

Slide 17 text

Example: Distillation for Tool Calling Accuracy in Microsoft Foundry Objective Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT 4.1-mini Technique: Supervised fine-tuning Data: Synthetic data generation Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate!

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Developer Training Tier Training can be expensive— especially for RFT models! DevTier training offers a 50% discount Jobs execute on pre-emptible capacity—think of it like spot VMs for training Public preview

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Trainer 4. Trainer updates model weights to produce the best CoT Let me guess x = apple I need to subtract 5 x = -5 I need to subtract 1 x = 1 2. Model generates multiple samples 0 1 0.5 3. Grader assign samples a score between 0-1. E.g., 0.5 if output is a number, 0.5 if output is correct Grader What is x? x+5=0 1. Prompt sent to model Model Reinforcement Fine-Tuning (RFT) teaches the model to produce outputs that score highly on a learned reward metric. RFT can elicit more structured, goal-directed reasoning behaviors that are not directly obtainable through imitation alone Reinforcement Fine Tuning: Improve Model Reasoning

Slide 24

Slide 24 text

In Preview: Agentic RFT with GPT-5 Objective Teach a best-in-class model to call the right tools to solve complex business problems Model: GPT-5 Technique: Reinforcement fine-tuning Data: 10 manually curated examples Train: Foundry UI & Python SDK Evaluate: Foundry evals Deploy: DevTier Regularly benchmark and iterate! Sign up at aka.ms/agentic-rft-preview

Slide 25

Slide 25 text

“ Real world results from fine-tuning We consolidated three steps into one, response times that were previously five or six seconds came down to one and a half to two seconds on average. This approach made the system more efficient and the 50% reduction in latency made conversations with Discovery AI feel seamless. —Stuart Emslie, Head of Actuarial and Data Science

Slide 26

Slide 26 text

Summary & recap

Slide 27

Slide 27 text

Recap: Model customization unlocks business goals I’m painting my living room wall User input System prompt Few shot examples Add my data Grounded responses Prompt engineering + RAG Shorter prompts Supervised fine-tuning Smaller, cheaper models Distillation Better tool calling Good choice! I recommend our Eggshell Paint. Would you like to know more about color choices? LLM output SFT

Slide 28

Slide 28 text

Recap: Microsft Foundry makes fine-tuning seamless Model choice The best models from the best providers Choose serverless or managed compute Reliability 99.9% availability for Azure OpenAI models Latency guarantees with PTU-M Foundry platform Everything you need in one place: models, training, evaluation, deployments, and metrics Scalability Start with low cost DevTier to experiment Scale up with PTU-M for production workloads

Slide 29

Slide 29 text

aka.ms/MicrosoftAITour/BRK443 Download today’s presentation …or scan the QR code. Scan QR code to download

Slide 30

Slide 30 text

aka.ms/BestModelGenAISolution https://aka.ms/ft-demos Next steps to advance your AI expertise

Slide 31

Slide 31 text

Join the AI Conversation on Discord gaic.io/discord

Slide 32

Slide 32 text

Session Feedback

Slide 33

Slide 33 text

Join Microsoft Foundry Developer Community Microsoft Foundry Discord Let's Build the Future of AI Together