Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ABCS25: Azure OpenAI Unplugged: Real-World Less...

ABCS25: Azure OpenAI Unplugged: Real-World Lessons from My Latest GenAI Project by Dieter Gobeyn

Description
Thinking of using Azure OpenAI in production? In this session, I’ll share detailed hard-earned lessons from my latest GenAI project, where I combined Azure OpenAI GPT-4 with data integration workflows. As both a solution architect and hands-on developer, I’ll walk you through what worked, what flopped, and what I wish I knew earlier. We’ll dive into managing costs, securing data, reducing hallucinations, fine-tuning prompts, and keeping token usage in check. Whether you’re exploring AI or scaling it, this session is packed with tips to help you avoid pitfalls and build smarter, more efficient AI solutions.

Dieter Gobeyn
Enterprise Integration Architect & Azure Cloud Solutions Architect

Microsoft MVP (verified)

Dieter is a seasoned IT professional and Azure cloud expert, certified since 2012, with hands-on experience. Throughout his consulting career, he has specialized in crafting end-to-end enterprise architectures that prioritize clarity, simplicity, and scalability.

A certified TOGAF practitioner, Scrum Master, and Azure Architect, Dieter excels at designing clear and effective solutions to complex technical challenges. He has spent recent years working with international organizations, navigating the complexities of cross-border collaboration and fostering teamwork across multi-country setups.

As a passionate advocate for technology, Dieter shares his insights as a public speaker from experiences, focusing on his expertise in cloud solutions, enterprise integration, and the lessons learned from managing global teams in distributed environments.
[email protected]

linkedin.com/in/dietergobeyn/
azuretechinsider.com (blog)
@DieterGobeyn
azurewatcher.com (AzureWatcher)
integration.team (company)
bsky.app/profile/d... (BlueSky)

sessionize.com/dieter-gobeyn (public speaker profile)

Tweet

More Decks by Azure Zurich User Group

Other Decks in Technology

Transcript

  1. Agenda Introduction My GenAI Journey Quality measurements Model Evaluation Cost

    Reduction Strategies Azure OpenAI limits & Security Json Integration Q&A Session 4
  2. Dieter Gobeyn Azure MVP - hands-on Azure Cloud Solutions Architect

    & Azure certified since 2012 Passionate about Cloud & Integration Public speaker and blogger Belgian-born, now based in London Scuba diver - hiker – world traveler 5
  3. 6

  4. A Community Project Built on Azure OpenAI Data Filtering: Ignore

    typos, small rewordings, and URL tweaks. 1 Classify data: Identify meaningful updates and provide clear summaries. 2 Estimate Impact & Urgency: Assess how changes affect users and implementations. 3 Regroup or Reclassify by feature: Organize updates for easier navigation. 4 Summarize Text: Provide concise, structured content. 5 Generate HTML: Newsletter-ready html links 6 8 Keeping it cost-efficient without breaking the bank!
  5. GenAI Output Quality vs. Development Effort 0 1 2 3

    4 5 6 7 8 9 10 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Quality output Time spent A Subjective Learning curve 10
  6. Prompting Techniques • Master a New Skill – Just like

    learning how to Google effectively, mastering prompting is essential. • Invest Time Wisely – Spend a few hours upfront to get the basics right. • Know the Rules – Read OpenAI model guidelines to understand best practices and limitations. • Great references: • Prompt engineering techniques: Microsoft learn • Advanced prompt techniques: https://www.promptingguide.ai/ 11
  7. • AI is non-deterministic—some level of unpredictability is inherent. •

    Expect hallucinations and fabricated facts. These can be mitigated but not eliminated entirely. • My observation: Reducing inaccuracies is an ongoing process, not a one-time fix. (MY) Rule of Thumb • If you can write an algorithm instead of using an LLM, do it. • 100% consistent and cost-effective results. • Fewer total instructions → Less confusion → Higher quality outputs. 15 GENAI IS NOT 100% ACCURATE IS THAT GOOD ENOUGH FOR YOUR USE CASE? Quality Considerations
  8. Practical Prompting Strategies Prompt Engineering: Writing Effective Prompts • Be

    Clear & Specific → Avoid vague wording; make your request explicit. • Provide Context → More context ~ fewer hallucinations. • Refine Iteratively → Adjust prompts to improve accuracy. Prompt Chaining: Structuring Multi-Step Tasks • Step-by-Step Breakdown → Handle complex queries in (multiple) smaller steps. • Verification Points → Insert checkpoints for self-validation. • Balance Accuracy vs. Cost → More steps improve results but increase cost. If everything else fails: Evaluator Optimizer. (~”are you sure?”) 16
  9. Model Evaluation Assessing Model Performance & Quality • Trade-off cost

    vs. quality (e.g., GPT-4o mini is ~15-20x more expensive than GPT-4). • Standardize expectations with structured input/output testing—think unit tests in .NET for GenAI. • Azure frequently retires models—ensure adaptability and a migration plan. Evaluation Approaches • Custom Unit tests for AI behaviour validation • Azure OpenAI Evaluation 17
  10. Cost Reduction Strategies 18 Optimizing Cost vs. Quality • Higher-cost

    models offer better accuracy and up-to-date knowledge, but do you always need it? • Lower-cost models can be sufficient for many tasks, reducing expenses. Smart Cost Management • Mix & match models based on functionality • Use simpler models for routine tasks and premium models only where needed • Regularly evaluate model performance vs. cost to ensure efficiency
  11. Token Usage • Azure charges per token used • On

    average, 1 token = ~4 English characters • Fewer tokens = lower cost & faster responses Optimisation Tips • Use clear, concise prompts to minimise token consumption • Optimise outputs to avoid unnecessary verbosity • Manage rate limits (see later..) 19
  12. (Advanced) Model settings Temperature • Controls the “creativity” or randomness

    of LLM output. • Higher values (e.g., 0.7) → More varied and imaginative responses. • Lower values (e.g., 0.2) → More deterministic, focused, and consistent. • (What Temperature actually controls is the scaling of the scores.) 21
  13. (Advanced) Model settings Top_p (Nucleus Sampling) • Restricts choices to

    the top percentage of probable tokens. • Top_p = 0.1 → Only considers tokens within the top 10% of the probability. • Balances creativity with coherent, context-driven responses. 22
  14. Rate limiting • Azure enforces strict rate limits – it’s

    not generous. • Two main key constraints: Requests per minute (RPM) & Tokens per minute (TPM). • Regional quota documentation is outdated – actual availability and quota limits may differ. • Quota increases require approval, but enterprise subscriptions start with higher limits. • Verify Azure OpenAI Service quotas & limits before scaling workloads. (in the portal) 23
  15. Batch Jobs • 50% discount on Global Standard Pricing •

    Asynchronous processing for grouped requests with a dedicated quota • 24-hour target turnaround (best effort, not guaranteed) Ideal for: Large-scale inference workloads, cost-efficient processing, and non-urgent requests. 24
  16. Azure API Management as a Gateway for OpenAI 25 Single-Region

    APIM Routes traffic across regions Active-Active Load Balancine Uses VNet peering/private endpoints Fault Handling Removes throttled/unhealthy instances, retries requests Health Checks Marks gateway unhealthy if no instances available
  17. Smart Load Balancing for OpenAI Endpoints and Azure API Management

    https://github.com/Azure-Samples/openai-apim-lb 26
  18. Azure OpenAI Token Limit Policy (APIM) Purpose: Ensures fair &

    efficient API usage by managing token consumption per key. Key Features • Enforcement: Blocks requests exceeding the token limit (429 Too Many Requests). • Real-Time Monitoring: Uses OpenAI metrics for accurate tracking. • Prompt Token Precalculation: Prevents unnecessary API calls. • Configurable Limits: Set per-minute token caps per key. • Custom Headers: Provides retry and remaining token info. <azure-openai-token-limit counter-key="@(context.Request.IpAddress)" tokens-per-minute="5000" estimate-prompt-tokens="false" remaining-tokens-variable-name="remainingTokens" /> 27 https://learn.microsoft.com/en-us/azure/api-management/azure-openai-token-limit-policy
  19. Token Monitoring Custom Workbook: High-level summary of Azure OpenAI resources

    across subscriptions, detailing network access patterns and regional distributions. Monitoring Capabilities: In-depth metrics on HTTP requests, token usage, Throughput Unit (PTU) utilization, and fine-tuning activities by model name, deployment, and region. Insights Integration: Through diagnostic settings Demo workbook & Demo standard https://techcommunity.microsoft.com/blog/fasttrackfor azureblog/azure-openai-insights-monitoring-ai-with- confidence/4026850 28
  20. Introduction to JSON Mode in Azure OpenAI • Challenge: AI

    models don’t guarantee a perfect Json structured output → Application crash. • Most AI models support JSON Mode → Implement JSON mode to enforce structured outputs • AzureOpenAi nuget lacks JSON support → but Semantic Kernel Does! 29
  21. Semantic kernel what? 30 • Multi platform & Open-Source •

    Connectivity to AI Services • Custom “functions” to empower AI • Integrated memory support • Orchestate AI to use features available • Enterprise Integration • Consider using Semantic Kernel from the start.