apidays Paris 2024 - Azure API Management as a GenAI Gateway, Iheb Khemissi and Andrei Kamenev, Microsoft

Azure API Management as GenAI Gateway Andrei Kamenev Senior Product
Manager Microsoft Iheb Khemissi Cloud Solution Architect Microsoft

GenAI development runs on APIs Conversational Agents Personalized Content Content
Generation Chat on your Data Voice Assistants Your own Copilot But these APIs must be Managed Secured Governed Azure AI Services OpenAI Mistral LLaMa Azure AI Search Hugging Face Cohere and more!

Unmanaged APIs: A Hidden Risk in the Era of AI

Azure API Management enables AI APIs Conversational Agents Personalized Content
Content Generation Chat on your Data Voice Assistants Your own Copilot Cost efficiency High reliability Robust security Developer enablement Enhanced governance Native Azure integration Defender for APIs Policy Monitor … and more Azure AI Services OpenAI Mistral LLaMa Azure AI Search and more! Hugging Face Cohere

GenAI gateway capabilities in API Management Intelligent App Intelligent App
Intelligent App Azure API Management Token-based limiting Token-based quotas Load balancing Semantic caching Managed identity Azure OpenAI Endpoints

Azure OpenAI Token Metric policy Azure OpenAI Requests API Application
Insights Metrics Azure API Management Facilitate accurate cross-charging based on token consumption Collect token usage data <azure-openai-emit-token-metric namespace="AzureOpenAI"> <dimension name="User ID" /> <dimension name="Subscription ID" /> </azure-openai-emit-token-metric> policy.xml

Distribute requests across PTU and Pay-as-you-go instances Define load balancing
pools to include multiple Azure OpenAI endpoints Configure circuit breaker rules for successful failover Backend #1 Azure API Management Azure OpenAI Unavailable Azure OpenAI Load Balancing 😀😀 Primary region/instance Another region/instance Backend #2 😀😀 ⛔ ☹ Priority Requests Load Balancer and Circuit Breaker

Authentication and Authorization Azure API Management Azure OpenAI Requests Validate
JWT policy Managed Identity Configure managed identity authentication Validate claims in JWT to manage access to OpenAI endpoints Authenticate API consumers using subscription keys

Azure OpenAI Azure OpenAI Token Limit policy Token limit policy
Azure API Management Requests Configure tokens per minute (TPM) limits based on counter keys Define policy behavior for throttling <azure-openai-token-limit counter-key="@(context.Subscription.Id)“ token-quota="20000" token-quota-period="Daily" tokens-per-minute="1000" estimate-prompt-tokens="false" />

Azure OpenAI Semantic Caching policy Configure semantic caching for all
API consumers Define similarity score threshold for caching <azure-openai-semantic-cache-lookup score-threshold="0.05" embeddings-backend-id="azure-openai-backend"> <vary-by>@(context.Subscription.Id)"</vary-by> </azure-openai-semantic-cache-lookup> policy.xml Azure OpenAI Requests API Azure Cache for Redis Cache-lookup Azure API Management Azure OpenAI Embeddings model

Azure API Management Token-based limiting Token usage metrics GenAI Gateway
Load Balancing Authentication Semantic Caching Azure AI Model Inference API Support Intelligent App Intelligent App Intelligent App Azure AI Model Inference API

Support for GPT-4o in API Management Support for text and
image-based input across all GenAI Gateway capabilities Support for prompt tokens estimation

Demo: Runtime governance for AI APIs

Learn more about API Management and GenAI Documentation GenAI gateway
labs GenAI AI Hub accelerator GenAI gateway accelerator

apidays Paris 2024 - Azure API Management as a ...

apidays Paris 2024 - Azure API Management as a GenAI Gateway, Iheb Khemissi and Andrei Kamenev, Microsoft

apidays
PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

Azure API Management as GenAI Gateway Andrei Kamenev Senior Product

GenAI development runs on APIs Conversational Agents Personalized Content Content

Unmanaged APIs: A Hidden Risk in the Era of AI

Azure API Management enables AI APIs Conversational Agents Personalized Content

GenAI gateway capabilities in API Management Intelligent App Intelligent App

Azure OpenAI Token Metric policy Azure OpenAI Requests API Application

Distribute requests across PTU and Pay-as-you-go instances Define load balancing

Authentication and Authorization Azure API Management Azure OpenAI Requests Validate

Azure OpenAI Azure OpenAI Token Limit policy Token limit policy

Azure OpenAI Semantic Caching policy Configure semantic caching for all

Azure API Management Token-based limiting Token usage metrics GenAI Gateway

Support for GPT-4o in API Management Support for text and

Demo: Runtime governance for AI APIs

Learn more about API Management and GenAI Documentation GenAI gateway