Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Paris 2024 - Azure API Management as a ...

apidays
December 22, 2024

apidays Paris 2024 - Azure API Management as a GenAI Gateway, Iheb Khemissi and Andrei Kamenev, Microsoft

Azure API Management as a GenAI Gateway
Iheb Khemissi, Cloud Solutions Architect at Microsoft
Andrei Kamenev, Product Manager, Azure API Management, Cloud Architect at Microsoft

apidays Paris 2024 - The Future API Stack for Mass Innovation
December 3 - 5, 2024

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

December 22, 2024
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Azure API Management as GenAI Gateway Andrei Kamenev Senior Product

    Manager Microsoft Iheb Khemissi Cloud Solution Architect Microsoft
  2. GenAI development runs on APIs Conversational Agents Personalized Content Content

    Generation Chat on your Data Voice Assistants Your own Copilot But these APIs must be Managed Secured Governed Azure AI Services OpenAI Mistral LLaMa Azure AI Search Hugging Face Cohere and more!
  3. Azure API Management enables AI APIs Conversational Agents Personalized Content

    Content Generation Chat on your Data Voice Assistants Your own Copilot Cost efficiency High reliability Robust security Developer enablement Enhanced governance Native Azure integration Defender for APIs Policy Monitor … and more Azure AI Services OpenAI Mistral LLaMa Azure AI Search and more! Hugging Face Cohere
  4. GenAI gateway capabilities in API Management Intelligent App Intelligent App

    Intelligent App Azure API Management Token-based limiting Token-based quotas Load balancing Semantic caching Managed identity Azure OpenAI Endpoints
  5. Azure OpenAI Token Metric policy Azure OpenAI Requests API Application

    Insights Metrics Azure API Management Facilitate accurate cross-charging based on token consumption Collect token usage data <azure-openai-emit-token-metric namespace="AzureOpenAI"> <dimension name="User ID" /> <dimension name="Subscription ID" /> </azure-openai-emit-token-metric> policy.xml
  6. Distribute requests across PTU and Pay-as-you-go instances Define load balancing

    pools to include multiple Azure OpenAI endpoints Configure circuit breaker rules for successful failover Backend #1 Azure API Management Azure OpenAI Unavailable Azure OpenAI Load Balancing 😀😀 Primary region/instance Another region/instance Backend #2 😀😀 ⛔ ☹ Priority Requests Load Balancer and Circuit Breaker
  7. Authentication and Authorization Azure API Management Azure OpenAI Requests Validate

    JWT policy Managed Identity Configure managed identity authentication Validate claims in JWT to manage access to OpenAI endpoints Authenticate API consumers using subscription keys
  8. Azure OpenAI Azure OpenAI Token Limit policy Token limit policy

    Azure API Management Requests Configure tokens per minute (TPM) limits based on counter keys Define policy behavior for throttling <azure-openai-token-limit counter-key="@(context.Subscription.Id)“ token-quota="20000" token-quota-period="Daily" tokens-per-minute="1000" estimate-prompt-tokens="false" />
  9. Azure OpenAI Semantic Caching policy Configure semantic caching for all

    API consumers Define similarity score threshold for caching <azure-openai-semantic-cache-lookup score-threshold="0.05" embeddings-backend-id="azure-openai-backend"> <vary-by>@(context.Subscription.Id)"</vary-by> </azure-openai-semantic-cache-lookup> policy.xml Azure OpenAI Requests API Azure Cache for Redis Cache-lookup Azure API Management Azure OpenAI Embeddings model
  10. Azure API Management Token-based limiting Token usage metrics GenAI Gateway

    Load Balancing Authentication Semantic Caching Azure AI Model Inference API Support Intelligent App Intelligent App Intelligent App Azure AI Model Inference API
  11. Support for GPT-4o in API Management Support for text and

    image-based input across all GenAI Gateway capabilities Support for prompt tokens estimation
  12. Learn more about API Management and GenAI Documentation GenAI gateway

    labs GenAI AI Hub accelerator GenAI gateway accelerator