Skip to content

APIM ❀️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more πŸš€ . New workshop experience at https://aka.ms/ai-gateway/workshop

License

Notifications You must be signed in to change notification settings

Azure-Samples/AI-Gateway

Repository files navigation

πŸ§ͺ AI Gateway labs

Open Source Love

What's new ✨

βž• AI Gateway workshop provides a comprehensive learning experience using the Azure Portal

workshop

βž• Refactor most of the labs to use the new LLM built-in logging that supports streaming completions.
βž• Realtime API (Audio and Text) with Azure OpenAI πŸ”₯ experiments with the AOAI Realtime
βž• Realtime API (Audio and Text) with Azure OpenAI + MCP tools πŸ”₯ experiments with the AOAI Realtime + MCP
βž• Model Context Protocol (MCP) βš™οΈ experiments with the client authorization flow
βž• the FinOps Framework lab to manage AI budgets effectively πŸ’°
βž• Agentic ✨ experiments with Model Context Protocol (MCP).
βž• Agentic ✨ experiments with OpenAI Agents SDK.
βž• Agentic ✨ experiments with AI Agent Service from Azure AI Foundry.

Contents

  1. 🧠 AI Gateway
  2. πŸ§ͺ Labs with AI Agents
  3. πŸ§ͺ Labs with the Inference API
  4. πŸ§ͺ Labs based on Azure OpenAI
  5. πŸš€ Getting started
  6. πŸ”¨ Supporting tools
  7. πŸ›οΈ Well-Architected Framework
  8. πŸ₯‡ Other Resources

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI models, data and tools.

With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.

🧠 AI Gateway

AI-Gateway flow

This repo explores the AI Gateway pattern through a series of experimental labs. The AI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure AI Foundry models, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any third party model.

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:

πŸ§ͺ Labs with AI Agents

Playground to experiment the Model Context Protocol with the client authorization flow. In this flow, Azure API Management act both as an OAuth client connecting to the Microsoft Entra ID authorization server and as an OAuth authorization server for the MCP client (MCP inspector in this lab).

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to experiment the Model Context Protocol with Azure API Management to enable plug & play of tools to LLMs. Leverages the credential manager for managing OAuth 2.0 tokens to backend tools and client token validation to ensure end-to-end authentication and authorization.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the OpenAI Agents with Azure OpenAI models and API based tools controlled by Azure API Management.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Use this playground to explore the Azure AI Agent Service, leveraging Azure API Management to control multiple services, including Azure OpenAI models, Logic Apps Workflows, and OpenAPI-based APIs.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the OpenAI function calling feature with an Azure Functions API that is also managed by Azure API Management.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

πŸ§ͺ Labs with the Inference API

Playground to try the Deepseek R1 model via the AI Model Inference from Azure AI Foundry. This lab uses the Azure AI Model Inference API and two APIM LLM policies: llm-token-limit and llm-emit-token-metric.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the self-hosted Phi-3 Small Language Model (SLM) through the Azure API Management self-hosted gateway with OpenAI API compatibility.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

πŸ§ͺ Labs based on Azure OpenAI

This playground leverages the FinOps Framework and Azure API Management to control AI costs. It uses the token limit policy for each product and integrates Azure Monitor alerts with Logic Apps to automatically disable APIM subscriptions that exceed cost quotas.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the token rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the emit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the semantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the OAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try routing to a backend based on Azure OpenAI model and version.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the buil-in LLM logging capabilities of Azure API Management. Logs requests into Azure Monitor to track details and token usage.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to test storing message details into Cosmos DB through the LLM Logging to event hub.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Playground to try the content safety policy. The policy enforces content safety checks on any LLM prompts by transmitting them to the Azure AI Content Safety service before sending to the backend LLM API.

flow

🦾 Bicep βž• βš™οΈ Policy βž• 🧾 Notebook

Backlog of Labs

This is a list of potential future labs to be developed.

  • Third party models: Google Gemini, AWS Bedrock
  • Semantic Kernel with Agents
  • Logic Apps RAG
  • PII handling

πŸš€ Getting Started

Prerequisites

Quickstart

  1. Clone this repo and configure your local machine with the prerequisites. Or just create a GitHub Codespace and run it on the browser or in VS Code.
  2. Navigate through the available labs and select one that best suits your needs. For starters we recommend the token rate limiting.
  3. Open the notebook and run the provided steps.
  4. Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate your submission of a pull request.

Note

πŸͺ² Please feel free to open a new issue if you find something that should be fixed or enhanced.

πŸ”¨ Supporting Tools

  • Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
  • Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.
  • AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. The app.py can be customized to tailor the Mock server to specific use cases.

πŸ›οΈ Well-Architected Framework

The Azure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.

Lab Security Reliability Performance Operations Costs
Access controlling ⭐
Backend pool load balancing ⭐ ⭐ ⭐
Semantic caching ⭐ ⭐
Token rate limiting ⭐ ⭐
Built-in LLM logging ⭐
FinOps framework ⭐ ⭐

πŸ₯‡ Other resources

We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.

Disclaimer

Important

This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

About

APIM ❀️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more πŸš€ . New workshop experience at https://aka.ms/ai-gateway/workshop

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks