APIM ❤️ OpenAI - 🧪 Labs for the GenAI Gateway capabilities of Azure API Management

What's new ✨

➕ the Zero-to-Production lab with an iterative policy exploration to fine-tune the optimal production configuration.
➕ the Terraform flavor of backend pool load balancing lab.
➕ the AI Foundry SDK lab.
➕ the Content filtering and Prompt shielding labs.
➕ the Model routing lab with OpenAI model based routing.
➕ the Prompt flow lab to try the Azure AI Studio Prompt Flow with Azure API Management.
➕ priority and weight parameters to the Backend pool load balancing lab.
➕ the Streaming tool to test OpenAI streaming with Azure API Management.
➕ the Tracing tool to debug and troubleshoot OpenAI APIs using Azure API Management tracing capability.
➕ image processing to the GPT-4o inferencing lab.
➕ the Function calling lab with a sample API on Azure Functions.

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed via APIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption of AI services.

With the expanding horizons of AI services and their seamless integration with APIs, there is a considerable demand for a comprehensive AI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of the AI Gateway provides a framework for the confident deployment of Intelligent Apps into production.

🧠 GenAI Gateway

This repo explores the AI Gateway pattern through a series of experimental labs. The GenAI Gateway capabilities of Azure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is on Azure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.

🧪 Labs

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, Bicep files and Azure API Management policies:

These labs are currently recommended after which to model your workloads.

🧪 Backend pool load balancing - Available with Bicep and Terraform

Playground to try the built-in load balancing backend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
images		images
labs		labs
modules		modules
shared		shared
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
AI-GATEWAY.md		AI-GATEWAY.md
AI-GATEWAY.pptx		AI-GATEWAY.pptx
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Lab	Security	Reliability	Performance	Operations	Costs
Request forwarding	⭐
Backend circuit breaking	⭐	⭐
Backend pool load balancing	⭐	⭐	⭐
Advanced load balancing	⭐	⭐	⭐
Response streaming	⭐		⭐
Vector searching	⭐	⭐	⭐
Built-in logging	⭐	⭐	⭐	⭐	⭐
SLM self-hosting	⭐		⭐

License

Azure-Samples/AI-Gateway

Folders and files

Latest commit

History

Repository files navigation

APIM ❤️ OpenAI - 🧪 Labs for the GenAI Gateway capabilities of Azure API Management

What's new ✨

Contents

🧠 GenAI Gateway

🧪 Labs

🧪 Backend pool load balancing - Available with Bicep and Terraform

🧪 SLM self-hosting (phy-3)

🧪 Load balancing with policy expressions (consider to use backend pool load balancing instead)

Backlog of Labs

🚀 Getting Started

Prerequisites

Quickstart

🔨 Tools

🏛️ Well-Architected Framework

🎒 Show and tell

🥇 Other resources

🌐 WW GBB initiative

Disclaimer

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages