mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
5e056b672e
* Python: Provider-leading client design & OpenAI package extraction Major refactoring of the Python Agent Framework client architecture: - Extract OpenAI clients into new `agent-framework-openai` package - Core package no longer depends on openai, azure-identity, azure-ai-projects - Rename clients for discoverability: OpenAIResponsesClient → OpenAIChatClient, OpenAIChatClient → OpenAIChatCompletionClient - Unify `model_id`/`deployment_name`/`model_deployment_name` → `model` param - New FoundryChatClient for Azure AI Foundry Responses API - New FoundryAgent/FoundryAgentClient for connecting to pre-configured Foundry agents - Remove OpenAIBase/OpenAIConfigMixin from non-deprecated client MRO - Deprecate AzureOpenAI* clients, AzureAIClient, OpenAIAssistantsClient - Reorganize samples: azure_openai+azure_ai+azure_ai_agent → azure/ - ADR-0020: Provider-Leading Client Design Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: missing Agent imports in samples, .model_id → .model in foundry_local sample Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: CI failures — mypy errors, coverage targets, sample imports - azure-ai mypy: add type ignores for TypedDict total=, model arg, forward ref - Coverage: replace core.azure/openai targets with openai package target - project_provider: add type annotation for opts dict Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: populate openai .pyi stub, fix broken README links, coverage targets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fixes * updated observabilitty * reset azure init.pyi * fix errors * updated adr number * fix foundry local * fixed not renamed docstrings and comments, and added deprecated markers to old classes * fix tests and pyprojects * fix test vars * updated function tests * update durable * updated test setup for functions * Fix Foundry auth in workflow samples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Stabilize Python integration workflows Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update hosting samples for Foundry Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trigger full CI rerun Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trigger CI rerun again Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * trigger rerun * trigger rerun * fix for litellm * undo durabletask changes * Move Foundry APIs into foundry namespace Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Foundry pyproject formatting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Split provider samples by Foundry surface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restore hosting sample requirements Also fix the Foundry Local sample link after the provider sample move. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * updated tests * udpated foundry integration tests * removed dist from azurefunctions tests * Use separate Foundry clients for concurrent agents Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix client setup in azfunc and durable * disabled two tests * updated setup for some function and durable tests * improved azure openai setup with new clients * ignore deprecated * fixes * skip 11 * remove openai assistants int tests --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
256 lines
8.9 KiB
Python
256 lines
8.9 KiB
Python
# /// script
|
||
# requires-python = ">=3.10"
|
||
# dependencies = [
|
||
# "agent-framework",
|
||
# "tenacity",
|
||
# ]
|
||
# ///
|
||
# Run with any PEP 723 compatible runner, e.g.:
|
||
# uv run samples/02-agents/auto_retry.py
|
||
|
||
# Copyright (c) Microsoft. All rights reserved.
|
||
|
||
import asyncio
|
||
import logging
|
||
from collections.abc import Awaitable, Callable
|
||
from typing import Any, TypeVar, cast
|
||
|
||
from agent_framework import Agent, ChatContext, ChatMiddleware, SupportsChatGetResponse, chat_middleware
|
||
from agent_framework.foundry import FoundryChatClient
|
||
from azure.identity import AzureCliCredential
|
||
from dotenv import load_dotenv
|
||
from openai import RateLimitError
|
||
from tenacity import (
|
||
AsyncRetrying,
|
||
before_sleep_log,
|
||
retry,
|
||
retry_if_exception_type,
|
||
stop_after_attempt,
|
||
wait_exponential,
|
||
)
|
||
|
||
# Load environment variables from .env file
|
||
load_dotenv()
|
||
|
||
"""
|
||
Auto-Retry Rate Limiting Sample
|
||
|
||
Every model inference API enforces rate limits, so production agents need retry logic
|
||
to handle 429 responses gracefully. This sample shows two ways to add automatic retry
|
||
using the `tenacity` library, keeping your application code free of boilerplate.
|
||
|
||
Approach 1 – Class decorator
|
||
Apply a class decorator to any client type implementing
|
||
SupportsChatGetResponse. The decorator patches get_response() with retry
|
||
behavior. Non-streaming responses are retried; streaming is returned as-is
|
||
(streaming retry requires more delicate handling).
|
||
|
||
Approach 2 – Chat middleware
|
||
Register middleware on the agent that catches RateLimitError raised inside
|
||
call_next() and retries the entire request pipeline. Two styles are shown:
|
||
a) Class-based middleware (ChatMiddleware subclass)
|
||
b) Function-based middleware (@chat_middleware decorator)
|
||
|
||
Both approaches use the same tenacity primitives:
|
||
- stop_after_attempt – cap the total number of tries
|
||
- wait_exponential – exponential back-off between retries
|
||
- retry_if_exception_type(RateLimitError) – only retry on 429 errors
|
||
- before_sleep_log – log each retry attempt at WARNING level
|
||
"""
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
RETRY_ATTEMPTS = 3
|
||
|
||
# =============================================================================
|
||
# Approach 1: Class decorator
|
||
# =============================================================================
|
||
|
||
|
||
ChatClientT = TypeVar("ChatClientT", bound=SupportsChatGetResponse[Any])
|
||
|
||
|
||
def with_rate_limit_retry(*, retry_attempts: int = RETRY_ATTEMPTS) -> Callable[[type[ChatClientT]], type[ChatClientT]]:
|
||
"""Class decorator that adds non-streaming retry behavior to get_response()."""
|
||
|
||
def decorator(client_cls: type[ChatClientT]) -> type[ChatClientT]:
|
||
original_get_response = client_cls.get_response
|
||
|
||
def get_response_with_retry(self, *args, **kwargs): # type: ignore[no-untyped-def]
|
||
stream = kwargs.get("stream", False)
|
||
|
||
if stream:
|
||
# Streaming retry is more complex; fall back to the original behaviour.
|
||
return original_get_response(self, *args, **kwargs)
|
||
|
||
async def _with_retry():
|
||
async for attempt in AsyncRetrying(
|
||
stop=stop_after_attempt(retry_attempts),
|
||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||
retry=retry_if_exception_type(RateLimitError),
|
||
reraise=True,
|
||
before_sleep=before_sleep_log(logger, logging.WARNING),
|
||
):
|
||
with attempt:
|
||
return await original_get_response(self, *args, **kwargs)
|
||
return None
|
||
|
||
return _with_retry()
|
||
|
||
client_cls.get_response = cast(Any, get_response_with_retry)
|
||
return client_cls
|
||
|
||
return decorator
|
||
|
||
|
||
@with_rate_limit_retry()
|
||
class RetryingFoundryChatClient(FoundryChatClient):
|
||
"""Azure OpenAI Chat client with class-decorator-based retry behavior."""
|
||
|
||
|
||
# =============================================================================
|
||
# Approach 2a: Class-based chat middleware
|
||
# =============================================================================
|
||
|
||
|
||
class RateLimitRetryMiddleware(ChatMiddleware):
|
||
"""Chat middleware that retries a single model-call pipeline on rate limit errors.
|
||
|
||
Register this middleware on an agent (or at the run level) to automatically
|
||
retry any chat-model call that raises RateLimitError. In tool-loop scenarios,
|
||
the middleware applies independently to each inner model call.
|
||
"""
|
||
|
||
def __init__(self, *, max_attempts: int = RETRY_ATTEMPTS) -> None:
|
||
"""Initialize with the maximum number of retry attempts."""
|
||
self.max_attempts = max_attempts
|
||
|
||
async def process(
|
||
self,
|
||
context: ChatContext,
|
||
call_next: Callable[[], Awaitable[None]],
|
||
) -> None:
|
||
"""Retry call_next() on rate limit errors with exponential back-off."""
|
||
async for attempt in AsyncRetrying(
|
||
stop=stop_after_attempt(self.max_attempts),
|
||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||
retry=retry_if_exception_type(RateLimitError),
|
||
reraise=True,
|
||
before_sleep=before_sleep_log(logger, logging.WARNING),
|
||
):
|
||
with attempt:
|
||
await call_next()
|
||
|
||
|
||
# =============================================================================
|
||
# Approach 2b: Function-based chat middleware
|
||
# =============================================================================
|
||
|
||
|
||
@chat_middleware
|
||
async def rate_limit_retry_middleware(
|
||
context: ChatContext,
|
||
call_next: Callable[[], Awaitable[None]],
|
||
) -> None:
|
||
"""Function-based chat middleware that retries on rate limit errors.
|
||
|
||
Wrap call_next() with a tenacity @retry decorator so any RateLimitError
|
||
raised during a single model call triggers an automatic retry with exponential
|
||
back-off. In tool-loop scenarios, the middleware applies independently to
|
||
each inner model call.
|
||
"""
|
||
|
||
@retry(
|
||
stop=stop_after_attempt(RETRY_ATTEMPTS),
|
||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||
retry=retry_if_exception_type(RateLimitError),
|
||
reraise=True,
|
||
before_sleep=before_sleep_log(logger, logging.WARNING),
|
||
)
|
||
async def _call_next_with_retry() -> None:
|
||
await call_next()
|
||
|
||
await _call_next_with_retry()
|
||
|
||
|
||
# =============================================================================
|
||
# Demo
|
||
# =============================================================================
|
||
|
||
|
||
async def class_decorator_example() -> None:
|
||
"""Demonstrate Approach 1: class decorator on a chat client type."""
|
||
print("\n" + "=" * 60)
|
||
print("Approach 1: Class decorator (applied to client type)")
|
||
print("=" * 60)
|
||
|
||
# For authentication, run `az login` command in terminal or replace
|
||
# AzureCliCredential with your preferred authentication option.
|
||
agent = Agent(
|
||
client=RetryingFoundryChatClient(credential=AzureCliCredential()),
|
||
instructions="You are a helpful assistant.",
|
||
)
|
||
|
||
query = "Say hello!"
|
||
print(f"User: {query}")
|
||
result = await agent.run(query)
|
||
print(f"Agent: {result.text}")
|
||
|
||
|
||
async def class_based_middleware_example() -> None:
|
||
"""Demonstrate Approach 2a: class-based chat middleware."""
|
||
print("\n" + "=" * 60)
|
||
print("Approach 2a: Class-based chat middleware")
|
||
print("=" * 60)
|
||
|
||
# For authentication, run `az login` command in terminal or replace
|
||
# AzureCliCredential with your preferred authentication option.
|
||
agent = Agent(
|
||
client=FoundryChatClient(credential=AzureCliCredential()),
|
||
instructions="You are a helpful assistant.",
|
||
middleware=[RateLimitRetryMiddleware(max_attempts=3)],
|
||
)
|
||
|
||
query = "Say hello!"
|
||
print(f"User: {query}")
|
||
result = await agent.run(query)
|
||
print(f"Agent: {result.text}")
|
||
|
||
|
||
async def function_based_middleware_example() -> None:
|
||
"""Demonstrate Approach 2b: function-based chat middleware."""
|
||
print("\n" + "=" * 60)
|
||
print("Approach 2b: Function-based chat middleware")
|
||
print("=" * 60)
|
||
|
||
# For authentication, run `az login` command in terminal or replace
|
||
# AzureCliCredential with your preferred authentication option.
|
||
agent = Agent(
|
||
client=FoundryChatClient(credential=AzureCliCredential()),
|
||
instructions="You are a helpful assistant.",
|
||
middleware=[rate_limit_retry_middleware],
|
||
)
|
||
|
||
query = "Say hello!"
|
||
print(f"User: {query}")
|
||
result = await agent.run(query)
|
||
print(f"Agent: {result.text}")
|
||
|
||
|
||
async def main() -> None:
|
||
"""Run all auto-retry examples."""
|
||
print("=== Auto-Retry Rate Limiting Sample ===")
|
||
print(
|
||
"Demonstrates two approaches for automatic retry on rate limit (429) errors.\n"
|
||
"Set AZURE_OPENAI_ENDPOINT and FOUNDRY_MODEL (and optionally\n"
|
||
"AZURE_OPENAI_API_KEY) before running, or populate a .env file."
|
||
)
|
||
|
||
await class_decorator_example()
|
||
await class_based_middleware_example()
|
||
await function_based_middleware_example()
|
||
|
||
|
||
if __name__ == "__main__":
|
||
asyncio.run(main())
|