Python: [BREAKING] update context provider APIs, middleware, and per-service-call history persistence (#4992)

* Rename provider base APIs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Allow provider-added chat and function middleware Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Simulate service-stored history per model call Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix typing regressions in CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix response ID suppression review feedback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rename per-service-call history persistence APIs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address context persistence review feedback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Stabilize markdown sample docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Persist service continuation state per call Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-04-01 18:13:11 +02:00
parent 38de991481
commit b065a4ce51
37 changed files with 1836 additions and 396 deletions
@@ -9,6 +9,7 @@ This folder contains examples for direct chat client usage patterns.
 | [`built_in_chat_clients.py`](built_in_chat_clients.py) | Consolidated sample for built-in chat clients. Uses `get_client()` to create the selected client and pass it to `main()`. |
 | [`chat_response_cancellation.py`](chat_response_cancellation.py) | Demonstrates how to cancel chat responses during streaming, showing proper cancellation handling and cleanup. |
 | [`custom_chat_client.py`](custom_chat_client.py) | Demonstrates how to create custom chat clients by extending the `BaseChatClient` class. Shows a `EchoingChatClient` implementation and how to integrate it with `Agent` using the `as_agent()` method. |
+| [`require_per_service_call_history_persistence.py`](require_per_service_call_history_persistence.py) | Compares two otherwise identical `FoundryChatClient` agents with `store=False`; the only difference is whether `require_per_service_call_history_persistence` is enabled, and only the run without it stores the synthesized tool result when middleware terminates the loop early. |

 ## Selecting a built-in client

@@ -35,6 +36,15 @@ Example:
 uv run samples/02-agents/chat_client/built_in_chat_clients.py
 ```

+The `require_per_service_call_history_persistence.py` sample uses `FoundryChatClient`, so set the usual Foundry settings first and sign in with the Azure CLI:
+
+```bash
+export FOUNDRY_PROJECT_ENDPOINT="https://<your-project>.services.ai.azure.com/api/projects/<project-name>"
+export FOUNDRY_MODEL="<your-model-deployment-name>"
+az login
+uv run samples/02-agents/chat_client/require_per_service_call_history_persistence.py
+```
+
 ## Environment Variables

 Depending on the selected client, set the appropriate environment variables:
@@ -0,0 +1,194 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+from __future__ import annotations
+
+import asyncio
+from collections.abc import Awaitable, Callable
+from typing import Annotated
+
+from agent_framework import (
+    Agent,
+    FunctionInvocationContext,
+    FunctionMiddleware,
+    InMemoryHistoryProvider,
+    Message,
+    MiddlewareTermination,
+)
+from agent_framework.foundry import FoundryChatClient
+from azure.identity import AzureCliCredential
+from dotenv import load_dotenv
+from pydantic import Field
+
+"""
+Compare Foundry agents with and without per-service-call chat history persistence.
+
+This sample runs two otherwise identical Foundry agents with ``store=False`` so
+history stays local for both runs.
+
+The sample adds a function middleware that raises ``MiddlewareTermination``
+immediately after the tool runs, so the request stops before a second model
+call.
+
+That early termination is the important difference:
+
+- Without per-service-call chat history persistence, the synthesized tool result is
+  still written to local history.
+- With ``require_per_service_call_history_persistence=True``, that synthesized tool result is
+  not written to local history.
+
+The per-service-call persistence case matches service-side storage behavior. When a terminated
+request never sends the tool result back to the service, that result also never
+becomes part of the service-managed history.
+"""
+
+# Load environment variables from .env file
+load_dotenv()
+
+
+def lookup_weather(
+    location: Annotated[str, Field(description="The location to get the weather for.")],
+) -> str:
+    """Return a deterministic weather result for the requested location."""
+    return f"The weather in {location} is sunny."
+
+
+class TerminateAfterToolMiddleware(FunctionMiddleware):
+    """Stop the tool loop after the first tool finishes."""
+
+    async def process(
+        self,
+        context: FunctionInvocationContext,
+        call_next: Callable[[], Awaitable[None]],
+    ) -> None:
+        """Run the tool, then terminate the loop with that tool result."""
+        await call_next()
+        raise MiddlewareTermination(result=context.result)
+
+
+def _describe_message(message: Message) -> str:
+    """Render one stored message in a compact, readable format."""
+    parts: list[str] = []
+    for content in message.contents:
+        if content.type == "text" and content.text:
+            parts.append(content.text)
+        elif content.type == "function_call":
+            parts.append(f"function_call -> {content.name}({content.arguments})")
+        elif content.type == "function_result":
+            parts.append(f"function_result -> {content.result}")
+        else:
+            parts.append(content.type)
+
+    return f"{message.role}: {' | '.join(parts)}"
+
+
+def _includes_tool_result(messages: list[Message]) -> bool:
+    """Return whether any stored message contains a tool result."""
+    return any(content.type == "function_result" for message in messages for content in message.contents)
+
+
+async def main() -> None:
+    """Run both comparison scenarios."""
+    print("=== require_per_service_call_history_persistence when middleware terminates the tool loop ===\n")
+
+    # 1. Create one Foundry chat client that both agents will share.
+    client = FoundryChatClient(credential=AzureCliCredential())
+    query = "What is the weather in Seattle, and should I bring sunglasses?"
+
+    # 2. Create and run the agent without per-service-call persistence.
+    agent_without_persistence = Agent(
+        client=client,
+        instructions=(
+            "You are a weather assistant. Call lookup_weather exactly once before answering "
+            "any weather question, then summarize the tool result in one short paragraph."
+        ),
+        tools=[lookup_weather],
+        context_providers=[InMemoryHistoryProvider()],
+        middleware=[TerminateAfterToolMiddleware()],
+        default_options={"tool_choice": "required", "store": False},
+    )
+    session_without_persistence = agent_without_persistence.create_session()
+    await agent_without_persistence.run(
+        query,
+        session=session_without_persistence,
+    )
+    stored_messages_without_persistence = session_without_persistence.state[InMemoryHistoryProvider.DEFAULT_SOURCE_ID][
+        "messages"
+    ]
+
+    print("=== Without per-service-call persistence ===")
+    print("Loop terminated immediately after the tool finished.")
+    print(f"Stored synthesized tool result: {_includes_tool_result(stored_messages_without_persistence)}")
+    print("Stored history:")
+    for index, message in enumerate(stored_messages_without_persistence, start=1):
+        print(f"  {index}. {_describe_message(message)}")
+    print()
+
+    # 3. Create and run the agent with per-service-call persistence enabled.
+    agent_with_persistence = Agent(
+        client=client,
+        instructions=(
+            "You are a weather assistant. Call lookup_weather exactly once before answering "
+            "any weather question, then summarize the tool result in one short paragraph."
+        ),
+        tools=[lookup_weather],
+        context_providers=[InMemoryHistoryProvider()],
+        middleware=[TerminateAfterToolMiddleware()],
+        require_per_service_call_history_persistence=True,
+        default_options={"tool_choice": "required", "store": False},
+    )
+    session_with_persistence = agent_with_persistence.create_session()
+    await agent_with_persistence.run(
+        query,
+        session=session_with_persistence,
+    )
+    stored_messages_with_persistence = session_with_persistence.state[InMemoryHistoryProvider.DEFAULT_SOURCE_ID][
+        "messages"
+    ]
+
+    print("=== With per-service-call persistence ===")
+    print("Loop terminated immediately after the tool finished.")
+    print(f"Stored synthesized tool result: {_includes_tool_result(stored_messages_with_persistence)}")
+    print("Stored history:")
+    for index, message in enumerate(stored_messages_with_persistence, start=1):
+        print(f"  {index}. {_describe_message(message)}")
+    print()
+
+    # 4. Summarize the effect of the flag.
+    print(
+        "Both runs used FoundryChatClient with store=False and terminated right after the tool. "
+        "Without per-service-call persistence, local history still stored the synthesized tool result. "
+        "With per-service-call persistence, local history stopped at the assistant function-call message instead, "
+        "which matches service-side storage because the terminated tool result is never sent back to the service."
+    )
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+
+
+"""
+Sample output:
+=== require_per_service_call_history_persistence when middleware terminates the tool loop ===
+
+=== Without per-service-call persistence ===
+Loop terminated immediately after the tool finished.
+Stored synthesized tool result: True
+Stored history:
+  1. user: What is the weather in Seattle, and should I bring sunglasses?
+  2. assistant: function_call -> lookup_weather({"location":"Seattle"})
+  3. tool: function_result -> The weather in Seattle is sunny.
+
+=== With per-service-call persistence ===
+Loop terminated immediately after the tool finished.
+Stored synthesized tool result: False
+Stored history:
+  1. user: What is the weather in Seattle, and should I bring sunglasses?
+  2. assistant: function_call -> lookup_weather({"location":"Seattle"})
+
+Both runs used FoundryChatClient with store=False and terminated right after
+the tool. Without per-service-call persistence, local history still stored the
+synthesized tool result. With per-service-call persistence, local history
+stopped at the assistant function-call message instead, which matches
+service-side storage because the terminated tool result is never sent back to
+the service.
+"""