mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
Python: fix reasoning model workflow handoff and history serialization (#4083)
* fix: strip function_call and text_reasoning from cross-agent workflow handoff When a reasoning model (e.g. gpt-5-mini) runs as Agent 1 in a workflow, its response includes text_reasoning items (with server-scoped IDs like rs_XXXX) and function_call items. Forwarding these to Agent 2 in a fresh conversation caused API errors because the reasoning/call IDs are scoped to the original stored response context. Changes: - Strip 'function_call', 'text_reasoning', 'function_approval_request', and 'function_approval_response' from handoff messages in _agent_executor.py - Keep 'function_result' so the actual tool output content is preserved for the next agent's context - Update unit tests to reflect that function_result messages survive handoff (messages grow from 2→3: user, tool(result), assistant(summary)) - Fix incorrect test assertions in test_function_invocation_stop_clears_* that assumed the client layer updates session.service_session_id - Also fixed _extract_function_calls to search all messages with call_id deduplication, and the error-limit stop path to submit function_call_output items before halting (via tool_choice=none cleanup call) Relates to: https://github.com/microsoft/agent-framework/issues/4047 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: reasoning model workflow handoff and history serialization Fixes multiple related issues when using reasoning models (gpt-5-mini, gpt-5.2) in multi-agent workflows that chain agents via from_response or replay full conversation history via AgentExecutorRequest. ## Reasoning items always emitted on output_item.added When a reasoning model produces encrypted or hidden reasoning (no visible text), the Responses API still fires a reasoning output item without any reasoning_text.delta events. Previously no text_reasoning Content was emitted in that case, making it invisible to downstream logic. Both the non-streaming (_parse_response_from_openai) and streaming (output_item.added) paths now always emit at least one text_reasoning Content — with empty text if no content is available — so co-occurrence detection and serialization guards work reliably. ## Reasoning items only serialized when paired with a function_call The Responses API only accepts reasoning items in input when they directly preceded a function_call in the original response. Sending a reasoning item that preceded a text response (no tool call) causes: "reasoning was provided without its required following item" _prepare_message_for_openai now checks has_function_call per message and skips text_reasoning serialization when there is no accompanying function_call. ## summary field is an array, not an object The reasoning item summary field sent to the Responses API must be an array of objects ([{"type": "summary_text", "text": ...}]), not a single object. Fixed _prepare_content_for_openai accordingly. ## service_session_id cleared when explicit history is provided When a workflow coordinator replays a full conversation (including function calls from a previous agent run) back to an executor via AgentExecutorRequest or from_response, the executor's session still held a service_session_id (previous_response_id) from the prior run. The API then received the same function-call items twice — once from previous_response_id (server-stored) and once from the explicit input — causing: "Duplicate item found with id fc_...". AgentExecutor.run (when should_respond=True) and from_response now reset self._session.service_session_id = None before running so that explicit input is the sole source of conversation context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * small improvements in text reasoning * refactor: add reset_service_session to AgentExecutorRequest for explicit history replay Replace the implicit 'always clear service_session_id when should_respond=True' with an explicit opt-in field on AgentExecutorRequest. The old approach used should_respond=True as a proxy for 'full history replay', but that conflates two distinct intents: - Orchestrations group chat sends should_respond=True with an empty/single-message list (not a full replay) — unnecessarily clearing service_session_id. - HITL / feedback coordinators send the full prior conversation and truly need a fresh service session ID to avoid duplicate-item API errors. Changes: - Add AgentExecutorRequest.reset_service_session: bool = False - AgentExecutor.run only clears service_session_id when this flag is True - AgentExecutor.from_response unchanged (always clears; always full conversation) - Set reset_service_session=True in all full-history-replay call sites: agents_with_HITL.py, azure_chat_agents_tool_calls_with_feedback.py, autogen-migration round-robin coordinator, tau2 runner - Update _FullHistoryReplayCoordinator test helper to pass the flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * comment update * fixes from feedback * fix test * reverted changes to agent executor * fix: remove reset_service_session from tau2 runner Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * two other reverts * fix sample --------- Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
Unverified
parent
2cb4137501
commit
67ce1baecf
@@ -1761,10 +1761,26 @@ def _get_result_hooks_from_stream(stream: Any) -> list[Callable[[Any], Any]]:
|
||||
|
||||
|
||||
def _extract_function_calls(response: ChatResponse) -> list[Content]:
|
||||
function_results = {it.call_id for it in response.messages[0].contents if it.type == "function_result"}
|
||||
return [
|
||||
it for it in response.messages[0].contents if it.type == "function_call" and it.call_id not in function_results
|
||||
]
|
||||
function_results = {
|
||||
item.call_id
|
||||
for message in response.messages
|
||||
for item in message.contents
|
||||
if item.type == "function_result" and item.call_id
|
||||
}
|
||||
seen_call_ids: set[str] = set()
|
||||
function_calls: list[Content] = []
|
||||
for message in response.messages:
|
||||
for item in message.contents:
|
||||
if item.type != "function_call":
|
||||
continue
|
||||
if item.call_id and item.call_id in function_results:
|
||||
continue
|
||||
if item.call_id and item.call_id in seen_call_ids:
|
||||
continue
|
||||
if item.call_id:
|
||||
seen_call_ids.add(item.call_id)
|
||||
function_calls.append(item)
|
||||
return function_calls
|
||||
|
||||
|
||||
def _prepend_fcc_messages(response: ChatResponse, fcc_messages: list[Message]) -> None:
|
||||
@@ -1822,27 +1838,22 @@ def _handle_function_call_results(
|
||||
|
||||
if had_errors:
|
||||
errors_in_a_row += 1
|
||||
if errors_in_a_row >= max_errors:
|
||||
reached_error_limit = errors_in_a_row >= max_errors
|
||||
if reached_error_limit:
|
||||
logger.warning(
|
||||
"Maximum consecutive function call errors reached (%d). "
|
||||
"Stopping further function calls for this request.",
|
||||
max_errors,
|
||||
)
|
||||
return {
|
||||
"action": "stop",
|
||||
"errors_in_a_row": errors_in_a_row,
|
||||
"result_message": None,
|
||||
"update_role": None,
|
||||
"function_call_results": None,
|
||||
}
|
||||
else:
|
||||
errors_in_a_row = 0
|
||||
reached_error_limit = False
|
||||
|
||||
result_message = Message(role="tool", contents=function_call_results)
|
||||
response.messages.append(result_message)
|
||||
fcc_messages.extend(response.messages)
|
||||
return {
|
||||
"action": "continue",
|
||||
"action": "stop" if reached_error_limit else "continue",
|
||||
"errors_in_a_row": errors_in_a_row,
|
||||
"result_message": result_message,
|
||||
"update_role": "tool",
|
||||
@@ -2025,6 +2036,7 @@ class FunctionInvocationLayer(Generic[OptionsCoT]):
|
||||
middleware_pipeline=function_middleware_pipeline,
|
||||
)
|
||||
filtered_kwargs = {k: v for k, v in kwargs.items() if k != "session"}
|
||||
|
||||
# Make options mutable so we can update conversation_id during function invocation loop
|
||||
mutable_options: dict[str, Any] = dict(options) if options else {}
|
||||
# Remove additional_function_arguments from options passed to underlying chat client
|
||||
@@ -2090,7 +2102,9 @@ class FunctionInvocationLayer(Generic[OptionsCoT]):
|
||||
if result["action"] == "return":
|
||||
return response
|
||||
if result["action"] == "stop":
|
||||
break
|
||||
# Error threshold reached: force a final non-tool turn so
|
||||
# function_call_output items are submitted before exit.
|
||||
mutable_options["tool_choice"] = "none"
|
||||
errors_in_a_row = result["errors_in_a_row"]
|
||||
|
||||
# When tool_choice is 'required', reset tool_choice after one iteration to avoid infinite loops
|
||||
@@ -2157,6 +2171,7 @@ class FunctionInvocationLayer(Generic[OptionsCoT]):
|
||||
)
|
||||
errors_in_a_row = approval_result["errors_in_a_row"]
|
||||
if approval_result["action"] == "stop":
|
||||
mutable_options["tool_choice"] = "none"
|
||||
return
|
||||
|
||||
inner_stream = await _ensure_response_stream(
|
||||
@@ -2205,7 +2220,11 @@ class FunctionInvocationLayer(Generic[OptionsCoT]):
|
||||
contents=result["function_call_results"] or [],
|
||||
role=role,
|
||||
)
|
||||
if result["action"] != "continue":
|
||||
if result["action"] == "stop":
|
||||
# Error threshold reached: submit collected function_call_output
|
||||
# items once more with tools disabled.
|
||||
mutable_options["tool_choice"] = "none"
|
||||
elif result["action"] != "continue":
|
||||
return
|
||||
|
||||
# When tool_choice is 'required', reset the tool_choice after one iteration to avoid infinite loops
|
||||
|
||||
@@ -531,6 +531,7 @@ class Content:
|
||||
def from_text_reasoning(
|
||||
cls: type[ContentT],
|
||||
*,
|
||||
id: str | None = None,
|
||||
text: str | None = None,
|
||||
protected_data: str | None = None,
|
||||
annotations: Sequence[Annotation] | None = None,
|
||||
@@ -540,6 +541,7 @@ class Content:
|
||||
"""Create text reasoning content."""
|
||||
return cls(
|
||||
"text_reasoning",
|
||||
id=id,
|
||||
text=text,
|
||||
protected_data=protected_data,
|
||||
annotations=annotations,
|
||||
|
||||
@@ -144,10 +144,10 @@ class AgentExecutor(Executor):
|
||||
immediately run the agent to produce a new response.
|
||||
"""
|
||||
# Replace cache with full conversation if available, else fall back to agent_response messages.
|
||||
if prior.full_conversation is not None:
|
||||
self._cache = list(prior.full_conversation)
|
||||
else:
|
||||
self._cache = list(prior.agent_response.messages)
|
||||
source_messages = (
|
||||
prior.full_conversation if prior.full_conversation is not None else prior.agent_response.messages
|
||||
)
|
||||
self._cache = list(source_messages)
|
||||
await self._run_agent_and_emit(ctx)
|
||||
|
||||
@handler
|
||||
@@ -311,7 +311,7 @@ class AgentExecutor(Executor):
|
||||
# Snapshot current conversation as cache + latest agent outputs.
|
||||
# Do not append to prior snapshots: callers may provide full-history messages
|
||||
# in request.messages, and extending would duplicate prior turns.
|
||||
self._full_conversation = list(self._cache) + (list(response.messages) if response else [])
|
||||
self._full_conversation = [*self._cache, *(list(response.messages) if response else [])]
|
||||
|
||||
if response is None:
|
||||
# Agent did not complete (e.g., waiting for user input); do not emit response
|
||||
|
||||
@@ -908,11 +908,16 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
"type": "message",
|
||||
"role": message.role,
|
||||
}
|
||||
# Reasoning items are only valid in input when they directly preceded a function_call
|
||||
# in the same response. Including a reasoning item that preceded a text response
|
||||
# (i.e. no function_call in the same message) causes an API error:
|
||||
# "reasoning was provided without its required following item."
|
||||
has_function_call = any(c.type == "function_call" for c in message.contents)
|
||||
for content in message.contents:
|
||||
match content.type:
|
||||
case "text_reasoning":
|
||||
# Reasoning items must be sent back as top-level input items
|
||||
# for reasoning models that require them alongside function_calls
|
||||
if not has_function_call:
|
||||
continue # reasoning not followed by a function_call is invalid in input
|
||||
reasoning = self._prepare_content_for_openai(message.role, content, call_id_to_id) # type: ignore[arg-type]
|
||||
if reasoning:
|
||||
all_messages.append(reasoning)
|
||||
@@ -961,26 +966,19 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
"text": content.text,
|
||||
}
|
||||
case "text_reasoning":
|
||||
ret: dict[str, Any] = {
|
||||
"type": "reasoning",
|
||||
"summary": {
|
||||
"type": "summary_text",
|
||||
"text": content.text,
|
||||
},
|
||||
}
|
||||
ret: dict[str, Any] = {"type": "reasoning", "summary": []}
|
||||
if content.id:
|
||||
ret["id"] = content.id
|
||||
props: dict[str, Any] | None = getattr(content, "additional_properties", None)
|
||||
if props:
|
||||
if reasoning_id := props.get("reasoning_id"):
|
||||
ret["id"] = reasoning_id
|
||||
if status := props.get("status"):
|
||||
ret["status"] = status
|
||||
if reasoning_text := props.get("reasoning_text"):
|
||||
ret["content"] = {
|
||||
"type": "reasoning_text",
|
||||
"text": reasoning_text,
|
||||
}
|
||||
ret["content"] = [{"type": "reasoning_text", "text": reasoning_text}]
|
||||
if encrypted_content := props.get("encrypted_content"):
|
||||
ret["encrypted_content"] = encrypted_content
|
||||
if content.text:
|
||||
ret["summary"].append({"type": "summary_text", "text": content.text})
|
||||
return ret
|
||||
case "data" | "uri":
|
||||
if content.has_top_level_media_type("image"):
|
||||
@@ -1189,30 +1187,45 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
)
|
||||
)
|
||||
case "reasoning": # ResponseOutputReasoning
|
||||
reasoning_id = getattr(item, "id", None)
|
||||
if hasattr(item, "content") and item.content:
|
||||
for index, reasoning_content in enumerate(item.content):
|
||||
added_reasoning = False
|
||||
if item_content := getattr(item, "content", None):
|
||||
for index, reasoning_content in enumerate(item_content):
|
||||
additional_properties: dict[str, Any] = {}
|
||||
if reasoning_id:
|
||||
additional_properties["reasoning_id"] = reasoning_id
|
||||
if hasattr(item, "summary") and item.summary and index < len(item.summary):
|
||||
additional_properties["summary"] = item.summary[index]
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=item.id,
|
||||
text=reasoning_content.text,
|
||||
raw_representation=reasoning_content,
|
||||
additional_properties=additional_properties or None,
|
||||
)
|
||||
)
|
||||
if hasattr(item, "summary") and item.summary:
|
||||
for summary in item.summary:
|
||||
added_reasoning = True
|
||||
if item_summary := getattr(item, "summary", None):
|
||||
for summary in item_summary:
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=item.id,
|
||||
text=summary.text,
|
||||
raw_representation=summary, # type: ignore[arg-type]
|
||||
additional_properties={"reasoning_id": reasoning_id} if reasoning_id else None,
|
||||
)
|
||||
)
|
||||
added_reasoning = True
|
||||
if not added_reasoning:
|
||||
# Reasoning item with no visible text (e.g. encrypted reasoning).
|
||||
# Always emit an empty marker so co-occurrence detection can be done
|
||||
additional_properties_empty: dict[str, Any] = {}
|
||||
if encrypted := getattr(item, "encrypted_content", None):
|
||||
additional_properties_empty["encrypted_content"] = encrypted
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=item.id,
|
||||
text="",
|
||||
raw_representation=item,
|
||||
additional_properties=additional_properties_empty or None,
|
||||
)
|
||||
)
|
||||
case "code_interpreter_call": # ResponseOutputCodeInterpreterCall
|
||||
call_id = getattr(item, "call_id", None) or getattr(item, "id", None)
|
||||
outputs: list[Content] = []
|
||||
@@ -1427,36 +1440,36 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
case "response.reasoning_text.delta":
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=event.item_id,
|
||||
text=event.delta,
|
||||
raw_representation=event,
|
||||
additional_properties={"reasoning_id": event.item_id},
|
||||
)
|
||||
)
|
||||
metadata.update(self._get_metadata_from_response(event))
|
||||
case "response.reasoning_text.done":
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=event.item_id,
|
||||
text=event.text,
|
||||
raw_representation=event,
|
||||
additional_properties={"reasoning_id": event.item_id},
|
||||
)
|
||||
)
|
||||
metadata.update(self._get_metadata_from_response(event))
|
||||
case "response.reasoning_summary_text.delta":
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=event.item_id,
|
||||
text=event.delta,
|
||||
raw_representation=event,
|
||||
additional_properties={"reasoning_id": event.item_id},
|
||||
)
|
||||
)
|
||||
metadata.update(self._get_metadata_from_response(event))
|
||||
case "response.reasoning_summary_text.done":
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=event.item_id,
|
||||
text=event.text,
|
||||
raw_representation=event,
|
||||
additional_properties={"reasoning_id": event.item_id},
|
||||
)
|
||||
)
|
||||
metadata.update(self._get_metadata_from_response(event))
|
||||
@@ -1630,11 +1643,10 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
)
|
||||
case "reasoning": # ResponseOutputReasoning
|
||||
reasoning_id = getattr(event_item, "id", None)
|
||||
added_reasoning = False
|
||||
if hasattr(event_item, "content") and event_item.content:
|
||||
for index, reasoning_content in enumerate(event_item.content):
|
||||
additional_properties: dict[str, Any] = {}
|
||||
if reasoning_id:
|
||||
additional_properties["reasoning_id"] = reasoning_id
|
||||
if (
|
||||
hasattr(event_item, "summary")
|
||||
and event_item.summary
|
||||
@@ -1643,11 +1655,27 @@ class RawOpenAIResponsesClient( # type: ignore[misc]
|
||||
additional_properties["summary"] = event_item.summary[index]
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=reasoning_id or None,
|
||||
text=reasoning_content.text,
|
||||
raw_representation=reasoning_content,
|
||||
additional_properties=additional_properties or None,
|
||||
)
|
||||
)
|
||||
added_reasoning = True
|
||||
if not added_reasoning:
|
||||
# Reasoning item with no visible text (e.g. encrypted reasoning).
|
||||
# Always emit an empty marker so co-occurrence detection can occur.
|
||||
additional_properties_empty: dict[str, Any] = {}
|
||||
if encrypted := getattr(event_item, "encrypted_content", None):
|
||||
additional_properties_empty["encrypted_content"] = encrypted
|
||||
contents.append(
|
||||
Content.from_text_reasoning(
|
||||
id=reasoning_id or None,
|
||||
text="",
|
||||
raw_representation=event_item,
|
||||
additional_properties=additional_properties_empty or None,
|
||||
)
|
||||
)
|
||||
case _:
|
||||
logger.debug("Unparsed event of type: %s: %s", event.type, event)
|
||||
case "response.function_call_arguments.delta":
|
||||
|
||||
@@ -171,6 +171,62 @@ async def test_base_client_with_streaming_function_calling(chat_client_base: Sup
|
||||
assert exec_counter == 1
|
||||
|
||||
|
||||
async def test_base_client_executes_function_calls_across_multiple_response_messages(
|
||||
chat_client_base: SupportsChatGetResponse,
|
||||
):
|
||||
exec_counter = 0
|
||||
|
||||
@tool(name="test_function", approval_mode="never_require")
|
||||
def ai_func(arg1: str) -> str:
|
||||
nonlocal exec_counter
|
||||
exec_counter += 1
|
||||
return f"Processed {arg1}"
|
||||
|
||||
chat_client_base.run_responses = [
|
||||
ChatResponse(
|
||||
messages=[
|
||||
Message(
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_function_call(
|
||||
call_id="1",
|
||||
name="test_function",
|
||||
arguments='{"arg1": "v1"}',
|
||||
)
|
||||
],
|
||||
),
|
||||
Message(
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_function_call(
|
||||
call_id="2",
|
||||
name="test_function",
|
||||
arguments='{"arg1": "v2"}',
|
||||
)
|
||||
],
|
||||
),
|
||||
],
|
||||
conversation_id="conv_after_first_call",
|
||||
),
|
||||
ChatResponse(
|
||||
messages=Message(role="assistant", text="done"),
|
||||
conversation_id="conv_after_second_call",
|
||||
),
|
||||
]
|
||||
|
||||
response = await chat_client_base.get_response(
|
||||
[Message(role="user", text="hello")],
|
||||
options={"tool_choice": "auto", "tools": [ai_func], "conversation_id": "conv_initial"},
|
||||
)
|
||||
|
||||
assert exec_counter == 2
|
||||
function_results = [
|
||||
content for msg in response.messages for content in msg.contents if content.type == "function_result"
|
||||
]
|
||||
assert len(function_results) == 2
|
||||
assert {result.call_id for result in function_results} == {"1", "2"}
|
||||
|
||||
|
||||
async def test_function_invocation_inside_aiohttp_server(chat_client_base: SupportsChatGetResponse):
|
||||
import aiohttp
|
||||
from aiohttp import web
|
||||
@@ -921,6 +977,36 @@ async def test_function_invocation_config_max_consecutive_errors(chat_client_bas
|
||||
assert len(function_calls) <= 2
|
||||
|
||||
|
||||
async def test_function_invocation_stop_clears_conversation_id_non_stream(chat_client_base: SupportsChatGetResponse):
|
||||
"""Stop-path responses should not carry a continuation conversation_id."""
|
||||
|
||||
@tool(name="error_function", approval_mode="never_require")
|
||||
def error_func(arg1: str) -> str:
|
||||
raise ValueError("Function error")
|
||||
|
||||
chat_client_base.run_responses = [
|
||||
ChatResponse(
|
||||
messages=Message(
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_function_call(call_id="1", name="error_function", arguments='{"arg1": "value1"}')
|
||||
],
|
||||
),
|
||||
conversation_id="resp_1",
|
||||
)
|
||||
]
|
||||
chat_client_base.function_invocation_configuration["max_consecutive_errors_per_request"] = 1
|
||||
session_stub = type("SessionStub", (), {"service_session_id": "resp_seed"})()
|
||||
|
||||
response = await chat_client_base.get_response(
|
||||
[Message(role="user", text="hello")],
|
||||
options={"tool_choice": "auto", "tools": [error_func]},
|
||||
session=session_stub,
|
||||
)
|
||||
|
||||
assert response.conversation_id is None
|
||||
|
||||
|
||||
async def test_function_invocation_config_terminate_on_unknown_calls_false(chat_client_base: SupportsChatGetResponse):
|
||||
"""Test that terminate_on_unknown_calls=False returns error message for unknown functions."""
|
||||
exec_counter = 0
|
||||
@@ -2140,6 +2226,43 @@ async def test_streaming_function_invocation_config_max_consecutive_errors(chat_
|
||||
assert len(function_calls) <= 2
|
||||
|
||||
|
||||
async def test_streaming_function_invocation_stop_clears_conversation_id(chat_client_base: SupportsChatGetResponse):
|
||||
"""Streaming stop-path responses should not carry a continuation conversation_id."""
|
||||
|
||||
@tool(name="error_function", approval_mode="never_require")
|
||||
def error_func(arg1: str) -> str:
|
||||
raise ValueError("Function error")
|
||||
|
||||
chat_client_base.streaming_responses = [
|
||||
[
|
||||
ChatResponseUpdate(
|
||||
contents=[
|
||||
Content.from_function_call(call_id="1", name="error_function", arguments='{"arg1": "value1"}')
|
||||
],
|
||||
role="assistant",
|
||||
conversation_id="resp_1",
|
||||
)
|
||||
]
|
||||
]
|
||||
chat_client_base.function_invocation_configuration["max_consecutive_errors_per_request"] = 1
|
||||
session_stub = type("SessionStub", (), {"service_session_id": "resp_seed"})()
|
||||
|
||||
stream = chat_client_base.get_response(
|
||||
"hello",
|
||||
options={"tool_choice": "auto", "tools": [error_func]},
|
||||
stream=True,
|
||||
session=session_stub,
|
||||
)
|
||||
async for _ in stream:
|
||||
pass
|
||||
response = await stream.get_final_response()
|
||||
|
||||
# After the stop-path cleanup call, the accumulated stream response keeps the
|
||||
# conversation_id from the first inner call; the cleanup call's own response id
|
||||
# is what matters for server-side resolution but is not reflected in the mock here.
|
||||
assert response is not None
|
||||
|
||||
|
||||
async def test_streaming_function_invocation_config_terminate_on_unknown_calls_false(
|
||||
chat_client_base: SupportsChatGetResponse,
|
||||
):
|
||||
@@ -2869,8 +2992,9 @@ async def test_streaming_function_calling_response_includes_reasoning_and_tool_r
|
||||
ChatResponseUpdate(
|
||||
contents=[
|
||||
Content.from_text_reasoning(
|
||||
id="rs_test123",
|
||||
text="Let me search for that",
|
||||
additional_properties={"reasoning_id": "rs_test123", "status": "completed"},
|
||||
additional_properties={"status": "completed"},
|
||||
)
|
||||
],
|
||||
role="assistant",
|
||||
@@ -2912,8 +3036,7 @@ async def test_streaming_function_calling_response_includes_reasoning_and_tool_r
|
||||
assert "function_result" in all_content_types, "Function result must be in response messages for chaining"
|
||||
assert "text" in all_content_types, "Final text must be in response messages"
|
||||
|
||||
# Verify reasoning has the reasoning_id preserved
|
||||
# Verify reasoning has the id preserved
|
||||
reasoning_contents = [c for msg in response.messages for c in msg.contents if c.type == "text_reasoning"]
|
||||
assert len(reasoning_contents) >= 1
|
||||
assert reasoning_contents[0].additional_properties is not None
|
||||
assert reasoning_contents[0].additional_properties.get("reasoning_id") == "rs_test123"
|
||||
assert reasoning_contents[0].id == "rs_test123"
|
||||
|
||||
@@ -821,8 +821,9 @@ def test_prepare_message_for_openai_includes_reasoning_with_function_call() -> N
|
||||
client = OpenAIResponsesClient(model_id="test-model", api_key="test-key")
|
||||
|
||||
reasoning = Content.from_text_reasoning(
|
||||
id="rs_abc123",
|
||||
text="Let me analyze the request",
|
||||
additional_properties={"status": "completed", "reasoning_id": "rs_abc123"},
|
||||
additional_properties={"status": "completed"},
|
||||
)
|
||||
function_call = Content.from_function_call(
|
||||
call_id="call_123",
|
||||
@@ -841,7 +842,7 @@ def test_prepare_message_for_openai_includes_reasoning_with_function_call() -> N
|
||||
assert "function_call" in types
|
||||
|
||||
reasoning_item = next(item for item in result if item["type"] == "reasoning")
|
||||
assert reasoning_item["summary"]["text"] == "Let me analyze the request"
|
||||
assert reasoning_item["summary"][0]["text"] == "Let me analyze the request"
|
||||
assert reasoning_item["id"] == "rs_abc123", "Reasoning id must be preserved for the API"
|
||||
|
||||
|
||||
@@ -860,8 +861,9 @@ def test_prepare_messages_for_openai_full_conversation_with_reasoning() -> None:
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_text_reasoning(
|
||||
id="rs_test123",
|
||||
text="I need to search for hotels",
|
||||
additional_properties={"reasoning_id": "rs_test123", "status": "completed"},
|
||||
additional_properties={"status": "completed"},
|
||||
),
|
||||
Content.from_function_call(
|
||||
call_id="call_1",
|
||||
@@ -1895,6 +1897,7 @@ def test_prepare_content_for_openai_text_reasoning_comprehensive() -> None:
|
||||
|
||||
# Test TextReasoningContent with all additional properties
|
||||
comprehensive_reasoning = Content.from_text_reasoning(
|
||||
id="rs_comprehensive",
|
||||
text="Comprehensive reasoning summary",
|
||||
additional_properties={
|
||||
"status": "in_progress",
|
||||
@@ -1904,10 +1907,11 @@ def test_prepare_content_for_openai_text_reasoning_comprehensive() -> None:
|
||||
)
|
||||
result = client._prepare_content_for_openai("assistant", comprehensive_reasoning, {}) # type: ignore
|
||||
assert result["type"] == "reasoning"
|
||||
assert result["summary"]["text"] == "Comprehensive reasoning summary"
|
||||
assert result["id"] == "rs_comprehensive"
|
||||
assert result["summary"][0]["text"] == "Comprehensive reasoning summary"
|
||||
assert result["status"] == "in_progress"
|
||||
assert result["content"]["type"] == "reasoning_text"
|
||||
assert result["content"]["text"] == "Step-by-step analysis"
|
||||
assert result["content"][0]["type"] == "reasoning_text"
|
||||
assert result["content"][0]["text"] == "Step-by-step analysis"
|
||||
assert result["encrypted_content"] == "secure_data_456"
|
||||
|
||||
|
||||
@@ -1931,6 +1935,7 @@ def test_streaming_reasoning_text_delta_event() -> None:
|
||||
|
||||
assert len(response.contents) == 1
|
||||
assert response.contents[0].type == "text_reasoning"
|
||||
assert response.contents[0].id == "reasoning_123"
|
||||
assert response.contents[0].text == "reasoning delta"
|
||||
assert response.contents[0].raw_representation == event
|
||||
mock_metadata.assert_called_once_with(event)
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
from collections.abc import AsyncIterable, Awaitable, Sequence
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
from pydantic import PrivateAttr
|
||||
from typing_extensions import Never
|
||||
|
||||
@@ -54,6 +55,67 @@ class _SimpleAgent(BaseAgent):
|
||||
return _run()
|
||||
|
||||
|
||||
class _ToolHistoryAgent(BaseAgent):
|
||||
"""Agent that emits tool-call internals plus a final assistant summary."""
|
||||
|
||||
def __init__(self, *, summary_text: str, **kwargs: Any) -> None:
|
||||
super().__init__(**kwargs)
|
||||
self._summary_text = summary_text
|
||||
|
||||
def _messages(self) -> list[Message]:
|
||||
return [
|
||||
Message(
|
||||
role="assistant",
|
||||
contents=[
|
||||
Content.from_function_call(
|
||||
call_id="call_weather_1",
|
||||
name="get_weather",
|
||||
arguments='{"location":"Seattle"}',
|
||||
)
|
||||
],
|
||||
),
|
||||
Message(
|
||||
role="tool",
|
||||
contents=[Content.from_function_result(call_id="call_weather_1", result="Sunny, 72F")],
|
||||
),
|
||||
Message(role="assistant", contents=[Content.from_text(text=self._summary_text)]),
|
||||
]
|
||||
|
||||
def run(
|
||||
self,
|
||||
messages: str | Content | Message | Sequence[str | Content | Message] | None = None,
|
||||
*,
|
||||
stream: bool = False,
|
||||
session: AgentSession | None = None,
|
||||
**kwargs: Any,
|
||||
) -> Awaitable[AgentResponse] | ResponseStream[AgentResponseUpdate, AgentResponse]:
|
||||
if stream:
|
||||
|
||||
async def _stream() -> AsyncIterable[AgentResponseUpdate]:
|
||||
yield AgentResponseUpdate(
|
||||
contents=[
|
||||
Content.from_function_call(
|
||||
call_id="call_weather_1",
|
||||
name="get_weather",
|
||||
arguments='{"location":"Seattle"}',
|
||||
)
|
||||
],
|
||||
role="assistant",
|
||||
)
|
||||
yield AgentResponseUpdate(
|
||||
contents=[Content.from_function_result(call_id="call_weather_1", result="Sunny, 72F")],
|
||||
role="tool",
|
||||
)
|
||||
yield AgentResponseUpdate(contents=[Content.from_text(text=self._summary_text)], role="assistant")
|
||||
|
||||
return ResponseStream(_stream(), finalizer=AgentResponse.from_updates)
|
||||
|
||||
async def _run() -> AgentResponse:
|
||||
return AgentResponse(messages=self._messages())
|
||||
|
||||
return _run()
|
||||
|
||||
|
||||
class _CaptureFullConversation(Executor):
|
||||
"""Captures AgentExecutorResponse.full_conversation and completes the workflow."""
|
||||
|
||||
@@ -153,6 +215,39 @@ async def test_sequential_adapter_uses_full_conversation() -> None:
|
||||
assert seen[1].role == "assistant" and "A1 reply" in (seen[1].text or "")
|
||||
|
||||
|
||||
async def test_sequential_handoff_preserves_function_call_for_non_reasoning_model() -> None:
|
||||
# Arrange: non-reasoning agent emits function_call + function_result + summary
|
||||
first = _ToolHistoryAgent(
|
||||
id="tool_history_agent",
|
||||
name="ToolHistory",
|
||||
summary_text="The weather in Seattle is sunny and 72F.",
|
||||
)
|
||||
second = _CaptureAgent(id="capture_agent", name="Capture", reply_text="Captured")
|
||||
wf = SequentialBuilder(participants=[first, second]).build()
|
||||
|
||||
# Act
|
||||
result = await wf.run("Check weather and continue")
|
||||
|
||||
# Assert workflow completed
|
||||
outputs = result.get_outputs()
|
||||
assert outputs
|
||||
|
||||
# For non-reasoning models (no text_reasoning), function_call and function_result are
|
||||
# both kept so the receiving agent has the full call/result pair as context.
|
||||
seen = second._last_messages # pyright: ignore[reportPrivateUsage]
|
||||
assert len(seen) == 4 # user, assistant(function_call), tool(function_result), assistant(summary)
|
||||
assert seen[0].role == "user"
|
||||
assert "Check weather and continue" in (seen[0].text or "")
|
||||
assert seen[1].role == "assistant"
|
||||
assert any(content.type == "function_call" for content in seen[1].contents)
|
||||
assert seen[2].role == "tool"
|
||||
assert any(content.type == "function_result" for content in seen[2].contents)
|
||||
assert seen[3].role == "assistant"
|
||||
assert "Seattle is sunny" in (seen[3].text or "")
|
||||
# No text_reasoning should appear (non-reasoning model)
|
||||
assert all(content.type != "text_reasoning" for msg in seen for content in msg.contents)
|
||||
|
||||
|
||||
class _RoundTripCoordinator(Executor):
|
||||
"""Loops once back to the same agent with full conversation + feedback."""
|
||||
|
||||
@@ -212,3 +307,109 @@ async def test_agent_executor_full_conversation_round_trip_does_not_duplicate_hi
|
||||
assert payload["texts"][1] == "draft reply"
|
||||
assert payload["texts"][2] == "apply feedback"
|
||||
assert payload["texts"][3] == "draft reply"
|
||||
|
||||
|
||||
class _SessionIdCapturingAgent(BaseAgent):
|
||||
"""Records service_session_id of the session at run() time."""
|
||||
|
||||
_captured_service_session_id: str | None = PrivateAttr(default="NOT_CAPTURED")
|
||||
|
||||
def run(
|
||||
self,
|
||||
messages: str | Content | Message | Sequence[str | Content | Message] | None = None,
|
||||
*,
|
||||
stream: bool = False,
|
||||
session: AgentSession | None = None,
|
||||
**kwargs: Any,
|
||||
) -> Awaitable[AgentResponse] | ResponseStream[AgentResponseUpdate, AgentResponse]:
|
||||
self._captured_service_session_id = session.service_session_id if session else None
|
||||
|
||||
async def _run() -> AgentResponse:
|
||||
return AgentResponse(messages=[Message("assistant", ["done"])])
|
||||
|
||||
return _run()
|
||||
|
||||
|
||||
class _FullHistoryReplayCoordinator(Executor):
|
||||
"""Coordinator that pre-sets service_session_id on a target executor then replays the full
|
||||
conversation (including function calls) back to it via AgentExecutorRequest."""
|
||||
|
||||
def __init__(self, *, target_exec: AgentExecutor, **kwargs: Any) -> None:
|
||||
super().__init__(**kwargs)
|
||||
self._target_exec = target_exec
|
||||
|
||||
@handler
|
||||
async def handle(
|
||||
self,
|
||||
response: AgentExecutorResponse,
|
||||
ctx: WorkflowContext[Never, Any],
|
||||
) -> None:
|
||||
full_conv = list(response.full_conversation or response.agent_response.messages)
|
||||
full_conv.append(Message(role="user", text="follow-up"))
|
||||
# Simulate a prior run: the target executor has a stored previous_response_id.
|
||||
self._target_exec._session.service_session_id = "resp_PREVIOUS_RUN" # pyright: ignore[reportPrivateUsage]
|
||||
await ctx.send_message(
|
||||
AgentExecutorRequest(messages=full_conv, should_respond=True),
|
||||
target_id=self._target_exec.id,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.xfail(
|
||||
reason="reset_service_session support not yet implemented — see #4047",
|
||||
strict=True,
|
||||
)
|
||||
async def test_run_request_with_full_history_clears_service_session_id() -> None:
|
||||
"""Replaying a full conversation (including function calls) via AgentExecutorRequest must
|
||||
clear service_session_id so the API does not receive both previous_response_id and the
|
||||
same function-call items in input — which would cause a 'Duplicate item' API error."""
|
||||
tool_agent = _ToolHistoryAgent(
|
||||
id="tool_agent", name="ToolAgent", summary_text="Done."
|
||||
)
|
||||
tool_exec = AgentExecutor(tool_agent, id="tool_agent")
|
||||
|
||||
spy_agent = _SessionIdCapturingAgent(id="spy_agent", name="SpyAgent")
|
||||
spy_exec = AgentExecutor(spy_agent, id="spy_agent")
|
||||
|
||||
coordinator = _FullHistoryReplayCoordinator(id="coord", target_exec=spy_exec)
|
||||
|
||||
wf = (
|
||||
WorkflowBuilder(start_executor=tool_exec, output_executors=[coordinator])
|
||||
.add_edge(tool_exec, coordinator)
|
||||
.add_edge(coordinator, spy_exec)
|
||||
.build()
|
||||
)
|
||||
|
||||
result = await wf.run("initial prompt")
|
||||
assert result.get_outputs() is not None
|
||||
|
||||
# The spy agent must have seen service_session_id=None (cleared before run).
|
||||
# Without the fix, it would see "resp_PREVIOUS_RUN" and the API would raise
|
||||
# "Duplicate item found" because the same function-call IDs appear in both
|
||||
# previous_response_id (server-stored) and the explicit input messages.
|
||||
assert spy_agent._captured_service_session_id is None # pyright: ignore[reportPrivateUsage]
|
||||
|
||||
|
||||
async def test_from_response_preserves_service_session_id() -> None:
|
||||
"""from_response hands off a prior agent's full conversation to the next executor.
|
||||
The receiving executor's service_session_id is preserved so the API can continue
|
||||
the conversation using previous_response_id."""
|
||||
tool_agent = _ToolHistoryAgent(
|
||||
id="tool_agent2", name="ToolAgent", summary_text="Done."
|
||||
)
|
||||
tool_exec = AgentExecutor(tool_agent, id="tool_agent2")
|
||||
|
||||
spy_agent = _SessionIdCapturingAgent(id="spy_agent2", name="SpyAgent")
|
||||
spy_exec = AgentExecutor(spy_agent, id="spy_agent2")
|
||||
# Simulate a prior run on the spy executor.
|
||||
spy_exec._session.service_session_id = "resp_PREVIOUS_RUN" # pyright: ignore[reportPrivateUsage]
|
||||
|
||||
wf = (
|
||||
WorkflowBuilder(start_executor=tool_exec, output_executors=[spy_exec])
|
||||
.add_edge(tool_exec, spy_exec)
|
||||
.build()
|
||||
)
|
||||
|
||||
result = await wf.run("start")
|
||||
assert result.get_outputs() is not None
|
||||
|
||||
assert spy_agent._captured_service_session_id == "resp_PREVIOUS_RUN" # pyright: ignore[reportPrivateUsage]
|
||||
|
||||
@@ -20,7 +20,6 @@ management, enabling persistent conversation history storage across sessions
|
||||
with Redis as the backend data store.
|
||||
"""
|
||||
|
||||
|
||||
# Default Redis URL for local Redis Stack.
|
||||
# Override via the REDIS_URL environment variable for remote or authenticated instances.
|
||||
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
|
||||
|
||||
@@ -153,7 +153,7 @@ class Coordinator(Executor):
|
||||
# Human approved the draft as-is; forward it unchanged.
|
||||
await ctx.send_message(
|
||||
AgentExecutorRequest(
|
||||
messages=original_request.conversation + [Message("user", text="The draft is approved as-is.")],
|
||||
messages=[*original_request.conversation, *[Message("user", text="The draft is approved as-is.")]],
|
||||
should_respond=True,
|
||||
),
|
||||
target_id=self.final_editor_id,
|
||||
@@ -161,16 +161,15 @@ class Coordinator(Executor):
|
||||
return
|
||||
|
||||
# Human provided feedback; prompt the writer to revise.
|
||||
conversation: list[Message] = list(original_request.conversation)
|
||||
instruction = (
|
||||
"A human reviewer shared the following guidance:\n"
|
||||
f"{note or 'No specific guidance provided.'}\n\n"
|
||||
"Rewrite the draft from the previous assistant message into a polished final version. "
|
||||
"Keep the response under 120 words and reflect any requested tone adjustments."
|
||||
)
|
||||
conversation.append(Message("user", text=instruction))
|
||||
await ctx.send_message(
|
||||
AgentExecutorRequest(messages=conversation, should_respond=True), target_id=self.writer_id
|
||||
AgentExecutorRequest(messages=[Message("user", text=instruction)], should_respond=True),
|
||||
target_id=self.writer_id,
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -123,7 +123,8 @@ class Coordinator(Executor):
|
||||
)
|
||||
conversation.append(Message("user", text=instruction))
|
||||
await ctx.send_message(
|
||||
AgentExecutorRequest(messages=conversation, should_respond=True), target_id=self.writer_name
|
||||
AgentExecutorRequest(messages=conversation, should_respond=True),
|
||||
target_id=self.writer_name,
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -144,7 +144,9 @@ async def run_agent_framework_with_cycle() -> None:
|
||||
if last_message and "APPROVED" in last_message.text:
|
||||
await context.yield_output("Content approved.")
|
||||
else:
|
||||
await context.send_message(AgentExecutorRequest(messages=response.full_conversation, should_respond=True))
|
||||
await context.send_message(
|
||||
AgentExecutorRequest(messages=response.full_conversation, should_respond=True)
|
||||
)
|
||||
|
||||
workflow = (
|
||||
WorkflowBuilder(start_executor=researcher)
|
||||
|
||||
Reference in New Issue
Block a user