Add maximal E2E ChatClientAgent + ApprovalRequiredAIFunction repro for #5350

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/baf2578f-bdc5-479b-8ebc-1b198f26c151

Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
This commit is contained in:
copilot-swe-agent[bot]
2026-05-19 13:53:09 +00:00
committed by GitHub
Unverified
parent e28682a853
commit 8d503c03aa
2 changed files with 257 additions and 12 deletions
@@ -4,9 +4,12 @@
**Status of repro:** A focused repro test class
`dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/ToolApprovalRequestCheckpointReproTests.cs`
was added covering six progressively more end-to-end variants of the path the issue
describes. **All six tests pass** on `main`, i.e. *the bug as described does not
reproduce* at any of the layers exercised here:
was added covering seven progressively more end-to-end variants of the path the issue
describes — including a **maximal** end-to-end test that uses a real
`ChatClientAgent` driven by `FunctionInvokingChatClient` with an
`ApprovalRequiredAIFunction`. **All seven tests pass** on `main`, consistently across
5 back-to-back runs, i.e. *the bug as described does not reproduce* at any of the
layers exercised here:
| # | Test | What it exercises | Result |
|---|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
@@ -16,6 +19,7 @@ reproduce* at any of the layers exercised here:
| 4 | `Repro_5350_DirectJsonMarshallerRoundtrip_IsDeterministic` | 25× repetition of #1 to rule out flakiness / JIT order | Pass |
| 5 | `Repro_5350_CaptureWireFormat_ForInspection` | Captures the on-the-wire shape so we can compare against the OP's SQL row | Pass |
| 6 | `Repro_5350_EndToEnd_JsonCheckpointResume_PreservesFunctionCallContentAsync` | Full `CheckpointManager.CreateJson(InMemoryJsonStore) → RunStreamingAsync → SuperStep checkpoint → ResumeStreamingAsync` cycle with a `RequestPort<TARC, TARR>` | Pass |
| 7 | `Repro_5350_EndToEnd_ChatClientAgent_WithApprovalRequiredTool_JsonCheckpointResume_PreservesFunctionCallContentAndInvokesToolAsync` | **Maximal**: `ChatClientAgent` over a `MockChatClient` with an `ApprovalRequiredAIFunction`, single-agent `WorkflowBuilder` (no orchestration), `CheckpointManager.CreateJson(InMemoryJsonStore)`. Asserts both that the resumed `RequestInfoEvent.Request` still carries a `FunctionCallContent` AND that approving the request actually invokes the underlying `AIFunction` and lets the workflow continue. | Pass |
This is **consistent with [@lokitoth's second comment](https://github.com/microsoft/agent-framework/issues/5350#issuecomment-4379664401)**:
@@ -71,16 +75,19 @@ the OP's reported scenario, varying one dimension at a time.
### Track A — Reproduce in a configuration closer to the OP's pattern "B"
The OP's repro path differs from the new tests in three concrete ways. Each is a
candidate root cause; the test matrix below isolates them. Each row is "add a
test that reproduces the OP's symptom (`postResume.ToolCall is not FunctionCallContent`)".
The OP's repro path differs from the new tests in three concrete ways. Test #7 (the
maximal repro added to this PR) closes the biggest gap — it uses a real
`ChatClientAgent` + `FunctionInvokingChatClient` + `ApprovalRequiredAIFunction` and
still passes — but the remaining differences are still worth isolating. Each row below
is "add a test that reproduces the OP's symptom (`postResume.ToolCall is not
FunctionCallContent`)".
| Step | Variable that changes vs. the passing tests in this repo | Why it matters | Pass criteria |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| A1 | Use `ChatClientAgent` + `ApprovalRequiredAIFunction` bound into a `WorkflowBuilder` (the GroupChatToolApproval "pattern B" path), with a fake `IChatClient` that emits a single `FunctionCallContent`. | This is the only major piece of the OP's setup that the current tests do not exercise. The TARC in this path is *generated* by `FunctionInvokingChatClient` and flows through the `AIAgentHostExecutor`. | Test fails (TARC.ToolCall comes back as base type) → root cause is in the agent-host/FICC bridge, not the marshaller. |
| A2 | Same as A1 but with the request payload also surfaced as part of a `ChatMessage.Contents`-style transport (whatever the agent host actually serializes through). Inspect the on-the-wire JSON for the inner `toolCall` and look for an absent or differently-named `$type`. | If the TARC is round-tripped as `AIContent`/`ChatMessage` instead of as itself, the polymorphism is two-deep (`AIContent` → TARC, TARC → ToolCall). It is plausible that one branch resolves but the other doesn't. | Find missing `$type` in serialized payload. |
| A3 | Same as the passing test #6 in this file, but with the SQL Server-style store that round-trips the `JsonElement` through `string` and back (e.g. `element.GetRawText()``JsonDocument.Parse`). | The OP uses Dapper + SQL Server. If the column is `nvarchar` and the round-trip preserves ordering, this should be identity-preserving — but if the OP uses `jsonb`-like storage that reorders metadata properties, the `$type` discriminator can be moved out of "first" position, which then requires `AllowOutOfOrderMetadataProperties = true`. Note `JsonMarshaller` only propagates that one flag from the user's `customOptions`. | Test fails when ordering is permuted before deserialization. Confirms the storage layer is reordering metadata. |
| A4 | Same as #6 but with a non-default `JsonSerializerOptions` passed as `customOptions` to `CheckpointManager.CreateJson`, where the user-supplied options do **not** include the polymorphism resolver and `JsonMarshaller`'s `LookupTypeInfo` falls back to them for some type. | `JsonMarshaller.LookupTypeInfo` only goes to the external options when the internal chain doesn't know about the type. For most cases this won't trigger, but it's worth confirming that supplying a custom `JsonSerializerOptions` does not silently displace the internal chain. | Either the external options are never consulted for `AIContent` (good), or there is a sneak path where they are (regression). |
| A1 | ~~Use `ChatClientAgent` + `ApprovalRequiredAIFunction` bound into a `WorkflowBuilder`~~**covered by test #7 in this PR; passes.** | Was the largest gap to the OP's repro. Closed. | N/A — already passing. |
| A2 | Multi-agent variant: same as #7 but with the agent inside `GroupChatBuilder` (the OP's actual orchestration), to rule out a group-chat-specific re-wrap or replay path that drops the TARC.ToolCall type. | If group chat re-encodes TARC as part of `ChatMessage.Contents` (`AIContent` polymorphism is two-deep through `ToolApprovalRequestContent`), one branch may resolve and the other not. | Test fails → root cause is in the group-chat message round-trip, not the marshaller. |
| A3 | Same as #7 but with the JSON `JsonElement` round-tripped through `string` + `JsonDocument.Parse` between commit and retrieve (i.e. emulating the SQL `nvarchar` hop in the OP's Dapper store). | The OP uses Dapper + SQL Server. If the column / driver round-trip preserves ordering, this should be identity-preserving — but if it reorders metadata properties, the `$type` discriminator can be moved out of first position, which then requires `AllowOutOfOrderMetadataProperties = true`. Note `JsonMarshaller` only propagates that one flag from the user's `customOptions`. | Test fails when ordering is permuted before deserialization. Confirms the storage layer is reordering metadata. |
| A4 | Same as #7 but with a non-default `JsonSerializerOptions` passed as `customOptions` to `CheckpointManager.CreateJson`, where the user-supplied options do **not** include the polymorphism resolver and `JsonMarshaller`'s `LookupTypeInfo` falls back to them for some type. | `JsonMarshaller.LookupTypeInfo` only goes to the external options when the internal chain doesn't know about the type. For most cases this won't trigger, but it's worth confirming that supplying a custom `JsonSerializerOptions` does not silently displace the internal chain. | Either the external options are never consulted for `AIContent` (good), or there is a sneak path where they are (regression). |
### Track B — Validate the wire format the OP actually persists
@@ -130,8 +137,14 @@ The combination of:
- the resolver chain in `WorkflowsJsonUtilities.CreateDefaultOptions()` already
putting `AgentAbstractionsJsonUtilities.DefaultOptions.TypeInfoResolver` first
(which itself puts `AIJsonUtilities.DefaultOptions.TypeInfoResolver` first),
- the wire format captured above showing `"$type": "functionCall"` is present, and
- the full `Run → checkpoint → Resume` test in this PR passing,
- the wire format captured above showing `"$type": "functionCall"` is present,
- the full `Run → checkpoint → Resume` test in this PR passing for a plain
`RequestPort<TARC, ToolApprovalResponseContent>` workflow, **and**
- the maximal `ChatClientAgent` + `FunctionInvokingChatClient` +
`ApprovalRequiredAIFunction` test in this PR also passing — including the
assertion that the wrapped `AIFunction` is actually invoked exactly once after
approval and that the workflow then receives the resulting
`FunctionResultContent` and produces a final assistant message,
is sufficient to **disprove** the OP's stated hypothesis. Once Track A or
Track B identifies the actual cause, the issue should be updated with a brief
@@ -2,6 +2,9 @@
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;
@@ -253,4 +256,233 @@ public class ToolApprovalRequestCheckpointReproTests
serialized.Should().NotBeNullOrEmpty();
serialized.Should().Contain(CallId, "the call id should be present in the serialized form");
}
/// <summary>
/// Maximal end-to-end repro for issue #5350 using the same shape as the OP's reported
/// scenario (pattern "B" in the GroupChatToolApproval sample), but with a single
/// <see cref="ChatClientAgent"/> bound directly into a <see cref="WorkflowBuilder"/>
/// (no orchestration), and with a real <see cref="ApprovalRequiredAIFunction"/>-wrapped
/// tool that the agent actually attempts to call. The test:
/// <list type="number">
/// <item>builds a <see cref="ChatClientAgent"/> over a <see cref="MockChatClient"/> that
/// returns a <see cref="FunctionCallContent"/> on the first turn and a final assistant
/// text on the second turn (so <see cref="FunctionInvokingChatClient"/> converts the FCC
/// to a <see cref="ToolApprovalRequestContent"/> and surfaces it as a workflow
/// <see cref="RequestInfoEvent"/>),</item>
/// <item>persists checkpoints via the OP's exact path —
/// <c>CheckpointManager.CreateJson(InMemoryJsonStore)</c> +
/// <c>InProcessExecutionEnvironment.WithCheckpointing(...).RunStreamingAsync(...)</c> —
/// so every checkpoint is round-tripped through the same <see cref="JsonMarshaller"/> +
/// <see cref="PortableValueConverter"/> pipeline the OP's SQL-backed store uses,</item>
/// <item>validates that the first-run <see cref="RequestInfoEvent.Request"/> carries a
/// <see cref="ToolApprovalRequestContent"/> whose <c>ToolCall</c> is a
/// <see cref="FunctionCallContent"/>,</item>
/// <item>disposes the run and resumes from the last <see cref="SuperStepCompletedEvent"/>
/// checkpoint via <see cref="InProcessExecutionEnvironment.ResumeStreamingAsync"/>,</item>
/// <item>validates the re-emitted <see cref="RequestInfoEvent.Request"/> still carries a
/// <see cref="ToolApprovalRequestContent"/> whose <c>ToolCall</c> is a
/// <see cref="FunctionCallContent"/> — this is the assertion the OP claims fails,</item>
/// <item>sends an approval response back into the resumed run and asserts the wrapped
/// function is actually invoked (counter increments) and the workflow completes with a
/// final assistant message.</item>
/// </list>
/// </summary>
[Fact]
public async Task Repro_5350_EndToEnd_ChatClientAgent_WithApprovalRequiredTool_JsonCheckpointResume_PreservesFunctionCallContentAndInvokesToolAsync()
{
// Arrange — counting tool wrapped for approval
int invocationCount = 0;
const string ToolName = "GetWeather";
const string ToolResultText = "Sunny, 22°C";
AIFunction underlyingTool = AIFunctionFactory.Create(
([Description("City to look up")] string city) =>
{
Interlocked.Increment(ref invocationCount);
return ToolResultText;
},
name: ToolName,
description: "Gets the weather for the given city");
ApprovalRequiredAIFunction approvalTool = new(underlyingTool);
// Arrange — mock chat client that turn-1 emits an FCC for the approval-required tool,
// turn-2 (after FunctionInvokingChatClient processes the approval + invokes the tool +
// appends a FunctionResultContent) emits a final assistant text.
const string ToolCallId = "call-1";
const string FinalAssistantText = "The weather in Amsterdam is sunny and 22°C.";
int chatCallIndex = 0;
List<List<ChatMessage>> capturedInputs = new();
MockChatClient mockChatClient = new((messages, options) =>
{
// Capture a snapshot of the inputs the agent passed in for this service call so the
// test can later assert the FunctionResultContent flowed back to the model.
capturedInputs.Add(new List<ChatMessage>(messages));
int index = Interlocked.Increment(ref chatCallIndex) - 1;
return index switch
{
0 => new ChatResponse(new ChatMessage(ChatRole.Assistant,
[new FunctionCallContent(
callId: ToolCallId,
name: ToolName,
arguments: new Dictionary<string, object?> { ["city"] = "Amsterdam" })])),
_ => new ChatResponse(new ChatMessage(ChatRole.Assistant, FinalAssistantText)),
};
});
ChatClientAgent agent = new(
mockChatClient,
instructions: "You are a weather agent.",
name: "WeatherAgent",
tools: [approvalTool]);
// Arrange — single-agent workflow. The AIAgent is auto-promoted to an ExecutorBinding
// via the implicit operator on ExecutorBinding.
Workflow workflow = new WorkflowBuilder(agent).Build();
// Arrange — JSON checkpoint manager backed by an in-memory JSON store. This mirrors
// the OP's "JsonCheckpointStore-backed CheckpointManager.CreateJson(...)" path —
// every checkpoint is round-tripped through the same JsonMarshaller +
// PortableValueConverter that the OP's SQL-backed store uses, just without the
// disk/SQL hop.
CheckpointManager checkpointManager = CheckpointManager.CreateJson(new InMemoryJsonStore());
InProcessExecutionEnvironment env = InProcessExecution.OffThread;
List<ChatMessage> inputMessages = [new(ChatRole.User, "What's the weather in Amsterdam?")];
// Act 1 — run until we see the approval request, then capture the latest checkpoint.
ExternalRequest? firstRunRequest = null;
CheckpointInfo? checkpoint = null;
await using (StreamingRun firstRun = await env.WithCheckpointing(checkpointManager)
.RunStreamingAsync(workflow, inputMessages))
{
// Trigger an actual turn — without a TurnToken the AIAgentHostExecutor will not
// invoke the agent. This matches the GroupChatToolApproval sample and the
// StreamAsyncWithTurnTokenShouldExecuteWorkflow pattern in InProcessExecutionTests.
(await firstRun.TrySendMessageAsync(new TurnToken(emitEvents: false)))
.Should().BeTrue("the workflow should accept a TurnToken");
using CancellationTokenSource cts = new(TimeSpan.FromSeconds(30));
await foreach (WorkflowEvent evt in firstRun.WatchStreamAsync(blockOnPendingRequest: false, cts.Token))
{
if (evt is RequestInfoEvent requestInfo)
{
firstRunRequest ??= requestInfo.Request;
}
if (evt is SuperStepCompletedEvent step && step.CompletionInfo?.Checkpoint is { } cp)
{
checkpoint = cp;
}
}
}
firstRunRequest.Should().NotBeNull(
"the ChatClientAgent + FICC pipeline should have surfaced the approval request as a workflow RequestInfoEvent");
checkpoint.Should().NotBeNull(
"a checkpoint should have been produced while the approval request was pending");
chatCallIndex.Should().Be(1, "the mock chat client should have been called exactly once before the approval was requested");
invocationCount.Should().Be(0, "the underlying tool must NOT have been invoked before the approval was granted");
ToolApprovalRequestContent? preCheckpoint = firstRunRequest!.Data.As<ToolApprovalRequestContent>();
preCheckpoint.Should().NotBeNull("the pending external request should carry a ToolApprovalRequestContent payload");
preCheckpoint!.ToolCall.Should().BeOfType<FunctionCallContent>(
"the pre-checkpoint pending request payload must already be a FunctionCallContent");
// Act 2 — resume from the checkpoint with a brand-new env / handle so that any
// in-process AIAgentHostExecutor instance state is gone and everything has to be
// rehydrated from the on-disk JSON.
ExternalRequest? resumedRequest = null;
List<WorkflowEvent> postResumeEvents = [];
await using (StreamingRun resumed = await env.WithCheckpointing(checkpointManager)
.ResumeStreamingAsync(workflow, checkpoint!))
{
using CancellationTokenSource cts = new(TimeSpan.FromSeconds(30));
// First pass: see the re-emitted RequestInfoEvent, but don't block on it.
await foreach (WorkflowEvent evt in resumed.WatchStreamAsync(blockOnPendingRequest: false, cts.Token))
{
if (evt is RequestInfoEvent requestInfo)
{
resumedRequest ??= requestInfo.Request;
}
}
resumedRequest.Should().NotBeNull(
"the resumed workflow should re-emit the pending approval RequestInfoEvent");
// The core issue #5350 assertion.
ToolApprovalRequestContent? postResume = resumedRequest!.Data.As<ToolApprovalRequestContent>();
postResume.Should().NotBeNull(
"ExternalRequest.Data.As<ToolApprovalRequestContent>() should materialize the payload after a JSON-file checkpoint resume");
postResume!.ToolCall.Should().NotBeNull("the resumed TARC must carry its ToolCall");
postResume.ToolCall.Should().BeOfType<FunctionCallContent>(
"after CheckpointManager.CreateJson(InMemoryJsonStore) round-trip via " +
"ResumeStreamingAsync, ToolApprovalRequestContent.ToolCall must still be a " +
"FunctionCallContent so that FunctionInvokingChatClient's pattern match " +
"(`tarc.ToolCall is FunctionCallContent`) continues to fire (issue #5350).");
FunctionCallContent resumedFcc = (FunctionCallContent)postResume.ToolCall;
resumedFcc.Name.Should().Be(ToolName);
resumedFcc.CallId.Should().EndWith(ToolCallId,
"the workflow rewrites the CallId with an executor-scoped prefix, but should preserve the original tail");
// Act 3 — send the approval response back into the resumed run and watch the
// remaining stream. This drives FunctionInvokingChatClient through its
// post-approval branch, where it must invoke the underlying AIFunction, append a
// FunctionResultContent, and call the model a second time.
ToolApprovalResponseContent approvalResponse = postResume.CreateResponse(approved: true);
await resumed.SendResponseAsync(resumedRequest.CreateResponse(approvalResponse));
using CancellationTokenSource cts2 = new(TimeSpan.FromSeconds(30));
await foreach (WorkflowEvent evt in resumed.WatchStreamAsync(blockOnPendingRequest: false, cts2.Token))
{
postResumeEvents.Add(evt);
}
}
// Assert 3 — the tool actually got called as part of the approval round-trip, and
// the workflow continued without raising errors.
invocationCount.Should().Be(1,
"approving the request should cause FunctionInvokingChatClient to invoke the wrapped AIFunction exactly once");
chatCallIndex.Should().Be(2,
"after the tool was invoked, FunctionInvokingChatClient should have made a second chat-client call to produce the final assistant message");
capturedInputs.Should().HaveCount(2);
capturedInputs[1].Should().Contain(
m => m.Contents.OfType<FunctionResultContent>().Any(),
"the second chat-client call must include the FunctionResultContent produced by the approved tool invocation");
postResumeEvents.OfType<WorkflowErrorEvent>().Should().BeEmpty(
"no workflow errors should be raised when responding to the resumed approval request");
postResumeEvents.OfType<ExecutorFailedEvent>().Should().BeEmpty(
"no executor failures should be raised when responding to the resumed approval request");
}
/// <summary>
/// Minimal <see cref="IChatClient"/> stub for repro tests; delegates each call to a caller-supplied factory.
/// </summary>
private sealed class MockChatClient(Func<IEnumerable<ChatMessage>, ChatOptions?, ChatResponse> responseFactory) : IChatClient
{
public Task<ChatResponse> GetResponseAsync(IEnumerable<ChatMessage> messages, ChatOptions? options = null, CancellationToken cancellationToken = default)
=> Task.FromResult(responseFactory(messages, options));
public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
ChatResponse response = await this.GetResponseAsync(messages, options, cancellationToken).ConfigureAwait(false);
foreach (ChatResponseUpdate update in response.ToChatResponseUpdates())
{
yield return update;
}
}
public object? GetService(Type serviceType, object? serviceKey = null) => null;
public void Dispose() { }
}
}