mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
c06af9a1b3
* Add dotnet integration test report to CI
- Add --report-junit flag to dotnet integration test step to generate
JUnit XML alongside TRX, with explicit --results-directory to
centralize output in IntegrationTestResults/
- Upload JUnit XML artifacts from each matrix leg (net10.0/ubuntu,
net472/windows) as dotnet-test-results-{framework}-{os}
- Add dotnet-integration-test-report job that downloads artifacts,
runs the existing aggregate.py script, posts markdown to Job Summary,
and saves trend history via actions/cache
- Refactor aggregate.py to discover JUnit XML files recursively,
supporting both pytest (pytest.xml) and xunit (*.junit.xml) layouts
- Handle provider name derivation for dotnet artifact naming convention
- Fix nodeid collision when same test runs under multiple frameworks
by qualifying keys with provider when collisions are detected
- Improve module extraction for dotnet C# classnames (recognizes
IntegrationTests/UnitTests namespace segments)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: trigger dotnet CI for report validation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: use .junit extension (not .junit.xml) for xunit v3 output
xUnit v3 generates files with .junit extension, not .junit.xml.
Update upload glob and aggregate.py discovery to match.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: use deterministic provider-qualified keys for dotnet tests
Always prefix dotnet test keys with provider (e.g. net10.0 (ubuntu)::TestName)
to ensure stable, comparable counts across runs regardless of file parse order.
Also show Executed (passed+failed) instead of Total in summary table.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: match Python report summary format (Total, passed/total, etc.)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: split dotnet report into per-framework tables
Dotnet tests run on multiple frameworks (net10.0, net472). Instead of
one combined table with unstable totals, show separate sections per
framework — each with its own summary row and per-test table. Python
reports retain the original single-table format.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Re-enable 7 flaky dotnet integration tests with increased timeouts
Increase timeouts to reduce timing-related flakiness in LLM-backed
integration tests (issue #4971):
- ExternalClientTests: 60s -> 120s default timeout
- SamplesValidationBase: 60s -> 120s default timeout
- ConsoleAppSamplesValidation: 90s -> 150s for long-running tests
- AzureFunctions SamplesValidation: 2min -> 3min orchestration timeout,
60s -> 90s per-step WaitForConditionAsync timeouts
Remove all Skip=Flaky annotations and unused SkipFlakyTimingTest constants.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Re-skip LLM non-determinism flaky tests, keep timeout fixes
Re-skip SingleAgentOrchestrationHITLSampleValidationAsync and
LongRunningToolsSampleValidationAsync - these fail due to LLM producing
extra review notifications, not timeouts. Updated skip reasons to
accurately describe the root cause. Reverted unnecessary timeout change
on the skipped LongRunningTools test.
The remaining 5 re-enabled tests with timeout increases are stable.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Enable Anthropic integration tests in CI
Replace hardcoded skip with conditional skip pattern (matching
CopilotStudio approach): tests gracefully skip when ANTHROPIC_API_KEY
is missing, and run when present.
Changes:
- AnthropicChatCompletionFixture: try/catch in InitializeAsync with
Assert.Skip on missing config (replaces hardcoded SkipReason)
- AnthropicSkillsIntegrationTests: same pattern per test method
- dotnet-build-and-test.yml: wire up ANTHROPIC_API_KEY,
ANTHROPIC_CHAT_MODEL_NAME, and ANTHROPIC_REASONING_MODEL_NAME
env vars to the integration test step
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix missing System using in AnthropicSkillsIntegrationTests
Add 'using System;' for InvalidOperationException in try/catch blocks.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Skip flaky SingleAgentOrchestrationChainingSampleValidationAsync
LLM non-determinism causes Assert.NotNull failures on orchestration
results. Skip until test logic is hardened against non-deterministic
LLM responses.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Re-enable HITL and LongRunningTools tests with timeout and flexibility fixes
- Remove Skip attribute from SingleAgentOrchestrationHITLSampleValidationAsync
- Remove Skip attribute from LongRunningToolsSampleValidationAsync
- Increase timeout from 120s/90s to 180s to accommodate 2+ LLM round-trips
- Replace rigid 2-cycle assertion with flexible approval logic that handles
extra review cycles from LLM non-determinism
Fixes the two failure modes identified in #4971:
1. Timeout: 120s/90s was insufficient for multiple LLM calls under CI load
2. Extra notifications: Assert.Fail on 3rd+ review cycle was too rigid
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Increase AzureFunctions LongRunningTools test timeouts from 90s to 180s
The LongRunningToolsSampleValidationAsync test in the AzureFunctions integration
tests was failing in CI with TimeoutException at the 'Content published
notification is logged' step. The 90-second timeouts are too tight for CI
environments where LLM calls and orchestration overhead can be slow.
Increased all three WaitForConditionAsync timeouts from 90s to 180s:
- Waiting for human feedback notification
- Waiting for publish notification (the step that was failing)
- Waiting for orchestration completion
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Merge main and fix dotnet report path after flaky_report rename
Merge upstream/main which renamed scripts/flaky_report/ to
scripts/integration_test_report/ (from Python PR #5454). Update the
dotnet-build-and-test workflow to reference the new path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add RetryFact to DurableTask and AzureFunctions integration tests
These tests interact with LLMs via stdin/stdout (DurableTask) or HTTP
(AzureFunctions) and are inherently non-deterministic. Unlike the Python
side which uses pytest-retry, the dotnet tests had no retry mechanism
and a single transient failure would fail the entire CI run.
Changes:
- Switch [Fact] to [RetryFact(2, 5000)] on all LLM-dependent tests
across ConsoleAppSamplesValidation, ExternalClientTests,
WorkflowConsoleAppSamplesValidation, and AzureFunctions SamplesValidation
- Add re-prompt mechanism to LongRunningToolsSampleValidationAsync:
if the LLM doesn't invoke the tool within 60s, re-send the prompt
(up to 2 retries) instead of burning the full timeout
- Reduce LongRunningTools timeout from 240s to 180s (re-prompt makes
the extra buffer unnecessary)
- Leave simple/deterministic tests as [Fact] (SingleAgent, unit tests)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add persist-credentials: false to Integration Test Report checkout step
Matches the convention used by other checkout steps in this workflow
to avoid leaving GITHUB_TOKEN credentials in the local git config.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* small fixes
* disable anthropic failing tests
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
237 lines
10 KiB
C#
237 lines
10 KiB
C#
// Copyright (c) Microsoft. All rights reserved.
|
|
|
|
using System.ComponentModel;
|
|
using System.Diagnostics;
|
|
using System.Reflection;
|
|
using Microsoft.Agents.AI.DurableTask.IntegrationTests.Logging;
|
|
using Microsoft.DurableTask;
|
|
using Microsoft.DurableTask.Client;
|
|
using Microsoft.Extensions.AI;
|
|
using Microsoft.Extensions.Configuration;
|
|
using OpenAI.Chat;
|
|
|
|
namespace Microsoft.Agents.AI.DurableTask.IntegrationTests;
|
|
|
|
/// <summary>
|
|
/// Tests for scenarios where an external client interacts with Durable Task Agents.
|
|
/// </summary>
|
|
[Collection("Sequential")]
|
|
[Trait("Category", "Integration")]
|
|
public sealed class ExternalClientTests(ITestOutputHelper outputHelper) : IDisposable
|
|
{
|
|
private static readonly TimeSpan s_defaultTimeout = Debugger.IsAttached
|
|
? TimeSpan.FromMinutes(5)
|
|
: TimeSpan.FromSeconds(120);
|
|
|
|
private static readonly IConfiguration s_configuration =
|
|
new ConfigurationBuilder()
|
|
.AddUserSecrets(Assembly.GetExecutingAssembly())
|
|
.AddEnvironmentVariables()
|
|
.Build();
|
|
|
|
private readonly ITestOutputHelper _outputHelper = outputHelper;
|
|
private readonly CancellationTokenSource _cts = new(delay: s_defaultTimeout);
|
|
|
|
private CancellationToken TestTimeoutToken => this._cts.Token;
|
|
|
|
public void Dispose() => this._cts.Dispose();
|
|
|
|
[RetryFact(2, 5000)]
|
|
public async Task SimplePromptAsync()
|
|
{
|
|
// Setup
|
|
AIAgent simpleAgent = TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
instructions: "You are a helpful assistant that always responds with a friendly greeting.",
|
|
name: "TestAgent");
|
|
|
|
using TestHelper testHelper = TestHelper.Start([simpleAgent], this._outputHelper);
|
|
|
|
// A proxy agent is needed to call the hosted test agent
|
|
AIAgent simpleAgentProxy = simpleAgent.AsDurableAgentProxy(testHelper.Services);
|
|
|
|
// Act: send a prompt to the agent and wait for a response
|
|
AgentSession session = await simpleAgentProxy.CreateSessionAsync(this.TestTimeoutToken);
|
|
await simpleAgentProxy.RunAsync(
|
|
message: "Hello!",
|
|
session,
|
|
cancellationToken: this.TestTimeoutToken);
|
|
|
|
AgentResponse response = await simpleAgentProxy.RunAsync(
|
|
message: "Repeat what you just said but say it like a pirate",
|
|
session,
|
|
cancellationToken: this.TestTimeoutToken);
|
|
|
|
// Assert: verify the agent responded appropriately
|
|
// We can't predict the exact response, but we can check that there is one response
|
|
Assert.NotNull(response);
|
|
Assert.NotEmpty(response.Text);
|
|
|
|
// Assert: verify the expected log entries were created in the expected category
|
|
IReadOnlyCollection<LogEntry> logs = testHelper.GetLogs();
|
|
Assert.NotEmpty(logs);
|
|
List<LogEntry> agentLogs = [.. logs.Where(log => log.Category.Contains(simpleAgent.Name!)).ToList()];
|
|
Assert.NotEmpty(agentLogs);
|
|
Assert.Contains(agentLogs, log => log.EventId.Name == "LogAgentRequest" && log.Message.Contains("Hello!"));
|
|
Assert.Contains(agentLogs, log => log.EventId.Name == "LogAgentResponse");
|
|
}
|
|
|
|
[RetryFact(2, 5000)]
|
|
public async Task CallFunctionToolsAsync()
|
|
{
|
|
int weatherToolInvocationCount = 0;
|
|
int packingListToolInvocationCount = 0;
|
|
|
|
string GetWeather(string location)
|
|
{
|
|
weatherToolInvocationCount++;
|
|
return $"The weather in {location} is sunny with a high of 75°F and a low of 55°F.";
|
|
}
|
|
|
|
string SuggestPackingList(string weather, bool isSunny)
|
|
{
|
|
packingListToolInvocationCount++;
|
|
return isSunny ? "Pack sunglasses and sunscreen." : "Pack a raincoat and umbrella.";
|
|
}
|
|
|
|
AIAgent tripPlanningAgent = TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
instructions: "You are a trip planning assistant. Use the weather tool and packing list tool as needed.",
|
|
name: "TripPlanningAgent",
|
|
description: "An agent to help plan your day trips",
|
|
tools: [AIFunctionFactory.Create(GetWeather), AIFunctionFactory.Create(SuggestPackingList)]
|
|
);
|
|
|
|
using TestHelper testHelper = TestHelper.Start([tripPlanningAgent], this._outputHelper);
|
|
AIAgent tripPlanningAgentProxy = tripPlanningAgent.AsDurableAgentProxy(testHelper.Services);
|
|
|
|
// Act: send a prompt to the agent
|
|
AgentResponse response = await tripPlanningAgentProxy.RunAsync(
|
|
message: "Help me figure out what to pack for my Seattle trip next Sunday",
|
|
cancellationToken: this.TestTimeoutToken);
|
|
|
|
// Assert: verify the agent responded appropriately
|
|
// We can't predict the exact response, but we can check that there is one response
|
|
Assert.NotNull(response);
|
|
Assert.NotEmpty(response.Text);
|
|
|
|
// Assert: verify the expected log entries were created in the expected category
|
|
IReadOnlyCollection<LogEntry> logs = testHelper.GetLogs();
|
|
Assert.NotEmpty(logs);
|
|
|
|
List<LogEntry> agentLogs = [.. logs.Where(log => log.Category.Contains(tripPlanningAgent.Name!)).ToList()];
|
|
Assert.NotEmpty(agentLogs);
|
|
Assert.Contains(agentLogs, log => log.EventId.Name == "LogAgentRequest" && log.Message.Contains("Seattle trip"));
|
|
Assert.Contains(agentLogs, log => log.EventId.Name == "LogAgentResponse");
|
|
|
|
// Assert: verify the tools were called
|
|
Assert.Equal(1, weatherToolInvocationCount);
|
|
Assert.Equal(1, packingListToolInvocationCount);
|
|
}
|
|
|
|
[RetryFact(2, 5000)]
|
|
public async Task CallLongRunningFunctionToolsAsync()
|
|
{
|
|
[Description("Starts a greeting workflow and returns the workflow instance ID")]
|
|
string StartWorkflowTool(string name)
|
|
{
|
|
return DurableAgentContext.Current.ScheduleNewOrchestration(nameof(RunWorkflowAsync), input: name);
|
|
}
|
|
|
|
[Description("Gets the current status of a previously started workflow. A null response means the workflow has not started yet.")]
|
|
static async Task<OrchestrationMetadata?> GetWorkflowStatusToolAsync(string instanceId)
|
|
{
|
|
OrchestrationMetadata? status = await DurableAgentContext.Current.GetOrchestrationStatusAsync(
|
|
instanceId,
|
|
includeDetails: true);
|
|
if (status == null)
|
|
{
|
|
// If the status is not found, wait a bit before returning null to give the workflow time to start
|
|
await Task.Delay(TimeSpan.FromSeconds(1));
|
|
}
|
|
|
|
return status;
|
|
}
|
|
|
|
async Task<string> RunWorkflowAsync(TaskOrchestrationContext context, string name)
|
|
{
|
|
// 1. Get agent and create a session
|
|
DurableAIAgent agent = context.GetAgent("SimpleAgent");
|
|
AgentSession session = await agent.CreateSessionAsync(this.TestTimeoutToken);
|
|
|
|
// 2. Call an agent and tell it my name
|
|
await agent.RunAsync($"My name is {name}.", session);
|
|
|
|
// 3. Call the agent again with the same session (ask it to tell me my name)
|
|
AgentResponse response = await agent.RunAsync("What is my name?", session);
|
|
|
|
return response.Text;
|
|
}
|
|
|
|
using TestHelper testHelper = TestHelper.Start(
|
|
this._outputHelper,
|
|
configureAgents: agents =>
|
|
{
|
|
// This is the agent that will be used to start the workflow
|
|
agents.AddAIAgentFactory(
|
|
"WorkflowAgent",
|
|
sp => TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
name: "WorkflowAgent",
|
|
instructions: "You can start greeting workflows and check their status.",
|
|
services: sp,
|
|
tools: [
|
|
AIFunctionFactory.Create(StartWorkflowTool),
|
|
AIFunctionFactory.Create(GetWorkflowStatusToolAsync)
|
|
]));
|
|
|
|
// This is the agent that will be called by the workflow
|
|
agents.AddAIAgent(TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
name: "SimpleAgent",
|
|
instructions: "You are a simple assistant."
|
|
));
|
|
},
|
|
durableTaskRegistry: registry => registry.AddOrchestratorFunc<string, string>(nameof(RunWorkflowAsync), RunWorkflowAsync));
|
|
|
|
AIAgent workflowManagerAgentProxy = testHelper.Services.GetDurableAgentProxy("WorkflowAgent");
|
|
|
|
// Act: send a prompt to the agent
|
|
AgentSession session = await workflowManagerAgentProxy.CreateSessionAsync(this.TestTimeoutToken);
|
|
await workflowManagerAgentProxy.RunAsync(
|
|
message: "Start a greeting workflow for \"John Doe\".",
|
|
session,
|
|
cancellationToken: this.TestTimeoutToken);
|
|
|
|
// Act: prompt it again to wait for the workflow to complete
|
|
AgentResponse response = await workflowManagerAgentProxy.RunAsync(
|
|
message: "Wait for the workflow to complete and tell me the result.",
|
|
session,
|
|
cancellationToken: this.TestTimeoutToken);
|
|
|
|
// Assert: verify the agent responded appropriately
|
|
// We can't predict the exact response, but we can check that there is one response
|
|
Assert.NotNull(response);
|
|
Assert.NotEmpty(response.Text);
|
|
Assert.Contains("John Doe", response.Text);
|
|
}
|
|
|
|
[Fact]
|
|
public void AsDurableAgentProxy_ThrowsWhenAgentNotRegistered()
|
|
{
|
|
// Setup: Register one agent but try to use a different one
|
|
AIAgent registeredAgent = TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
instructions: "You are a helpful assistant.",
|
|
name: "RegisteredAgent");
|
|
|
|
using TestHelper testHelper = TestHelper.Start([registeredAgent], this._outputHelper);
|
|
|
|
// Create an agent with a different name that isn't registered
|
|
AIAgent unregisteredAgent = TestHelper.GetAzureOpenAIChatClient(s_configuration).AsAIAgent(
|
|
instructions: "You are a helpful assistant.",
|
|
name: "UnregisteredAgent");
|
|
|
|
// Act & Assert: Should throw AgentNotRegisteredException
|
|
AgentNotRegisteredException exception = Assert.Throws<AgentNotRegisteredException>(
|
|
() => unregisteredAgent.AsDurableAgentProxy(testHelper.Services));
|
|
|
|
Assert.Equal("UnregisteredAgent", exception.AgentName);
|
|
}
|
|
}
|