agent-framework

Workflow improvement (#6025 )

Evan Mattson · 2026-05-22 15:56:32 +09:00

c82c0133fc

ci: pin third-party GitHub Actions to commit SHAs (#5972 )

Replaces every floating tag in our workflow and composite action files
with an immutable 40-character commit SHA, keeping the original `# vX`
comment so Dependabot can still propose version bumps. 186 occurrences
across 25 workflows and 2 composite actions.

Also widens the github-actions Dependabot entry to use the plural
`directories` key with `/.github/actions/*` so composite actions under
`.github/actions/<name>/action.yml` are kept up to date. Previously
Dependabot only scanned `.github/workflows` and the repo-root
`action.yml`, leaving our `python-setup` and `sample-validation-setup`
composite actions unmaintained.

Roger Barreto · 2026-05-20 22:10:32 +00:00

01a3c5be8a

Python: Bump Python package versions for a release (#5964 )

* Bump Python package versions to 1.5.0 for a release

* Promote orchestrations to 1.0.0rc1

* ci(python-setup): merge dynamic exclude into existing workspace exclude

The python-setup action injected exclude = [...] verbatim into
[tool.uv.workspace], producing a duplicate 'exclude' key when the
section already had a static exclude. Scope the rewrite to the
[tool.uv.workspace] section and append the package to the existing
array when present; idempotent if the package is already excluded.

* Address Copilot review feedback: raise inter-package floors to 1.5.0

- foundry, foundry-local: agent-framework-openai >=1.4.0 -> >=1.5.0
- azure-contentunderstanding: agent-framework-foundry >=1.4.0 -> >=1.5.0
- azurefunctions: pin agent-framework-durabletask to >=1.0.0b260519,<2

Keeps lockstep cohort consistent and avoids mixed 1.4.x / 1.5.0 installs.

* Re-include azurefunctions and durabletask in the uv workspace

The pinned durabletask>=1.4.0 floor is enough to make resolution succeed;
the workspace exclude was over-correction and broke CI samples and pyright
type-checking (re-exports in agent_framework/azure/__init__.pyi plus
samples/04-hosting/{azure_functions,durabletask}/ could not resolve their
imports). Dropping them from agent-framework-core[all] still stands so the
metapackage does not pull them.

* Restore azurefunctions and durabletask in agent-framework-core[all]

The durabletask floor pin keeps users on the safe 1.4.0, so they are once
again included in the metapackage. Update CHANGELOG to reflect the pin
rather than an [all] removal.

* Raise uvicorn ceiling in ag-ui and devui to allow 0.42+

The root override-dependencies pins uvicorn[standard]>=0.34.0 (no upper)
and the workspace lock resolves to 0.47.0. The package ceiling <0.42.0
meant the workspace was no longer testing the declared supported range.
Bump to <1 so the lock fits within the declared bounds.

Also picked up by validate-dependency-bounds: refresh stale orchestrations
RC pin in devui dev deps.

Evan Mattson · 2026-05-20 09:20:53 +09:00

4b0522d62d

ci(python-setup): drop -U upgrade flag from uv sync (#5961 )

The shared composite action ran `uv sync --all-packages --all-extras
--dev -U` on every job, which upgrades every dependency to the latest
compatible version instead of using the pinned versions in `uv.lock`.

That is currently producing a hard resolver failure on every CI job:

    No solution found when resolving dependencies for split
    (markers: python_full_version >= '3.11' and sys_platform == 'darwin')
    Because there are no versions of durabletask and
    agent-framework-durabletask depends on durabletask>=1.3.0,<2,
    we can conclude that agent-framework-durabletask's requirements
    are unsatisfiable.

Dropping `-U` makes the install use the workspace lockfile, which is
what is reproducible locally and what we publish releases against.
Upgrades should be opt-in (via a scheduled job or a separate workflow)
rather than implicit on every CI run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-05-19 19:33:11 +00:00

8636c70ddf

Triage improvements (#5880 )

Evan Mattson · 2026-05-15 10:49:46 +09:00

97eaef029e

Replace merge-gatekeeper Docker action with github-script polling (#5533 )

The upsidr/merge-gatekeeper@v1 action is a Dockerfile-based action that
builds a golang image on every run. On merge_group events the run step
is conditioned out via `if: github.event_name == 'pull_request'`, so the
build happens but produces nothing.

Replace with an actions/github-script@v8 polling loop that mirrors the
action's behavior exactly: merges combined-statuses and check-runs for
the PR head SHA, with combined-status winning on name collisions, and
the same conclusion mapping (skipped → dropped, success/neutral →
success, anything else terminal → error). Same job name, triggers,
permissions, timeout (3600s), interval (30s), and ignored list, so
existing required-check rules stay valid.

PR runs now poll the API in seconds instead of waiting on a per-run
docker image build, and merge_group runs become near-instant no-ops.

Evan Mattson · 2026-05-13 05:45:51 +00:00

9a301b8d4b

.NET: CI hardening — split Functions tests, re-enable skipped integration tests (#5717 )

* Split DurableTask/AzureFunctions integration tests into dedicated CI job

- Add -TestProjectNameExclude parameter to New-FilteredSolution.ps1
- Add 'functions' and 'core' path filters to paths-filter job
- Exclude DurableTask/AzureFunctions from main dotnet-test job
- Remove emulator setup from dotnet-test (no longer needed)
- Add new dotnet-test-functions job (ubuntu/net10.0 only, path-conditional)
- Update merge gate and report job to include dotnet-test-functions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR feedback: add Workflows.Generators to core filter, drop dotnetChanges gate from functions job

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable Anthropic integration tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Upgrade Anthropic SDK 12.13.0 -> 12.20.0 to fix M.E.AI incompatibility

Fixes MissingMethodException on WebSearchToolResultContent.get_Results()
caused by Anthropic 12.13.0 being compiled against an older
Microsoft.Extensions.AI.Abstractions version.

Suppress RT0003 in AI.Abstractions.csproj as the transitive reference
from the upgraded Anthropic SDK conflicts with the explicit one.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Anthropic unit test mocks for SDK 12.20.0 interface changes

Add missing interface members: IAnthropicClient.WebhookKey,
IBetaService.MemoryStores, IBetaService.Webhooks, IBetaService.UserProfiles

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable CheckSystem declarative integration tests

The CheckSystem.yaml tests were temporarily skipped in PR #4270 during
the Azure.AI.Projects 2.0.0-beta.1 SDK update. Since then, the system
variable plumbing (SystemScope, SetLastMessageAsync, conversation
initialization) has been significantly updated and stabilized. The
other tests in these same files pass reliably using the same
infrastructure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix CheckSystem test case to expect 1 response

The CheckSystem workflow sends a 'PASSED!' SendActivity when all system
variables are populated, producing 1 AgentResponseEvent. The test case
had min_response_count: 0 with no max, so the assertion defaulted max
to 0 and failed with 'Response count greater than expected: 0 (Actual: 1)'.
Updated to expect exactly 1 response, matching the SendActivity pattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable Foundry OpenAPI server-side tool integration test

Remove Skip="For manual testing only" from
AsAIAgent_WithOpenAPITool_NativeSDKCreation_InvokesServerSideToolAsync.
The test already uses RetryFact(3 retries, 5s delay) to handle
transient failures from the external restcountries.com API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Include workflow file in functions/core path filters

A PR editing only dotnet-build-and-test.yml would skip
dotnet-test-functions because the workflow path was missing
from both the functions and core path filter lists.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename filter parameters for consistency

TestProjectNameFilter  -> TestProjectNameIncludeFilter
TestProjectNameExclude -> TestProjectNameExcludeFilter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove unnecessary RT0003 warning suppression

The RT0003 suppression was added during the Anthropic SDK 12.20.0
upgrade but the warning no longer fires. Removing it to keep the
NoWarn list minimal.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove duplicate WebhookKey properties from merge

Both our branch and main added WebhookKey to the Anthropic test
mock classes, resulting in CS0102 duplicate definition errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-12 17:56:31 +00:00

cfd3dfe40b

propagate token (#5768 )

Evan Mattson · 2026-05-12 15:27:13 +09:00

939d4d0153

Trigger issue triage on bug-labeled issues (#5763 )

* Trigger issue triage on bug-labeled issues instead of manual dispatch

* Address PR feedback: scope concurrency cancellation to bug-label events

Evan Mattson · 2026-05-12 13:07:17 +09:00

fe09f13adb

.NET: Hosted Agents - RAG Sample with Azure AI Search (#5693 ) (#5701 )

* .NET: Hosted Agents - RAG Sample with Azure AI Search (#5693)

Adds a Hosted-AzureSearchRag sample plus a live Foundry.Hosting integration
test scenario backed by a real Azure AI Search index.

Sample (Hosted-AzureSearchRag): keyword-only Azure AI Search via
SearchClient adapter into TextSearchProvider, scope-aware
DevTemporaryTokenCredential consuming AZURE_BEARER_TOKEN_FOUNDRY +
AZURE_BEARER_TOKEN_SEARCH for local Docker, Dockerfile + contributor
Dockerfile mirroring Hosted-TextRag.

Integration test: AzureSearchRagHostedAgentFixture extends the PR #5598
HostedAgentFixture with the new azure-search-rag scenario branch in the
shared test container; AzureSearchRagHostedAgentTests asserts the model
returns canary tokens (TR-CANARY-7821, SHIP-CANARY-4493) that exist only
in the seeded documents - real proof the agent grounded its answer in
retrieved content rather than training data.

* Address PR 5701 Copilot review feedback

- Sample README: drop stale 'bootstraps the index on first run' line; index is pre-provisioned out of band

- Sample + TestContainer search adapters: propagate CancellationToken to await foreach via .WithCancellation()

Roger Barreto · 2026-05-11 13:59:42 +00:00

18d7a46a54

.NET: Foundry.Hosting IT - eliminate MSBuild parallel-output races (#5725 )

* .NET: Foundry.Hosted IT - fix MSBuild parallel-output races

Two surgical changes inside the dotnet-foundry-hosted-it job:

1. Replace dotnet build <slnx> -f net10.0 with dotnet build <test.csproj>. The test csproj pins TargetFrameworks=net10.0 and its ProjectReference closure gives MSBuild a single-rooted graph, eliminating the duplicate inner-builds that race on bin/obj. Drops the two New-FilteredSolution.ps1 steps.

2. In it-build-image.ps1, drop the -UsePrebuiltProjectReferences switch and always pass --no-dependencies to dotnet publish. Publish now resolves TestContainer's framework refs by reading prebuilt DLLs and never re-touches them. Replaces the partial-mitigation in PR #5689 with a structural fix.

Local validation confirmed published Foundry.dll has identical mtime and bytes as the prebuild output.

* .NET: dotnet test - use --project flag for Microsoft Testing Platform

Roger Barreto · 2026-05-11 09:39:13 +00:00

d2ce0e9087

.NET: Python: Add dotnet integration test report to CI (#5515 )

* Add dotnet integration test report to CI

- Add --report-junit flag to dotnet integration test step to generate
  JUnit XML alongside TRX, with explicit --results-directory to
  centralize output in IntegrationTestResults/
- Upload JUnit XML artifacts from each matrix leg (net10.0/ubuntu,
  net472/windows) as dotnet-test-results-{framework}-{os}
- Add dotnet-integration-test-report job that downloads artifacts,
  runs the existing aggregate.py script, posts markdown to Job Summary,
  and saves trend history via actions/cache
- Refactor aggregate.py to discover JUnit XML files recursively,
  supporting both pytest (pytest.xml) and xunit (*.junit.xml) layouts
- Handle provider name derivation for dotnet artifact naming convention
- Fix nodeid collision when same test runs under multiple frameworks
  by qualifying keys with provider when collisions are detected
- Improve module extraction for dotnet C# classnames (recognizes
  IntegrationTests/UnitTests namespace segments)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: trigger dotnet CI for report validation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use .junit extension (not .junit.xml) for xunit v3 output

xUnit v3 generates files with .junit extension, not .junit.xml.
Update upload glob and aggregate.py discovery to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use deterministic provider-qualified keys for dotnet tests

Always prefix dotnet test keys with provider (e.g. net10.0 (ubuntu)::TestName)
to ensure stable, comparable counts across runs regardless of file parse order.
Also show Executed (passed+failed) instead of Total in summary table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: match Python report summary format (Total, passed/total, etc.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: split dotnet report into per-framework tables

Dotnet tests run on multiple frameworks (net10.0, net472). Instead of
one combined table with unstable totals, show separate sections per
framework — each with its own summary row and per-test table. Python
reports retain the original single-table format.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable 7 flaky dotnet integration tests with increased timeouts

Increase timeouts to reduce timing-related flakiness in LLM-backed
integration tests (issue #4971):

- ExternalClientTests: 60s -> 120s default timeout
- SamplesValidationBase: 60s -> 120s default timeout
- ConsoleAppSamplesValidation: 90s -> 150s for long-running tests
- AzureFunctions SamplesValidation: 2min -> 3min orchestration timeout,
  60s -> 90s per-step WaitForConditionAsync timeouts

Remove all Skip=Flaky annotations and unused SkipFlakyTimingTest constants.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-skip LLM non-determinism flaky tests, keep timeout fixes

Re-skip SingleAgentOrchestrationHITLSampleValidationAsync and
LongRunningToolsSampleValidationAsync - these fail due to LLM producing
extra review notifications, not timeouts. Updated skip reasons to
accurately describe the root cause. Reverted unnecessary timeout change
on the skipped LongRunningTools test.

The remaining 5 re-enabled tests with timeout increases are stable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enable Anthropic integration tests in CI

Replace hardcoded skip with conditional skip pattern (matching
CopilotStudio approach): tests gracefully skip when ANTHROPIC_API_KEY
is missing, and run when present.

Changes:
- AnthropicChatCompletionFixture: try/catch in InitializeAsync with
  Assert.Skip on missing config (replaces hardcoded SkipReason)
- AnthropicSkillsIntegrationTests: same pattern per test method
- dotnet-build-and-test.yml: wire up ANTHROPIC_API_KEY,
  ANTHROPIC_CHAT_MODEL_NAME, and ANTHROPIC_REASONING_MODEL_NAME
  env vars to the integration test step

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix missing System using in AnthropicSkillsIntegrationTests

Add 'using System;' for InvalidOperationException in try/catch blocks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip flaky SingleAgentOrchestrationChainingSampleValidationAsync

LLM non-determinism causes Assert.NotNull failures on orchestration
results. Skip until test logic is hardened against non-deterministic
LLM responses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable HITL and LongRunningTools tests with timeout and flexibility fixes

- Remove Skip attribute from SingleAgentOrchestrationHITLSampleValidationAsync
- Remove Skip attribute from LongRunningToolsSampleValidationAsync
- Increase timeout from 120s/90s to 180s to accommodate 2+ LLM round-trips
- Replace rigid 2-cycle assertion with flexible approval logic that handles
  extra review cycles from LLM non-determinism

Fixes the two failure modes identified in #4971:
1. Timeout: 120s/90s was insufficient for multiple LLM calls under CI load
2. Extra notifications: Assert.Fail on 3rd+ review cycle was too rigid

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Increase AzureFunctions LongRunningTools test timeouts from 90s to 180s

The LongRunningToolsSampleValidationAsync test in the AzureFunctions integration
tests was failing in CI with TimeoutException at the 'Content published
notification is logged' step. The 90-second timeouts are too tight for CI
environments where LLM calls and orchestration overhead can be slow.

Increased all three WaitForConditionAsync timeouts from 90s to 180s:
- Waiting for human feedback notification
- Waiting for publish notification (the step that was failing)
- Waiting for orchestration completion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Merge main and fix dotnet report path after flaky_report rename

Merge upstream/main which renamed scripts/flaky_report/ to
scripts/integration_test_report/ (from Python PR #5454). Update the
dotnet-build-and-test workflow to reference the new path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add RetryFact to DurableTask and AzureFunctions integration tests

These tests interact with LLMs via stdin/stdout (DurableTask) or HTTP
(AzureFunctions) and are inherently non-deterministic. Unlike the Python
side which uses pytest-retry, the dotnet tests had no retry mechanism
and a single transient failure would fail the entire CI run.

Changes:
- Switch [Fact] to [RetryFact(2, 5000)] on all LLM-dependent tests
  across ConsoleAppSamplesValidation, ExternalClientTests,
  WorkflowConsoleAppSamplesValidation, and AzureFunctions SamplesValidation
- Add re-prompt mechanism to LongRunningToolsSampleValidationAsync:
  if the LLM doesn't invoke the tool within 60s, re-send the prompt
  (up to 2 retries) instead of burning the full timeout
- Reduce LongRunningTools timeout from 240s to 180s (re-prompt makes
  the extra buffer unnecessary)
- Leave simple/deterministic tests as [Fact] (SingleAgent, unit tests)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add persist-credentials: false to Integration Test Report checkout step

Matches the convention used by other checkout steps in this workflow
to avoid leaving GITHUB_TOKEN credentials in the local git config.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* small fixes

* disable anthropic failing tests

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-07 20:39:32 +00:00

c06af9a1b3

.NET: Foundry.Hosting IT: avoid MSB3026 in publish; fix telemetry UT flake (#5689 )

CI publish step: gate the BuildProjectReferences=false fast-path on an explicit -UsePrebuiltProjectReferences switch (passed by the workflow) instead of marker detection. Adds a preflight error when stale obj/Release/net10.0 outputs would cause CS0579, with actionable recovery instructions.

Telemetry UT flake: AgentFrameworkResponseHandlerTelemetryTests was using a plain List<Activity> for OTel's InMemoryExporter. The exporter writes from background Activity completion callbacks while parallel tests on the same global ActivitySource feed every listener, racing against the assertion's enumeration and throwing 'Collection was modified'. Replaced with a small thread-safe ConcurrentActivityList that locks add/enumerate and returns a snapshot for assertions.

Roger Barreto · 2026-05-07 18:54:46 +00:00

a478d1b53c

.NET: Add Foundry.Hosting.IntegrationTests (#5598 )

* Foundry.Hosting.IntegrationTests: scaffold project, fixtures, and 24 tests

Add a new integration test project for Foundry hosted agents alongside the existing Foundry.IntegrationTests project. The project provisions a real Foundry hosted agent per scenario via AgentAdministrationClient.CreateAgentVersionAsync, points it at a single test container image (built and pushed out of band by scripts/it-build-image.ps1 in a follow up commit), and exercises the agent through AIProjectClient.AsAIAgent.

Six scenario fixtures are introduced, each pointing at the same image but selecting behavior via the IT_SCENARIO environment variable on the HostedAgentDefinition:
- HappyPathHostedAgentFixture (round trip, multi turn, stored=false flag)
- ToolCallingHostedAgentFixture (server side AIFunctions)
- ToolCallingApprovalHostedAgentFixture (approval flow)
- ToolboxHostedAgentFixture (Foundry toolbox)
- McpToolboxHostedAgentFixture (MCP backed toolbox)
- CustomStorageHostedAgentFixture (custom storage provider)

24 tests across 6 test classes are scaffolded. All are tagged Skip pending the test container build and the end to end smoke iteration in follow up commits. Once the container is in place the Skip annotations can be removed scenario by scenario.

Adds an IT_HOSTED_AGENT_IMAGE constant to the shared TestSettings so every IT project agrees on the env var name the build script emits.

* Foundry.Hosting.IntegrationTests: add TestContainer, build script, slnx, README

Adds the rest of the integration test infrastructure on top of the previous scaffolding commit:

* Foundry.Hosting.IntegrationTests.TestContainer csproj and Program.cs implementing the multi scenario container (one image, IT_SCENARIO env var dispatches between happy-path, tool-calling, tool-calling-approval, toolbox, mcp-toolbox, and custom-storage). The toolbox, mcp-toolbox, and custom-storage branches are placeholders pending API surface stabilization.
* Dockerfile and dockerignore in the test container project, using the contributor pattern matching the investigation work (host side dotnet publish, container only does COPY out/).
* scripts/it-build-image.ps1 with mandatory Registry parameter (no hardcoded ACR), content hashed tags so unchanged source results in a no op push, and emits IT_HOSTED_AGENT_IMAGE for shells and CI to consume.
* slnx entry for both new projects.
* README in the IT project covering env vars, image build, scenario table, and current placeholder status.

Steps still pending: end to end smoke (step 5) and CI workflow integration (step 6) require a live Foundry deployment and ACR push, so they land in follow up commits.

* Foundry.Hosting.IntegrationTests: address PR 5598 review feedback

Fix issues raised by Copilot review:

* it-build-image.ps1: hash file contents, not the path list, so any source edit produces a fresh tag. Normalize Registry input by stripping scheme and trailing slash before deriving the ACR short name. Validate the short name is non empty.
* HostedAgentFixture: route GetAgentAsync through _adminClient (which has the FoundryFeaturesPolicy attached) instead of through _projectClient.AgentAdministrationClient (which does not).
* HostedAgentFixture FoundryFeaturesPolicy: replace Headers.Add with Remove plus Add so retries cannot accumulate duplicate headers.
* HappyPath, ToolCalling, ToolCallingApproval, CustomStorage tests: create the AgentSession before turn 1 and reuse it for both turns. The previous pattern created the session after turn 1 so turn 2 had no link to turn 1, defeating the multi turn assertion.

* .NET: Foundry.Hosting.IntegrationTests: constrain to net10.0 + dotnet format autofix

- Set <TargetFrameworks>net10.0</TargetFrameworks>: the project references both
  Microsoft.Agents.AI.Foundry.Hosting (net8/9/10 only) and AgentConformance.IntegrationTests
  (net10.0;net472 — inherits the tests-default TFM list). The intersection is net10.0;
  the previous $(TargetFrameworksCore) triple caused NU1702 + System.Text.Json version
  conflicts on the net8.0/net9.0 builds because AgentConformance had no matching asset.
- Apply `dotnet format` autofix on the test files (IDE0005, IDE0009, IDE0032, IMPORTS).

* .NET: Foundry.Hosting.IntegrationTests.TestContainer/Program.cs: add UTF-8 BOM

CI's check-format requires charset=utf-8-bom per .editorconfig.

* Foundry.Hosting IntegrationTests: wire end-to-end CI flow against hosted agents

Make the integration tests usable end-to-end against a live Foundry deployment, including
a per-run rebuild of the test container so framework code changes are exercised.

Fixture (HostedAgentFixture.cs)

* Switch from per-run unique agent names to stable scenario-keyed names (it-happy-path,
  it-tool-calling, ...). The agent's managed identity carries the Azure AI User role on
  the project scope, which is required for inbound inference; deleting the agent recycles
  the MI and breaks that role assignment, so we keep the agent across runs and only churn
  versions.
* Add IT_RUN_ID env var to defeat Foundry's content-addressed version dedup; otherwise a
  rerun just receives the existing version and Dispose deletes it.
* PATCH the per-agent endpoint with AgentEndpointConfig (Responses protocol, version
  selector at 100% to the new version). Without this, /agents/{name}/endpoint/protocols/
  openai/responses returns HTTP 400.
* Build a per-agent ProjectOpenAIClient (not the cached projectClient.ProjectOpenAIClient,
  which is bound to the project-level URL); set AgentName in options so the URL routes
  through the agent endpoint, and add the Foundry-Features header to the inference
  pipeline.
* Use Versions (which serializes to container_protocol_versions) instead of the
  deprecated ProtocolVersions; the server now rejects the legacy field.
* On Dispose, delete only the version this fixture created. Never delete the agent.

Tests

* Tag every HostedAgentTests class with [Trait("Category", "FoundryHostedAgents")] so the
  CI workflow can route them to a separate Foundry project than the rest of the
  integration suite.

CI workflow (.github/workflows/dotnet-build-and-test.yml)

* Add a foundryHosting paths-filter covering Microsoft.Agents.AI.Foundry.Hosting and its
  in-repo dependency chain (Foundry, Agents.AI, Agents.AI.Abstractions), the test
  container, the test fixture, Directory.Packages.props, the build script, and this
  workflow file. Skip the costly hosted-agent steps when none of those changed.
* Add "Build and push Foundry Hosted Agents test container" step that invokes
  scripts/it-build-image.ps1 against vars.IT_HOSTED_AGENT_REGISTRY and pipes the resulting
  IT_HOSTED_AGENT_IMAGE=<tag> into GITHUB_ENV.
* Add "Run Foundry Hosted Agents Integration Tests" step that filters in only the new
  trait, with AZURE_AI_PROJECT_ENDPOINT/AZURE_AI_MODEL_DEPLOYMENT_NAME pointed at
  IT_HOSTED_AGENT_PROJECT_ENDPOINT/IT_HOSTED_AGENT_MODEL_DEPLOYMENT_NAME (Tao project,
  East US 2; the SK IT project's region does not yet support hosted agents preview).
* Exclude the new trait from the existing "Run Integration Tests" step.
* TEMP: drop the != 'pull_request' guard on the new steps and on Azure CLI Login when the
  paths-filter triggers, so PR #5598 can validate the wiring before promoting to merge
  queue only. Restore the original guard after one green PR run.

Build script (scripts/it-build-image.ps1)

* Hash now spans TestContainer source AND its referenced framework projects so any
  framework code change forces a fresh tag and a real docker push; the previous
  TestContainer-only hash silently reused stale images on framework edits.

Bootstrap script (dotnet/tests/Foundry.Hosting.IntegrationTests/scripts/it-bootstrap-agents.ps1)

* New idempotent script that creates the six stable scenario agents and grants Azure AI
  User on the project scope to each agent's MI. Run once per Foundry project. Includes
  AAD-graph propagation retries because newly created MIs take time to appear there.

README (dotnet/tests/Foundry.Hosting.IntegrationTests/README.md)

* Document the bootstrap prerequisite, the regional caveat (East US 2 is the only region
  we have validated; East US returned "Unsupported region" at the time of writing), the
  per-run image rebuild, and the CI wiring including the SP RBAC requirements.

SDK pin (TEMP)

* Bump Microsoft.Agents.AI.Foundry.Hosting's Azure.AI.Projects VersionOverride to
  2.1.0-alpha.20260505.1 from the azure-sdk public daily feed (added to nuget.config).
  This release is the first that builds the per-agent inference URL as
  /agents/{name}/endpoint/protocols/openai (the 2.1.0-beta.1 release builds
  .../openai/openai/v1, which the server rejects). Revert both the feed and the override
  once the URL fix lands in a stable Azure.AI.Projects release.

* Foundry.Hosting IntegrationTests: revert alpha SDK pin; move endpoint PATCH to bootstrap

The alpha SDK pin (Azure.AI.Projects 2.1.0-alpha.20260505.1 from the azure-sdk public
daily feed) was needed only for the URL routing fix and the strongly-typed
AgentEndpointConfig/PatchAgentOptions wrapper. We do not need either right now: the
fixture stays compatible with the public 2.1.0-beta.1 by moving the one-time endpoint
PATCH to the bootstrap script (it sets version_selector to FixedRatio @latest, so each
new fixture run becomes the served version automatically without a per-run PATCH from
the test code). The hosted-agent invocation path will start working end-to-end once the
URL routing fix lands in a stable Azure.AI.Projects release; until then the tests stay
[Fact(Skip = ...)] as documented.

* Revert dotnet/nuget.config: drop the azure-sdk-for-net public feed.
* Revert Microsoft.Agents.AI.Foundry.Hosting.csproj VersionOverride to 2.1.0-beta.1.
* Revert Microsoft.Agents.AI.Foundry.UnitTests and Microsoft.Agents.AI.Foundry.Hosting.UnitTests
  Azure.AI.Projects pin (they had been bumped to align Azure.Core 1.54 transitive).
* Drop the AgentEndpointConfig PATCH block from HostedAgentFixture.cs (the type is
  alpha-only). Replace with a comment pointing at the bootstrap script.
* Bootstrap script (it-bootstrap-agents.ps1) now also PATCHes each agent's endpoint
  with version_selector=@latest if not already set. Idempotent.

* Foundry.Hosting IntegrationTests: drop accidentally committed filtered.slnx

* Foundry.Hosting IntegrationTests: revert TEMP PR override on Azure CLI Login + IT steps

The previous attempt to validate the new hosted-agent IT wiring on PR #5598 failed
because the PR is from a fork (rogerbarreto/agent-framework-public). GitHub never passes
environment secrets to fork PRs regardless of event-name guards on individual steps,
so 'azure/login@v2' fails with 'client-id and tenant-id are not supplied'. Restore the
original github.event_name != 'pull_request' guard. The new steps will execute on
push to main and on merge_group runs.

* Foundry.Hosting IntegrationTests: invoke build-and-push script with absolute path

The pwsh shell on the GitHub Actions runner couldn't resolve ./scripts/it-build-image.ps1
when the step had no working-directory set; the step inherits the runner's PWD which is
not always the repo root after preceding steps. Use github.workspace explicitly to remove
the ambiguity.

* Foundry.Hosting IntegrationTests: move it-build-image.ps1 inside the IT project tree

The previous location at scripts/it-build-image.ps1 lived outside the sparse-checkout
paths the workflow uses (.github, dotnet, python, declarative-agents), so the runner
never had the file when the new step tried to invoke it. Move the script next to its
sibling it-bootstrap-agents.ps1 inside the IT project tree, and anchor its relative
paths to the repo root via  so callers can invoke it from any PWD.

* Move scripts/it-build-image.ps1 -> dotnet/tests/Foundry.Hosting.IntegrationTests/scripts/it-build-image.ps1
* Add Push-Location to the resolved repo root inside the script (Pop-Location in finally)
  so the existing relative paths (TestContainerProject, hashed src dirs) keep working
  no matter where the script is invoked from.
* Update the workflow path filter and the step's invocation path to the new location.

* Foundry.Hosting IntegrationTests: enable 5 HappyPath tests on the live Foundry endpoint

The fixture already constructs ProjectOpenAIClient via the per-agent path that beta.1
supports (new ProjectOpenAIClient(uri, cred, opts { AgentName })), so no SDK pin bump
is required to run the smoke tests end-to-end. Un-skip the 5 tests that pass against
the live test container.

Tests un-skipped (verified passing locally against tao-foundry-prj):

* RunAsync_ReturnsNonEmptyTextAsync
* RunStreamingAsync_YieldsAtLeastOneUpdateAsync
* MultiTurn_WithPreviousResponseId_PreservesContextAsync
* StoredFalse_Baseline_DoesNotPersistResponseAsync
* Instructions_FromContainerDefinition_AreObeyedAsync

Tests still skipped with a more specific reason (4 of 9 in HappyPath plus all
ToolCalling*, McpToolbox, Toolbox, CustomStorage) because the test container does not
yet emit usable response_id / conversation_id chains, and the placeholder scenarios are
not implemented in the test container's Program.cs. These are test container limitations,
not infra bugs, and can be un-skipped as the container surfaces stabilize.

* Foundry.Hosting IntegrationTests: extract hosted IT into parallel job, add Workflows dep

Address Wesley's review feedback on PR #5598:

1. Pull Foundry hosted-agent IT into its own dotnet-foundry-hosted-it job that runs in parallel to dotnet-build and dotnet-test. Same path-filter gate keeps it skipped on unrelated edits. Builds only the filtered solution containing Foundry.Hosting.IntegrationTests and src deps. dotnet-build-and-test-check now waits on it too.

2. Add Microsoft.Agents.AI.Workflows to the foundryHosting paths-filter and to hashedDirs in it-build-image.ps1 since Foundry.Hosting transitively depends on it.

TFM constraint on the IT csproj stays at net10.0 because AgentConformance.IntegrationTests targets net10/net472 and is consumed by ~12 other IT projects on net472.

---------

Co-authored-by: Roger Barreto <rbarreto@microsoft.com>

Roger Barreto · 2026-05-06 16:08:15 +00:00

51ad460d5f

Python: Reduce flaky integration tests and improve CI signal quality (#5454 )

* Enable Ollama integration tests in CI and rename report to Integration Test Report

- Install Ollama, cache models (qwen2.5:0.5b + nomic-embed-text), and start
  server in the Misc integration job for both workflow files
- Set OLLAMA_MODEL and OLLAMA_EMBEDDING_MODEL env vars so the 5 Ollama tests
  are no longer skipped
- Rename Flaky Test Report to Integration Test Report throughout (job names,
  artifact names, cache keys, file names, script titles/docstrings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Bump Ollama model to qwen2.5:1.5b for better instruction following

The 0.5b model was too small to reliably follow simple prompts like
'Say Hello World', causing test assertion failures. The 1.5b model
follows instructions more reliably while still being small enough
for fast CI pulls (~1GB).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable reliable streaming integration tests

Remove the hard skip on test_03_reliable_streaming tests that was
temporarily disabled for instability investigation. CI infrastructure
(Azurite, DTS emulator, Redis, func CLI) is already in place.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable skipped Functions/DurableTask tests and bump timeout to 480s

- Remove hard skips from 4 tests in test_11_workflow_parallel.py
- Remove hard skip from test_conditional_branching in test_06_dt_multi_agent_orchestration_conditionals.py
- Increase pytest --timeout from 360 to 480 for Functions+DurableTask CI job
- Updated in both python-merge-tests.yml and python-integration-tests.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-skip failing Functions/DurableTask tests with specific root causes

- test_11_workflow_parallel (4 tests): xdist worker crashes during execution
- test_conditional_branching: orchestration fails with RuntimeError, not a timeout
- Keep 480s timeout bump for remaining Functions tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix auth routing in samples 06/11: api_key -> credential for Azure OpenAI

Both samples passed a bearer token provider via api_key= which caused the
client to route to api.openai.com instead of Azure OpenAI, resulting in
401 Unauthorized. Changed to credential= which correctly triggers Azure
routing and picks up AZURE_OPENAI_ENDPOINT from the environment.

- samples/azure_functions/11_workflow_parallel/function_app.py: 1 fix
- samples/durabletask/06_multi_agent_orchestration_conditionals/worker.py: 2 fixes
- Re-enable 4 parallel workflow tests and 1 conditional branching test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-skip parallel workflow tests: xdist worker distribution issue

The 4 parallel workflow tests crash because xdist worksteal distributes
them across separate workers, each spawning its own func process against
shared emulators. Auth fix (api_key->credential) was valid and stays.
test_conditional_branching now passes with the auth fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix E501 line-too-long in azurefunctions parallel test skip reasons

Wrap skip reason strings to stay within 120 char line limit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add retry logic and port-conflict fix for Ollama CI setup

- Kill any auto-started Ollama before launching serve (fixes port
  conflict: 'address already in use')
- Retry ollama pull up to 3 times with 15s backoff (fixes 429 rate
  limit failures)
- Applied to both python-merge-tests.yml and python-integration-tests.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix flaky integration tests and re-enable skipped tests

- Foundry agent: add allow_preview=True to custom client test
- Foundry hosting: raise max_output_tokens 50->200, add temperature,
  relax assertion in test_temperature_and_max_tokens
- Foundry embedding: update skip reason with root cause (endpoint mismatch)
- OpenAI file search: fix vector store indexing race condition by polling
  file_counts before querying; fix get_streaming_response -> get_response(stream=True)
- Azure OpenAI file search: remove skip (transient 500 resolved)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove temperature from foundry hosting test (unsupported by CI model)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stabilize Ollama tool call integration tests with no-arg function

Use a no-argument greet() function instead of hello_world(arg1) for
integration tests. The 1.5B model in CI is unreliable at generating
correct tool call arguments, causing 'Argument parsing failed' errors.
A no-arg function eliminates this flakiness entirely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Increase reliable streaming test timeouts from 30s to 60s

The LLM call through Azure OpenAI + Redis streaming pipeline can exceed
30s in CI due to cold starts or throttling. Raise to 60s to reduce
flaky timeouts while still bounded by pytest's 120s per-test limit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Re-enable workflow parallel tests with xdist_group marker

The tests were skipped because xdist distributes module tests across
workers, each spawning their own func process (port conflicts). Adding
xdist_group forces all tests in this module onto a single worker so
the module-scoped function_app_for_test fixture works correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert "Re-enable workflow parallel tests with xdist_group marker"

This reverts commit 455c28da62.

* Rename flaky_report to integration_test_report and add try/finally cleanup

- Rename scripts/flaky_report/ to scripts/integration_test_report/ to
  reflect expanded scope beyond flaky-test detection
- Update workflow references in both CI files
- Wrap file search integration tests in try/finally to ensure vector
  store cleanup runs even on test failure or timeout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Ollama pull failure propagation and Azure OpenAI vector store readiness

- Ollama CI: fail the step immediately if model pull fails after 3
  retries instead of silently proceeding to tests
- Azure OpenAI file search: add the same vector-store readiness polling
  that was applied to the non-Azure OpenAI tests, preventing eventual
  consistency race conditions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* remove load_dotenv from test file

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-01 00:41:39 +00:00

540193ccef

Python: Update hosting agent samples + fixes (#5485 )

* Update foundry hosting samples

* Add file data type support

* Fix file content and add more tests

* Fix README

* Address comments

* Fix int tests

* remove temp

Tao Chen · 2026-04-28 04:24:05 +00:00

88347f6494

Propagate integration-test model credentials to issue-triage repro (#5443 )

Scopes the triage job to the integration GitHub Environment, adds
the azure/login OIDC step, and exposes the same OpenAI / Azure
OpenAI / Foundry / Anthropic env vars the integration test
workflow uses. This lets the triage agent write repro code that
constructs model clients from the environment without any secrets
entering the agent prompt or generated-code literals.

Azure OpenAI and Foundry continue to authenticate via AAD
(DefaultAzureCredential), so there is no API key to leak for
those providers.

Evan Mattson · 2026-04-23 21:01:24 +09:00

fbbc2ebe86

Automated issue triage workflow (#5419 )

* Automated issue triage workflow

* Bump dependencies

* Fix issue-triage workflow: security, reliability, and testability

Address six review comments on the issue-triage workflow:

1. Change trigger from issues:opened to issues:labeled so the
   secret-backed triage flow is only triggered by a maintainer-
   controlled signal.

2. Include inputs.issue_number in the concurrency group so
   workflow_dispatch runs for the same issue are properly
   de-duplicated.

3. Improve team membership error handling to fail closed: verify
   the team exists before checking membership, and only treat a
   404 as 'not a member' (all other errors fail the job).

4. Use optional chaining (issue.user?.login) for the API-fetched
   issue to handle deleted GitHub accounts without crashing.

5. Extract the inline github-script into a testable module at
   .github/scripts/check_team_membership.js with 10 tests in
   .github/tests/test_check_team_membership.js covering all
   code paths (payload/API author resolution, deleted accounts,
   team lookup failure, 404 vs non-404 membership errors).

6. Make the spam gate actually stop the job by exiting non-zero
   instead of just logging, so future steps cannot accidentally
   run for spam issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make issue-triage workflow manually triggered only for initial testing

Remove the 'issues' event trigger, keeping only 'workflow_dispatch' so the
workflow can be tested manually before enabling automatic triggers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-04-23 20:22:04 +09:00

c9e6033048

Don't fail if review issue occurs (#5434 )

Evan Mattson · 2026-04-23 13:24:21 +09:00

5d4873888f

Pin to specific release (#5430 )

Evan Mattson · 2026-04-23 08:23:56 +09:00

e2f161c8a0

Python: Flaky test report (#5342 )

* Add flaky test trend reporting to CI workflows

Parse JUnit XML (pytest.xml) from each integration test job and
aggregate results into a markdown trend report showing per-test
pass/fail/skip status across the last 5 runs.

Changes:
- Add python/scripts/flaky_report/ package (JUnit XML parser + trend
  report generator following the sample_validation pattern)
- Add upload-artifact steps to all 6 integration test jobs in both
  python-merge-tests.yml and python-integration-tests.yml
- Add python-flaky-test-report aggregation job with history caching
- Add --junitxml=pytest.xml to integration-tests.yml jobs (already
  present in merge-tests.yml)
- Fix Cosmos job --junitxml path (use absolute path since uv run
  --directory changes cwd)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix flaky report: handle missing test results gracefully

- Guard against missing reports directory in load_current_run()
- Only run report job when at least one integration test job completed
  (skip when all jobs are skipped, e.g. on pull_request events)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: fix provider names and if-expression precedence

- Use explicit provider name mapping in _derive_provider() so OpenAI
  renders correctly instead of 'Openai'
- Fix operator precedence in workflow if-expressions by wrapping
  success/failure checks in parentheses

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add File column and xfail detection to flaky test report

- Add File column showing module name (e.g., test_openai_chat_client)
  to disambiguate tests with the same function name across files
- Detect pytest xfail tests in JUnit XML (type=pytest.xfail) and
  show them with a distinct warning emoji instead of skip emoji
- Update legend to include xfail explanation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Foundry embedding env vars to merge-tests workflow

Sync the Foundry integration job in python-merge-tests.yml with
python-integration-tests.yml by adding FOUNDRY_MODELS_ENDPOINT,
FOUNDRY_MODELS_API_KEY, FOUNDRY_EMBEDDING_MODEL, and
FOUNDRY_IMAGE_EMBEDDING_MODEL. Once the repo variables/secrets
are configured, the embedding integration test will run in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix File column showing class name instead of module name

When a test is inside a class, pytest writes the classname as e.g.
'pkg.test_file.TestClass'. The previous rsplit logic extracted
'TestClass' instead of 'test_file'. Now detect uppercase-starting
segments as class names and use the preceding segment instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: UTC timestamps, XML error handling, summary fix, docstring

- Use datetime.now(timezone.utc) for accurate UTC timestamps
- Catch ET.ParseError per-file so corrupt XML doesn't crash the report
- Remove separate 'error' key from summary (errors folded into 'failed')
- Fix _short_name docstring to show actual dotted classname::name format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-04-22 20:16:50 +00:00

3f23e1dfbf

Add pr review GH workflow (#5418 )

* Add workflow PR review

* Allow reviews on draft PRs

* Update .github/workflows/devflow-pr-review.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/devflow-pr-review.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Bump actions/checkout to v6 and uv to 0.11.x

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Evan Mattson · 2026-04-22 13:52:42 +09:00

9e915b36b6

Python: Add Hyperlight CodeAct package and docs (#5185 )

* initial work on code_mode

* updated samples

* updates to codeact

* udpated codeact

* Draft CodeAct ADR and sample updates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* initial implementation and adr and feature

* Python: Limit Hyperlight wasm backend to Python <3.14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Fix CI for Hyperlight CodeAct PR

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Run Hyperlight integration when available

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Address Hyperlight review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Simplify Hyperlight file mount inputs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Accept Path host paths in Hyperlight mounts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Fix Hyperlight mount typing for CI

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* temp run integration test

* Python: Strengthen Hyperlight real sandbox tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* added additional tests

* Python: Simplify Hyperlight CodeAct API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* set tests as non-integration

* Retry Hyperlight allowed-domain registration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Gate Hyperlight integration tests by runtime support

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Hyperlight skip test on Python 3.14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Delay Hyperlight runtime probe until test execution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Relax Hyperlight Windows integration stdout assertion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Scan Hyperlight output directory for artifacts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Retry Hyperlight output artifact collection

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Harden Hyperlight integration output assertions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Retry Hyperlight read-back check in integration test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify Hyperlight integration write assertion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Avoid pathlib in Hyperlight integration sandbox

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use socket network check in Hyperlight sandbox

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace blocked Azure AI Search blog link

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Clarify Hyperlight guest stdlib limits

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use _socket in Hyperlight integration sandbox

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Handle Hyperlight mounted file paths

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Broaden Hyperlight sandbox path fallbacks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Search Hyperlight guest mounts recursively

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Split Hyperlight mount coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Split Hyperlight live network tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Hyperlight file-write test on Windows

Enable the sandbox filesystem by providing a workspace_root so
/output is mounted. Remove os.path.exists assertion (unsupported
in WASM guest) and fix Content data assertion to use .uri.
Skip the network integration test on Windows where the WASM
sandbox lacks the encodings.idna codec.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: ADR intro, manual wiring sample, doc clarifications

- Add CodeAct introduction section to ADR for unfamiliar readers
- Clarify 'less runtime efficient' con with specific overhead description
- Add note in Python impl doc clarifying ADR vs impl doc split
- Explain why before_run hooks must be per-run (CRUD, concurrency, approval)
- Rename code_interpreter variable to codeact in E2E sample
- Add manual static wiring sample (codeact_manual_wiring.py)
- Add 'when to use which pattern' guidance to samples README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #5185 review comments and add .NET CodeAct design doc

- Fix async callback: _make_sandbox_callback returns sync wrapper with
  thread + asyncio.run() bridge (was broken with real Wasm FFI)
- Fix stale output: clear output_dir before each sandbox.run() call
- Fix blocking event loop: _run_code now async with asyncio.to_thread()
- Revert _agents.py options['tools'] injection (unnecessary; provider
  uses context.extend_tools())
- Revert SessionContext.options docstring back to read-only
- Add real-sandbox test fixtures (shared/restored/fresh)
- Add 8 new real-sandbox tests for callback round-trip, stale output,
  event loop non-blocking, basic execution, stdout/stderr, errors,
  snapshot/restore, and tool registration
- Add comprehensive .NET HyperlightCodeActProvider design document

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update hyperlight README with code snippets and remove Public API section

Replace bare export list with Quick Start code examples covering the
context provider, standalone tool, manual static wiring, and file
mounts / network access patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-04-17 00:49:44 +00:00

b03cb324d5

.NET: Foundry Evals integration for .NET (#4914 )

* Foundry Evals integration for .NET

- Core evaluation framework: EvalItem, LocalEvaluator, FunctionEvaluator, EvalChecks
- IAgentEvaluator interface with MeaiEvaluatorAdapter bridge
- AgentEvaluationExtensions for agent.EvaluateAsync() overloads
- FoundryEvals wrapping MEAI quality/safety evaluators
- ConversationSplitters (LastTurn, Full) and IConversationSplitter
- EvalItem.PerTurnItems() for multi-turn decomposition
- HasImageContent for multimodal content detection
- WorkflowEvaluationExtensions for per-agent workflow evaluation
- 7 eval samples mirroring Python parity:
  02-agents/Evaluation: SimpleEval, ExpectedOutputs, Multimodal
  03-workflows/Evaluation: WorkflowEval
  05-end-to-end/Evaluation: FoundryQuality, MixedProviders, ConversationSplits
- Comprehensive unit tests (1958 passing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rewrite FoundryEvals to use real Foundry Evals API

Replace MEAI evaluator shim with actual OpenAI EvaluationClient protocol
methods. FoundryEvals now creates eval definitions, submits runs, polls
for completion, and fetches per-item results server-side.

- New constructor: FoundryEvals(AIProjectClient, model, evaluators)
- Add FoundryEvalConverter for MEAI ChatMessage -> Foundry JSON format
- Add EvalId, RunId, ReportUrl to AgentEvaluationResults
- All 20 built-in evaluator constants now work (agent, tool, quality, safety)
- Remove Microsoft.Extensions.AI.Evaluation.Quality/Safety dependencies
- Update all samples for new constructor (no more ChatConfiguration)
- Replace BuildEvaluators tests with ResolveEvaluator tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add response output to CustomEvals and ExpectedOutputs samples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: pagination, validation, error handling, tests

FoundryEvals fixes:
- Add pagination for output items (has_more/after cursor)
- Add guard clauses for pollIntervalSeconds/timeoutSeconds <= 0
- Fix double TryGetProperty for passed field parsing
- Throw on all-tool-evaluators with no tool definitions
- Fix XML doc (default 300s, not 180s)

New tests (30 added, 1989 total):
- EvalChecks: NonEmpty, ContainsExpected (pass/fail/skip/case),
  HasImageContent, ToolCallsPresent
- FoundryEvalConverter: ConvertMessage (text, image, function call,
  function results fan-out, empty fallback, mixed content),
  ConvertEvalItem, BuildTestingCriteria (quality/agent/tool/groundedness
  data mappings), BuildItemSchema

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix review: null-refs, Data.ToString() bug, ContainsExpected, add tests

- Fix NullReferenceException in sample Response display (pattern matching)
- Fix WorkflowEvaluationExtensions Data?.ToString() producing type names
  instead of message text (pattern-match ChatMessage/AgentResponse/list)
- Change EvalChecks.ContainsExpected to return Passed=false when no
  ExpectedOutput (was silently passing, masking misconfiguration)
- Add EvalItem constructor tests with LastTurn/Full/null splitters
- Add FoundryEvalConverter.ConvertMessage DataContent (base64 image) test
- Add ExtractAgentData tests with ChatMessage, list, and AgentResponse data

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix review: conversation fidelity, eval caching, fallback tests

- WorkflowEvaluationExtensions: preserve full response messages (tool calls,
  intermediate) instead of synthetic 2-message conversation. Cast completed
  Data to AgentResponse and use Messages when available, fallback to text.
- FoundryEvals: cache evalId per schema shape (hasContext, hasTools) so
  subsequent EvaluateAsync calls create runs under the same eval definition.
- MeaiEvaluatorAdapter: code already correctly passes queryMessages (not full
  conversation) to IEvaluator — no change needed, verified by inspection.
- Add tests: AgentResponse full messages preservation, unknown object
  ToString() fallback for ExtractAgentData.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename AzureAI→Foundry: move eval files, update references

- Move FoundryEvals.cs and FoundryEvalConverter.cs from
  Microsoft.Agents.AI.AzureAI to Microsoft.Agents.AI.Foundry
- Update namespace from AzureAI to Foundry in both files
- Add explicit usings required by Foundry project (no implicit usings)
- Move FoundryEvalConverter tests to Foundry.UnitTests project
  (avoids ReplacingRedactor type conflict from dual project refs)
- Update all sample csproj references and using statements
- Remove Foundry project reference from AI UnitTests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* PR review round 4: wire up tool extraction, remove eval cache, fix null safety

- BuildEvalItem: extract tools from agent via GetService<ChatOptions>() into EvalItem.Tools (Python parity)
- FoundryEvals: remove eval ID cache - each call creates fresh definition (matches Python behavior)
- FoundryEvals: replace null-forgiving operators with descriptive InvalidOperationException
- MixedProviders sample: remove unnecessary explicit PackageReferences (transitively provided)
- FoundryEvalConverter: document that tool results take precedence over text content
- Add LocalEvaluator zero-checks test documenting 0 metrics = failed behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python-dotnet parity: 9 feature gaps filled

New checks:
- ToolCallArgsMatch() — verify tool call names + argument subset match
- ToolCalledCheck(ToolCalledMode.Any, ...) — match any of the specified tools
- ToolCalledMode enum (All/Any)

FoundryEvals enhancements:
- Default evaluators now [Relevance, Coherence, TaskAdherence] (was Relevance, Coherence)
- Auto-add ToolCallAccuracy when items have tool definitions
- EvaluateTracesAsync — evaluate by response_ids, trace_ids, or agent_id
- EvaluateFoundryTargetAsync — evaluate deployed Foundry targets

Result type enrichment:
- AgentEvaluationResults: added Status, Error, PerEvaluator, DetailedItems
- New EvalItemResult/EvalScoreResult/PerEvaluatorResult types
- FoundryEvals populates all new fields from API responses

Workflow fix:
- Skip internal executors (_*, input-conversation, end-conversation, end)

Tests: 8 new tests covering ToolCallArgsMatch, ToolCalledMode.Any, internal executor filtering

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add MeaiEvaluatorAdapter and PerTurnItems edge case tests

- 3 tests for MeaiEvaluatorAdapter: query message forwarding, synthetic
  response fallback, multiple items aggregation
- 3 tests for EvalItem.PerTurnItems: empty conversation, no user messages,
  system+assistant only
- StubEvaluator and StubChatClient test helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Blocking link check for outdated package in DevUI.

* Replace Dictionary<string, object> payloads with typed wire models

Introduce internal FoundryEvalWireModels.cs with compile-time-safe types
for the OpenAI Evals API wire format. The OpenAI .NET SDK (2.9.1) only
provides protocol-level methods with BinaryContent/ClientResult — no
typed request models. These internal models replace scattered dictionary
literals with [JsonPropertyName]-annotated classes, giving:

- Compile-time safety (typos become build errors)
- Single point of change when the API evolves
- IntelliSense discoverability
- Cleaner serialization via JsonPolymorphic for content items

Models: WireContentItem hierarchy (text, image, tool_call, tool_result),
WireMessage, WireEvalItemPayload, WireTestingCriterion, WireItemSchema,
WireCreateEvalRequest, WireCreateRunRequest, and data source variants.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip metric when Foundry returns neither score nor passed

When an evaluator returns no score and no passed value, the previous
code created BooleanMetric(name, false), which falsely failed items
via ItemPassed. Now we skip the MEAI metric entirely for indeterminate
results — the raw data remains available in DetailedItems for diagnostics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #4914 review comments: fix tool evaluator bug and add tests

- Fix duplicate ToolCallAccuracy: resolve evaluator names before checking
  against ToolEvaluators set (Comment 2)
- Make FilterToolEvaluators internal for testability; add tests for the
  ArgumentException edge case when all evaluators are tool-type (Comment 3)
- Add CancellationToken test for LocalEvaluator (Comment 4)
- Add EvaluateAsync integration test on Run with sequential workflow and
  per-agent SubResults verification (Comment 5)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address Peter's review comments on PR #4914

- Add trailing newline to Evaluation_FoundryQuality.csproj (Comment 6)
- Make evaluator name lookups case-insensitive: switch BuiltinEvaluators,
  ToolEvaluators, AgentEvaluators, and ResolveEvaluator's StartsWith check
  from Ordinal to OrdinalIgnoreCase (Comment 7)
- Add Trace.TraceWarning when Foundry returns fewer results than submitted
  items, indicating expected vs actual count before padding (Comment 8)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Microsoft.Extensions.AI.Evaluation packages to Directory.Packages.props

These were removed in #5269 as unused, but are needed by the Foundry
and core evaluation integration added in this PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: alliscode <bentho@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ben Thomas · 2026-04-16 19:40:07 +00:00

aee1acbf8b

Python: bump misc-integration retry delay to 30s (#5293 )

The misc-integration job (Anthropic, Ollama, MCP) frequently fails on merge to main when the upstream MCP server (e.g. learn.microsoft.com/api/mcp) returns a transient rate-limit error. The previous 5s retry delay is too short to ride out the upstream backoff window, so all retries fail and the merge queue is blocked. Bumping to 30s gives the upstream a chance to recover before pytest-retry re-runs the test.

Evan Mattson · 2026-04-16 10:03:00 +09:00

f112150cfb

Add missing path to verify-samples run checkout (#5194 )

westey · 2026-04-13 11:00:31 +00:00

39b560f83c

Python: Stop emitting duplicate reasoning content from OpenAI response.reasoning_text.done and response.reasoning_summary_text.done events (#5162 )

* Fix reasoning text done events duplicating streamed delta content (#5157)

The OpenAI Responses API sends both reasoning_text.delta (incremental
chunks) and reasoning_text.done (full accumulated text) events. The
chat client was emitting Content for both, causing ag-ui to append the
full done text onto already-accumulated delta text, producing
duplicated reasoning output.

Stop emitting Content for reasoning_text.done and
reasoning_summary_text.done events, matching how output_text.done is
already handled (not emitted). The deltas contain all the content;
the done event is redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(openai): emit reasoning done content as fallback when no deltas observed (#5157)

Address PR review feedback:
- Track item_ids that received reasoning deltas via seen_reasoning_delta_item_ids set
- Emit content from done events only when no deltas were received for the
  item_id, preventing silent content loss on stream resumption
- Add comment documenting code_interpreter done event asymmetry
- Replace redundant ag-ui test with deduplication-focused test
- Add integration test for delta+done sequence in OpenAI chat client tests
- Add fallback path tests for done events without preceding deltas

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #5157: Python: [Bug]: "type": "response.reasoning_text.delta" and "response.reasoning_text.done" both get exposed as "text_reasoning"

* Fix AG-UI reasoning streaming to use proper Start/End pattern (#5157)

_emit_text_reasoning now follows the same streaming pattern as _emit_text:
- Emits ReasoningStartEvent/ReasoningMessageStartEvent only on the first
  delta for a given message_id
- Emits only ReasoningMessageContentEvent for subsequent deltas
- Defers ReasoningMessageEndEvent/ReasoningEndEvent until
  _close_reasoning_block is called (on content type switch or end-of-run)

This produces the correct protocol pattern:
  ReasoningStartEvent
    ReasoningMessageStartEvent
    ReasoningMessageContentEvent(delta1)
    ReasoningMessageContentEvent(delta2)
    ReasoningMessageEndEvent
  ReasoningEndEvent

Instead of wrapping every delta in a full Start→End sequence.

Backward compatibility is preserved: calling _emit_text_reasoning without
a flow argument still produces the full sequence per call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix import ordering lint error in AG-UI test file (#5157)

Move inline import of TextMessageContentEvent to the top-level import
block and ensure alphabetical ordering to satisfy ruff I001 rule.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy error: rename loop variable to avoid type conflict with WorkflowEvent

The 'event' variable was already typed as WorkflowEvent[Any] from the
async for loop at line 590. Reusing it in the _close_reasoning_block
loop (which returns list[BaseEvent]) caused an incompatible assignment
error. Renamed to 'reasoning_evt' to avoid the conflict.

Fixes #5162

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #5157: review comment fixes

* narrow test result reporting to explicit pytest JUnit XML

* Fix test args

* Fix pytest-results-action in merge workflow and remove committed test artifacts

Apply the same JUnit XML fix from python-tests.yml to python-merge-tests.yml:
add --junitxml=pytest.xml to all test commands and narrow the results action
path from ./python/**.xml to ./python/pytest.xml. Also remove accidentally
committed pytest.xml and python-coverage.xml and add them to .gitignore.

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-04-09 22:44:59 +00:00

5e8fe0be1f

VerifySamples: Filter projects to net10 only (#5184 )

westey · 2026-04-09 16:43:54 +00:00

8348584ac2

.NET: Improve resilience of verify-samples by building separately and improving evaluation instructions (#5151 )

* Improve resilience of verify-samples by building separately and improving evaluation instructions

* Address PR comments

* Address PR comment

westey · 2026-04-09 11:25:00 +00:00

6d6cb840ae

.NET: Add github actions workflow for verify-samples (#5034 )

* Add github actions workflow for verify-samples

* Make workflow run as part of PR (for now)

* Update workflow to remove pr trigger

* Address PR comments

westey · 2026-04-03 09:58:24 +00:00

e4defadc79

Python: [BREAKING] Python: move Azure AI embeddings to Foundry (#5056 )

* renamed AzureAIINferenceEmbeddings and lazy load azure-cosmos and env var rename

* updated coverage

* fix readme

Eduard van Valkenburg · 2026-04-02 11:26:35 +00:00

95fd5ec658

Python: Move workflow-samples and agent-samples under declarative-agents directory (#5011 )

* Move workflow-samples and agent-samples under declarative-agents and update all references

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f70f7d19-9256-4eec-b7db-28007d74440c

Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>

* Fix relative paths in README files inside moved directories

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f70f7d19-9256-4eec-b7db-28007d74440c

Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>
Co-authored-by: Shawn Henry <shahen@microsoft.com>

Copilot · 2026-04-02 09:34:33 +00:00

fd253c0b0e

Python: Fix SK migration samples (#5047 )

* Fix SK migration samples

* Fix env vars for SK

* Hard code model for sheel tool samples

Tao Chen · 2026-04-02 08:40:34 +00:00

3d87cec304

Python: [BREAKING] Standardize model selection on model (#4999 )

* Refactor Anthropic model option and provider clients

Rename the Anthropic client model option from model_id to model, add provider-specific Anthropic wrappers for Foundry, Bedrock, and Vertex, and expose them through the Anthropic, Foundry, Amazon, and Google namespaces. Update core option handling, docs, samples, and tests accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Anthropic skills sample typing

Cast the Anthropic beta client to Any in the skills sample so the pre-commit sample pyright check no longer fails on beta skills and files endpoints that are not exposed by the current SDK stubs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* undo sample mypy

* Retry CI after transient external failures

Retrigger PR validation after an unrelated Copilot review workflow SAML failure and a transient external tau2 git fetch failure in the Windows Python test setup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback on model option merging

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address Anthropic compatibility review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* moved all to `model`

* fixes for azure ai search

* Python: standardize remaining sample env var names

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix foundry-local pyright compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated env vars in cicd

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-04-01 19:00:18 +00:00

6acab3d1d6

Python: Enforce Foundry package unit test coverage (#5036 )

* Enforce Foundry package unit test coverage

* Sort ENFORCED_TARGETS alphabetically in python-check-coverage.py

Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/ed0b81ed-c267-4ee0-9655-56c4b3066fad

Co-authored-by: TaoChenOSU <12570346+TaoChenOSU@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: TaoChenOSU <12570346+TaoChenOSU@users.noreply.github.com>

Tao Chen · 2026-04-01 17:37:27 +00:00

95550dd0dc

Python: [BREAKING] Remove deprecated Python OpenAI/Azure AI surfaces (#4990 )

* [BREAKING] Remove deprecated Python OpenAI/Azure AI surfaces

Also clean up follow-on docs, environment guidance, package metadata, and lab test stability.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix deleted semantic-kernel sample links

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* improve foundry language

* Fix A2A Foundry sample regression

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-31 20:36:21 +00:00

3a49b1d6dd

Python: Fix samples (#4980 )

* First samples 1st batch

* Fix sample paths

* Fix workflow samples

* Fix workflow dependency

* Correct env vars

* Increase idle timeout

* Fix workflows HIL sample

* Fix more workflow samples

Tao Chen · 2026-03-31 15:20:35 +00:00

016daf3b98

Python: [BREAKING] Remove deprecated kwargs compatibility paths (#4858 )

* [BREAKING] Remove deprecated kwargs compatibility paths

Remove the deprecated kwargs compatibility shims across core agents, clients, tools, middleware, and telemetry.

Keep workflow kwargs behavior intact in this branch and follow up separately in #4850.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix PR CI fallout for kwargs removal

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updates

* Fix Azure AI CI fallout

Remove the stale _get_current_conversation_id override from the Azure AI client after the OpenAI base helper was deleted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fixed new classes

* Fix Assistants deprecated import gating

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix integration replay regressions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Switch multi-agent hosting samples to Azure chat completions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Simplify Azure multi-agent sample config

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-27 21:00:12 +00:00

b1b528e4a8

[BREAKING] Python: fix OpenAI Azure routing and provider samples (#4925 )

* Python: fix OpenAI Azure routing and provider samples

Prefer OpenAI when OPENAI_API_KEY is present unless Azure is explicitly requested. Clarify constructor docs, keep deprecated Azure wrappers compatible with stricter settings validation, and refresh the provider samples and tests to use the current client patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix bandit

* Python: align OpenAI embedding Azure routing

Extend the shared OpenAI-vs-Azure routing and credential behavior to the embedding client, add Azure embedding regression coverage, and refresh the embedding samples to use the generic client path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix embedding client pyright check

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: thin OpenAI embedding wrapper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: document embedding overload routing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix callable OpenAI key routing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix Azure credential routing tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: address OpenAI review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: narrow Azure routing markers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: refine OpenAI model fallback order

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: narrow Azure deployment docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: remove embedding routing wording

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: run embedding Azure integration tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* changed variable name

* Python: expand OpenAI package README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* clarified readme

* Python: fix Azure OpenAI integration setup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: correct Azure integration env mapping

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated code to fix int tests

* test updates

* test fix

* fix test setup

* updates to tests and setup

* remove openai assistants int tests

* improvements in int tests

* fix env var

* fix env vars

* fix azure responses test

* trigger actions

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-27 13:33:39 +00:00

cc0cfaaac8

Python: [BREAKING] Python: Provider-leading client design & OpenAI package extraction (#4818 )

* Python: Provider-leading client design & OpenAI package extraction

Major refactoring of the Python Agent Framework client architecture:

- Extract OpenAI clients into new `agent-framework-openai` package
- Core package no longer depends on openai, azure-identity, azure-ai-projects
- Rename clients for discoverability: OpenAIResponsesClient → OpenAIChatClient,
  OpenAIChatClient → OpenAIChatCompletionClient
- Unify `model_id`/`deployment_name`/`model_deployment_name` → `model` param
- New FoundryChatClient for Azure AI Foundry Responses API
- New FoundryAgent/FoundryAgentClient for connecting to pre-configured Foundry agents
- Remove OpenAIBase/OpenAIConfigMixin from non-deprecated client MRO
- Deprecate AzureOpenAI* clients, AzureAIClient, OpenAIAssistantsClient
- Reorganize samples: azure_openai+azure_ai+azure_ai_agent → azure/
- ADR-0020: Provider-Leading Client Design

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: missing Agent imports in samples, .model_id → .model in foundry_local sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: CI failures — mypy errors, coverage targets, sample imports

- azure-ai mypy: add type ignores for TypedDict total=, model arg, forward ref
- Coverage: replace core.azure/openai targets with openai package target
- project_provider: add type annotation for opts dict

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: populate openai .pyi stub, fix broken README links, coverage targets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fixes

* updated observabilitty

* reset azure init.pyi

* fix errors

* updated adr number

* fix foundry local

* fixed not renamed docstrings and comments, and added deprecated markers to old classes

* fix tests and pyprojects

* fix test vars

* updated function tests

* update durable

* updated test setup for functions

* Fix Foundry auth in workflow samples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stabilize Python integration workflows

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update hosting samples for Foundry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trigger full CI rerun

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trigger CI rerun again

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* trigger rerun

* trigger rerun

* fix for litellm

* undo durabletask changes

* Move Foundry APIs into foundry namespace

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Foundry pyproject formatting

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Split provider samples by Foundry surface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore hosting sample requirements

Also fix the Foundry Local sample link after the provider sample move.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated tests

* udpated foundry integration tests

* removed dist from azurefunctions tests

* Use separate Foundry clients for concurrent agents

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix client setup in azfunc and durable

* disabled two tests

* updated setup for some function and durable tests

* improved azure openai setup with new clients

* ignore deprecated

* fixes

* skip 11

* remove openai assistants int tests

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-25 09:56:29 +00:00

5e056b672e

Python: Update sample validation scripts (#4870 )

* Update sample validation scripts

* Adjust prompt

* Update autogen-migration samples

* Add fix suggestion

* Split jobs

* Add .env

* Create trend report

* Add timestamp

* Add more env vars

* Comments

* force node24

* force node24

* force node22

Tao Chen · 2026-03-25 01:21:32 +00:00

4b533608b6

Bump actions/download-artifact from 7 to 8 (#4372 )

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v7...v8)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dependabot[bot] · 2026-03-23 21:55:19 +00:00

01aaf2baea

Update script to ping only on waiting-for-author label (#4812 )

* update script to ping only on certain waiting for author label

* Update .github/scripts/stale_issue_pr_ping.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/scripts/stale_issue_pr_ping.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix docstring

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Evan Mattson · 2026-03-20 19:39:22 +09:00

8edcb282f4

Add automated stale issue and PR follow-up ping workflow (#4776 )

* Add script to ping on stale issues/PRs

* Add script to ping on stale issues/PRs

* Fix stale issue/PR ping script review comments

- Rename TEAM_NAME env var to TEAM_SLUG for clarity
- Add actionable error messages for 403/404 team lookup failures
- Add contents:read permission for actions/checkout
- Use github.event.inputs context with fallback for scheduled runs
- Pin PyGithub to 2.6.0 for reproducible builds
- Fetch comments once in should_ping() to reduce API calls
- Make ping() retry loop idempotent (track comment/label state)
- Validate DAYS_THRESHOLD with helpful error for non-numeric input
- Fix timezone bug: use astimezone() instead of replace(tzinfo=)
- Add comprehensive unit tests (29 tests)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-03-20 00:41:31 +00:00

1272ec5adf

Python: Simplify Python Poe tasks and unify package selectors (#4722 )

* updated automation tasks and commands, with alias for the time being

* Restore aggregate test exclusions

Preserve the legacy all-tests scope for test --all by excluding lab and devui from the default aggregate sweep, while still allowing explicit package selection. Also ignore hidden/generated test directories such as .mypy_cache during aggregate discovery.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated versions in pre-commit

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-18 18:39:11 +00:00

f48c4512d3

Bump actions/upload-artifact from 4 to 7 (#4373 )

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dependabot[bot] · 2026-03-17 16:05:55 +00:00

008fe23585

Bump MishaKav/pytest-coverage-comment from 1.2.0 to 1.6.0 (#4543 )

Bumps [MishaKav/pytest-coverage-comment](https://github.com/mishakav/pytest-coverage-comment) from 1.2.0 to 1.6.0.
- [Release notes](https://github.com/mishakav/pytest-coverage-comment/releases)
- [Changelog](https://github.com/MishaKav/pytest-coverage-comment/blob/main/CHANGELOG.md)
- [Commits](https://github.com/mishakav/pytest-coverage-comment/compare/v1.2.0...v1.6.0)

---
updated-dependencies:
- dependency-name: MishaKav/pytest-coverage-comment
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dependabot[bot] · 2026-03-17 16:04:37 +00:00

6af0511e2b

Bump danielpalme/ReportGenerator-GitHub-Action from 5.5.1 to 5.5.3 (#4542 )

Bumps [danielpalme/ReportGenerator-GitHub-Action](https://github.com/danielpalme/reportgenerator-github-action) from 5.5.1 to 5.5.3.
- [Release notes](https://github.com/danielpalme/reportgenerator-github-action/releases)
- [Commits](https://github.com/danielpalme/reportgenerator-github-action/compare/5.5.1...5.5.3)

---
updated-dependencies:
- dependency-name: danielpalme/ReportGenerator-GitHub-Action
  dependency-version: 5.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dependabot[bot] · 2026-03-17 16:04:20 +00:00

6dbb0a5bb4

Bump actions/setup-dotnet from 5.1.0 to 5.2.0 (#4541 )

Bumps [actions/setup-dotnet](https://github.com/actions/setup-dotnet) from 5.1.0 to 5.2.0.
- [Release notes](https://github.com/actions/setup-dotnet/releases)
- [Commits](https://github.com/actions/setup-dotnet/compare/v5.1.0...v5.2.0)

---
updated-dependencies:
- dependency-name: actions/setup-dotnet
  dependency-version: 5.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dependabot[bot] · 2026-03-17 16:04:07 +00:00

21af304c7d

Python: chore(python): improve dependency range automation (#4343 )

* chore(python): improve dependency range automation

- tighten dependency bounds and coding standards guidance\n- add dependency range validation workflow, reporting, and issue automation\n- update related tests and dependency pins for compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* updated text and pyarrow

* new lock

* fixed workflow

* updated deps

* fix tiktoken

* chore(python): refine dependency validation workflows

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(python): add high-level dependency validation comments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* WIP

* added additional comments and excludes

* added dev dependency handling and workflow and updates to package ranges

* added readme and simplified commands

* fix markers

* chore(python): address dependency review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten dependency bounds, remove stale overrides, restore Python 3.10 support

- Apply dependency bound policy across all packages: stable >=1.0 deps use
  >=floor,<next_major; pre-1.0/prerelease deps use validated hard-bounded ranges
- Remove stale root tool.uv.override-dependencies (uvicorn, websockets, grpcio)
- Lower github_copilot requires-python to >=3.10 with github-copilot-sdk gated
  behind python_version >= 3.11 marker; import raises ImportError on 3.10
- Skip github_copilot pyright/mypy/test tasks on Python <3.11
- Use version-conditional pyrightconfig for samples on Python 3.10
- Add compatibility fix in core responses client for older openai typed dicts
- Normalize uv.lock prerelease mode and refresh dev dependencies
- Update CODING_STANDARD.md, DEV_SETUP.md, and package management skill docs

Closes #902

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* small tweaks

* add note in workflow

* fix workflows and several versions

* fix duplicate

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-03-13 12:32:37 +00:00

50fdcbaf57

214 Commits