Commit Graph

214 Commits

  • ci: pin third-party GitHub Actions to commit SHAs (#5972)
    Replaces every floating tag in our workflow and composite action files
    with an immutable 40-character commit SHA, keeping the original `# vX`
    comment so Dependabot can still propose version bumps. 186 occurrences
    across 25 workflows and 2 composite actions.
    
    Also widens the github-actions Dependabot entry to use the plural
    `directories` key with `/.github/actions/*` so composite actions under
    `.github/actions/<name>/action.yml` are kept up to date. Previously
    Dependabot only scanned `.github/workflows` and the repo-root
    `action.yml`, leaving our `python-setup` and `sample-validation-setup`
    composite actions unmaintained.
  • Python: Bump Python package versions for a release (#5964)
    * Bump Python package versions to 1.5.0 for a release
    
    * Promote orchestrations to 1.0.0rc1
    
    * ci(python-setup): merge dynamic exclude into existing workspace exclude
    
    The python-setup action injected exclude = [...] verbatim into
    [tool.uv.workspace], producing a duplicate 'exclude' key when the
    section already had a static exclude. Scope the rewrite to the
    [tool.uv.workspace] section and append the package to the existing
    array when present; idempotent if the package is already excluded.
    
    * Address Copilot review feedback: raise inter-package floors to 1.5.0
    
    - foundry, foundry-local: agent-framework-openai >=1.4.0 -> >=1.5.0
    - azure-contentunderstanding: agent-framework-foundry >=1.4.0 -> >=1.5.0
    - azurefunctions: pin agent-framework-durabletask to >=1.0.0b260519,<2
    
    Keeps lockstep cohort consistent and avoids mixed 1.4.x / 1.5.0 installs.
    
    * Re-include azurefunctions and durabletask in the uv workspace
    
    The pinned durabletask>=1.4.0 floor is enough to make resolution succeed;
    the workspace exclude was over-correction and broke CI samples and pyright
    type-checking (re-exports in agent_framework/azure/__init__.pyi plus
    samples/04-hosting/{azure_functions,durabletask}/ could not resolve their
    imports). Dropping them from agent-framework-core[all] still stands so the
    metapackage does not pull them.
    
    * Restore azurefunctions and durabletask in agent-framework-core[all]
    
    The durabletask floor pin keeps users on the safe 1.4.0, so they are once
    again included in the metapackage. Update CHANGELOG to reflect the pin
    rather than an [all] removal.
    
    * Raise uvicorn ceiling in ag-ui and devui to allow 0.42+
    
    The root override-dependencies pins uvicorn[standard]>=0.34.0 (no upper)
    and the workspace lock resolves to 0.47.0. The package ceiling <0.42.0
    meant the workspace was no longer testing the declared supported range.
    Bump to <1 so the lock fits within the declared bounds.
    
    Also picked up by validate-dependency-bounds: refresh stale orchestrations
    RC pin in devui dev deps.
  • ci(python-setup): drop -U upgrade flag from uv sync (#5961)
    The shared composite action ran `uv sync --all-packages --all-extras
    --dev -U` on every job, which upgrades every dependency to the latest
    compatible version instead of using the pinned versions in `uv.lock`.
    
    That is currently producing a hard resolver failure on every CI job:
    
        No solution found when resolving dependencies for split
        (markers: python_full_version >= '3.11' and sys_platform == 'darwin')
        Because there are no versions of durabletask and
        agent-framework-durabletask depends on durabletask>=1.3.0,<2,
        we can conclude that agent-framework-durabletask's requirements
        are unsatisfiable.
    
    Dropping `-U` makes the install use the workspace lockfile, which is
    what is reproducible locally and what we publish releases against.
    Upgrades should be opt-in (via a scheduled job or a separate workflow)
    rather than implicit on every CI run.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Replace merge-gatekeeper Docker action with github-script polling (#5533)
    The upsidr/merge-gatekeeper@v1 action is a Dockerfile-based action that
    builds a golang image on every run. On merge_group events the run step
    is conditioned out via `if: github.event_name == 'pull_request'`, so the
    build happens but produces nothing.
    
    Replace with an actions/github-script@v8 polling loop that mirrors the
    action's behavior exactly: merges combined-statuses and check-runs for
    the PR head SHA, with combined-status winning on name collisions, and
    the same conclusion mapping (skipped → dropped, success/neutral →
    success, anything else terminal → error). Same job name, triggers,
    permissions, timeout (3600s), interval (30s), and ignored list, so
    existing required-check rules stay valid.
    
    PR runs now poll the API in seconds instead of waiting on a per-run
    docker image build, and merge_group runs become near-instant no-ops.
  • .NET: CI hardening — split Functions tests, re-enable skipped integration tests (#5717)
    * Split DurableTask/AzureFunctions integration tests into dedicated CI job
    
    - Add -TestProjectNameExclude parameter to New-FilteredSolution.ps1
    - Add 'functions' and 'core' path filters to paths-filter job
    - Exclude DurableTask/AzureFunctions from main dotnet-test job
    - Remove emulator setup from dotnet-test (no longer needed)
    - Add new dotnet-test-functions job (ubuntu/net10.0 only, path-conditional)
    - Update merge gate and report job to include dotnet-test-functions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR feedback: add Workflows.Generators to core filter, drop dotnetChanges gate from functions job
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable Anthropic integration tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Upgrade Anthropic SDK 12.13.0 -> 12.20.0 to fix M.E.AI incompatibility
    
    Fixes MissingMethodException on WebSearchToolResultContent.get_Results()
    caused by Anthropic 12.13.0 being compiled against an older
    Microsoft.Extensions.AI.Abstractions version.
    
    Suppress RT0003 in AI.Abstractions.csproj as the transitive reference
    from the upgraded Anthropic SDK conflicts with the explicit one.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Anthropic unit test mocks for SDK 12.20.0 interface changes
    
    Add missing interface members: IAnthropicClient.WebhookKey,
    IBetaService.MemoryStores, IBetaService.Webhooks, IBetaService.UserProfiles
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable CheckSystem declarative integration tests
    
    The CheckSystem.yaml tests were temporarily skipped in PR #4270 during
    the Azure.AI.Projects 2.0.0-beta.1 SDK update. Since then, the system
    variable plumbing (SystemScope, SetLastMessageAsync, conversation
    initialization) has been significantly updated and stabilized. The
    other tests in these same files pass reliably using the same
    infrastructure.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix CheckSystem test case to expect 1 response
    
    The CheckSystem workflow sends a 'PASSED!' SendActivity when all system
    variables are populated, producing 1 AgentResponseEvent. The test case
    had min_response_count: 0 with no max, so the assertion defaulted max
    to 0 and failed with 'Response count greater than expected: 0 (Actual: 1)'.
    Updated to expect exactly 1 response, matching the SendActivity pattern.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable Foundry OpenAPI server-side tool integration test
    
    Remove Skip="For manual testing only" from
    AsAIAgent_WithOpenAPITool_NativeSDKCreation_InvokesServerSideToolAsync.
    The test already uses RetryFact(3 retries, 5s delay) to handle
    transient failures from the external restcountries.com API.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Include workflow file in functions/core path filters
    
    A PR editing only dotnet-build-and-test.yml would skip
    dotnet-test-functions because the workflow path was missing
    from both the functions and core path filter lists.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Rename filter parameters for consistency
    
    TestProjectNameFilter  -> TestProjectNameIncludeFilter
    TestProjectNameExclude -> TestProjectNameExcludeFilter
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Remove unnecessary RT0003 warning suppression
    
    The RT0003 suppression was added during the Anthropic SDK 12.20.0
    upgrade but the warning no longer fires. Removing it to keep the
    NoWarn list minimal.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Remove duplicate WebhookKey properties from merge
    
    Both our branch and main added WebhookKey to the Anthropic test
    mock classes, resulting in CS0102 duplicate definition errors.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Trigger issue triage on bug-labeled issues (#5763)
    * Trigger issue triage on bug-labeled issues instead of manual dispatch
    
    * Address PR feedback: scope concurrency cancellation to bug-label events
  • .NET: Hosted Agents - RAG Sample with Azure AI Search (#5693) (#5701)
    * .NET: Hosted Agents - RAG Sample with Azure AI Search (#5693)
    
    Adds a Hosted-AzureSearchRag sample plus a live Foundry.Hosting integration
    test scenario backed by a real Azure AI Search index.
    
    Sample (Hosted-AzureSearchRag): keyword-only Azure AI Search via
    SearchClient adapter into TextSearchProvider, scope-aware
    DevTemporaryTokenCredential consuming AZURE_BEARER_TOKEN_FOUNDRY +
    AZURE_BEARER_TOKEN_SEARCH for local Docker, Dockerfile + contributor
    Dockerfile mirroring Hosted-TextRag.
    
    Integration test: AzureSearchRagHostedAgentFixture extends the PR #5598
    HostedAgentFixture with the new azure-search-rag scenario branch in the
    shared test container; AzureSearchRagHostedAgentTests asserts the model
    returns canary tokens (TR-CANARY-7821, SHIP-CANARY-4493) that exist only
    in the seeded documents - real proof the agent grounded its answer in
    retrieved content rather than training data.
    
    * Address PR 5701 Copilot review feedback
    
    - Sample README: drop stale 'bootstraps the index on first run' line; index is pre-provisioned out of band
    
    - Sample + TestContainer search adapters: propagate CancellationToken to await foreach via .WithCancellation()
  • .NET: Foundry.Hosting IT - eliminate MSBuild parallel-output races (#5725)
    * .NET: Foundry.Hosted IT - fix MSBuild parallel-output races
    
    Two surgical changes inside the dotnet-foundry-hosted-it job:
    
    1. Replace dotnet build <slnx> -f net10.0 with dotnet build <test.csproj>. The test csproj pins TargetFrameworks=net10.0 and its ProjectReference closure gives MSBuild a single-rooted graph, eliminating the duplicate inner-builds that race on bin/obj. Drops the two New-FilteredSolution.ps1 steps.
    
    2. In it-build-image.ps1, drop the -UsePrebuiltProjectReferences switch and always pass --no-dependencies to dotnet publish. Publish now resolves TestContainer's framework refs by reading prebuilt DLLs and never re-touches them. Replaces the partial-mitigation in PR #5689 with a structural fix.
    
    Local validation confirmed published Foundry.dll has identical mtime and bytes as the prebuild output.
    
    * .NET: dotnet test - use --project flag for Microsoft Testing Platform
  • .NET: Python: Add dotnet integration test report to CI (#5515)
    * Add dotnet integration test report to CI
    
    - Add --report-junit flag to dotnet integration test step to generate
      JUnit XML alongside TRX, with explicit --results-directory to
      centralize output in IntegrationTestResults/
    - Upload JUnit XML artifacts from each matrix leg (net10.0/ubuntu,
      net472/windows) as dotnet-test-results-{framework}-{os}
    - Add dotnet-integration-test-report job that downloads artifacts,
      runs the existing aggregate.py script, posts markdown to Job Summary,
      and saves trend history via actions/cache
    - Refactor aggregate.py to discover JUnit XML files recursively,
      supporting both pytest (pytest.xml) and xunit (*.junit.xml) layouts
    - Handle provider name derivation for dotnet artifact naming convention
    - Fix nodeid collision when same test runs under multiple frameworks
      by qualifying keys with provider when collisions are detected
    - Improve module extraction for dotnet C# classnames (recognizes
      IntegrationTests/UnitTests namespace segments)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * chore: trigger dotnet CI for report validation
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: use .junit extension (not .junit.xml) for xunit v3 output
    
    xUnit v3 generates files with .junit extension, not .junit.xml.
    Update upload glob and aggregate.py discovery to match.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: use deterministic provider-qualified keys for dotnet tests
    
    Always prefix dotnet test keys with provider (e.g. net10.0 (ubuntu)::TestName)
    to ensure stable, comparable counts across runs regardless of file parse order.
    Also show Executed (passed+failed) instead of Total in summary table.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: match Python report summary format (Total, passed/total, etc.)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: split dotnet report into per-framework tables
    
    Dotnet tests run on multiple frameworks (net10.0, net472). Instead of
    one combined table with unstable totals, show separate sections per
    framework — each with its own summary row and per-test table. Python
    reports retain the original single-table format.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable 7 flaky dotnet integration tests with increased timeouts
    
    Increase timeouts to reduce timing-related flakiness in LLM-backed
    integration tests (issue #4971):
    
    - ExternalClientTests: 60s -> 120s default timeout
    - SamplesValidationBase: 60s -> 120s default timeout
    - ConsoleAppSamplesValidation: 90s -> 150s for long-running tests
    - AzureFunctions SamplesValidation: 2min -> 3min orchestration timeout,
      60s -> 90s per-step WaitForConditionAsync timeouts
    
    Remove all Skip=Flaky annotations and unused SkipFlakyTimingTest constants.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-skip LLM non-determinism flaky tests, keep timeout fixes
    
    Re-skip SingleAgentOrchestrationHITLSampleValidationAsync and
    LongRunningToolsSampleValidationAsync - these fail due to LLM producing
    extra review notifications, not timeouts. Updated skip reasons to
    accurately describe the root cause. Reverted unnecessary timeout change
    on the skipped LongRunningTools test.
    
    The remaining 5 re-enabled tests with timeout increases are stable.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Enable Anthropic integration tests in CI
    
    Replace hardcoded skip with conditional skip pattern (matching
    CopilotStudio approach): tests gracefully skip when ANTHROPIC_API_KEY
    is missing, and run when present.
    
    Changes:
    - AnthropicChatCompletionFixture: try/catch in InitializeAsync with
      Assert.Skip on missing config (replaces hardcoded SkipReason)
    - AnthropicSkillsIntegrationTests: same pattern per test method
    - dotnet-build-and-test.yml: wire up ANTHROPIC_API_KEY,
      ANTHROPIC_CHAT_MODEL_NAME, and ANTHROPIC_REASONING_MODEL_NAME
      env vars to the integration test step
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix missing System using in AnthropicSkillsIntegrationTests
    
    Add 'using System;' for InvalidOperationException in try/catch blocks.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Skip flaky SingleAgentOrchestrationChainingSampleValidationAsync
    
    LLM non-determinism causes Assert.NotNull failures on orchestration
    results. Skip until test logic is hardened against non-deterministic
    LLM responses.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable HITL and LongRunningTools tests with timeout and flexibility fixes
    
    - Remove Skip attribute from SingleAgentOrchestrationHITLSampleValidationAsync
    - Remove Skip attribute from LongRunningToolsSampleValidationAsync
    - Increase timeout from 120s/90s to 180s to accommodate 2+ LLM round-trips
    - Replace rigid 2-cycle assertion with flexible approval logic that handles
      extra review cycles from LLM non-determinism
    
    Fixes the two failure modes identified in #4971:
    1. Timeout: 120s/90s was insufficient for multiple LLM calls under CI load
    2. Extra notifications: Assert.Fail on 3rd+ review cycle was too rigid
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Increase AzureFunctions LongRunningTools test timeouts from 90s to 180s
    
    The LongRunningToolsSampleValidationAsync test in the AzureFunctions integration
    tests was failing in CI with TimeoutException at the 'Content published
    notification is logged' step. The 90-second timeouts are too tight for CI
    environments where LLM calls and orchestration overhead can be slow.
    
    Increased all three WaitForConditionAsync timeouts from 90s to 180s:
    - Waiting for human feedback notification
    - Waiting for publish notification (the step that was failing)
    - Waiting for orchestration completion
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Merge main and fix dotnet report path after flaky_report rename
    
    Merge upstream/main which renamed scripts/flaky_report/ to
    scripts/integration_test_report/ (from Python PR #5454). Update the
    dotnet-build-and-test workflow to reference the new path.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add RetryFact to DurableTask and AzureFunctions integration tests
    
    These tests interact with LLMs via stdin/stdout (DurableTask) or HTTP
    (AzureFunctions) and are inherently non-deterministic. Unlike the Python
    side which uses pytest-retry, the dotnet tests had no retry mechanism
    and a single transient failure would fail the entire CI run.
    
    Changes:
    - Switch [Fact] to [RetryFact(2, 5000)] on all LLM-dependent tests
      across ConsoleAppSamplesValidation, ExternalClientTests,
      WorkflowConsoleAppSamplesValidation, and AzureFunctions SamplesValidation
    - Add re-prompt mechanism to LongRunningToolsSampleValidationAsync:
      if the LLM doesn't invoke the tool within 60s, re-send the prompt
      (up to 2 retries) instead of burning the full timeout
    - Reduce LongRunningTools timeout from 240s to 180s (re-prompt makes
      the extra buffer unnecessary)
    - Leave simple/deterministic tests as [Fact] (SingleAgent, unit tests)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add persist-credentials: false to Integration Test Report checkout step
    
    Matches the convention used by other checkout steps in this workflow
    to avoid leaving GITHUB_TOKEN credentials in the local git config.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * small fixes
    
    * disable anthropic failing tests
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Foundry.Hosting IT: avoid MSB3026 in publish; fix telemetry UT flake (#5689)
    CI publish step: gate the BuildProjectReferences=false fast-path on an explicit -UsePrebuiltProjectReferences switch (passed by the workflow) instead of marker detection. Adds a preflight error when stale obj/Release/net10.0 outputs would cause CS0579, with actionable recovery instructions.
    
    Telemetry UT flake: AgentFrameworkResponseHandlerTelemetryTests was using a plain List<Activity> for OTel's InMemoryExporter. The exporter writes from background Activity completion callbacks while parallel tests on the same global ActivitySource feed every listener, racing against the assertion's enumeration and throwing 'Collection was modified'. Replaced with a small thread-safe ConcurrentActivityList that locks add/enumerate and returns a snapshot for assertions.
  • .NET: Add Foundry.Hosting.IntegrationTests (#5598)
    * Foundry.Hosting.IntegrationTests: scaffold project, fixtures, and 24 tests
    
    Add a new integration test project for Foundry hosted agents alongside the existing Foundry.IntegrationTests project. The project provisions a real Foundry hosted agent per scenario via AgentAdministrationClient.CreateAgentVersionAsync, points it at a single test container image (built and pushed out of band by scripts/it-build-image.ps1 in a follow up commit), and exercises the agent through AIProjectClient.AsAIAgent.
    
    Six scenario fixtures are introduced, each pointing at the same image but selecting behavior via the IT_SCENARIO environment variable on the HostedAgentDefinition:
    - HappyPathHostedAgentFixture (round trip, multi turn, stored=false flag)
    - ToolCallingHostedAgentFixture (server side AIFunctions)
    - ToolCallingApprovalHostedAgentFixture (approval flow)
    - ToolboxHostedAgentFixture (Foundry toolbox)
    - McpToolboxHostedAgentFixture (MCP backed toolbox)
    - CustomStorageHostedAgentFixture (custom storage provider)
    
    24 tests across 6 test classes are scaffolded. All are tagged Skip pending the test container build and the end to end smoke iteration in follow up commits. Once the container is in place the Skip annotations can be removed scenario by scenario.
    
    Adds an IT_HOSTED_AGENT_IMAGE constant to the shared TestSettings so every IT project agrees on the env var name the build script emits.
    
    * Foundry.Hosting.IntegrationTests: add TestContainer, build script, slnx, README
    
    Adds the rest of the integration test infrastructure on top of the previous scaffolding commit:
    
    * Foundry.Hosting.IntegrationTests.TestContainer csproj and Program.cs implementing the multi scenario container (one image, IT_SCENARIO env var dispatches between happy-path, tool-calling, tool-calling-approval, toolbox, mcp-toolbox, and custom-storage). The toolbox, mcp-toolbox, and custom-storage branches are placeholders pending API surface stabilization.
    * Dockerfile and dockerignore in the test container project, using the contributor pattern matching the investigation work (host side dotnet publish, container only does COPY out/).
    * scripts/it-build-image.ps1 with mandatory Registry parameter (no hardcoded ACR), content hashed tags so unchanged source results in a no op push, and emits IT_HOSTED_AGENT_IMAGE for shells and CI to consume.
    * slnx entry for both new projects.
    * README in the IT project covering env vars, image build, scenario table, and current placeholder status.
    
    Steps still pending: end to end smoke (step 5) and CI workflow integration (step 6) require a live Foundry deployment and ACR push, so they land in follow up commits.
    
    * Foundry.Hosting.IntegrationTests: address PR 5598 review feedback
    
    Fix issues raised by Copilot review:
    
    * it-build-image.ps1: hash file contents, not the path list, so any source edit produces a fresh tag. Normalize Registry input by stripping scheme and trailing slash before deriving the ACR short name. Validate the short name is non empty.
    * HostedAgentFixture: route GetAgentAsync through _adminClient (which has the FoundryFeaturesPolicy attached) instead of through _projectClient.AgentAdministrationClient (which does not).
    * HostedAgentFixture FoundryFeaturesPolicy: replace Headers.Add with Remove plus Add so retries cannot accumulate duplicate headers.
    * HappyPath, ToolCalling, ToolCallingApproval, CustomStorage tests: create the AgentSession before turn 1 and reuse it for both turns. The previous pattern created the session after turn 1 so turn 2 had no link to turn 1, defeating the multi turn assertion.
    
    * .NET: Foundry.Hosting.IntegrationTests: constrain to net10.0 + dotnet format autofix
    
    - Set <TargetFrameworks>net10.0</TargetFrameworks>: the project references both
      Microsoft.Agents.AI.Foundry.Hosting (net8/9/10 only) and AgentConformance.IntegrationTests
      (net10.0;net472 — inherits the tests-default TFM list). The intersection is net10.0;
      the previous $(TargetFrameworksCore) triple caused NU1702 + System.Text.Json version
      conflicts on the net8.0/net9.0 builds because AgentConformance had no matching asset.
    - Apply `dotnet format` autofix on the test files (IDE0005, IDE0009, IDE0032, IMPORTS).
    
    * .NET: Foundry.Hosting.IntegrationTests.TestContainer/Program.cs: add UTF-8 BOM
    
    CI's check-format requires charset=utf-8-bom per .editorconfig.
    
    * Foundry.Hosting IntegrationTests: wire end-to-end CI flow against hosted agents
    
    Make the integration tests usable end-to-end against a live Foundry deployment, including
    a per-run rebuild of the test container so framework code changes are exercised.
    
    Fixture (HostedAgentFixture.cs)
    
    * Switch from per-run unique agent names to stable scenario-keyed names (it-happy-path,
      it-tool-calling, ...). The agent's managed identity carries the Azure AI User role on
      the project scope, which is required for inbound inference; deleting the agent recycles
      the MI and breaks that role assignment, so we keep the agent across runs and only churn
      versions.
    * Add IT_RUN_ID env var to defeat Foundry's content-addressed version dedup; otherwise a
      rerun just receives the existing version and Dispose deletes it.
    * PATCH the per-agent endpoint with AgentEndpointConfig (Responses protocol, version
      selector at 100% to the new version). Without this, /agents/{name}/endpoint/protocols/
      openai/responses returns HTTP 400.
    * Build a per-agent ProjectOpenAIClient (not the cached projectClient.ProjectOpenAIClient,
      which is bound to the project-level URL); set AgentName in options so the URL routes
      through the agent endpoint, and add the Foundry-Features header to the inference
      pipeline.
    * Use Versions (which serializes to container_protocol_versions) instead of the
      deprecated ProtocolVersions; the server now rejects the legacy field.
    * On Dispose, delete only the version this fixture created. Never delete the agent.
    
    Tests
    
    * Tag every HostedAgentTests class with [Trait("Category", "FoundryHostedAgents")] so the
      CI workflow can route them to a separate Foundry project than the rest of the
      integration suite.
    
    CI workflow (.github/workflows/dotnet-build-and-test.yml)
    
    * Add a foundryHosting paths-filter covering Microsoft.Agents.AI.Foundry.Hosting and its
      in-repo dependency chain (Foundry, Agents.AI, Agents.AI.Abstractions), the test
      container, the test fixture, Directory.Packages.props, the build script, and this
      workflow file. Skip the costly hosted-agent steps when none of those changed.
    * Add "Build and push Foundry Hosted Agents test container" step that invokes
      scripts/it-build-image.ps1 against vars.IT_HOSTED_AGENT_REGISTRY and pipes the resulting
      IT_HOSTED_AGENT_IMAGE=<tag> into GITHUB_ENV.
    * Add "Run Foundry Hosted Agents Integration Tests" step that filters in only the new
      trait, with AZURE_AI_PROJECT_ENDPOINT/AZURE_AI_MODEL_DEPLOYMENT_NAME pointed at
      IT_HOSTED_AGENT_PROJECT_ENDPOINT/IT_HOSTED_AGENT_MODEL_DEPLOYMENT_NAME (Tao project,
      East US 2; the SK IT project's region does not yet support hosted agents preview).
    * Exclude the new trait from the existing "Run Integration Tests" step.
    * TEMP: drop the != 'pull_request' guard on the new steps and on Azure CLI Login when the
      paths-filter triggers, so PR #5598 can validate the wiring before promoting to merge
      queue only. Restore the original guard after one green PR run.
    
    Build script (scripts/it-build-image.ps1)
    
    * Hash now spans TestContainer source AND its referenced framework projects so any
      framework code change forces a fresh tag and a real docker push; the previous
      TestContainer-only hash silently reused stale images on framework edits.
    
    Bootstrap script (dotnet/tests/Foundry.Hosting.IntegrationTests/scripts/it-bootstrap-agents.ps1)
    
    * New idempotent script that creates the six stable scenario agents and grants Azure AI
      User on the project scope to each agent's MI. Run once per Foundry project. Includes
      AAD-graph propagation retries because newly created MIs take time to appear there.
    
    README (dotnet/tests/Foundry.Hosting.IntegrationTests/README.md)
    
    * Document the bootstrap prerequisite, the regional caveat (East US 2 is the only region
      we have validated; East US returned "Unsupported region" at the time of writing), the
      per-run image rebuild, and the CI wiring including the SP RBAC requirements.
    
    SDK pin (TEMP)
    
    * Bump Microsoft.Agents.AI.Foundry.Hosting's Azure.AI.Projects VersionOverride to
      2.1.0-alpha.20260505.1 from the azure-sdk public daily feed (added to nuget.config).
      This release is the first that builds the per-agent inference URL as
      /agents/{name}/endpoint/protocols/openai (the 2.1.0-beta.1 release builds
      .../openai/openai/v1, which the server rejects). Revert both the feed and the override
      once the URL fix lands in a stable Azure.AI.Projects release.
    
    * Foundry.Hosting IntegrationTests: revert alpha SDK pin; move endpoint PATCH to bootstrap
    
    The alpha SDK pin (Azure.AI.Projects 2.1.0-alpha.20260505.1 from the azure-sdk public
    daily feed) was needed only for the URL routing fix and the strongly-typed
    AgentEndpointConfig/PatchAgentOptions wrapper. We do not need either right now: the
    fixture stays compatible with the public 2.1.0-beta.1 by moving the one-time endpoint
    PATCH to the bootstrap script (it sets version_selector to FixedRatio @latest, so each
    new fixture run becomes the served version automatically without a per-run PATCH from
    the test code). The hosted-agent invocation path will start working end-to-end once the
    URL routing fix lands in a stable Azure.AI.Projects release; until then the tests stay
    [Fact(Skip = ...)] as documented.
    
    * Revert dotnet/nuget.config: drop the azure-sdk-for-net public feed.
    * Revert Microsoft.Agents.AI.Foundry.Hosting.csproj VersionOverride to 2.1.0-beta.1.
    * Revert Microsoft.Agents.AI.Foundry.UnitTests and Microsoft.Agents.AI.Foundry.Hosting.UnitTests
      Azure.AI.Projects pin (they had been bumped to align Azure.Core 1.54 transitive).
    * Drop the AgentEndpointConfig PATCH block from HostedAgentFixture.cs (the type is
      alpha-only). Replace with a comment pointing at the bootstrap script.
    * Bootstrap script (it-bootstrap-agents.ps1) now also PATCHes each agent's endpoint
      with version_selector=@latest if not already set. Idempotent.
    
    * Foundry.Hosting IntegrationTests: drop accidentally committed filtered.slnx
    
    * Foundry.Hosting IntegrationTests: revert TEMP PR override on Azure CLI Login + IT steps
    
    The previous attempt to validate the new hosted-agent IT wiring on PR #5598 failed
    because the PR is from a fork (rogerbarreto/agent-framework-public). GitHub never passes
    environment secrets to fork PRs regardless of event-name guards on individual steps,
    so 'azure/login@v2' fails with 'client-id and tenant-id are not supplied'. Restore the
    original github.event_name != 'pull_request' guard. The new steps will execute on
    push to main and on merge_group runs.
    
    * Foundry.Hosting IntegrationTests: invoke build-and-push script with absolute path
    
    The pwsh shell on the GitHub Actions runner couldn't resolve ./scripts/it-build-image.ps1
    when the step had no working-directory set; the step inherits the runner's PWD which is
    not always the repo root after preceding steps. Use github.workspace explicitly to remove
    the ambiguity.
    
    * Foundry.Hosting IntegrationTests: move it-build-image.ps1 inside the IT project tree
    
    The previous location at scripts/it-build-image.ps1 lived outside the sparse-checkout
    paths the workflow uses (.github, dotnet, python, declarative-agents), so the runner
    never had the file when the new step tried to invoke it. Move the script next to its
    sibling it-bootstrap-agents.ps1 inside the IT project tree, and anchor its relative
    paths to the repo root via  so callers can invoke it from any PWD.
    
    * Move scripts/it-build-image.ps1 -> dotnet/tests/Foundry.Hosting.IntegrationTests/scripts/it-build-image.ps1
    * Add Push-Location to the resolved repo root inside the script (Pop-Location in finally)
      so the existing relative paths (TestContainerProject, hashed src dirs) keep working
      no matter where the script is invoked from.
    * Update the workflow path filter and the step's invocation path to the new location.
    
    * Foundry.Hosting IntegrationTests: enable 5 HappyPath tests on the live Foundry endpoint
    
    The fixture already constructs ProjectOpenAIClient via the per-agent path that beta.1
    supports (new ProjectOpenAIClient(uri, cred, opts { AgentName })), so no SDK pin bump
    is required to run the smoke tests end-to-end. Un-skip the 5 tests that pass against
    the live test container.
    
    Tests un-skipped (verified passing locally against tao-foundry-prj):
    
    * RunAsync_ReturnsNonEmptyTextAsync
    * RunStreamingAsync_YieldsAtLeastOneUpdateAsync
    * MultiTurn_WithPreviousResponseId_PreservesContextAsync
    * StoredFalse_Baseline_DoesNotPersistResponseAsync
    * Instructions_FromContainerDefinition_AreObeyedAsync
    
    Tests still skipped with a more specific reason (4 of 9 in HappyPath plus all
    ToolCalling*, McpToolbox, Toolbox, CustomStorage) because the test container does not
    yet emit usable response_id / conversation_id chains, and the placeholder scenarios are
    not implemented in the test container's Program.cs. These are test container limitations,
    not infra bugs, and can be un-skipped as the container surfaces stabilize.
    
    * Foundry.Hosting IntegrationTests: extract hosted IT into parallel job, add Workflows dep
    
    Address Wesley's review feedback on PR #5598:
    
    1. Pull Foundry hosted-agent IT into its own dotnet-foundry-hosted-it job that runs in parallel to dotnet-build and dotnet-test. Same path-filter gate keeps it skipped on unrelated edits. Builds only the filtered solution containing Foundry.Hosting.IntegrationTests and src deps. dotnet-build-and-test-check now waits on it too.
    
    2. Add Microsoft.Agents.AI.Workflows to the foundryHosting paths-filter and to hashedDirs in it-build-image.ps1 since Foundry.Hosting transitively depends on it.
    
    TFM constraint on the IT csproj stays at net10.0 because AgentConformance.IntegrationTests targets net10/net472 and is consumed by ~12 other IT projects on net472.
    
    ---------
    
    Co-authored-by: Roger Barreto <rbarreto@microsoft.com>
  • Python: Reduce flaky integration tests and improve CI signal quality (#5454)
    * Enable Ollama integration tests in CI and rename report to Integration Test Report
    
    - Install Ollama, cache models (qwen2.5:0.5b + nomic-embed-text), and start
      server in the Misc integration job for both workflow files
    - Set OLLAMA_MODEL and OLLAMA_EMBEDDING_MODEL env vars so the 5 Ollama tests
      are no longer skipped
    - Rename Flaky Test Report to Integration Test Report throughout (job names,
      artifact names, cache keys, file names, script titles/docstrings)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Bump Ollama model to qwen2.5:1.5b for better instruction following
    
    The 0.5b model was too small to reliably follow simple prompts like
    'Say Hello World', causing test assertion failures. The 1.5b model
    follows instructions more reliably while still being small enough
    for fast CI pulls (~1GB).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable reliable streaming integration tests
    
    Remove the hard skip on test_03_reliable_streaming tests that was
    temporarily disabled for instability investigation. CI infrastructure
    (Azurite, DTS emulator, Redis, func CLI) is already in place.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable skipped Functions/DurableTask tests and bump timeout to 480s
    
    - Remove hard skips from 4 tests in test_11_workflow_parallel.py
    - Remove hard skip from test_conditional_branching in test_06_dt_multi_agent_orchestration_conditionals.py
    - Increase pytest --timeout from 360 to 480 for Functions+DurableTask CI job
    - Updated in both python-merge-tests.yml and python-integration-tests.yml
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-skip failing Functions/DurableTask tests with specific root causes
    
    - test_11_workflow_parallel (4 tests): xdist worker crashes during execution
    - test_conditional_branching: orchestration fails with RuntimeError, not a timeout
    - Keep 480s timeout bump for remaining Functions tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix auth routing in samples 06/11: api_key -> credential for Azure OpenAI
    
    Both samples passed a bearer token provider via api_key= which caused the
    client to route to api.openai.com instead of Azure OpenAI, resulting in
    401 Unauthorized. Changed to credential= which correctly triggers Azure
    routing and picks up AZURE_OPENAI_ENDPOINT from the environment.
    
    - samples/azure_functions/11_workflow_parallel/function_app.py: 1 fix
    - samples/durabletask/06_multi_agent_orchestration_conditionals/worker.py: 2 fixes
    - Re-enable 4 parallel workflow tests and 1 conditional branching test
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-skip parallel workflow tests: xdist worker distribution issue
    
    The 4 parallel workflow tests crash because xdist worksteal distributes
    them across separate workers, each spawning its own func process against
    shared emulators. Auth fix (api_key->credential) was valid and stays.
    test_conditional_branching now passes with the auth fix.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix E501 line-too-long in azurefunctions parallel test skip reasons
    
    Wrap skip reason strings to stay within 120 char line limit.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add retry logic and port-conflict fix for Ollama CI setup
    
    - Kill any auto-started Ollama before launching serve (fixes port
      conflict: 'address already in use')
    - Retry ollama pull up to 3 times with 15s backoff (fixes 429 rate
      limit failures)
    - Applied to both python-merge-tests.yml and python-integration-tests.yml
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix flaky integration tests and re-enable skipped tests
    
    - Foundry agent: add allow_preview=True to custom client test
    - Foundry hosting: raise max_output_tokens 50->200, add temperature,
      relax assertion in test_temperature_and_max_tokens
    - Foundry embedding: update skip reason with root cause (endpoint mismatch)
    - OpenAI file search: fix vector store indexing race condition by polling
      file_counts before querying; fix get_streaming_response -> get_response(stream=True)
    - Azure OpenAI file search: remove skip (transient 500 resolved)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Remove temperature from foundry hosting test (unsupported by CI model)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Stabilize Ollama tool call integration tests with no-arg function
    
    Use a no-argument greet() function instead of hello_world(arg1) for
    integration tests. The 1.5B model in CI is unreliable at generating
    correct tool call arguments, causing 'Argument parsing failed' errors.
    A no-arg function eliminates this flakiness entirely.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Increase reliable streaming test timeouts from 30s to 60s
    
    The LLM call through Azure OpenAI + Redis streaming pipeline can exceed
    30s in CI due to cold starts or throttling. Raise to 60s to reduce
    flaky timeouts while still bounded by pytest's 120s per-test limit.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Re-enable workflow parallel tests with xdist_group marker
    
    The tests were skipped because xdist distributes module tests across
    workers, each spawning their own func process (port conflicts). Adding
    xdist_group forces all tests in this module onto a single worker so
    the module-scoped function_app_for_test fixture works correctly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Revert "Re-enable workflow parallel tests with xdist_group marker"
    
    This reverts commit 455c28da62.
    
    * Rename flaky_report to integration_test_report and add try/finally cleanup
    
    - Rename scripts/flaky_report/ to scripts/integration_test_report/ to
      reflect expanded scope beyond flaky-test detection
    - Update workflow references in both CI files
    - Wrap file search integration tests in try/finally to ensure vector
      store cleanup runs even on test failure or timeout
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Ollama pull failure propagation and Azure OpenAI vector store readiness
    
    - Ollama CI: fail the step immediately if model pull fails after 3
      retries instead of silently proceeding to tests
    - Azure OpenAI file search: add the same vector-store readiness polling
      that was applied to the non-Azure OpenAI tests, preventing eventual
      consistency race conditions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * remove load_dotenv from test file
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Update hosting agent samples + fixes (#5485)
    * Update foundry hosting samples
    
    * Add file data type support
    
    * Fix file content and add more tests
    
    * Fix README
    
    * Address comments
    
    * Fix int tests
    
    * remove temp
  • Propagate integration-test model credentials to issue-triage repro (#5443)
    Scopes the triage job to the integration GitHub Environment, adds
    the azure/login OIDC step, and exposes the same OpenAI / Azure
    OpenAI / Foundry / Anthropic env vars the integration test
    workflow uses. This lets the triage agent write repro code that
    constructs model clients from the environment without any secrets
    entering the agent prompt or generated-code literals.
    
    Azure OpenAI and Foundry continue to authenticate via AAD
    (DefaultAzureCredential), so there is no API key to leak for
    those providers.
  • Automated issue triage workflow (#5419)
    * Automated issue triage workflow
    
    * Bump dependencies
    
    * Fix issue-triage workflow: security, reliability, and testability
    
    Address six review comments on the issue-triage workflow:
    
    1. Change trigger from issues:opened to issues:labeled so the
       secret-backed triage flow is only triggered by a maintainer-
       controlled signal.
    
    2. Include inputs.issue_number in the concurrency group so
       workflow_dispatch runs for the same issue are properly
       de-duplicated.
    
    3. Improve team membership error handling to fail closed: verify
       the team exists before checking membership, and only treat a
       404 as 'not a member' (all other errors fail the job).
    
    4. Use optional chaining (issue.user?.login) for the API-fetched
       issue to handle deleted GitHub accounts without crashing.
    
    5. Extract the inline github-script into a testable module at
       .github/scripts/check_team_membership.js with 10 tests in
       .github/tests/test_check_team_membership.js covering all
       code paths (payload/API author resolution, deleted accounts,
       team lookup failure, 404 vs non-404 membership errors).
    
    6. Make the spam gate actually stop the job by exiting non-zero
       instead of just logging, so future steps cannot accidentally
       run for spam issues.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Make issue-triage workflow manually triggered only for initial testing
    
    Remove the 'issues' event trigger, keeping only 'workflow_dispatch' so the
    workflow can be tested manually before enabling automatic triggers.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Flaky test report (#5342)
    * Add flaky test trend reporting to CI workflows
    
    Parse JUnit XML (pytest.xml) from each integration test job and
    aggregate results into a markdown trend report showing per-test
    pass/fail/skip status across the last 5 runs.
    
    Changes:
    - Add python/scripts/flaky_report/ package (JUnit XML parser + trend
      report generator following the sample_validation pattern)
    - Add upload-artifact steps to all 6 integration test jobs in both
      python-merge-tests.yml and python-integration-tests.yml
    - Add python-flaky-test-report aggregation job with history caching
    - Add --junitxml=pytest.xml to integration-tests.yml jobs (already
      present in merge-tests.yml)
    - Fix Cosmos job --junitxml path (use absolute path since uv run
      --directory changes cwd)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix flaky report: handle missing test results gracefully
    
    - Guard against missing reports directory in load_current_run()
    - Only run report job when at least one integration test job completed
      (skip when all jobs are skipped, e.g. on pull_request events)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review: fix provider names and if-expression precedence
    
    - Use explicit provider name mapping in _derive_provider() so OpenAI
      renders correctly instead of 'Openai'
    - Fix operator precedence in workflow if-expressions by wrapping
      success/failure checks in parentheses
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add File column and xfail detection to flaky test report
    
    - Add File column showing module name (e.g., test_openai_chat_client)
      to disambiguate tests with the same function name across files
    - Detect pytest xfail tests in JUnit XML (type=pytest.xfail) and
      show them with a distinct warning emoji instead of skip emoji
    - Update legend to include xfail explanation
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add Foundry embedding env vars to merge-tests workflow
    
    Sync the Foundry integration job in python-merge-tests.yml with
    python-integration-tests.yml by adding FOUNDRY_MODELS_ENDPOINT,
    FOUNDRY_MODELS_API_KEY, FOUNDRY_EMBEDDING_MODEL, and
    FOUNDRY_IMAGE_EMBEDDING_MODEL. Once the repo variables/secrets
    are configured, the embedding integration test will run in CI.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix File column showing class name instead of module name
    
    When a test is inside a class, pytest writes the classname as e.g.
    'pkg.test_file.TestClass'. The previous rsplit logic extracted
    'TestClass' instead of 'test_file'. Now detect uppercase-starting
    segments as class names and use the preceding segment instead.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review: UTC timestamps, XML error handling, summary fix, docstring
    
    - Use datetime.now(timezone.utc) for accurate UTC timestamps
    - Catch ET.ParseError per-file so corrupt XML doesn't crash the report
    - Remove separate 'error' key from summary (errors folded into 'failed')
    - Fix _short_name docstring to show actual dotted classname::name format
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Add pr review GH workflow (#5418)
    * Add workflow PR review
    
    * Allow reviews on draft PRs
    
    * Update .github/workflows/devflow-pr-review.yml
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
    
    * Update .github/workflows/devflow-pr-review.yml
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
    
    * Bump actions/checkout to v6 and uv to 0.11.x
    
    ---------
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
  • Python: Add Hyperlight CodeAct package and docs (#5185)
    * initial work on code_mode
    
    * updated samples
    
    * updates to codeact
    
    * udpated codeact
    
    * Draft CodeAct ADR and sample updates
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * initial implementation and adr and feature
    
    * Python: Limit Hyperlight wasm backend to Python <3.14
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Fix CI for Hyperlight CodeAct PR
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Run Hyperlight integration when available
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Address Hyperlight review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Simplify Hyperlight file mount inputs
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Accept Path host paths in Hyperlight mounts
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: Fix Hyperlight mount typing for CI
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * temp run integration test
    
    * Python: Strengthen Hyperlight real sandbox tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * added additional tests
    
    * Python: Simplify Hyperlight CodeAct API
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * set tests as non-integration
    
    * Retry Hyperlight allowed-domain registration
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Gate Hyperlight integration tests by runtime support
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Hyperlight skip test on Python 3.14
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Delay Hyperlight runtime probe until test execution
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Relax Hyperlight Windows integration stdout assertion
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Scan Hyperlight output directory for artifacts
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Retry Hyperlight output artifact collection
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Harden Hyperlight integration output assertions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Retry Hyperlight read-back check in integration test
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Simplify Hyperlight integration write assertion
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Avoid pathlib in Hyperlight integration sandbox
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use socket network check in Hyperlight sandbox
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Replace blocked Azure AI Search blog link
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Clarify Hyperlight guest stdlib limits
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use _socket in Hyperlight integration sandbox
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Handle Hyperlight mounted file paths
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Broaden Hyperlight sandbox path fallbacks
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Search Hyperlight guest mounts recursively
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Split Hyperlight mount coverage
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Split Hyperlight live network tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Hyperlight file-write test on Windows
    
    Enable the sandbox filesystem by providing a workspace_root so
    /output is mounted. Remove os.path.exists assertion (unsupported
    in WASM guest) and fix Content data assertion to use .uri.
    Skip the network integration test on Windows where the WASM
    sandbox lacks the encodings.idna codec.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review: ADR intro, manual wiring sample, doc clarifications
    
    - Add CodeAct introduction section to ADR for unfamiliar readers
    - Clarify 'less runtime efficient' con with specific overhead description
    - Add note in Python impl doc clarifying ADR vs impl doc split
    - Explain why before_run hooks must be per-run (CRUD, concurrency, approval)
    - Rename code_interpreter variable to codeact in E2E sample
    - Add manual static wiring sample (codeact_manual_wiring.py)
    - Add 'when to use which pattern' guidance to samples README
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR #5185 review comments and add .NET CodeAct design doc
    
    - Fix async callback: _make_sandbox_callback returns sync wrapper with
      thread + asyncio.run() bridge (was broken with real Wasm FFI)
    - Fix stale output: clear output_dir before each sandbox.run() call
    - Fix blocking event loop: _run_code now async with asyncio.to_thread()
    - Revert _agents.py options['tools'] injection (unnecessary; provider
      uses context.extend_tools())
    - Revert SessionContext.options docstring back to read-only
    - Add real-sandbox test fixtures (shared/restored/fresh)
    - Add 8 new real-sandbox tests for callback round-trip, stale output,
      event loop non-blocking, basic execution, stdout/stderr, errors,
      snapshot/restore, and tool registration
    - Add comprehensive .NET HyperlightCodeActProvider design document
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Update hyperlight README with code snippets and remove Public API section
    
    Replace bare export list with Quick Start code examples covering the
    context provider, standalone tool, manual static wiring, and file
    mounts / network access patterns.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Foundry Evals integration for .NET (#4914)
    * Foundry Evals integration for .NET
    
    - Core evaluation framework: EvalItem, LocalEvaluator, FunctionEvaluator, EvalChecks
    - IAgentEvaluator interface with MeaiEvaluatorAdapter bridge
    - AgentEvaluationExtensions for agent.EvaluateAsync() overloads
    - FoundryEvals wrapping MEAI quality/safety evaluators
    - ConversationSplitters (LastTurn, Full) and IConversationSplitter
    - EvalItem.PerTurnItems() for multi-turn decomposition
    - HasImageContent for multimodal content detection
    - WorkflowEvaluationExtensions for per-agent workflow evaluation
    - 7 eval samples mirroring Python parity:
      02-agents/Evaluation: SimpleEval, ExpectedOutputs, Multimodal
      03-workflows/Evaluation: WorkflowEval
      05-end-to-end/Evaluation: FoundryQuality, MixedProviders, ConversationSplits
    - Comprehensive unit tests (1958 passing)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Rewrite FoundryEvals to use real Foundry Evals API
    
    Replace MEAI evaluator shim with actual OpenAI EvaluationClient protocol
    methods. FoundryEvals now creates eval definitions, submits runs, polls
    for completion, and fetches per-item results server-side.
    
    - New constructor: FoundryEvals(AIProjectClient, model, evaluators)
    - Add FoundryEvalConverter for MEAI ChatMessage -> Foundry JSON format
    - Add EvalId, RunId, ReportUrl to AgentEvaluationResults
    - All 20 built-in evaluator constants now work (agent, tool, quality, safety)
    - Remove Microsoft.Extensions.AI.Evaluation.Quality/Safety dependencies
    - Update all samples for new constructor (no more ChatConfiguration)
    - Replace BuildEvaluators tests with ResolveEvaluator tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add response output to CustomEvals and ExpectedOutputs samples
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review: pagination, validation, error handling, tests
    
    FoundryEvals fixes:
    - Add pagination for output items (has_more/after cursor)
    - Add guard clauses for pollIntervalSeconds/timeoutSeconds <= 0
    - Fix double TryGetProperty for passed field parsing
    - Throw on all-tool-evaluators with no tool definitions
    - Fix XML doc (default 300s, not 180s)
    
    New tests (30 added, 1989 total):
    - EvalChecks: NonEmpty, ContainsExpected (pass/fail/skip/case),
      HasImageContent, ToolCallsPresent
    - FoundryEvalConverter: ConvertMessage (text, image, function call,
      function results fan-out, empty fallback, mixed content),
      ConvertEvalItem, BuildTestingCriteria (quality/agent/tool/groundedness
      data mappings), BuildItemSchema
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix review: null-refs, Data.ToString() bug, ContainsExpected, add tests
    
    - Fix NullReferenceException in sample Response display (pattern matching)
    - Fix WorkflowEvaluationExtensions Data?.ToString() producing type names
      instead of message text (pattern-match ChatMessage/AgentResponse/list)
    - Change EvalChecks.ContainsExpected to return Passed=false when no
      ExpectedOutput (was silently passing, masking misconfiguration)
    - Add EvalItem constructor tests with LastTurn/Full/null splitters
    - Add FoundryEvalConverter.ConvertMessage DataContent (base64 image) test
    - Add ExtractAgentData tests with ChatMessage, list, and AgentResponse data
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix review: conversation fidelity, eval caching, fallback tests
    
    - WorkflowEvaluationExtensions: preserve full response messages (tool calls,
      intermediate) instead of synthetic 2-message conversation. Cast completed
      Data to AgentResponse and use Messages when available, fallback to text.
    - FoundryEvals: cache evalId per schema shape (hasContext, hasTools) so
      subsequent EvaluateAsync calls create runs under the same eval definition.
    - MeaiEvaluatorAdapter: code already correctly passes queryMessages (not full
      conversation) to IEvaluator — no change needed, verified by inspection.
    - Add tests: AgentResponse full messages preservation, unknown object
      ToString() fallback for ExtractAgentData.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Rename AzureAI→Foundry: move eval files, update references
    
    - Move FoundryEvals.cs and FoundryEvalConverter.cs from
      Microsoft.Agents.AI.AzureAI to Microsoft.Agents.AI.Foundry
    - Update namespace from AzureAI to Foundry in both files
    - Add explicit usings required by Foundry project (no implicit usings)
    - Move FoundryEvalConverter tests to Foundry.UnitTests project
      (avoids ReplacingRedactor type conflict from dual project refs)
    - Update all sample csproj references and using statements
    - Remove Foundry project reference from AI UnitTests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * PR review round 4: wire up tool extraction, remove eval cache, fix null safety
    
    - BuildEvalItem: extract tools from agent via GetService<ChatOptions>() into EvalItem.Tools (Python parity)
    - FoundryEvals: remove eval ID cache - each call creates fresh definition (matches Python behavior)
    - FoundryEvals: replace null-forgiving operators with descriptive InvalidOperationException
    - MixedProviders sample: remove unnecessary explicit PackageReferences (transitively provided)
    - FoundryEvalConverter: document that tool results take precedence over text content
    - Add LocalEvaluator zero-checks test documenting 0 metrics = failed behavior
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python-dotnet parity: 9 feature gaps filled
    
    New checks:
    - ToolCallArgsMatch() — verify tool call names + argument subset match
    - ToolCalledCheck(ToolCalledMode.Any, ...) — match any of the specified tools
    - ToolCalledMode enum (All/Any)
    
    FoundryEvals enhancements:
    - Default evaluators now [Relevance, Coherence, TaskAdherence] (was Relevance, Coherence)
    - Auto-add ToolCallAccuracy when items have tool definitions
    - EvaluateTracesAsync — evaluate by response_ids, trace_ids, or agent_id
    - EvaluateFoundryTargetAsync — evaluate deployed Foundry targets
    
    Result type enrichment:
    - AgentEvaluationResults: added Status, Error, PerEvaluator, DetailedItems
    - New EvalItemResult/EvalScoreResult/PerEvaluatorResult types
    - FoundryEvals populates all new fields from API responses
    
    Workflow fix:
    - Skip internal executors (_*, input-conversation, end-conversation, end)
    
    Tests: 8 new tests covering ToolCallArgsMatch, ToolCalledMode.Any, internal executor filtering
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add MeaiEvaluatorAdapter and PerTurnItems edge case tests
    
    - 3 tests for MeaiEvaluatorAdapter: query message forwarding, synthetic
      response fallback, multiple items aggregation
    - 3 tests for EvalItem.PerTurnItems: empty conversation, no user messages,
      system+assistant only
    - StubEvaluator and StubChatClient test helpers
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Blocking link check for outdated package in DevUI.
    
    * Replace Dictionary<string, object> payloads with typed wire models
    
    Introduce internal FoundryEvalWireModels.cs with compile-time-safe types
    for the OpenAI Evals API wire format. The OpenAI .NET SDK (2.9.1) only
    provides protocol-level methods with BinaryContent/ClientResult — no
    typed request models. These internal models replace scattered dictionary
    literals with [JsonPropertyName]-annotated classes, giving:
    
    - Compile-time safety (typos become build errors)
    - Single point of change when the API evolves
    - IntelliSense discoverability
    - Cleaner serialization via JsonPolymorphic for content items
    
    Models: WireContentItem hierarchy (text, image, tool_call, tool_result),
    WireMessage, WireEvalItemPayload, WireTestingCriterion, WireItemSchema,
    WireCreateEvalRequest, WireCreateRunRequest, and data source variants.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Skip metric when Foundry returns neither score nor passed
    
    When an evaluator returns no score and no passed value, the previous
    code created BooleanMetric(name, false), which falsely failed items
    via ItemPassed. Now we skip the MEAI metric entirely for indeterminate
    results — the raw data remains available in DetailedItems for diagnostics.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR #4914 review comments: fix tool evaluator bug and add tests
    
    - Fix duplicate ToolCallAccuracy: resolve evaluator names before checking
      against ToolEvaluators set (Comment 2)
    - Make FilterToolEvaluators internal for testability; add tests for the
      ArgumentException edge case when all evaluators are tool-type (Comment 3)
    - Add CancellationToken test for LocalEvaluator (Comment 4)
    - Add EvaluateAsync integration test on Run with sequential workflow and
      per-agent SubResults verification (Comment 5)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address Peter's review comments on PR #4914
    
    - Add trailing newline to Evaluation_FoundryQuality.csproj (Comment 6)
    - Make evaluator name lookups case-insensitive: switch BuiltinEvaluators,
      ToolEvaluators, AgentEvaluators, and ResolveEvaluator's StartsWith check
      from Ordinal to OrdinalIgnoreCase (Comment 7)
    - Add Trace.TraceWarning when Foundry returns fewer results than submitted
      items, indicating expected vs actual count before padding (Comment 8)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add Microsoft.Extensions.AI.Evaluation packages to Directory.Packages.props
    
    These were removed in #5269 as unused, but are needed by the Foundry
    and core evaluation integration added in this PR.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: alliscode <bentho@microsoft.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: bump misc-integration retry delay to 30s (#5293)
    The misc-integration job (Anthropic, Ollama, MCP) frequently fails on merge to main when the upstream MCP server (e.g. learn.microsoft.com/api/mcp) returns a transient rate-limit error. The previous 5s retry delay is too short to ride out the upstream backoff window, so all retries fail and the merge queue is blocked. Bumping to 30s gives the upstream a chance to recover before pytest-retry re-runs the test.
  • Python: Stop emitting duplicate reasoning content from OpenAI response.reasoning_text.done and response.reasoning_summary_text.done events (#5162)
    * Fix reasoning text done events duplicating streamed delta content (#5157)
    
    The OpenAI Responses API sends both reasoning_text.delta (incremental
    chunks) and reasoning_text.done (full accumulated text) events. The
    chat client was emitting Content for both, causing ag-ui to append the
    full done text onto already-accumulated delta text, producing
    duplicated reasoning output.
    
    Stop emitting Content for reasoning_text.done and
    reasoning_summary_text.done events, matching how output_text.done is
    already handled (not emitted). The deltas contain all the content;
    the done event is redundant.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(openai): emit reasoning done content as fallback when no deltas observed (#5157)
    
    Address PR review feedback:
    - Track item_ids that received reasoning deltas via seen_reasoning_delta_item_ids set
    - Emit content from done events only when no deltas were received for the
      item_id, preventing silent content loss on stream resumption
    - Add comment documenting code_interpreter done event asymmetry
    - Replace redundant ag-ui test with deduplication-focused test
    - Add integration test for delta+done sequence in OpenAI chat client tests
    - Add fallback path tests for done events without preceding deltas
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review feedback for #5157: Python: [Bug]: "type": "response.reasoning_text.delta" and "response.reasoning_text.done" both get exposed as "text_reasoning"
    
    * Fix AG-UI reasoning streaming to use proper Start/End pattern (#5157)
    
    _emit_text_reasoning now follows the same streaming pattern as _emit_text:
    - Emits ReasoningStartEvent/ReasoningMessageStartEvent only on the first
      delta for a given message_id
    - Emits only ReasoningMessageContentEvent for subsequent deltas
    - Defers ReasoningMessageEndEvent/ReasoningEndEvent until
      _close_reasoning_block is called (on content type switch or end-of-run)
    
    This produces the correct protocol pattern:
      ReasoningStartEvent
        ReasoningMessageStartEvent
        ReasoningMessageContentEvent(delta1)
        ReasoningMessageContentEvent(delta2)
        ReasoningMessageEndEvent
      ReasoningEndEvent
    
    Instead of wrapping every delta in a full Start→End sequence.
    
    Backward compatibility is preserved: calling _emit_text_reasoning without
    a flow argument still produces the full sequence per call.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix import ordering lint error in AG-UI test file (#5157)
    
    Move inline import of TextMessageContentEvent to the top-level import
    block and ensure alphabetical ordering to satisfy ruff I001 rule.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix mypy error: rename loop variable to avoid type conflict with WorkflowEvent
    
    The 'event' variable was already typed as WorkflowEvent[Any] from the
    async for loop at line 590. Reusing it in the _close_reasoning_block
    loop (which returns list[BaseEvent]) caused an incompatible assignment
    error. Renamed to 'reasoning_evt' to avoid the conflict.
    
    Fixes #5162
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review feedback for #5157: review comment fixes
    
    * narrow test result reporting to explicit pytest JUnit XML
    
    * Fix test args
    
    * Fix pytest-results-action in merge workflow and remove committed test artifacts
    
    Apply the same JUnit XML fix from python-tests.yml to python-merge-tests.yml:
    add --junitxml=pytest.xml to all test commands and narrow the results action
    path from ./python/**.xml to ./python/pytest.xml. Also remove accidentally
    committed pytest.xml and python-coverage.xml and add them to .gitignore.
    
    ---------
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Improve resilience of verify-samples by building separately and improving evaluation instructions (#5151)
    * Improve resilience of verify-samples by building separately and improving evaluation instructions
    
    * Address PR comments
    
    * Address PR comment
  • .NET: Add github actions workflow for verify-samples (#5034)
    * Add github actions workflow for verify-samples
    
    * Make workflow run as part of PR (for now)
    
    * Update workflow to remove pr trigger
    
    * Address PR comments
  • Python: [BREAKING] Python: move Azure AI embeddings to Foundry (#5056)
    * renamed AzureAIINferenceEmbeddings and lazy load azure-cosmos and env var rename
    
    * updated coverage
    
    * fix readme
  • Python: Move workflow-samples and agent-samples under declarative-agents directory (#5011)
    * Move workflow-samples and agent-samples under declarative-agents and update all references
    
    Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f70f7d19-9256-4eec-b7db-28007d74440c
    
    Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>
    
    * Fix relative paths in README files inside moved directories
    
    Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/f70f7d19-9256-4eec-b7db-28007d74440c
    
    Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
    Co-authored-by: sphenry <6749825+sphenry@users.noreply.github.com>
    Co-authored-by: Shawn Henry <shahen@microsoft.com>
  • Python: Fix SK migration samples (#5047)
    * Fix SK migration samples
    
    * Fix env vars for SK
    
    * Hard code model for sheel tool samples
  • Python: [BREAKING] Standardize model selection on model (#4999)
    * Refactor Anthropic model option and provider clients
    
    Rename the Anthropic client model option from model_id to model, add provider-specific Anthropic wrappers for Foundry, Bedrock, and Vertex, and expose them through the Anthropic, Foundry, Amazon, and Google namespaces. Update core option handling, docs, samples, and tests accordingly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Anthropic skills sample typing
    
    Cast the Anthropic beta client to Any in the skills sample so the pre-commit sample pyright check no longer fails on beta skills and files endpoints that are not exposed by the current SDK stubs.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * undo sample mypy
    
    * Retry CI after transient external failures
    
    Retrigger PR validation after an unrelated Copilot review workflow SAML failure and a transient external tau2 git fetch failure in the Windows Python test setup.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review feedback on model option merging
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address Anthropic compatibility review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * moved all to `model`
    
    * fixes for azure ai search
    
    * Python: standardize remaining sample env var names
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix foundry-local pyright compatibility
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updated env vars in cicd
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Enforce Foundry package unit test coverage (#5036)
    * Enforce Foundry package unit test coverage
    
    * Sort ENFORCED_TARGETS alphabetically in python-check-coverage.py
    
    Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/ed0b81ed-c267-4ee0-9655-56c4b3066fad
    
    Co-authored-by: TaoChenOSU <12570346+TaoChenOSU@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
    Co-authored-by: TaoChenOSU <12570346+TaoChenOSU@users.noreply.github.com>
  • Python: [BREAKING] Remove deprecated Python OpenAI/Azure AI surfaces (#4990)
    * [BREAKING] Remove deprecated Python OpenAI/Azure AI surfaces
    
    Also clean up follow-on docs, environment guidance, package metadata, and lab test stability.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix deleted semantic-kernel sample links
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * improve foundry language
    
    * Fix A2A Foundry sample regression
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Fix samples (#4980)
    * First samples 1st batch
    
    * Fix sample paths
    
    * Fix workflow samples
    
    * Fix workflow dependency
    
    * Correct env vars
    
    * Increase idle timeout
    
    * Fix workflows HIL sample
    
    * Fix more workflow samples
  • Python: [BREAKING] Remove deprecated kwargs compatibility paths (#4858)
    * [BREAKING] Remove deprecated kwargs compatibility paths
    
    Remove the deprecated kwargs compatibility shims across core agents, clients, tools, middleware, and telemetry.
    
    Keep workflow kwargs behavior intact in this branch and follow up separately in #4850.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix PR CI fallout for kwargs removal
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updates
    
    * Fix Azure AI CI fallout
    
    Remove the stale _get_current_conversation_id override from the Azure AI client after the OpenAI base helper was deleted.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fixed new classes
    
    * Fix Assistants deprecated import gating
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix integration replay regressions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Switch multi-agent hosting samples to Azure chat completions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Simplify Azure multi-agent sample config
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • [BREAKING] Python: fix OpenAI Azure routing and provider samples (#4925)
    * Python: fix OpenAI Azure routing and provider samples
    
    Prefer OpenAI when OPENAI_API_KEY is present unless Azure is explicitly requested. Clarify constructor docs, keep deprecated Azure wrappers compatible with stricter settings validation, and refresh the provider samples and tests to use the current client patterns.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix bandit
    
    * Python: align OpenAI embedding Azure routing
    
    Extend the shared OpenAI-vs-Azure routing and credential behavior to the embedding client, add Azure embedding regression coverage, and refresh the embedding samples to use the generic client path.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix embedding client pyright check
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: thin OpenAI embedding wrapper
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: document embedding overload routing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix callable OpenAI key routing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix Azure credential routing tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: address OpenAI review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: narrow Azure routing markers
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: refine OpenAI model fallback order
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: narrow Azure deployment docs
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: remove embedding routing wording
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: run embedding Azure integration tests
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * changed variable name
    
    * Python: expand OpenAI package README
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * clarified readme
    
    * Python: fix Azure OpenAI integration setup
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: correct Azure integration env mapping
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updated code to fix int tests
    
    * test updates
    
    * test fix
    
    * fix test setup
    
    * updates to tests and setup
    
    * remove openai assistants int tests
    
    * improvements in int tests
    
    * fix env var
    
    * fix env vars
    
    * fix azure responses test
    
    * trigger actions
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: [BREAKING] Python: Provider-leading client design & OpenAI package extraction (#4818)
    * Python: Provider-leading client design & OpenAI package extraction
    
    Major refactoring of the Python Agent Framework client architecture:
    
    - Extract OpenAI clients into new `agent-framework-openai` package
    - Core package no longer depends on openai, azure-identity, azure-ai-projects
    - Rename clients for discoverability: OpenAIResponsesClient → OpenAIChatClient,
      OpenAIChatClient → OpenAIChatCompletionClient
    - Unify `model_id`/`deployment_name`/`model_deployment_name` → `model` param
    - New FoundryChatClient for Azure AI Foundry Responses API
    - New FoundryAgent/FoundryAgentClient for connecting to pre-configured Foundry agents
    - Remove OpenAIBase/OpenAIConfigMixin from non-deprecated client MRO
    - Deprecate AzureOpenAI* clients, AzureAIClient, OpenAIAssistantsClient
    - Reorganize samples: azure_openai+azure_ai+azure_ai_agent → azure/
    - ADR-0020: Provider-Leading Client Design
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: missing Agent imports in samples, .model_id → .model in foundry_local sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: CI failures — mypy errors, coverage targets, sample imports
    
    - azure-ai mypy: add type ignores for TypedDict total=, model arg, forward ref
    - Coverage: replace core.azure/openai targets with openai package target
    - project_provider: add type annotation for opts dict
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: populate openai .pyi stub, fix broken README links, coverage targets
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fixes
    
    * updated observabilitty
    
    * reset azure init.pyi
    
    * fix errors
    
    * updated adr number
    
    * fix foundry local
    
    * fixed not renamed docstrings and comments, and added deprecated markers to old classes
    
    * fix tests and pyprojects
    
    * fix test vars
    
    * updated function tests
    
    * update durable
    
    * updated test setup for functions
    
    * Fix Foundry auth in workflow samples
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Stabilize Python integration workflows
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Update hosting samples for Foundry
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Trigger full CI rerun
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Trigger CI rerun again
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * trigger rerun
    
    * trigger rerun
    
    * fix for litellm
    
    * undo durabletask changes
    
    * Move Foundry APIs into foundry namespace
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Foundry pyproject formatting
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Split provider samples by Foundry surface
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Restore hosting sample requirements
    
    Also fix the Foundry Local sample link after the provider sample move.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updated tests
    
    * udpated foundry integration tests
    
    * removed dist from azurefunctions tests
    
    * Use separate Foundry clients for concurrent agents
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix client setup in azfunc and durable
    
    * disabled two tests
    
    * updated setup for some function and durable tests
    
    * improved azure openai setup with new clients
    
    * ignore deprecated
    
    * fixes
    
    * skip 11
    
    * remove openai assistants int tests
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Update sample validation scripts (#4870)
    * Update sample validation scripts
    
    * Adjust prompt
    
    * Update autogen-migration samples
    
    * Add fix suggestion
    
    * Split jobs
    
    * Add .env
    
    * Create trend report
    
    * Add timestamp
    
    * Add more env vars
    
    * Comments
    
    * force node24
    
    * force node24
    
    * force node22
  • Bump actions/download-artifact from 7 to 8 (#4372)
    Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8.
    - [Release notes](https://github.com/actions/download-artifact/releases)
    - [Commits](https://github.com/actions/download-artifact/compare/v7...v8)
    
    ---
    updated-dependencies:
    - dependency-name: actions/download-artifact
      dependency-version: '8'
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Update script to ping only on waiting-for-author label (#4812)
    * update script to ping only on certain waiting for author label
    
    * Update .github/scripts/stale_issue_pr_ping.py
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
    
    * Update .github/scripts/stale_issue_pr_ping.py
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
    
    * Fix docstring
    
    ---------
    
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
  • Add automated stale issue and PR follow-up ping workflow (#4776)
    * Add script to ping on stale issues/PRs
    
    * Add script to ping on stale issues/PRs
    
    * Fix stale issue/PR ping script review comments
    
    - Rename TEAM_NAME env var to TEAM_SLUG for clarity
    - Add actionable error messages for 403/404 team lookup failures
    - Add contents:read permission for actions/checkout
    - Use github.event.inputs context with fallback for scheduled runs
    - Pin PyGithub to 2.6.0 for reproducible builds
    - Fetch comments once in should_ping() to reduce API calls
    - Make ping() retry loop idempotent (track comment/label state)
    - Validate DAYS_THRESHOLD with helpful error for non-numeric input
    - Fix timezone bug: use astimezone() instead of replace(tzinfo=)
    - Add comprehensive unit tests (29 tests)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Simplify Python Poe tasks and unify package selectors (#4722)
    * updated automation tasks and commands, with alias for the time being
    
    * Restore aggregate test exclusions
    
    Preserve the legacy all-tests scope for test --all by excluding lab and devui from the default aggregate sweep, while still allowing explicit package selection. Also ignore hidden/generated test directories such as .mypy_cache during aggregate discovery.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updated versions in pre-commit
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Bump actions/upload-artifact from 4 to 7 (#4373)
    Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
    - [Release notes](https://github.com/actions/upload-artifact/releases)
    - [Commits](https://github.com/actions/upload-artifact/compare/v4...v7)
    
    ---
    updated-dependencies:
    - dependency-name: actions/upload-artifact
      dependency-version: '7'
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Bump MishaKav/pytest-coverage-comment from 1.2.0 to 1.6.0 (#4543)
    Bumps [MishaKav/pytest-coverage-comment](https://github.com/mishakav/pytest-coverage-comment) from 1.2.0 to 1.6.0.
    - [Release notes](https://github.com/mishakav/pytest-coverage-comment/releases)
    - [Changelog](https://github.com/MishaKav/pytest-coverage-comment/blob/main/CHANGELOG.md)
    - [Commits](https://github.com/mishakav/pytest-coverage-comment/compare/v1.2.0...v1.6.0)
    
    ---
    updated-dependencies:
    - dependency-name: MishaKav/pytest-coverage-comment
      dependency-version: 1.6.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Bump danielpalme/ReportGenerator-GitHub-Action from 5.5.1 to 5.5.3 (#4542)
    Bumps [danielpalme/ReportGenerator-GitHub-Action](https://github.com/danielpalme/reportgenerator-github-action) from 5.5.1 to 5.5.3.
    - [Release notes](https://github.com/danielpalme/reportgenerator-github-action/releases)
    - [Commits](https://github.com/danielpalme/reportgenerator-github-action/compare/5.5.1...5.5.3)
    
    ---
    updated-dependencies:
    - dependency-name: danielpalme/ReportGenerator-GitHub-Action
      dependency-version: 5.5.3
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Bump actions/setup-dotnet from 5.1.0 to 5.2.0 (#4541)
    Bumps [actions/setup-dotnet](https://github.com/actions/setup-dotnet) from 5.1.0 to 5.2.0.
    - [Release notes](https://github.com/actions/setup-dotnet/releases)
    - [Commits](https://github.com/actions/setup-dotnet/compare/v5.1.0...v5.2.0)
    
    ---
    updated-dependencies:
    - dependency-name: actions/setup-dotnet
      dependency-version: 5.2.0
      dependency-type: direct:production
      update-type: version-update:semver-minor
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  • Python: chore(python): improve dependency range automation (#4343)
    * chore(python): improve dependency range automation
    
    - tighten dependency bounds and coding standards guidance\n- add dependency range validation workflow, reporting, and issue automation\n- update related tests and dependency pins for compatibility
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * updated text and pyarrow
    
    * new lock
    
    * fixed workflow
    
    * updated deps
    
    * fix tiktoken
    
    * chore(python): refine dependency validation workflows
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(python): add high-level dependency validation comments
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * WIP
    
    * added additional comments and excludes
    
    * added dev dependency handling and workflow and updates to package ranges
    
    * added readme and simplified commands
    
    * fix markers
    
    * chore(python): address dependency review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Tighten dependency bounds, remove stale overrides, restore Python 3.10 support
    
    - Apply dependency bound policy across all packages: stable >=1.0 deps use
      >=floor,<next_major; pre-1.0/prerelease deps use validated hard-bounded ranges
    - Remove stale root tool.uv.override-dependencies (uvicorn, websockets, grpcio)
    - Lower github_copilot requires-python to >=3.10 with github-copilot-sdk gated
      behind python_version >= 3.11 marker; import raises ImportError on 3.10
    - Skip github_copilot pyright/mypy/test tasks on Python <3.11
    - Use version-conditional pyrightconfig for samples on Python 3.10
    - Add compatibility fix in core responses client for older openai typed dicts
    - Normalize uv.lock prerelease mode and refresh dev dependencies
    - Update CODING_STANDARD.md, DEV_SETUP.md, and package management skill docs
    
    Closes #902
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * small tweaks
    
    * add note in workflow
    
    * fix workflows and several versions
    
    * fix duplicate
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>