Commit Graph

2229 Commits

  • Fix pyupgrade and AGENTS.md reconnect description
    - pyupgrade: drop forward-reference string annotations in _mcp.py (Python 3.10+ resolves them natively now that MCPTaskOptions is defined before use).
    
    - AGENTS.md: align reconnect description with current behavior. Phase 1 (initial tools/call) does NOT retry on connection loss; raises 'connection lost; task state unknown' instead, so a server that accepted the request but lost the response cannot start the operation twice. Phase 2 (tasks/get / tasks/result) still reconnects once against the same task_id.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add MCP-based skills discovery (McpSkillsSource) (#6169)
    * Add MCP-based skills discovery (McpSkill, McpSkillsSource, McpSkillResource)
    
    Implement Agent Skills discovery over MCP following the SEP-2640 convention:
    - McpSkillsSource: reads skill://index.json to discover skills served by an MCP server
    - McpSkill: lazily fetches SKILL.md content via resources/read on demand
    - McpSkillResource: wraps MCP resource results (text and binary)
    - Path traversal protection in get_resource for defense in depth
    - Samples for Foundry Toolbox and standalone MCP skills server
    - Comprehensive unit tests (514 lines)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review comments: rename to MCP* convention, fix error handling and samples
    
    - Rename McpSkill/McpSkillResource/McpSkillsSource to MCPSkill/MCPSkillResource/MCPSkillsSource
    - Add data-URI prefix stripping for blob resource decoding
    - Let non-McpError exceptions propagate from get_resource()
    - Fix contradictory test comment
    - Use interactive input() in mcp_based_skill sample
    - Remove misleading sample output block
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Restore debug logging for McpError in get_resource()
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use AzureCliCredential in Foundry toolbox skills sample for consistency
    
    Replace DefaultAzureCredential with AzureCliCredential to match the
    credential convention used in all other samples.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use MCPStreamableHTTPTool in MCP skills sample
    
    Replace raw mcp library imports (ClientSession, streamable_http_client)
    with the framework's MCPStreamableHTTPTool to keep MCP server connections
    consistent regardless of whether skills are enabled.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Branch on McpError.error.code so only not-found errors return empty
    
    Previously _try_read_index() and get_resource() swallowed every McpError
    as 'no skills available', making auth failures, server crashes, and
    connection drops indistinguishable from a server that simply has no
    skills.
    
    Now only two codes are treated as not-found:
    - -32002 (MCP-spec Resource not found)
    - -32601 (METHOD_NOT_FOUND — server lacks resources/read)
    
    All other McpError codes and non-McpError exceptions propagate with a
    warning log, surfacing real failures visibly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add tests for non-McpError and non-not-found error propagation in MCP skills
    
    Cover the re-raise branch in MCPSkill.get_resource for plain
    ConnectionError/TimeoutError, the generic McpError (code 0) propagation
    on get_resource, and TimeoutError propagation in _try_read_index.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Revert "Use MCPStreamableHTTPTool in MCP skills sample"
    
    This reverts commit f31ed0ded9.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Introduce MCP_SKILLS experimental feature for MCP skill classes
    
    Add a separate MCP_SKILLS feature ID to ExperimentalFeature enum and
    use it for MCPSkillResource, MCPSkill, and MCPSkillsSource, since their
    promotion timeline is partly outside of our control.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Bug fixes for AGUI hosting and workflows (#6311)
    * Add mcp tool execution fix
    
    * Apply IsolationKeyScopedAgentSessionStore to MapAGUI by default if not yet set and improve comments in samples
    
    * Address PR comments
    
    * Fix formatting
  • .NET: Add ILoggerFactory and IServiceProvider to HarnessAgent constructor (#6273)
    * Add ILoggerFactory and IServiceProvider to HarnessAgent constructor
    
    Add optional ILoggerFactory and IServiceProvider parameters to the
    HarnessAgent constructor and AsHarnessAgent extension method, passing
    them to all downstream components that accept them:
    
    - FunctionInvokingChatClient (via UseFunctionInvocation)
    - CompactionProvider
    - AgentSkillsProvider
    - ChatClientAgent (via BuildAIAgent)
    - AIAgentBuilder.Build()
    
    Closes #6103
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Improve tests to verify ILoggerFactory and IServiceProvider propagation
    
    - Add test verifying ILoggerFactory.CreateLogger() is called by
      downstream components (CompactionProvider, AgentSkillsProvider)
    - Add test verifying IServiceProvider is queried during pipeline build
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: progressive tool exposure via FunctionInvocationContext (#6233)
    * Python: progressive tool exposure via FunctionInvocationContext
    
    Add first-class progressive tool exposure to the Python core function-calling
    loop. Tools can now add or remove real FunctionTool schemas at runtime via the
    injected FunctionInvocationContext, taking effect on the next iteration of the
    loop.
    
    - FunctionInvocationContext gains a live `tools` list plus experimental
      `add_tools()` / `remove_tools()` helpers (feature: PROGRESSIVE_TOOLS).
    - The function-calling loop establishes a run-local, normalized tools list and
      threads it into the context at both invocation paths so mutations propagate.
    - Add a sample (dynamic_tool_exposure.py) and a tools samples README, including
      a note that CodeAct providers (Monty/Hyperlight) use their own provider-level
      tool management instead.
    
    Supersedes #3877.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Validate non-negative input in dynamic_tool_exposure sample tools
    
    Address review feedback: factorial and fibonacci now return an error
    message for negative n instead of producing incorrect results.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Make add_tools atomic and surface swallowed function errors
    
    Address review feedback on progressive tool exposure:
    
    - add_tools now validates the full batch against a throwaway copy before
      committing, so a duplicate-name clash partway through a sequence leaves
      the live tool list unchanged (all-or-nothing).
    - _auto_invoke_function now logs a warning (with traceback) when a tool
      raises, so contract errors such as a duplicate-name ValueError from
      add_tools are debuggable without enabling include_detailed_errors.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Avoid retaining tracebacks when logging swallowed function errors
    
    Logging with exc_info=exc fed the exception traceback to the logging
    machinery, whose frame references created reference cycles collected
    lazily by the cyclic GC. On Windows that could drop a hyperlight
    WasmSandbox on a non-owning thread ("unsendable, dropped on another
    thread"), crashing the xdist worker. Log a pre-formatted message with
    the exception repr instead, so no traceback object is retained.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * added missing decorator
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Promote agent-framework-declarative package to RC (#6256)
    * Promote agent-framework-declarative package to RC
    
    * Update missed package status file.
  • Python: Fix FoundryAgent stripping model from PromptAgent requests (#5526)
    * Fix FoundryAgent stripping model from PromptAgent requests
    
    Move run_options.pop('model', None) inside the _uses_foundry_agent_session()
    conditional so that model is only stripped for hosted agent sessions (where
    the server manages the model) and preserved for PromptAgent requests that
    require it in the Responses API call.
    
    Fixes #5525
    
    * test: add coverage for resp_* continuation preserving model
    
    Adds test_raw_foundry_agent_chat_client_prepare_options_preserves_model_for_resp_continuation
    to explicitly verify that HostedAgent v1 / v2-no-session paths (where conversation_id
    starts with resp_) preserve model and previous_response_id without triggering the
    hosted-session gate.
    
    ---------
    
    Co-authored-by: Benke Qu <bequ@microsoft.com>
    Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
  • .NET: Promote Workflows.Declarative packages to stable versions (#6254)
    * Promote Workflows.Declarative packages to stable versions
    
    * Address PR feedback: enable package validation on GA declarative packages
    
    Both Workflows.Declarative and Workflows.Declarative.Mcp set IsReleased=true
    
    but were disabling package validation, bypassing the repo's GA convention
    
    (see dotnet/nuget/nuget-package.props which auto-enables validation when
    
    IsReleased=true).
    
    Re-enable validation by removing the local EnablePackageValidation=false
    
    overrides and pointing PackageValidationBaselineVersion at 1.8.0-rc1 (the
    
    latest published version of each package). This catches accidental breaking
    
    changes between RC and the first GA. Future GAs should bump the baseline to
    
    the previous GA version.
    
    Verified locally: dotnet build -c Release on both projects runs
    
    RunPackageValidation -> APICompat ran successfully without finding any
    
    breaking changes.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Update statement for the baseline validation.
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append (#5913)
    * Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append
    
    Per the OTel spec, OTEL_EXPORTER_OTLP_ENDPOINT is a *base* URL for HTTP —
    the SDK auto-appends /v1/traces, /v1/metrics, /v1/logs when it reads the
    env var directly. Signal-specific endpoint env vars are *full* URLs used
    verbatim.
    
    _get_exporters_from_env read the base endpoint and forwarded it as the
    constructor ``endpoint=`` argument, which the SDK always treats as a full
    signal URL. As a result, with OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
    and HTTP protocol, the exporter sent to http://localhost:4318 instead of
    http://localhost:4318/v1/traces (and likewise for metrics/logs).
    
    Replicate the spec's auto-append here when falling back to the base
    endpoint under HTTP. gRPC behavior is unchanged.
    
    * Python: Fix mypy type errors in OTLP endpoint assignment
    
    Pre-declare traces_endpoint, metrics_endpoint, logs_endpoint as
    str | None before the if/else block. Mypy inferred str from the
    if-branch f-string assignments and then rejected the str | None
    expressions in the else-branch as incompatible.
  • .NET: Add Hosted-ToolboxMcpSkills sample (#6175)
    * .NET: Add Hosted-ToolboxMcpSkills sample
    
    Adds a hosted Foundry Responses sample that discovers MCP-based skills from a Foundry Toolbox and makes them available to the agent via AgentSkillsProvider.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Align README and Program.cs default model to gpt-5
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Clarify MCP skills provider log to avoid implying eager discovery
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Drop redundant skills provider configured log
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add Foundry Toolbox Skills tag to manifest
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Simplify BearerTokenHandler by deriving from HttpClientHandler
    
    Removes the need for an explicit InnerHandler. Enables CheckCertificateRevocationList to satisfy CA5399.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • ci: harden Python test coverage workflow (#5982)
    Improve input handling and token management in the Python test coverage
    workflows.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Persist hosted MCP call/results as canonical mcp_call output (#6070)
    * Persist hosted MCP call/results as canonical mcp_call output
    
    - Preserve hosted MCP call/result pairs as canonical mcp_call output items
    
    - Coalesce MCP call + result in non-streaming conversion path
    
    - Keep call-id alignment for MCP tool call tracking and output mapping
    
    - Update tests and package metadata
    
    * Fix missing Mapping import in hosted responses adapter
    
    * Fix pyright unknown type in MCP output stringification
    
    * Fix typing for MCP output sequence iteration
    
    * Improve MCP output robustness and avoid eager flattening
    
    * Bump foundry_hosting to b7 and update responses dependency to b7
    
    * Restore foundry_hosting package version to 1.0.0a260521
    
    * Refactor hosted MCP output parsing
  • Python: feat(bedrock): implement native structured output support via Converse API (#6052)
    * feat(bedrock): add structured output support via Converse API (Fixes #5966)
    
    * fix(bedrock): improve unsupported model exception handling and schema parsing
    
    * refactor(bedrock): use generic traversal for strict schema enforcement
    
    * address Copilot review comments on structured output
    
    * refine bedrock structured output: guard additionalProperties, TypeError check, docs + test
    
    * fix(bedrock): widen response_format to Mapping and add missing test coverage
  • Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101)
    * Python: feat(evals): RubricScore type + EvalScoreResult.dimensions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(evals): parse rubric_scores from output items + assertion helpers
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): YAML config loader + sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix(evals): address PR review feedback
    
    Addresses 4 Copilot review comments on PR #6101:
    
    1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.
    
    2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.
    
    3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.
    
    4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): hosted-agent-aware rubric generation
    
    * Auto-detect hosted Foundry agents in agent_as_eval_source: when the
      agent's chat_client exposes a string agent_name (the convention used
      by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
      type='agent' EvalGenerationSource so the service fetches instructions
      and tools from the agent registry instead of relying on the local
      wrapper (which holds neither for hosted agents).
    * Add hosted_agent_version kwarg and a new agent_version field on
      EvalGenerationSource so PromptAgent runs can pin to a specific hosted
      version for reproducible rubric generation.
    * Add force_prompt_source escape hatch to bypass auto-detection and
      always emit a rendered prompt dossier - useful when the local wrapper
      carries overrides the service-side agent doesnt see.
    * Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
      not dataset_name=/dataset_version=. The mismatch would raise TypeError
      against the real azure-ai-projects 2.3.0a* SDK; only unmocked
      integration paths were affected.
    
    Tests cover: auto-detection happy path, versionless hosted agent,
    explicit hosted_agent_version forwarding, force_prompt_source override,
    non-string chat_client attrs (MagicMock test doubles) not mis-detected,
    agent_version forwarded through _to_sdk_source, and the corrected
    dataset SDK kwarg names.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry-evals): accept canonical dimension_scores key per docs
    
    The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry-evals): add manual create_rubric_evaluator
    
    Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).
    
    Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.
    
    Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * samples(foundry-evals): manual rubric sample + namespace re-exports
    
    Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.
    
    Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.
    
    Updates the foundry-evals README with a chooser entry for the two rubric paths.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry-evals): remove rubric creation flows; keep consumption only
    
    Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.
    
    Removed creation surface area:
    
    - FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.
    
    - EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).
    
    - BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.
    
    - Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.
    
    Kept (consumption surface):
    
    - GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.
    
    - RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.
    
    - _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).
    
    Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * samples(foundry-evals): add evaluate_with_rubric_sample
    
    Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry-evals): satisfy mypy on _fetch_output_items
    
    mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link
    
    The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * test(foundry-evals): hoist repeated local imports to module top
    
    Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Fix core observability unsafe serialization of function-call arguments containing dataclass/framework objects (#6026)
    * fix: safely serialize function-call arguments in core observability
    
    Apply make_json_safe() to content.arguments in _to_otel_part() before
    building the otel message dict, so that dataclass/framework payloads
    (e.g. workflow request_info events) do not cause a TypeError when
    _capture_messages() calls json.dumps().
    
    Lift make_json_safe() into agent_framework._serialization (no new
    external deps — dataclasses/datetime only) so the core observability
    path can use it without a dependency on the ag-ui adapter.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(core): safely serialize workflow request_info payloads in observability (#5733)
    
    - Add make_json_safe() helper to recursively convert non-serializable objects
    - Use make_json_safe() in _to_otel_part() for function_call arguments
    - Fix CustomPayload test class to use @dataclass (resolves B903 lint error)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(serialization): guard callability and normalize dict keys in make_json_safe (#5733)
    
    - Use callable(getattr(obj, method, None)) instead of hasattr() so that
      non-callable attributes named model_dump/to_dict/dict do not raise
      TypeError at runtime.
    - Wrap each call in try/except TypeError to handle callables with
      mandatory arguments gracefully.
    - Convert dict keys to str() so that non-string keys (e.g. datetime,
      int) cannot cause json.dumps to raise TypeError.
    - Add regression tests for both scenarios.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address observability serialization review feedback
    
    ---------
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Update hosted agents (#6243)
    * Updating to latest Foundry hosting packages.
    
    * Re-applying .gitignore.
    
    * Adding empty line at end of .gitignore
    
    ---------
    
    Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
  • .NET - Fix missing id on function_call_output in Foundry Hosting (#6246)
    * Fix missing id on function_call_output in Foundry Hosting
    
    The Foundry storage layer was rejecting responses with
    "ID cannot be null or empty (Parameter 'id')" because
    function_call_output items emitted by OutputConverter had no id on
    the wire.
    
    OutputItemFunctionToolCallOutput's public ctor only sets CallId and
    Output; Id is read-only and only the SDK's internal ctor populates
    it. OutputItemBuilder<T>.ApplyAutoStamps fills ResponseId and
    AgentReference but not Id, so the itemId passed to
    AddOutputItem<T>(itemId) was used only for event sequencing and the
    serialized item went out with id=null.
    
    Switch to stream.OutputItemFunctionCallOutput(callId, output), the
    SDK convenience method that uses the internal ctor and stamps the
    id. Add a regression test asserting the added/done events carry a
    non-empty matching Id.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * ci: free disk space and relocate NuGet cache on ubuntu runners
    
    The ubuntu-latest dotnet-build/test jobs were hitting No space left on device because the runner image only ships ~14 GB free on /. The full multi-TFM build plus the dotnet pack + console-app install-check exhausts that easily.
    
    Add a reusable composite action .github/actions/free-runner-disk-space that runs on Linux runners only and:
    
    * removes pre-installed toolchains we never use here (Android SDK, GHC/Haskell, CodeQL, PyPy, Ruby, Go, boost, vcpkg, etc.), prunes docker images, and disables swap (reclaims ~25-30 GB on /)
    
    * relocates the NuGet package cache to /mnt/nuget via NUGET_PACKAGES env, since /mnt has ~75 GB free on hosted runners
    
    Wire the action into the four ubuntu-touching jobs in dotnet-build-and-test.yml (dotnet-build, dotnet-test, dotnet-foundry-hosted-it, dotnet-test-functions). The action self-guards with runner.os == 'Linux' so the matrix legs that run on windows are unaffected.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: refresh dev dependencies and validate runtime bounds (#6238)
    Updates third-party dev dependencies across the Python workspace and
    validates that all runtime dependency bounds still hold at both ends.
    
    Dev dependency bumps (root, lab, declarative, durabletask):
    - uv 0.11.6 -> 0.11.17, ruff 0.15.8 -> 0.15.15,
      pytest-asyncio 1.3.0 -> 1.4.0, mcp 1.27.0 -> 1.27.2,
      azure-monitor-opentelemetry 1.8.7 -> 1.8.8,
      poethepoet 0.42.1 -> 0.46.0, prek 0.3.9 -> 0.4.3,
      types-python-dateutil and types-PyYaml stub bumps.
    - Transitive Dependabot items swept via lock: idna 3.11 -> 3.17,
      pip 26.0.1 -> 26.1.2.
    
    Deliberately excluded:
    - opentelemetry-sdk stays 1.40.0: azure-monitor-opentelemetry (incl.
      1.8.8) hard-pins opentelemetry-sdk==1.40.
    - mypy stays 1.20.0 and pyright stays 1.1.408: the 2.1.0 / 1.1.409
      bumps introduce new diagnostics that fail type checking and need
      dedicated PRs.
    - rich kept as a range: agentlightning (lab[lightning]) forces
      rich==13.9.4.
    
    Code/formatting changes driven by the ruff upgrade:
    - devui lifespan now uses try/finally so shutdown cleanup always runs
      (ruff RUF075).
    - Removed unused TYPE_CHECKING imports in core and foundry flagged by
      ruff 0.15.15.
    - Reapplied ruff 0.15.15 formatting to the files it changed.
    
    Validation: validate-dependency-bounds-test "*" passes (31/31 lower +
    31/31 upper); typing 62/62; lint 31/31; devui tests pass.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add background agent support to harness agent (#6155)
    * Add background agent support to harness agent
    
    * Address PR comments
  • Python: coalesce code interpreter history chunks (#5801)
    * fix: coalesce code interpreter history chunks
    
    * fix: narrow content item list types
    
    * fix: remove redundant content list casts
  • Fix integration test worker crashes in Azure Functions on Py3.13 (#4260)
    * Initial plan
    
    * Fix integration test worker crashes on Python 3.13
    
    Three changes to prevent pytest-xdist workers from crashing during
    Azure Functions integration tests:
    
    1. Add `start_new_session=True` to subprocess on Linux so signals
       (e.g. from test-timeout) cannot propagate between the func host
       and the xdist worker process.
    
    2. Add an overall 100-second budget to the fixture setup loop so
       the retry logic never exceeds the 120-second test timeout. When
       pytest-timeout's thread method fires during fixture setup and the
       thread doesn't respond, it calls os._exit() which kills the
       xdist worker – this is the root cause of the "Not properly
       terminated" crashes.
    
    3. Remove the `UV_PYTHON: "3.10"` workaround from both workflow
       files so integration tests actually run on Python 3.13.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Validate integration tests on Python 3.13
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Revert unintentional uv.lock dependency bumps
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Use time.monotonic() instead of time.time() for fixture budget timing
    
    Addresses review feedback: monotonic clock is immune to NTP/clock
    adjustments that could skew the budget enforcement.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Fix func worker segfault on Python 3.13 by redirecting worker to Python 3.12
    
    The Azure Functions Python worker crashes with SIGSEGV (exit code 139)
    on Python 3.13 due to protobuf C extension (google._upb) compatibility
    issues.  When the test runner uses Python >=3.13, the conftest now
    automatically finds a compatible Python 3.10-3.12 and sets
    languageWorkers__python__defaultExecutablePath so the func host uses
    it for the worker process.
    
    The CI setup action also ensures Python 3.12 is available on the
    runner, falling back to uv python install if the system doesn't have
    it.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Address code review: add path validation, clarify version range and config key format
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Run func worker natively on Python 3.13 by disabling dependency isolation
    
    Replace the Python 3.12 redirect workaround with the proper fix:
    set PYTHON_ISOLATE_WORKER_DEPENDENCIES=0 on Python >=3.13.
    
    The segfault (exit code 139) is caused by the Azure Functions worker's
    module isolation mechanism conflicting with protobuf's C extensions
    (google._upb) on Python 3.13.  Disabling isolation lets the worker
    load dependencies from the app's own environment, which avoids the
    crash while keeping everything running on Python 3.13.
    
    See: https://github.com/Azure/azure-functions-python-worker/issues/1797
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    Co-authored-by: Laveesh Rohra <larohra@microsoft.com>
  • Add community PR limit workflow (#6229)
    * Add community PR limit workflow
    
    * Address PR limit workflow review feedback
  • Python: Reorganize A2A samples and use package A2AExecutor (#6165)
    * Reorganize A2A samples: client demos in 02-agents, use package A2AExecutor
    
    - Move client samples (agent_with_a2a, a2a_agent_as_function_tools) to samples/02-agents/a2a/
    - Add new concept samples: polling, stream reconnection, protocol selection
    - Replace sample agent_executor.py with package-level A2AExecutor (stream=True)
    - Update 04-hosting/a2a to focus on server-side, point to 02-agents for clients
    - Add README.md for the new 02-agents/a2a/ sample collection
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix streaming artifact coalescing and address PR review feedback
    
    A2AExecutor fix:
    - Generate a stable artifact_id per stream in _run_stream so all streaming
      chunks share the same ID, enabling proper append=True coalescing per the
      A2A spec (TaskArtifactUpdateEvent with same artifactId).
    - Previously, item.message_id was None for OpenAI/Foundry streaming updates,
      causing the SDK to generate a new random UUID per token (100+ separate
      artifacts instead of 1 appended artifact).
    
    Sample improvements:
    - Replace join workaround with response.text now that coalescing works
    - Add background=True to stream reconnection resume call (required for
      continuation token emission on in-progress tasks)
    - Fix type ignore specificity in polling sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: Preserve and propagate CreatedAt through workflows (#3930)
    * Preserve per-message CreatedAt attribute if it's available
    
    * Add unit test
    
    ---------
    
    Co-authored-by: Sam Chang <changsam@microsoft.com>
    Co-authored-by: samchang-msft <samchang.msft@gmail.com>
  • .NET: Forward Magentic participant replies to manager (#6156)
    MagenticOrchestrator.TakeTurnAsync dropped the `messages` parameter
    on subsequent turns, so participant replies never reached the manager's
    ChatHistory. The manager kept re-dispatching the same speaker every
    round until MaxRounds.
    
    Append the incoming messages to taskContext.ChatHistory before running
    the coordination round (matches Python's _handle_response).
    
    Adds RecordingReplayAgent + regression test that asserts the worker's
    reply reaches round-2's progress-ledger call.
    
    Co-authored-by: Jacob Alber <jaalber@microsoft.com>
  • Bump Azure.AI.AgentServer.* packages and align Azure.Core/System.ClientModel (#6178)
    * Bump Azure.AI.AgentServer.* package versions
    
    * Align Azure.Core/System.ClientModel to AgentServer transitive deps
    
    Bump Azure.Core 1.55->1.56 and System.ClientModel 1.11->1.12 to match Azure.AI.AgentServer.* requirements, and add explicit references in transitive-pinning-off Foundry consumers to avoid CS1705/MSB3277 version conflicts.
  • .NET: Fix InvokeMcpTool approval path for declarative workflows (#6177)
    * Fix InvokeMcpTool approval path for declarative workflows
    
    * Added more test for coverage.
  • .NET: Quarantine flaky DevUI test (#6159)
    * Bump Microsoft.Extensions.AI packages to 10.6.0
    
    * Align transitive package versions for Microsoft.Extensions.AI 10.6.0
    
    * Initial plan
    
    * Temporarily skip flaky DevUI keyed/default workflow test
    
    * Revert Microsoft.Extensions.AI package bumps, keep only flaky test quarantine
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
    Co-authored-by: Roger Barreto <19890735+rogerbarreto@users.noreply.github.com>
  • Python: [A2A] Set message_id on AgentResponseUpdate for message-bearing paths (#6163)
    Map A2A protocol message_id to AgentResponseUpdate.message_id in two paths
    where it was previously omitted, aligning with .NET behavior:
    
    1. Standalone A2AMessage: set message_id=msg.message_id (matches .NET
       ConvertToAgentResponseUpdate(Message) which sets both ResponseId and
       MessageId to message.MessageId)
    
    2. TaskStatusUpdateEvent (terminal/input_required): set
       message_id=message.message_id (matches .NET which sets
       MessageId=statusUpdateEvent.Status.Message?.MessageId)
    
    Fixes #5949
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: consolidate MCP reliability fixes (#6145)
    * Python: consolidate MCP reliability fixes
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix MCP cleanup and metadata typing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Satisfy MCP metadata mypy typing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Pyright metadata mapping type
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add Mistral AI embedding client package (#5480)
    * Python: Add Mistral AI embedding client package
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Address review feedback: fix dimensions check, sort embeddings by index, align docs
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Address review feedback: downgrade to alpha, remove integration tests - Change version to 1.0.0a260505 (alpha) - Update classifier to Development Status :: 3 - Alpha - Update PACKAGE_STATUS.md to alpha - Remove Mistral from integration test workflows (no API keys yet)
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Add samples directory for alpha package compliance Per python-package-management skill: alpha packages must include samples inside the package directory.
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Fix ruff formatting in sample file
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    ---------
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
  • .NET: Workflow Outputs Overhaul: Support Tagging, Filtering Agent Outputs (#6045)
    * test: reshuffle .NET Workflow tests in preparation for Outputs overhaul
    
    Phase 1 of the .NET Workflows outputs overhaul (see
    working/implementation-plan.md). Pure moves/renames in
    dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests; no production code
    changes, no new test cases. The split keeps each orchestration mode in
    its own source file so the upcoming tag-aware and orchestration-default
    test additions land on clean diffs.
    
    Renames:
    * WorkflowBuilderSmokeTests.cs -> WorkflowBuilderTests.cs (with class
      rename to match). The scope is no longer "smoke"-only once subsequent
      phases add tag-aware builder tests.
    * InputWaiterAndOutputFilterTests.cs -> InputWaiterTests.cs +
      OutputFilterTests.cs. The file already declared the two test classes
      separately; this split simply gives each its own file so the
      output-filter cases have a dedicated home for tag-aware additions.
    
    Split of AgentWorkflowBuilderTests.cs:
    * AgentWorkflowBuilderTests.cs is now the outer
      `public static partial class AgentWorkflowBuilderTests` holding the
      shared test helpers (DoubleEchoAgent + session + WithBarrier variant,
      WorkflowRunResult, RunWorkflow* methods) bumped from `private` to
      `internal` so the new top-level GroupChatWorkflowBuilderTests in the
      same assembly can reach them.
    * AgentWorkflowBuilder.SequentialTests.cs (nested SequentialTests):
      BuildSequential_InvalidArguments_Throws,
      BuildSequential_AgentsRunInOrderAsync.
    * AgentWorkflowBuilder.ConcurrentTests.cs (nested ConcurrentTests):
      BuildConcurrent_InvalidArguments_Throws,
      BuildConcurrent_AgentsRunInParallelAsync.
    
    Sequential and Concurrent are kept as nested classes because they're
    modes of the same `AgentWorkflowBuilder` static factory and do not
    produce dedicated builder types.
    
    New file:
    * GroupChatWorkflowBuilderTests.cs (top-level): the existing
      BuildGroupChat_* and GroupChatManager_* cases moved out of the old
      AgentWorkflowBuilderTests file. They exercise the
      `GroupChatWorkflowBuilder` type (returned by
      `AgentWorkflowBuilder.CreateGroupChatBuilderWith`), so a dedicated
      top-level test class - matching the convention reserved by the plan
      for HandoffWorkflowBuilderTests / MagenticWorkflowBuilderTests - is
      the right home. Cross-class helper references qualify with
      `AgentWorkflowBuilderTests.DoubleEchoAgent` and
      `AgentWorkflowBuilderTests.RunWorkflowAsync`.
    
    The outer partial class is `static` (and nested classes carry the
    instance test methods) because the outer holds only static helpers;
    this satisfies CA1052 without suppressions and is invisible to xUnit
    discovery, which finds tests on the nested classes as
    `AgentWorkflowBuilderTests.SequentialTests.*` etc.
    
    Validation: `dotnet build` clean on both target frameworks; all 547
    tests in Microsoft.Agents.AI.Workflows.UnitTests pass on net10.0.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: introduce OutputTag, Futures, and tag-aware WorkflowBuilder API
    
    Phase 2 of the .NET Workflows outputs overhaul. Additive code change
    only - no observable runtime behavior change. The runner still uses the
    legacy bypass for AgentResponse / AgentResponseUpdate payloads, and the
    new `Futures.EnableAgentResponseOutputTaggingAndFiltering` flag defaults
    to false. Phase 3 will wire the flag into the runner; this commit only
    introduces the types and the builder API.
    
    New public surface:
    * `OutputTag` (readonly struct): wraps a string Value with ordinal
      equality (IEquatable, GetHashCode, == / !=) so it can participate as a
      HashSet element. Internal ctor closes the set. One public singleton:
      `OutputTag.Intermediate`. Terminal / regular outputs carry no tag
      (empty Tags set). JSON-serialized as a bare string via
      [JsonConverter(typeof(OutputTagJsonConverter))], with the converter
      rehydrating to the well-known singleton on read.
    * `Futures` (static class): hosts opt-in pre-GA behavior switches.
      First flag is `EnableAgentResponseOutputTaggingAndFiltering`; XML doc
      captures the v2.0.0 obsoletion / v3.0.0 removal lifecycle.
    * `WorkflowOutputEvent.Tags`: `HashSet<OutputTag>` exposed directly
      (concrete collection, matches the JSON-serialization convention used
      for `WorkflowInfo.OutputExecutorIds`). Never null; empty for legacy /
      terminal events. New ctors take a single `OutputTag` or
      `IEnumerable<OutputTag>?`; the existing (data, executorId) ctor
      remains and produces an untagged event. `HasTag(OutputTag)` helper.
      `AgentResponseEvent` and `AgentResponseUpdateEvent` gain matching
      tag-accepting ctors forwarding to the base.
    * `WorkflowOutputEventExtensions.IsIntermediate(this WorkflowOutputEvent)`:
      extension method returning `evt.HasTag(OutputTag.Intermediate)`. The
      preferred way to ask "is this an intermediate output?" without
      reaching into the Tags set.
    * `WorkflowBuilder.WithOutputFrom(IEnumerable<ExecutorBinding>, OutputTag)`
      and `WorkflowBuilder.WithOutputFrom(ExecutorBinding, OutputTag)`:
      forward-looking tagged overloads. The IEnumerable form is the primary
      tagged surface; the single-executor form is a convenience for the
      common one-executor case. Currently usable for the
      `OutputTag.Intermediate` singleton; will become the primary surface
      once the `OutputTag` constructor is opened to user-defined tags in
      a future release. Callers in this release should prefer the
      intent-specific `WithIntermediateOutputFrom` extension for the
      intermediate case. Tags accumulate across repeated calls; same tag
      repeated dedupes via the HashSet.
    * `WorkflowBuilderExtensions.WithIntermediateOutputFrom(this WorkflowBuilder, IEnumerable<ExecutorBinding>)`:
      helper that forwards to `WithOutputFrom(executors, OutputTag.Intermediate)`.
      Takes an IEnumerable (matching the tagged WithOutputFrom shape) -
      callers pass collection literals: `builder.WithIntermediateOutputFrom([a, b])`.
      XML doc remarks call out the Futures-flag interaction and the
      AIAgent-payload forwarding contract.
    
    Internal shape changes:
    * `WorkflowBuilder._outputExecutors`: HashSet<string> -> Dictionary<
      string, HashSet<OutputTag>>. The value set is empty for executors
      designated only via the untagged WithOutputFrom; contains Intermediate
      (and possibly future tags) otherwise.
    * `Workflow.OutputExecutors`: HashSet<string> -> Dictionary<string,
      HashSet<OutputTag>>.
    * `OutputFilter.CanOutput`: `Contains(id)` -> `ContainsKey(id)`.
    * `WorkflowInfo.OutputExecutorIds`: HashSet<string> -> Dictionary<
      string, HashSet<OutputTag>>, with a custom JsonConverter that reads
      both the new map shape (`{id: ["intermediate", ...]}`) and the legacy
      array shape (`[id1, id2]`, where each id is treated as an untagged
      output). Always writes the map shape. IsMatch updated to compare
      per-id tag sets.
    
    Tests landing in this commit (per the test-with-feature principle):
    * `OutputTagTests.cs` (6 tests): KnownValues, EqualityIsOrdinalOnValue,
      DefaultStructValueIsDistinct (default(OutputTag) does not collide
      with the Intermediate singleton in a HashSet),
      GetHashCodeMatchesEquals, JsonConverter_RoundtripsValueAsString,
      ConstructorIsInternal (reflection-based assertion that the (string)
      ctor is `internal`).
    * `WorkflowBuilderTests.cs` adds 7 new tests pinning the builder
      API contract: RegistersWithEmptyTagSet, AddsIntermediateTag,
      MultipleExecutorsAllUntagged, ThenIntermediate_AccumulatesTags,
      RepeatedDedupes, OnlyRegistersWithoutPriorWithOutputFrom,
      TracksExecutorBinding.
    * `BackwardsCompatibility/JsonCheckpointSerializationTests.cs`
      (new folder + file, 5 tests): event-level ctor contract tests
      (single-tag, no-tag, multi-tag — the last with a custom tag);
      IsIntermediate() asserted; load-bearing JSON BC tests for
      `WorkflowInfo.OutputExecutorIds` -
      `WorkflowOutputExecutorsReadsLegacyArrayShape` (legacy ids map to
      empty tag sets) and `WorkflowOutputExecutorsWritesMapShape`.
    
    The plan's three JSON round-trip tests for `WorkflowOutputEvent.Tags`
    were dropped: `WorkflowEvent` is not currently a serialized checkpoint
    shape (see the comment in WorkflowsJsonUtilities.cs about events not
    being persisted), so there is no real back-compat surface to pin
    through JSON. They are substituted with in-process ctor/property
    round-trip tests that exercise the `Tags` / `HasTag` / `IsIntermediate`
    contract.
    
    Validation: full `Microsoft.Agents.AI.Workflows.UnitTests` suite runs
    green on net10.0 (565 passing, 0 failing). Core library builds clean
    on net472, netstandard2.0, net8.0, net9.0, and net10.0. Test project
    builds clean on net472 + net10.0.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: route AgentResponse(Update) through the output filter under a Futures flag
    
    `InProcessRunnerContext.YieldOutputAsync` historically special-cased AgentResponse and
    AgentResponseUpdate payloads: it built the typed event subclass and emitted it directly,
    bypassing the output filter. Rewrites the method so that:
    
    - When `Futures.EnableAgentResponseOutputTaggingAndFiltering` is `false` (the current
      default), AgentResponse(Update) keep the legacy bypass — emitted as
      AgentResponseEvent / AgentResponseUpdateEvent with no tags. Existing callers see no
      behavior change.
    - When the flag is `true`, AIAgent payloads flow through the output filter just like
      every other payload type: undesignated sources are dropped, and the emitted event
      carries the source's tag set (empty for terminal `WithOutputFrom`, `{Intermediate}`
      for `WithIntermediateOutputFrom`, the set union when both designations apply).
    
    Non-AIAgent (POCO) outputs also now carry the source's tag set on the emitted
    WorkflowOutputEvent unconditionally — additive, since no existing assertion inspected
    Tags. Subclass events (`AgentResponseEvent` / `AgentResponseUpdateEvent`) continue to
    be emitted under both modes so `switch (evt) { case AgentResponseEvent: ... }`
    consumer code keeps matching.
    
    Adds `OutputFilter.TryGetTags` as the tag-aware lookup used by the runner.
    `OutputFilter.CanOutput` is kept (still used by the existing sync tests in
    `OutputFilterTests.cs`).
    
    Tests
    -----
    - `Futures/Futures.AgentResponseOutputFilteringAndTaggingTests.cs` (new): the F1–F13
      matrix from the plan, covering every combination of `(flag on/off) × (designation)
      × (payload shape)`. Uses a `FuturesScope` IDisposable + a `FuturesSerial` xUnit
      collection (DisableParallelization = true) to keep the process-global flag from
      leaking across parallel tests.
    - `OutputFilterTests.cs`: four new `Test_OutputFilter_…` cases for the `TryGetTags`
      surface (empty-tag-set for terminal designation, `{Intermediate}` for intermediate
      designation, union for accumulated designation, `false` for unregistered).
    
    582/582 unit tests pass on net10.0 (565 baseline + 17 new).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: tag-aware defaults and designation API on orchestration builders
    
    Aligns the .NET orchestration builders with Python's output / intermediate-output
    distinction. Each builder either applies a Python-aligned default designation set or
    replays the user's explicit `WithOutputFrom` / `WithIntermediateOutputFrom` calls,
    never both.
    
    Static `AgentWorkflowBuilder.BuildSequential` / `BuildConcurrent` apply defaults
    unconditionally (no user-facing fluent surface to take control through):
    
    - Sequential: terminal `end` + every agent designated intermediate.
    - Concurrent: terminal `end` + every agent and per-agent accumulator designated
      intermediate.
    
    The three fluent instance builders memoize agent-typed designation calls in a
    `Dictionary<AIAgent, HashSet<OutputTag>>` (empty set = terminal-only, non-empty =
    intermediate tag(s)) so repeated calls dedupe naturally. They replay the entries
    at `Build()` time, suppressing defaults when any call has been made:
    
    - `HandoffWorkflowBuilder` / `HandoffWorkflowBuilderCore<TBuilder>` (also picked up
      by the obsolete `HandoffsWorkflowBuilder` via inheritance).
      Default: terminal `HandoffEnd` + every handoff agent intermediate.
      (Bug fix: legacy code relied on `WithOutputFrom(end)` to bind `HandoffEnd`. The
      new explicit-designation path bypasses that, so `Build()` now calls
      `BindExecutor(end)` unconditionally to keep validation happy.)
    - `GroupChatWorkflowBuilder` — default: terminal host + every participant intermediate.
    - `MagenticWorkflowBuilder` — default: terminal orchestrator + every team member
      intermediate.
    
    Designating a non-participant agent throws `InvalidOperationException`.
    
    The bare `WorkflowBuilder` default is unchanged — only the orchestration-style
    builders gain implicit defaults, matching the plan's non-goal.
    
    Tests
    -----
    - `AgentWorkflowBuilder.SequentialTests` / `.ConcurrentTests`: one default-spec
      assertion each.
    - `GroupChatWorkflowBuilderTests`: defaults-match-spec, explicit-replaces-defaults,
      non-participant throws.
    - `HandoffWorkflowBuilderTests` (new file): same three.
    - `MagenticWorkflowBuilderTests` (new file): same three.
    
    593/593 unit tests pass on net10.0 (582 baseline + 11 new).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: WorkflowHostAgent forwards AgentResponseEvent unconditionally under Futures-on
    
    Aligns the .NET Workflow-as-Agent surface with Python `as_agent`. Under
    `Futures.EnableAgentResponseOutputTaggingAndFiltering = true`,
    `WorkflowSession.InvokeStageAsync` now forwards `AgentResponseEvent`
    unconditionally — joining `AgentResponseUpdateEvent` in ignoring the host's
    `includeWorkflowOutputsInResponse` switch. That switch keeps governing the
    generic `WorkflowOutputEvent` path for non-AIAgent payloads, where it is
    further short-circuited by an `IsIntermediate()` check (tagged intermediate
    outputs always surface).
    
    Under Futures-off the legacy asymmetry is preserved: `AgentResponseUpdateEvent`
    always forwarded, `AgentResponseEvent` gated by `includeWorkflowOutputsInResponse`.
    
    Back-compat: with `Futures.EnableAgentResponseOutputTaggingAndFiltering` left at
    its default `false`, observable behavior is identical to before.
    
    `Futures` documentation gains a remark explaining the `Workflow.AsAIAgent()`
    interaction in both flag states.
    
    Runner fix
    ----------
    `InProcessRunnerContext.YieldOutputAsync` now skips `Executor.CanOutput` for
    AgentResponse-shaped payloads under both Futures branches. `AIAgentHostExecutor`
    doesn't declare AgentResponse(Update) in its `Yields` set, so the historical
    legacy bypass had silently skipped the check; Phase 3's Futures-on path was
    running it and would reject AIAgent payloads. AIAgent-shaped payloads are now
    always a valid output shape, matching the legacy bypass semantics.
    
    Phase 4 follow-on
    -----------------
    Switched the three orchestration-builder designation-replay loops to iterate
    `Dictionary.Keys` with a value lookup instead of constructing/destructuring
    `KeyValuePair<,>`. Cleaner shape and avoids the netstandard2.0 / net472
    `KeyValuePair<,>.Deconstruct` unavailability that surfaced when this branch
    multi-TFM-built.
    
    Tests
    -----
    `WorkflowHostSmokeTests.IntermediateForwarding` (new nested class, 6 tests):
    - intermediate AgentResponse forwarded past the include-outputs gate (Futures on)
    - terminal AgentResponse forwarded unconditionally (Futures on)
    - terminal AgentResponse gated by include flag (Futures off, legacy)
    - undesignated AIAgent executor emits no AgentResponseEvent under Futures-on
    - legacy bypass still emits AgentResponseEvent under Futures-off
    - intermediate tag is observable via `update.RawRepresentation`
    
    The class joins the `FuturesSerial` xUnit collection so the process-global flag
    is serialized against other Futures-toggling tests.
    
    599/599 unit tests pass on net10.0 (593 baseline + 6 new).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat: SequentialWorkflowBuilder and ConcurrentWorkflowBuilder, OrchestrationBuilderBase
    
    Promotes the Sequential and Concurrent orchestration shapes to first-class fluent
    builder classes, matching Handoff / GroupChat / Magentic. Users can call
    `WithOutputFrom(agents)` / `WithIntermediateOutputFrom(agents)` to control which
    agents are designated output / intermediate sources; when no designation call is
    made, the Python-aligned defaults apply (terminal aggregator output + every agent
    intermediate; Concurrent also tags per-agent accumulators).
    
    `AgentWorkflowBuilder.BuildSequential(...)` and `BuildConcurrent(...)` are kept
    and now delegate to the new builders; observable behavior unchanged. Five static
    factories now mirror each other:
    
    - `AgentWorkflowBuilder.CreateSequentialBuilderWith(params IEnumerable<AIAgent>)`
    - `AgentWorkflowBuilder.CreateConcurrentBuilderWith(params IEnumerable<AIAgent>)`
    - `AgentWorkflowBuilder.CreateHandoffBuilderWith(AIAgent)`        (already existed)
    - `AgentWorkflowBuilder.CreateGroupChatBuilderWith(Func<...>)`    (already existed)
    - `AgentWorkflowBuilder.CreateMagenticBuilderWith(AIAgent)`       (new)
    
    OrchestrationBuilderBase
    ------------------------
    New abstract `OrchestrationBuilderBase<TBuilder>` unifies the shared fluent
    surface across all five orchestration builders: `WithName`, `WithDescription`,
    `WithOutputFrom`, `WithIntermediateOutputFrom`, and the
    `ApplyOutputDesignations(builder, agentMap, kind, applyDefaults)` helper that
    either replays the user's designations or invokes the orchestration-specific
    defaults.
    
    Removes ~150 LOC of duplicated designation-management code from the four
    non-Handoff builders, plus the equivalent from `HandoffWorkflowBuilderCore`.
    
    Tests
    -----
    - New `SequentialWorkflowBuilderTests.cs` / `ConcurrentWorkflowBuilderTests.cs`
      (replace the old `AgentWorkflowBuilder.{Sequential,Concurrent}Tests.cs`
      nested-class files). Method names normalized to
      `Test_<BuilderType>_<Scenario>[Async]`.
    - Shared helpers (`DoubleEchoAgent`, `DoubleEchoAgentWithBarrier`,
      `WorkflowRunResult`, `RunWorkflow*`) moved from the old
      `AgentWorkflowBuilderTests` partial class into a new
      `OrchestrationTestHelpers` static class in `OrchestrationTestHelpers.cs`.
      Downstream test files (Group Chat, Handoff, Sequential, Concurrent) updated
      to qualify with `OrchestrationTestHelpers.*`.
    - A new `AgentWorkflowBuilderTests.cs` covers the static surface directly:
      `BuildSequential` / `BuildConcurrent` invariants and aggregator wiring, plus
      null-rejection + round-trip checks for every `Create*BuilderWith` factory.
    - New AsAgent intermediate-suppression tests on a nested `AsAgentForwarding`
      class for each of Sequential and Concurrent: build with only the terminal
      agent designated via `WithOutputFrom`, run via `AsAIAgent(...)`, assert via
      `AgentResponseUpdate.AuthorName` that intermediate agents do not surface.
      Both join the `FuturesSerial` collection.
    - New `Test_<Builder>_WithDescriptionPropagatesToWorkflow` smoke tests on
      Sequential and Concurrent (newly available via the base class).
    
    625/625 unit tests pass on net10.0.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * chore: dotnet format
    
    * fixup: encoding
    
    * fixup: charset
    
    * fixup: Updates for PR feedback
    
    * fixup: format
    
    * fixup: merge issue
    
    * Fix intermediate filtering on .AsAgent()
    
    * fix filter logic
    
    * fix: Revert logic change and add comments
    
    ---------
    
    Co-authored-by: Jacob Alber <jalber@lokitoth.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099)
    * Adding AgentFileStore and FileAccessProvider to support file ased operations for agents.
    
    * Address PR review feedback on FileAccessProvider
    
    - Probe symlinks on the unresolved candidate path so in-root symlinks
      cannot silently pass and out-of-root symlinks surface the correct
      error message.
    - Validate matching_lines elements in FileSearchResult.from_dict and
      raise a clean ValueError for non-mapping entries.
    - Cap search regex pattern length (256 chars) via a new
      _compile_search_regex helper to mitigate ReDoS, and surface the cap
      in the file_access_search_files tool description.
    - Skip non-UTF-8 files during filesystem search instead of aborting
      the entire directory walk.
    - Replace the module-scope trailing string in the data-processing
      sample with comments to avoid Ruff B018.
    - Remove the checked-in working/region_totals.md sample artifact so
      the save flow works from a clean checkout.
    - Expand the Windows stdout reconfiguration comment in task_runner.py
      for clarity.
    - Add tests for invalid/oversize regex, non-UTF-8 file search, and
      in-root symlink rejection.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix mypy redundant-cast in FileSearchResult.from_dict
    
    Use cast(list[object], ...) instead of cast(list[Any], ...) so the
    cast represents a real type change (lists are invariant) and is no
    longer flagged by mypy as redundant, while still satisfying pyright's
    reportUnknownVariableType. Matches the existing pattern in _memory.py.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Tighten path normalization and directory resolution in FileAccess
    
    - _normalize_relative_path now strips surrounding whitespace up front
      so leading/trailing spaces never leak into file segments, and
      rejects trailing path separators for file paths so 'foo/' is no
      longer silently coerced to 'foo'.
    - FileSystemAgentFileStore._resolve_safe_directory_path normalizes
      with is_directory=True and maps an empty normalized result to the
      root. This matches InMemoryAgentFileStore so whitespace-only
      directory inputs resolve to the root instead of raising.
    - Added tests for whitespace stripping, trailing-separator rejection,
      and whitespace-only directory listing on the filesystem store.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Harden FileAccess search and atomic save in store API
    
    - Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop.
    - Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store.
    - Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection.
    - Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`.
    - `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Python 3.10 timeout handling and add directory arg to list/search tools
    
    - Catch asyncio.TimeoutError in _run_search_with_timeout. In Python 3.10
      asyncio.wait_for raises asyncio.exceptions.TimeoutError, which is
      distinct from the builtin TimeoutError (the two were unified in 3.11).
      Catching the asyncio alias works on every supported version.
    - Add an optional directory parameter to file_access_list_files and
      file_access_search_files so agents can enumerate / scope searches to
      nested folders, not just the store root.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address FileAccess review feedback: case, errors, signal, TOCTOU
    
    - InMemoryAgentFileStore now stores (display_name, content) so list_files
      and search_files return the original-case names callers wrote, matching
      the behaviour of FileSystemAgentFileStore on case-preserving filesystems
      and removing the silent in-memory vs. on-disk contract divergence.
    - FileSystemAgentFileStore.read_file raises ValueError instead of letting
      UnicodeDecodeError bubble for binary / non-UTF-8 input, restoring
      symmetry with search_files (which still skips) and giving the tool
      layer a recoverable type to translate.
    - Tool wrappers now catch ValueError and OSError around every operation
      and surface them as readable strings, so 'you used ..' and 'the file
      already exists' are both reported to the model the same way instead of
      the former crashing out as an unhandled exception.
    - _search_files_sync logs per skipped non-UTF-8 file at WARNING and an
      aggregate INFO summary so operators can distinguish 'no matches' from
      'half the corpus was unreadable'.
    - FileSystemAgentFileStore softens its docstrings to acknowledge the
      inherent probe-then-open TOCTOU window. On POSIX both read and write
      now pass O_NOFOLLOW so the kernel refuses if the leaf segment becomes
      a symlink between the probe and the open. Windows has no equivalent
      flag; the limitation is documented.
    - Tests cover: case preservation on list/search, ValueError on non-UTF-8
      read at the store and tool layer, tool-layer string responses for
      path-traversal and oversized-regex inputs, search-skip log output,
      symlink rejection on delete/search/list, and symlinked intermediate
      directory rejection.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address FileAccess nit comments: docstrings, enumerate, opt-in delete approval
    
    - Expand FileSearchMatch/FileSearchResult.to_dict docstrings to explain why
      the override is needed (__slots__ defeats the mixin's __dict__ iteration)
      and why exclude/exclude_none are accepted-but-ignored (mixin signature
      compatibility for callers like to_json).
    - Use enumerate(lines, start=1) in _search_file_content so the +1 below is
      no longer needed; rename loop variable to line_number for clarity.
    - Add opt-in require_delete_approval: bool = False on FileAccessProvider.
      When True, file_access_delete_file is registered with approval_mode
      'always_require' so the host must approve every delete. Default False
      preserves current behaviour and matches the .NET reference, but
      deployments that want a safer-by-default posture can enable it.
    - Add tests covering both delete approval modes.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * FileAccess: require delete approval by default
    
    Flip the default for FileAccessProvider(require_delete_approval=...) from
    False to True so destructive deletes are gated by host approval out of the
    box. Callers that want the previous autonomous behaviour (which matches the
    .NET reference) can pass require_delete_approval=False.
    
    Tests updated accordingly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fixing linkinspector by installing Chrome for puppeteer first.
    
    ---------
    
    Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python A2A: Expose supported_protocol_bindings as configurable parameter (#6098)
    * Expose supported_protocol_bindings as configurable parameter on A2AAgent
    
    Add supported_protocol_bindings parameter to A2AAgent.__init__() allowing
    users to configure which A2A protocol bindings (JSONRPC, GRPC, HTTP+JSON)
    the client prefers when connecting to remote agents.
    
    - Defaults to ["JSONRPC"] matching current behavior
    - Passes through to ClientConfig for transport negotiation
    - Replaces 4 hardcoded references with the configurable value
    
    Closes #6057
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix empty list falsy trap and add fallback path test coverage
    
    - Use 'is not None' check instead of 'or' to preserve explicit empty list
    - Add test verifying empty list is not silently replaced with defaults
    - Add test verifying fallback path uses custom bindings
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Document known protocol binding values in docstring
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use Literal union for protocol binding type hint
    
    Provides IDE autocomplete for known values while keeping the type
    open for custom bindings (Literal is str at runtime).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: feat: Update GroupChatManager semantics to match other Orchestration patterns (#6140)
    * Refactor group chat workflow to prevent message echoing and enhance checkpointing
    
    - Updated GroupChatWorkflowBuilder to disable forwarding incoming messages to prevent duplicates.
    - Enhanced RoundRobinGroupChatManager with checkpointing support to preserve state across executions.
    - Modified GroupChatHost to maintain a history of messages and track the current speaker for message broadcasting.
    - Implemented broadcasting logic to ensure participants receive messages from others while excluding their own responses.
    - Added comprehensive unit tests for group chat orchestration, including scenarios for tool approval and function calls.
    - Introduced a new ApprovalHarness for testing tool invocation and approval workflows.
    
    * fixup: format
    
    * Add JSON serialization support for GroupChatManagerState and RoundRobinGroupChatManagerState
    
    ---------
    
    Co-authored-by: Jacob Alber <jalber@lokitoth.com>
  • .NET: [Breaking] Refactor AgentFileSkillsSource for depth-based discovery and predicate filters (#6109)
    * Refactor AgentFileSkillsSource to use filter predicates and add AgentFileSkillFilterContext
    
    - Replace hardcoded script/resource directory lists with configurable ScriptFilter and ResourceFilter predicates
    - Add AgentFileSkillFilterContext class to provide contextual file information to filter predicates
    - Replace MaxSearchDepth constant with configurable SearchDepth option
    - Update AgentFileSkillsSourceOptions with new filter and search depth properties
    - Update tests to reflect the new filtering approach
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Log '(none)' instead of empty string for missing file extensions in debug output
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • .NET: feat: Bring Handoff Orchestration to parity with Python (#6138)
    * feat: implement autonomous mode and termination conditions in handoff workflow
    
    * fixup: format
    
    * feat: enhance autonomous mode with per-agent configurations and add unit tests
    
    * fixup: remove empty file
    
    ---------
    
    Co-authored-by: Jacob Alber <jalber@lokitoth.com>