agent-framework

Address PR feedbacks

Peter Ibekwe · 2026-06-04 11:03:11 -07:00

012c135efd

Fix bandit nosec marker for CI pipeline

Peter Ibekwe · 2026-06-03 17:45:33 -07:00

6e016b1cfb

Fix pyupgrade and AGENTS.md reconnect description

- pyupgrade: drop forward-reference string annotations in _mcp.py (Python 3.10+ resolves them natively now that MCPTaskOptions is defined before use).

- AGENTS.md: align reconnect description with current behavior. Phase 1 (initial tools/call) does NOT retry on connection loss; raises 'connection lost; task state unknown' instead, so a server that accepted the request but lost the response cannot start the operation twice. Phase 2 (tasks/get / tasks/result) still reconnects once against the same task_id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Peter Ibekwe · 2026-06-03 17:34:21 -07:00

e1af7ea937

MCP long-running task support in Python

Peter Ibekwe · 2026-06-03 17:17:21 -07:00

5cd005f665

Python: Add MCP-based skills discovery (McpSkillsSource) (#6169 )

* Add MCP-based skills discovery (McpSkill, McpSkillsSource, McpSkillResource)

Implement Agent Skills discovery over MCP following the SEP-2640 convention:
- McpSkillsSource: reads skill://index.json to discover skills served by an MCP server
- McpSkill: lazily fetches SKILL.md content via resources/read on demand
- McpSkillResource: wraps MCP resource results (text and binary)
- Path traversal protection in get_resource for defense in depth
- Samples for Foundry Toolbox and standalone MCP skills server
- Comprehensive unit tests (514 lines)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review comments: rename to MCP* convention, fix error handling and samples

- Rename McpSkill/McpSkillResource/McpSkillsSource to MCPSkill/MCPSkillResource/MCPSkillsSource
- Add data-URI prefix stripping for blob resource decoding
- Let non-McpError exceptions propagate from get_resource()
- Fix contradictory test comment
- Use interactive input() in mcp_based_skill sample
- Remove misleading sample output block

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore debug logging for McpError in get_resource()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use AzureCliCredential in Foundry toolbox skills sample for consistency

Replace DefaultAzureCredential with AzureCliCredential to match the
credential convention used in all other samples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use MCPStreamableHTTPTool in MCP skills sample

Replace raw mcp library imports (ClientSession, streamable_http_client)
with the framework's MCPStreamableHTTPTool to keep MCP server connections
consistent regardless of whether skills are enabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Branch on McpError.error.code so only not-found errors return empty

Previously _try_read_index() and get_resource() swallowed every McpError
as 'no skills available', making auth failures, server crashes, and
connection drops indistinguishable from a server that simply has no
skills.

Now only two codes are treated as not-found:
- -32002 (MCP-spec Resource not found)
- -32601 (METHOD_NOT_FOUND — server lacks resources/read)

All other McpError codes and non-McpError exceptions propagate with a
warning log, surfacing real failures visibly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add tests for non-McpError and non-not-found error propagation in MCP skills

Cover the re-raise branch in MCPSkill.get_resource for plain
ConnectionError/TimeoutError, the generic McpError (code 0) propagation
on get_resource, and TimeoutError propagation in _try_read_index.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert "Use MCPStreamableHTTPTool in MCP skills sample"

This reverts commit f31ed0ded9.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Introduce MCP_SKILLS experimental feature for MCP skill classes

Add a separate MCP_SKILLS feature ID to ExperimentalFeature enum and
use it for MCPSkillResource, MCPSkill, and MCPSkillsSource, since their
promotion timeline is partly outside of our control.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

semenshi-m · 2026-06-03 18:09:50 +00:00

c6951c21f6

Python: progressive tool exposure via FunctionInvocationContext (#6233 )

* Python: progressive tool exposure via FunctionInvocationContext

Add first-class progressive tool exposure to the Python core function-calling
loop. Tools can now add or remove real FunctionTool schemas at runtime via the
injected FunctionInvocationContext, taking effect on the next iteration of the
loop.

- FunctionInvocationContext gains a live `tools` list plus experimental
  `add_tools()` / `remove_tools()` helpers (feature: PROGRESSIVE_TOOLS).
- The function-calling loop establishes a run-local, normalized tools list and
  threads it into the context at both invocation paths so mutations propagate.
- Add a sample (dynamic_tool_exposure.py) and a tools samples README, including
  a note that CodeAct providers (Monty/Hyperlight) use their own provider-level
  tool management instead.

Supersedes #3877.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Validate non-negative input in dynamic_tool_exposure sample tools

Address review feedback: factorial and fibonacci now return an error
message for negative n instead of producing incorrect results.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make add_tools atomic and surface swallowed function errors

Address review feedback on progressive tool exposure:

- add_tools now validates the full batch against a throwaway copy before
  committing, so a duplicate-name clash partway through a sequence leaves
  the live tool list unchanged (all-or-nothing).
- _auto_invoke_function now logs a warning (with traceback) when a tool
  raises, so contract errors such as a duplicate-name ValueError from
  add_tools are debuggable without enabling include_detailed_errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Avoid retaining tracebacks when logging swallowed function errors

Logging with exc_info=exc fed the exception traceback to the logging
machinery, whose frame references created reference cycles collected
lazily by the cyclic GC. On Windows that could drop a hyperlight
WasmSandbox on a non-owning thread ("unsendable, dropped on another
thread"), crashing the xdist worker. Log a pre-formatted message with
the exception repr instead, so no traceback object is retained.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* added missing decorator

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-03 09:01:07 +00:00

49a6e433a3

Python: Promote agent-framework-declarative package to RC (#6256 )

* Promote agent-framework-declarative package to RC

* Update missed package status file.

Peter Ibekwe · 2026-06-02 19:30:05 +00:00

6086a74302

Python: Fix FoundryAgent stripping model from PromptAgent requests (#5526 )

* Fix FoundryAgent stripping model from PromptAgent requests

Move run_options.pop('model', None) inside the _uses_foundry_agent_session()
conditional so that model is only stripped for hosted agent sessions (where
the server manages the model) and preserved for PromptAgent requests that
require it in the Responses API call.

Fixes #5525

* test: add coverage for resp_* continuation preserving model

Adds test_raw_foundry_agent_chat_client_prepare_options_preserves_model_for_resp_continuation
to explicitly verify that HostedAgent v1 / v2-no-session paths (where conversation_id
starts with resp_) preserve model and previous_response_id without triggering the
hosted-session gate.

---------

Co-authored-by: Benke Qu <bequ@microsoft.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>

Benke Qu · 2026-06-02 18:30:04 +00:00

fa8cfb7567

Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append (#5913 )

* Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append

Per the OTel spec, OTEL_EXPORTER_OTLP_ENDPOINT is a *base* URL for HTTP —
the SDK auto-appends /v1/traces, /v1/metrics, /v1/logs when it reads the
env var directly. Signal-specific endpoint env vars are *full* URLs used
verbatim.

_get_exporters_from_env read the base endpoint and forwarded it as the
constructor ``endpoint=`` argument, which the SDK always treats as a full
signal URL. As a result, with OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
and HTTP protocol, the exporter sent to http://localhost:4318 instead of
http://localhost:4318/v1/traces (and likewise for metrics/logs).

Replicate the spec's auto-append here when falling back to the base
endpoint under HTTP. gRPC behavior is unchanged.

* Python: Fix mypy type errors in OTLP endpoint assignment

Pre-declare traces_endpoint, metrics_endpoint, logs_endpoint as
str | None before the if/else block. Mypy inferred str from the
if-branch f-string assignments and then rejected the str | None
expressions in the else-branch as incompatible.

Dineshsuriya D · 2026-06-02 09:59:50 +00:00

a5f355e04a

Python: Persist hosted MCP call/results as canonical mcp_call output (#6070 )

* Persist hosted MCP call/results as canonical mcp_call output

- Preserve hosted MCP call/result pairs as canonical mcp_call output items

- Coalesce MCP call + result in non-streaming conversion path

- Keep call-id alignment for MCP tool call tracking and output mapping

- Update tests and package metadata

* Fix missing Mapping import in hosted responses adapter

* Fix pyright unknown type in MCP output stringification

* Fix typing for MCP output sequence iteration

* Improve MCP output robustness and avoid eager flattening

* Bump foundry_hosting to b7 and update responses dependency to b7

* Restore foundry_hosting package version to 1.0.0a260521

* Refactor hosted MCP output parsing

Hameed Kunkanoor · 2026-06-02 07:30:36 +00:00

043208241a

fix: skip orphan anthropic thinking signatures (#5784 )

Yufeng He · 2026-06-02 00:48:42 +00:00

05ebb966cf

Python: feat(bedrock): implement native structured output support via Converse API (#6052 )

* feat(bedrock): add structured output support via Converse API (Fixes #5966)

* fix(bedrock): improve unsupported model exception handling and schema parsing

* refactor(bedrock): use generic traversal for strict schema enforcement

* address Copilot review comments on structured output

* refine bedrock structured output: guard additionalProperties, TypeError check, docs + test

* fix(bedrock): widen response_format to Mapping and add missing test coverage

Thota Sai Karthik · 2026-06-01 23:30:19 +00:00

5d98beddf5

Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

* Python: feat(evals): RubricScore type + EvalScoreResult.dimensions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): parse rubric_scores from output items + assertion helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): YAML config loader + sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(evals): address PR review feedback

Addresses 4 Copilot review comments on PR #6101:

1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.

2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.

3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.

4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): hosted-agent-aware rubric generation

* Auto-detect hosted Foundry agents in agent_as_eval_source: when the
  agent's chat_client exposes a string agent_name (the convention used
  by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
  type='agent' EvalGenerationSource so the service fetches instructions
  and tools from the agent registry instead of relying on the local
  wrapper (which holds neither for hosted agents).
* Add hosted_agent_version kwarg and a new agent_version field on
  EvalGenerationSource so PromptAgent runs can pin to a specific hosted
  version for reproducible rubric generation.
* Add force_prompt_source escape hatch to bypass auto-detection and
  always emit a rendered prompt dossier - useful when the local wrapper
  carries overrides the service-side agent doesnt see.
* Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
  not dataset_name=/dataset_version=. The mismatch would raise TypeError
  against the real azure-ai-projects 2.3.0a* SDK; only unmocked
  integration paths were affected.

Tests cover: auto-detection happy path, versionless hosted agent,
explicit hosted_agent_version forwarding, force_prompt_source override,
non-string chat_client attrs (MagicMock test doubles) not mis-detected,
agent_version forwarded through _to_sdk_source, and the corrected
dataset SDK kwarg names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): accept canonical dimension_scores key per docs

The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): add manual create_rubric_evaluator

Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).

Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.

Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): manual rubric sample + namespace re-exports

Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.

Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.

Updates the foundry-evals README with a chooser entry for the two rubric paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): remove rubric creation flows; keep consumption only

Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.

Removed creation surface area:

- FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.

- EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).

- BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.

- Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.

Kept (consumption surface):

- GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.

- RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.

- _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).

Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): add evaluate_with_rubric_sample

Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): satisfy mypy on _fetch_output_items

mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link

The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry-evals): hoist repeated local imports to module top

Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ben Thomas · 2026-06-01 23:01:56 +00:00

e0d0ad16a0

Python: Fix core observability unsafe serialization of function-call arguments containing dataclass/framework objects (#6026 )

* fix: safely serialize function-call arguments in core observability

Apply make_json_safe() to content.arguments in _to_otel_part() before
building the otel message dict, so that dataclass/framework payloads
(e.g. workflow request_info events) do not cause a TypeError when
_capture_messages() calls json.dumps().

Lift make_json_safe() into agent_framework._serialization (no new
external deps — dataclasses/datetime only) so the core observability
path can use it without a dependency on the ag-ui adapter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(core): safely serialize workflow request_info payloads in observability (#5733)

- Add make_json_safe() helper to recursively convert non-serializable objects
- Use make_json_safe() in _to_otel_part() for function_call arguments
- Fix CustomPayload test class to use @dataclass (resolves B903 lint error)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(serialization): guard callability and normalize dict keys in make_json_safe (#5733)

- Use callable(getattr(obj, method, None)) instead of hasattr() so that
  non-callable attributes named model_dump/to_dict/dict do not raise
  TypeError at runtime.
- Wrap each call in try/except TypeError to handle callables with
  mandatory arguments gracefully.
- Convert dict keys to str() so that non-string keys (e.g. datetime,
  int) cannot cause json.dumps to raise TypeError.
- Add regression tests for both scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address observability serialization review feedback

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-06-01 21:41:52 +00:00

f36096ce1a

Python: refresh dev dependencies and validate runtime bounds (#6238 )

Updates third-party dev dependencies across the Python workspace and
validates that all runtime dependency bounds still hold at both ends.

Dev dependency bumps (root, lab, declarative, durabletask):
- uv 0.11.6 -> 0.11.17, ruff 0.15.8 -> 0.15.15,
  pytest-asyncio 1.3.0 -> 1.4.0, mcp 1.27.0 -> 1.27.2,
  azure-monitor-opentelemetry 1.8.7 -> 1.8.8,
  poethepoet 0.42.1 -> 0.46.0, prek 0.3.9 -> 0.4.3,
  types-python-dateutil and types-PyYaml stub bumps.
- Transitive Dependabot items swept via lock: idna 3.11 -> 3.17,
  pip 26.0.1 -> 26.1.2.

Deliberately excluded:
- opentelemetry-sdk stays 1.40.0: azure-monitor-opentelemetry (incl.
  1.8.8) hard-pins opentelemetry-sdk==1.40.
- mypy stays 1.20.0 and pyright stays 1.1.408: the 2.1.0 / 1.1.409
  bumps introduce new diagnostics that fail type checking and need
  dedicated PRs.
- rich kept as a range: agentlightning (lab[lightning]) forces
  rich==13.9.4.

Code/formatting changes driven by the ruff upgrade:
- devui lifespan now uses try/finally so shutdown cleanup always runs
  (ruff RUF075).
- Removed unused TYPE_CHECKING imports in core and foundry flagged by
  ruff 0.15.15.
- Reapplied ruff 0.15.15 formatting to the files it changed.

Validation: validate-dependency-bounds-test "*" passes (31/31 lower +
31/31 upper); typing 62/62; lint 31/31; devui tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-01 17:53:56 +00:00

8091d052d8

Python: Add background agent support to harness agent (#6155 )

* Add background agent support to harness agent

* Address PR comments

westey · 2026-06-01 17:20:39 +00:00

52a8045bb6

Python: coalesce code interpreter history chunks (#5801 )

* fix: coalesce code interpreter history chunks

* fix: narrow content item list types

* fix: remove redundant content list casts

Yufeng He · 2026-06-01 13:26:20 +00:00

78d175a1e2

Fix integration test worker crashes in Azure Functions on Py3.13 (#4260 )

* Initial plan

* Fix integration test worker crashes on Python 3.13

Three changes to prevent pytest-xdist workers from crashing during
Azure Functions integration tests:

1. Add `start_new_session=True` to subprocess on Linux so signals
   (e.g. from test-timeout) cannot propagate between the func host
   and the xdist worker process.

2. Add an overall 100-second budget to the fixture setup loop so
   the retry logic never exceeds the 120-second test timeout. When
   pytest-timeout's thread method fires during fixture setup and the
   thread doesn't respond, it calls os._exit() which kills the
   xdist worker – this is the root cause of the "Not properly
   terminated" crashes.

3. Remove the `UV_PYTHON: "3.10"` workaround from both workflow
   files so integration tests actually run on Python 3.13.

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Validate integration tests on Python 3.13

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Revert unintentional uv.lock dependency bumps

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Use time.monotonic() instead of time.time() for fixture budget timing

Addresses review feedback: monotonic clock is immune to NTP/clock
adjustments that could skew the budget enforcement.

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Fix func worker segfault on Python 3.13 by redirecting worker to Python 3.12

The Azure Functions Python worker crashes with SIGSEGV (exit code 139)
on Python 3.13 due to protobuf C extension (google._upb) compatibility
issues.  When the test runner uses Python >=3.13, the conftest now
automatically finds a compatible Python 3.10-3.12 and sets
languageWorkers__python__defaultExecutablePath so the func host uses
it for the worker process.

The CI setup action also ensures Python 3.12 is available on the
runner, falling back to uv python install if the system doesn't have
it.

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Address code review: add path validation, clarify version range and config key format

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

* Run func worker natively on Python 3.13 by disabling dependency isolation

Replace the Python 3.12 redirect workaround with the proper fix:
set PYTHON_ISOLATE_WORKER_DEPENDENCIES=0 on Python >=3.13.

The segfault (exit code 139) is caused by the Azure Functions worker's
module isolation mechanism conflicting with protobuf's C extensions
(google._upb) on Python 3.13.  Disabling isolation lets the worker
load dependencies from the app's own environment, which avoids the
crash while keeping everything running on Python 3.13.

See: https://github.com/Azure/azure-functions-python-worker/issues/1797

Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Co-authored-by: Laveesh Rohra <larohra@microsoft.com>

Copilot · 2026-06-01 09:18:26 +00:00

b59a854fcd

Python: Reorganize A2A samples and use package A2AExecutor (#6165 )

* Reorganize A2A samples: client demos in 02-agents, use package A2AExecutor

- Move client samples (agent_with_a2a, a2a_agent_as_function_tools) to samples/02-agents/a2a/
- Add new concept samples: polling, stream reconnection, protocol selection
- Replace sample agent_executor.py with package-level A2AExecutor (stream=True)
- Update 04-hosting/a2a to focus on server-side, point to 02-agents for clients
- Add README.md for the new 02-agents/a2a/ sample collection

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix streaming artifact coalescing and address PR review feedback

A2AExecutor fix:
- Generate a stable artifact_id per stream in _run_stream so all streaming
  chunks share the same ID, enabling proper append=True coalescing per the
  A2A spec (TaskArtifactUpdateEvent with same artifactId).
- Previously, item.message_id was None for OpenAI/Foundry streaming updates,
  causing the SDK to generate a new random UUID per token (100+ separate
  artifacts instead of 1 appended artifact).

Sample improvements:
- Replace join workaround with response.text now that coalescing works
- Add background=True to stream reconnection resume call (required for
  continuation token emission on in-progress tasks)
- Fix type ignore specificity in polling sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-06-01 07:09:11 +00:00

5affc9c333

Python: [A2A] Set message_id on AgentResponseUpdate for message-bearing paths (#6163 )

Map A2A protocol message_id to AgentResponseUpdate.message_id in two paths
where it was previously omitted, aligning with .NET behavior:

1. Standalone A2AMessage: set message_id=msg.message_id (matches .NET
   ConvertToAgentResponseUpdate(Message) which sets both ResponseId and
   MessageId to message.MessageId)

2. TaskStatusUpdateEvent (terminal/input_required): set
   message_id=message.message_id (matches .NET which sets
   MessageId=statusUpdateEvent.Status.Message?.MessageId)

Fixes #5949

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-29 08:11:13 +00:00

dd9a4b6321

Python: consolidate MCP reliability fixes (#6145 )

* Python: consolidate MCP reliability fixes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix MCP cleanup and metadata typing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Satisfy MCP metadata mypy typing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Pyright metadata mapping type

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-05-29 07:21:14 +00:00

e8ff541ebf

Python: Add Mistral AI embedding client package (#5480 )

* Python: Add Mistral AI embedding client package

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

* Address review feedback: fix dimensions check, sort embeddings by index, align docs

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

* Address review feedback: downgrade to alpha, remove integration tests - Change version to 1.0.0a260505 (alpha) - Update classifier to Development Status :: 3 - Alpha - Update PACKAGE_STATUS.md to alpha - Remove Mistral from integration test workflows (no API keys yet)

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

* Add samples directory for alpha package compliance Per python-package-management skill: alpha packages must include samples inside the package directory.

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

* Fix ruff formatting in sample file

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

---------

Signed-off-by: Daria Korenieva <daric2612@gmail.com>

Daria Korenieva · 2026-05-29 07:20:56 +00:00

d2d5384f28

Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099 )

* Adding AgentFileStore and FileAccessProvider to support file ased operations for agents.

* Address PR review feedback on FileAccessProvider

- Probe symlinks on the unresolved candidate path so in-root symlinks
cannot silently pass and out-of-root symlinks surface the correct
error message.
- Validate matching_lines elements in FileSearchResult.from_dict and
raise a clean ValueError for non-mapping entries.
- Cap search regex pattern length (256 chars) via a new
_compile_search_regex helper to mitigate ReDoS, and surface the cap
in the file_access_search_files tool description.
- Skip non-UTF-8 files during filesystem search instead of aborting
the entire directory walk.
- Replace the module-scope trailing string in the data-processing
sample with comments to avoid Ruff B018.
- Remove the checked-in working/region_totals.md sample artifact so
the save flow works from a clean checkout.
- Expand the Windows stdout reconfiguration comment in task_runner.py
for clarity.
- Add tests for invalid/oversize regex, non-UTF-8 file search, and
in-root symlink rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy redundant-cast in FileSearchResult.from_dict

Use cast(list[object], ...) instead of cast(list[Any], ...) so the
cast represents a real type change (lists are invariant) and is no
longer flagged by mypy as redundant, while still satisfying pyright's
reportUnknownVariableType. Matches the existing pattern in _memory.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten path normalization and directory resolution in FileAccess

- _normalize_relative_path now strips surrounding whitespace up front
so leading/trailing spaces never leak into file segments, and
rejects trailing path separators for file paths so 'foo/' is no
longer silently coerced to 'foo'.
- FileSystemAgentFileStore._resolve_safe_directory_path normalizes
with is_directory=True and maps an empty normalized result to the
root. This matches InMemoryAgentFileStore so whitespace-only
directory inputs resolve to the root instead of raising.
- Added tests for whitespace stripping, trailing-separator rejection,
and whitespace-only directory listing on the filesystem store.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Harden FileAccess search and atomic save in store API

- Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop.
- Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store.
- Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection.
- Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`.
- `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix Python 3.10 timeout handling and add directory arg to list/search tools

- Catch asyncio.TimeoutError in _run_search_with_timeout. In Python 3.10
asyncio.wait_for raises asyncio.exceptions.TimeoutError, which is
distinct from the builtin TimeoutError (the two were unified in 3.11).
Catching the asyncio alias works on every supported version.
- Add an optional directory parameter to file_access_list_files and
file_access_search_files so agents can enumerate / scope searches to
nested folders, not just the store root.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address FileAccess review feedback: case, errors, signal, TOCTOU

- InMemoryAgentFileStore now stores (display_name, content) so list_files
and search_files return the original-case names callers wrote, matching
the behaviour of FileSystemAgentFileStore on case-preserving filesystems
and removing the silent in-memory vs. on-disk contract divergence.
- FileSystemAgentFileStore.read_file raises ValueError instead of letting
UnicodeDecodeError bubble for binary / non-UTF-8 input, restoring
symmetry with search_files (which still skips) and giving the tool
layer a recoverable type to translate.
- Tool wrappers now catch ValueError and OSError around every operation
and surface them as readable strings, so 'you used ..' and 'the file
already exists' are both reported to the model the same way instead of
the former crashing out as an unhandled exception.
- _search_files_sync logs per skipped non-UTF-8 file at WARNING and an
aggregate INFO summary so operators can distinguish 'no matches' from
'half the corpus was unreadable'.
- FileSystemAgentFileStore softens its docstrings to acknowledge the
inherent probe-then-open TOCTOU window. On POSIX both read and write
now pass O_NOFOLLOW so the kernel refuses if the leaf segment becomes
a symlink between the probe and the open. Windows has no equivalent
flag; the limitation is documented.
- Tests cover: case preservation on list/search, ValueError on non-UTF-8
read at the store and tool layer, tool-layer string responses for
path-traversal and oversized-regex inputs, search-skip log output,
symlink rejection on delete/search/list, and symlinked intermediate
directory rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address FileAccess nit comments: docstrings, enumerate, opt-in delete approval

- Expand FileSearchMatch/FileSearchResult.to_dict docstrings to explain why
the override is needed (__slots__ defeats the mixin's __dict__ iteration)
and why exclude/exclude_none are accepted-but-ignored (mixin signature
compatibility for callers like to_json).
- Use enumerate(lines, start=1) in _search_file_content so the +1 below is
no longer needed; rename loop variable to line_number for clarity.
- Add opt-in require_delete_approval: bool = False on FileAccessProvider.
When True, file_access_delete_file is registered with approval_mode
'always_require' so the host must approve every delete. Default False
preserves current behaviour and matches the .NET reference, but
deployments that want a safer-by-default posture can enable it.
- Add tests covering both delete approval modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* FileAccess: require delete approval by default

Flip the default for FileAccessProvider(require_delete_approval=...) from
False to True so destructive deletes are gated by host approval out of the
box. Callers that want the previous autonomous behaviour (which matches the
.NET reference) can pass require_delete_approval=False.

Tests updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fixing linkinspector by installing Chrome for puppeteer first.

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Ben Thomas · 2026-05-28 20:09:50 +00:00

b000a2cf51

Backfill chat span request model if it's unknown and response model is avaliable (#6160 )

Tao Chen · 2026-05-28 20:03:46 +00:00

0578f4c910

Python A2A: Expose supported_protocol_bindings as configurable parameter (#6098 )

* Expose supported_protocol_bindings as configurable parameter on A2AAgent

Add supported_protocol_bindings parameter to A2AAgent.__init__() allowing
users to configure which A2A protocol bindings (JSONRPC, GRPC, HTTP+JSON)
the client prefers when connecting to remote agents.

- Defaults to ["JSONRPC"] matching current behavior
- Passes through to ClientConfig for transport negotiation
- Replaces 4 hardcoded references with the configurable value

Closes #6057

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix empty list falsy trap and add fallback path test coverage

- Use 'is not None' check instead of 'or' to preserve explicit empty list
- Add test verifying empty list is not silently replaced with defaults
- Add test verifying fallback path uses custom bindings

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Document known protocol binding values in docstring

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use Literal union for protocol binding type hint

Provides IDE autocomplete for known values while keeping the type
open for custom bindings (Literal is str at runtime).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-28 19:05:13 +00:00

e9a606344a

Python: [Breaking] Refactor Skill API to async resource and script lookup (#6135 )

Port of .NET commit 08541ee5a9.

Replace property-based Skill.content/resources/scripts with async
by-name lookup methods:
- content property -> async get_content() -> str
- resources property -> async get_resource(name) -> SkillResource | None
- scripts property -> async get_script(name) -> SkillScript | None

SkillsProvider now always includes all three tools (load_skill,
read_skill_resource, run_skill_script) and both instruction blocks
regardless of whether any skills have resources or scripts.

ClassSkill retains resources/scripts properties as overridable hooks
for subclass reflection-based discovery.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

semenshi-m · 2026-05-28 15:54:20 +00:00

f7c5b8d108

Bump Python package versions for 1.7.0 release (#6142 )

Bumps the released 1.6.0 packages agent-framework, agent-framework-core, agent-framework-foundry, and agent-framework-openai to 1.7.0, with root continuing to exactly pin agent-framework-core[all]. Bumps the changed prerelease packages agent-framework-a2a, agent-framework-chatkit, agent-framework-declarative, agent-framework-devui, and agent-framework-foundry-hosting to the 260528 date stamp, raises core floors on the packages included in this release, raises Foundry's OpenAI floor alongside OpenAI, and raises ChatKit's openai-chatkit floor to the minimum version required by the current typed API usage. No beta cohort bump was applied; the absent mistal/mistral package was intentionally not bumped because no such package exists in this branch.

Evan Mattson · 2026-05-28 19:45:31 +09:00

a84ad42f6d

Python: [Breaking] Remove Python-only declarative actions and rename alias kinds to C# canonical names (#6126 )

* Remove Python-only declarative actions and rename alias kinds to C# canonical names

* Address PR comments.

* Address PR comments.

* Reduce verbose and duplicate output from sample workflow.

Peter Ibekwe · 2026-05-28 10:16:22 +00:00

ded17b178c

Python: fix: pass Foundry agent default headers (#6040 )

* fix: pass Foundry agent default headers

* test: loosen Foundry default header assertions

Yufeng He · 2026-05-28 10:08:14 +00:00

55dc3ce734

Python: Allow hosted checkpoints to restore MessageRole (#6049 )

* Python: Allow hosted checkpoints to restore MessageRole

Allow Responses hosting checkpoint storage to deserialize the Azure Responses MessageRole enum that hosted workflows can persist inside Agent Framework Message objects.

Add regression coverage for both direct load() and the hosted get_latest() restore path, including the plain-storage failure mode where list_checkpoints logs the blocked type and get_latest() returns None.

Ruff also normalizes a duplicate contextlib import in the touched hosting module.

* Address MessageRole checkpoint review comments

* Cover hosted MessageRole checkpoint restore path

Baidar · 2026-05-28 09:13:30 +00:00

9d8e5ca4f5

Python: Align c# and python TodoProvider tool names (#6107 )

* Align c# and python TodoProvider tool names

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Address PR review: remove __slots__ and add typed schemas for tool params

- Remove __slots__ from TodoItem, TodoInput, and TodoCompleteInput classes
  (not needed for low-instance-count objects and hinders dev scenarios)
- Add _TodoAddItemSchema and _TodoCompleteItemSchema TypedDicts to provide
  proper JSON schema for todos_add and todos_complete tool parameters
- Use typing_extensions for Python 3.10 compatibility

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

westey · 2026-05-28 08:40:13 +00:00

af787569b3

Python: read headers defensively to support stream wrappers without .headers (#6028 ) (#6029 )

`OpenAIChatClient._inner_get_response()` reads `.headers` on the raw streaming
response returned by `client.responses.with_raw_response.create(stream=True)`
(and its three sibling call sites - retrieve-streaming, non-streaming create
and background retrieve) to surface the `x-ms-served-model` Azure header,
introduced in #5910.

When `azure-ai-projects>=2.1.0` experimental GenAI tracing is enabled
(`AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true`), the instrumentor wraps the
raw streaming response in an inline `AsyncStreamWrapper` that exposes
`.response` but not `.headers`. Reading `raw_create_response.headers` then
raises `AttributeError: 'AsyncStreamWrapper' object has no attribute 'headers'`,
which `FoundryChatClient` rethrows as a `ChatClientException` and breaks every
streaming call (workflows and free chat).

Fix: read the header dict via `getattr(raw_response, "headers", None)` at all
four call sites. `_extract_served_model()` already short-circuits on `None`,
so the served-model surfacing degrades gracefully (model stays the deployment
alias) instead of crashing when the response is wrapped by an instrumentor
that does not proxy `.headers`.

Regression test added:
`test_streaming_response_without_headers_attribute_does_not_crash`
simulates a stream wrapper that raises `AttributeError` on `.headers` and
asserts the stream still completes with the deployment alias as `update.model`.

Fixes #6028

Co-authored-by: Emilien Mottet <emilien.mottet@michelin.com>

Emilien Mottet · 2026-05-28 08:37:38 +00:00

3db2004e49

feat(a2a): add A2AAgentSession with reference_task_ids and input-required support (#5980 )

* feat(a2a): link follow-up messages via reference_task_ids

Track the task_id from A2A responses (task, status_update, artifact_update,
and message payloads) on session.state and include it as reference_task_ids
on subsequent outgoing messages. This enables remote agents to correlate
follow-up messages as task refinements per the A2A spec.

Resolves #5938

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(a2a): add A2AAgentSession for typed protocol state tracking

Introduce A2AAgentSession (subclass of AgentSession) with context_id,
task_id, and task_state properties. This follows the DurableAgentSession
pattern and mirrors the .NET A2AAgentSession design.

- Track task_id, context_id, and task_state from all response payload types
- Validate context_id consistency (raise on mismatch)
- Auto-assign server-generated context_id when not set
- Only A2AAgentSession gets reference tracking (no state dict fallback)
- Plain AgentSession continues to work without reference tracking
- Add serialization support (to_dict/from_dict)
- Export via agent_framework.a2a and agent_framework_a2a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: remove unnecessary string annotation (pyupgrade)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: use AgentSession.from_dict for state deserialization

Avoids importing private _deserialize_state, matching the
DurableAgentSession pattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: track context_id from message payloads in A2AAgentSession

Previously, context_id was only captured from task, status_update, and
artifact_update payloads. Message-only responses (which carry context_id
but may lack task_id) were silently lost. This fix:

- Captures msg.context_id in the message handler
- Persists session state when either last_task_id or last_context_id is
  present (not only when task_id is truthy)
- Only updates task_id/task_state when a task_id was actually returned
- Adds a test for message-only context_id tracking

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* addressed comments

* Gate status content to INPUT_REQUIRED/terminal states (match .NET)

Match .NET's GetUserInputRequests pattern: only emit TaskStatusUpdateEvent
message content when state is INPUT_REQUIRED or terminal. Intermediate
status text (WORKING, SUBMITTED) is no longer surfaced to callers.

When state is INPUT_REQUIRED, set additional_properties['input_required']
= True so callers can distinguish input requests from final responses.

Closes #5937

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: remove message task_id tracking, defensive fallbacks, and input_required flag

- Do not track task_id from Message payloads (simple interactions
  without task tracking)
- Remove 'or last_task_id' fallback from status_update and
  artifact_update handlers (spec guarantees task_id is always set)
- Remove additional_properties['input_required'] flag (content gating
  to INPUT_REQUIRED/terminal states is the signal itself)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-28 08:36:49 +00:00

efdabd56dc

Fix deprecated asyncio.iscoroutinefunction usage in test_cleanup_hooks.py (#4563 )

Fixes #4522

Replace deprecated `asyncio.iscoroutinefunction()` with `inspect.iscoroutinefunction()`
to resolve Python 3.13+ deprecation warning.

Changes:
- Added `import inspect` to imports
- Replaced `asyncio.iscoroutinefunction(hook)` with `inspect.iscoroutinefunction(hook)` on line 126
- This makes the code consistent with other test methods in the same file (lines 201, 236)

The rest of the file already uses `inspect.iscoroutinefunction()` correctly, making
this change consistent with the existing codebase pattern.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Tao Chen <taochen@microsoft.com>

Shalabh Gupta · 2026-05-28 02:29:31 +00:00

371a869e44

Add hosting samples overview README (#5407 )

Co-authored-by: whenpoem <187613766+whenpoem@users.noreply.github.com>

whenpoem · 2026-05-27 21:08:17 +00:00

e532ced950

Python: fix: keep citation get_url metadata (#6037 )

* fix: keep citation get_url metadata

* fix: satisfy citation metadata mypy check

Yufeng He · 2026-05-27 20:09:02 +00:00

4c4e1d9b87

Python: Add a HarnessAgent with available features and sample (#6041 )

* Add a HarnessAgent with available features and sample

* Fix formatting

* Address PR comments and fix mypy error

* Add web search support to HarnessAgent

* Fix build warning

* Apply suggestions from code review

Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

* Address PR comments

* Address PR comments

* Address further PR comments.

* Fix markdown broken link

---------

Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

westey · 2026-05-27 14:54:00 +01:00

ef86fb51d5

Python: feat(foundry): add to_prompt_agent / deploy_as_prompt_agent (experimental) (#5959 )

* feat(foundry): add experimental to_prompt_agent converter

Adds `to_prompt_agent(agent)`, an experimental converter
(`ExperimentalFeature.TO_PROMPT_AGENT`) that turns an Agent Framework
`Agent` into a Foundry `PromptAgentDefinition` ready to publish via
`AIProjectClient.agents.create_version(...)`.

Behaviour:

* `agent.client` must be a `FoundryChatClient` (or subclass); otherwise
`TypeError` is raised. The model deployment name is lifted from the
bound client so the same Agent definition used for local runs can be
published as a hosted prompt agent without restating the model.
* Foundry SDK tool instances (from `FoundryChatClient.get_*_tool()`) are
passed through unchanged. AF `FunctionTool`s (and `@tool`-decorated
callables) are emitted as Foundry `FunctionTool` declarations.
* Local AF MCP tools cannot be expressed in a `PromptAgentDefinition`;
the converter raises `ValueError` and points at
`FoundryChatClient.get_mcp_tool()` for hosted MCP servers.
* The converter walks both `agent.default_options["tools"]` and
`agent.mcp_tools` because `normalize_tools()` splits local MCP off
into its own list.

Re-exported through the `agent_framework.foundry` lazy-loading namespace
(updates both `__init__.py` and the `__init__.pyi` type stub).

Adds a portable-agent sample showing the same `Agent` driven through
both `agent.run(...)` and `to_prompt_agent(agent)`, and a README section
covering the new converter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(samples): remove snippet tags from portable agent sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(samples): inline FoundryChatClient and enable prompt-agent publish

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(samples): drop async credential context manager

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry): trim README to_prompt_agent example to publish-only flow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry): note FoundryAgent runs @tool callables for deployed prompt agents

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry): address review comments on to_prompt_agent converter

* Construct `PromptAgentDefinition` `Tool` from a dict via `**tool_item`
unpacking rather than the positional Mapping constructor \u2014 cleaner and
matches the typical Pydantic / Azure SDK pattern.
* Drop the redundant `isinstance(mcp_tool, MCPTool)` guard in
`_convert_tools`; the parameter is already typed `Iterable[MCPTool]` so
the second `raise` was unreachable. The remaining single `raise`
fires for every entry as intended.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry): match Agent.__init__ model resolution in to_prompt_agent

* Read the model from `agent.default_options.get("model")` first,
falling back to `agent.client.model`. This mirrors the order
`Agent.__init__` uses (`_agents.py:740`) when assembling
default_options, so the model the agent runs with is the same model
the converter publishes \u2014 e.g. when the caller passes
`default_options={"model": "..."}` to override the bound client.
* Updated the missing-model error message to point at both the client
and the default_options paths.
* Added tests:
* tool-only agent with no `instructions` produces a definition
where `instructions` is `None` and is omitted from the dict
payload (`Agent.__init__` strips None values from default_options
before storing them).
* `default_options['model']` wins over the bound client's model.
* Fallback to client.model when default_options has no model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry): add deploy_as_prompt_agent helper + samples

Adds `deploy_as_prompt_agent(agent)`, a convenience wrapper around
`to_prompt_agent` that reuses the bound FoundryChatClient's project
client to call `project_client.agents.create_version(...)`. Defaults
`agent_name` / `description` from `agent.name` / `agent.description`
so the Agent stays the single source of truth.

* Exposed from `agent_framework_foundry` and the lazy-loading
`agent_framework.foundry` namespace (including the .pyi stub).
* Marked experimental with the existing
`ExperimentalFeature.TO_PROMPT_AGENT` tag.
* Tests cover the happy path, name/description defaulting, explicit
override, no-name error, metadata + description forwarding, extra
kwargs passthrough, and the experimental metadata.

Samples:
* Renamed the existing sample to `creating_prompt_agents.py`, drops
'portable' wording, presents `deploy_as_prompt_agent` first as the
recommended path and `to_prompt_agent` + `AIProjectClient` as the
two-step alternative, and adds a cleanup step that deletes the
published agent so re-runs stay idempotent.
* New `using_prompt_agents.py` shows the end-to-end loop: deploy the
agent, connect to it with `FoundryAgent` passing the same local
`@tool` callable, run a query against the deployed prompt agent,
then clean up.

README updated to introduce `deploy_as_prompt_agent` as the
recommended path and link to both runnable samples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry): restore missing-model ValueError in to_prompt_agent

The check was accidentally dropped while reworking docstrings in the
previous commit. Test `test_to_prompt_agent_rejects_missing_model`
exercises this path and was failing on CI as a result.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(foundry): rename deploy_as_prompt_agent -> create_prompt_agent

Renames the helper across the foundry package, core lazy-loader stubs,
tests, README and samples. The new name better matches the action
performed (a prompt-agent definition is created in Foundry) and is
consistent with the surrounding ''create_*'' API surface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(foundry): drop create_prompt_agent, enrich to_prompt_agent params

Remove the create_prompt_agent helper and consolidate on to_prompt_agent.
Expose every PromptAgentDefinition parameter that has either an Agent
Framework equivalent (sourced from default_options) or no equivalent
(accepted as a keyword argument).

* default_options-sourced (with kwarg overrides):
temperature, top_p, string tool_choice
* kwarg-only Foundry knobs:
reasoning, text, structured_inputs, rai_config, ToolChoiceParam tool_choice

Precedence is always: explicit keyword > default_options entry > unset.

Tests cover every path (defaults, default_options, kwargs, kwarg override).
Samples and README rewritten around the enriched to_prompt_agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(foundry): single source of truth for prompt-agent options

Stop duplicating the generation-parameter surface between FoundryChatOptions
and to_prompt_agent. Translate every field with an Agent Framework equivalent
(temperature, top_p, tool_choice, reasoning, response_format/text/verbosity)
from agent.default_options via a new RawFoundryChatClient helper
_prepare_prompt_agent_options. Only Foundry-specific fields with no AF
equivalent — structured_inputs and rai_config — remain as keyword arguments
on to_prompt_agent.

- tool_choice is dropped when there are no tools (mirrors _prepare_options
semantics and avoids polluting tool-less prompt agents with Agent.__init__'s
'auto' default).
- response_format Pydantic models route through
openai.lib._parsing._responses.type_to_text_format_param; dict shapes go
through the existing _prepare_response_and_text_format helper.
- default_options is not mutated; text dict is defensively copied.

Tests, README, and creating_prompt_agents.py sample updated to reflect the
new single-source model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry): consolidate prompt-agent sample

Drop creating_prompt_agents.py (the publish-only variant) and rename
using_prompt_agents.py to foundry_prompt_agents.py so the single sample
covers the full convert -> publish -> connect -> run loop. Update the
README link list accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry): run local Agent + deployed agent in same sample

Add an agent.run() call against the local Agent before publishing, then run
the deployed prompt agent on the same query. Expand the docstring with a
compare-and-contrast covering runtime/latency, configurability, and
persistence/sharing differences between the two execution paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry): cover conflicting response_format + text.format in to_prompt_agent

Exercises the ValueError path when a Pydantic response_format would overwrite
an explicit text.format mapping with a different shape. Lifts _chat_client.py
coverage from 89% to 90%.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(foundry): move _prepare_prompt_agent_options into _to_prompt_agent

Lift the translation helper off RawFoundryChatClient and into the
_to_prompt_agent module as a module-private function that takes the client
as its first argument. The chat client no longer needs to carry a method
whose only consumer is the prompt-agent converter, while still serving as
the source of the request-path helper (_prepare_response_and_text_format)
that the converter reuses for dict-shaped response_format values.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(python): codify GA terminology + post-run docs review

Add two pieces of guidance to python/AGENTS.md:

* Terminology - reserve 'GA' for hosted services; use 'released' or 'stable'
for Agent Framework code/features to match the feature-lifecycle stages.
* Maintaining Documentation - review AGENTS.md and skills at the end of every
run and update any guidance the conversation made stale; before adding a
new principle, ask the user to confirm it should be captured.

Also pulls in a docstring fix in foundry_prompt_agents.py that swaps the
stray 'GA' for 'released', applying the new terminology rule.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* address PR review: strict=True default, Tool._deserialize dispatch, sample cleanup safety

- FunctionTool published as strict=True so the server-side schema validation
matches what the local FoundryAgent(tools=[same_callable]) dispatcher
enforces. AF FunctionTool has no 'strict' attribute, so the safer default
is used uniformly instead of silently downgrading to a permissive contract.
- _validate_mapping_tool now dispatches through ProjectsTool._deserialize so
dict-shaped tools rehydrate to the concrete subclass (FunctionTool,
WebSearchTool, ...) via the 'type' discriminator instead of returning a
generic Tool. Added a test that asserts isinstance(WebSearchTool) and a
new test for the function-typed dict path.
- foundry_prompt_agents.py sample now wraps credential + project client in
async with and the create_version / run flow in try/finally so a failure
on connect or run still deletes the published prompt agent rather than
leaving an orphaned, billable resource in the user's Foundry project.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(ci): correct linkspector ignorePattern typo (./pulls -> ./pull)

GitHub PR URLs use the singular segment /pull/N (compare to /issues/N
for issues). The existing './pulls' ignore pattern never matched
anything as a result, so legitimately stale PR links (e.g. PRs deleted
from forks) surface as linkspector failures on unrelated PRs.

This is the same convention the './issues' rule above already follows.
Fixes the markdown-link-check failure on a dangling link in
dotnet/src/Microsoft.Agents.AI.DurableTask/CHANGELOG.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-05-27 13:31:21 +00:00

d5c07f2623

Python: Add a BackgroundAgentsProvider for python (#6069 )

* Add a BackgroundAgentsProvider for python

* Address PR comments and fix linting warnings

* Address PR comment

westey · 2026-05-27 09:12:01 +00:00

ae989b92e7

Python: Fix DevUI streaming memory growth regression (#6038 )

* Fix DevUI streaming memory growth regression

Bounds retained streaming/debug state in DevUI and strengthens browser regression coverage for long streamed responses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address DevUI memory review feedback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix DevUI bundle trailing whitespace

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-05-27 07:48:29 +00:00

3242d8a4c4

Python: fix(openai): guard against null delta in streaming chunks from non-co… (#5734 )

* fix(openai): guard against null delta in streaming chunks from non-compliant providers (#5732)

* chore: resolve nit and align with project style

---------

Co-authored-by: Sergey Borisov <sergey.borisov@dataimpact.io>
Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com>

S3rj · 2026-05-27 07:42:46 +00:00

e1e6e3d35e

Python: Add Python parity sample for invoking Foundry Toolbox tools from declarative workflows (#5933 )

* Add Python parity sample for invoking Foundry Toolbox tools from declarative workflows

* Python: address PR review on declarative toolbox sample

Two security fixes for PR #5933:

1. Add safe_mode flag to WorkflowFactory (default True) mirroring
   AgentFactory. Gates =Env.* exposure inside DeclarativeWorkflowState
   PowerFx symbols via _safe_mode_context, so workflow YAML loaded from
   untrusted sources no longer leaks the host's full os.environ snapshot
   into PowerFx evaluation. The flag is also forwarded to the
   internally-constructed AgentFactory so inline agent definitions
   follow the same policy.

2. Pin the invoke_foundry_toolbox_mcp sample's _client_provider to the
   resolved toolbox endpoint. The bearer-authenticated httpx client is
   now only returned when MCPToolInvocation.server_url matches the
   toolbox URL case-insensitively; any other URL gets None (the default
   unauthenticated path), preventing the Foundry AAD bearer token from
   being attached to a mis-configured or injected server URL. Mirrors
   the .NET sample's httpClientProvider guard.

The sample is updated to opt in to safe_mode=False because its YAML
intentionally uses =Env.FOUNDRY_TOOLBOX_* to keep configuration in env
vars under the developer's control.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix pyright issues.

* Addressed PR comments.

* Fix CI pipelines.

* Resolve PR comments

* Revamped sample to address PR comments.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Peter Ibekwe · 2026-05-26 15:36:33 +00:00

200488cb08

Python: Align ModeProvider tool names and instructions (#6071 )

* Align ModeProvider tool names and instructions

* Address PR comments

westey · 2026-05-26 14:37:34 +00:00

bd4fc64b4d

Fix Foreach body exit wiring in declarative workflows (#6050 )

Peter Ibekwe · 2026-05-26 06:37:35 +00:00

b2e77067e9

fix: update sequential workflow sample output handling (#5976 )

Yufeng He · 2026-05-22 15:31:18 +00:00

6bc0dc5911

Python: fix Foundry handoff argument serialization (#5861 )

Yufeng He · 2026-05-22 15:30:55 +00:00

cf91819625

Python: fix(core): point @experimental warnings at user code, not stdlib internals (#5996 )

* fix(core): point @experimental warnings at user code, not stdlib internals

Previously the wrappers installed by @experimental called warnings.warn
with a fixed stacklevel=3. ABCMeta inserts an extra abc.__new__ frame
when an experimental ABC is subclassed, so the warning landed inside
abc.py (or <frozen abc>:106 on modern CPython) instead of the user's
class Sub(...) line.

Resolve the user frame by walking inspect.currentframe(), skipping
frames whose module name is abc/functools/typing/contextlib (or
submodules), then emit via warnings.warn_explicit so the recorded
filename/lineno point at user code. Falls back to warnings.warn with
stacklevel=2 if no user frame is found. Module-name matching is used
because frozen stdlib modules report '<frozen abc>' as their filename.

Also install a one-line warnings.formatwarning specifically for
FeatureStageWarning so 'file:line: ExperimentalWarning: [ID] Name ...'
prints without the secondary source-snippet line. Other categories
delegate to the stdlib default formatter unchanged.

Added a regression test that subclasses an @experimental ABC inside
warnings.catch_warnings and asserts the recorded filename equals the
test file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(core): address review feedback on @experimental warning fix

- Make _install_feature_stage_formatter idempotent: tag the installed
  formatter with a marker attribute and short-circuit re-installation,
  so re-imports/reloads don't wrap the formatter on top of itself.
  Also expose the previous formatter via __wrapped__ for restoration.
- Avoid leaking frame references in _resolve_user_frame: capture data
  into plain locals inside try and del frame/candidate in finally,
  per CPython's guidance on inspect.currentframe usage.
- Drop redundant _WARNED_FEATURES.clear() in the new ABC subclass test
  (the autouse fixture already handles it).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* changed query for foundry web search test

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-05-22 12:07:10 +00:00

578416a379

Python: bump package versions for 1.6.0 release (#6017 )

* Python: bump package versions for 1.6.0 release

- Released cohort (agent-framework, core, openai, foundry): 1.5.0 -> 1.6.0
- Beta packages (21 packages): 1.0.0b260519 -> 1.0.0b260521
- Alpha packages (azure-contentunderstanding, foundry-hosting, gemini, monty): 1.0.0a260518/19 -> 1.0.0a260521
- ag-ui stays at 1.0.0rc2, orchestrations at 1.0.0rc1 (dependency bounds updated)
- Inter-package dependency lower bounds updated (>=1.5.0,<2 -> >=1.6.0,<2)
- Update CHANGELOG compare links
- uv.lock refreshed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: bump RC packages, add shell tool to changelog

- ag-ui: 1.0.0rc2 -> 1.0.0rc3
- orchestrations: 1.0.0rc1 -> 1.0.0rc2
- Add shell tool (#5664) to CHANGELOG
- uv.lock refreshed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-05-22 01:59:20 +00:00

950673ba47

Python: Shell tool with support for local and Docker (#5664 )

* feat(tools): add cross-OS LocalShellTool in new agent-framework-tools package

Introduces a safe, cross-OS local shell tool as the first citizen of a new

agent-framework-tools workspace package. Supports persistent (default) and

stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist,

allowlist, approval gating, process-tree kill on timeout, output truncation,

and audit hooks. Integrates with existing provider get_shell_tool(func=...)

factories via FunctionTool kind='shell'.

See docs/decisions/0026-builtin-tools-local-shell.md for the full design.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): security hardening for LocalShellTool

Codifies what LocalShellTool does and does not defend against, and

delegates the security-relevant lifecycle primitive to a battle-tested

library instead of hand-rolled per-OS code.

Changes:

- Adopt psutil for cross-OS process-tree termination (executor + session).

  Replaces hand-rolled taskkill/killpg with one canonical implementation.

- Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH

  poisoning cannot redirect us to an attacker-supplied binary.

- Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail,

  not a security boundary.

- Require acknowledge_unsafe=True to set approval_mode='never_require',

  making the unsafe path explicitly opt-in with a self-documenting name.

- Add tests/test_security.py codifying named CVE-style cases. Defenses

  we DO claim are asserted; non-defenses (denylist bypasses via

  backslash insertion, variable expansion, interpreter escape, base64,

  alternative tools, PowerShell-native verbs) are documented as

  expected-to-pass tests so residual risk stays visible.

- Add Threat Model + Confidence Strategy sections to ADR 0026.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): add DockerShellTool sandboxed shell tier

Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool.

Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework.

Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell.

Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): backport shell-tool fixes from .NET parity review

Applies the applicable subset of bug fixes accumulated during the
.NET shell-tool PR review (microsoft/agent-framework#5604) to the
Python shell tool.

A1 - Quote workdir safely in _maybe_reanchor

  Previously _tool.py used double-quote interpolation when emitting
  the cd/Set-Location prefix, which expanded $VAR, $(), and backticks
  in the workdir path. A workdir containing shell metacharacters could
  trigger arbitrary command execution before the user command ran.

  Replaced with single-quote escaping helpers _quote_posix and
  _quote_powershell that emit literal-string forms safe for both
  hosts.

A5/A6 - Consolidate truncation to a single byte-aware helper

  Extracted a shared truncate_head_tail / truncate_text_head_tail
  helper in _truncate.py. The new implementation distributes odd
  caps so head receives floor(cap/2) and tail receives ceil(cap/2)
  bytes, matching the .NET round-9 fix and ensuring no input bytes
  are silently dropped on the boundary.

  _session.py previously truncated by Python str length while the
  caller passed _max_output_bytes - the unit mismatch is now gone:
  raw byte buffers go through truncate_head_tail and decoded text
  goes through truncate_text_head_tail.

Unit tests added for the truncate and quote helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(tools): tone down narrative and overconfident comments in shell tool

The shell tool's docstrings and comments contained two patterns that
the .NET review pushed back on:

- Narrative framing about implementation history ("hard-won",
  "we sidestep", "design inspiration: ...", competitor framework
  name-drops in module docstrings).
- Overstated security guarantees ("battle-tested",
  "reasonable for untrusted input", "recommended executor for any
  agent that runs commands from untrusted input",
  "destructive commands are blocked", "safe local shell tool",
  "blocks shell injection").

Rewrites the affected docstrings and comments to describe what the
code does in neutral terms. Behaviour is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tools): add ShellEnvironmentProvider for the Python shell tool

Ports the .NET ShellEnvironmentProvider as a Python ContextProvider
so agents using LocalShellTool or DockerShellTool can be primed with
an accurate description of the shell they're talking to (family,
version, OS, working directory, and which CLIs are available).

The provider runs probes through any ShellExecutor, caches the
resulting snapshot, and on every before_run extends the session
instructions with a markdown block describing the shell idiom to
use. A failed first probe leaves the cache empty so the next call
retries (no permanent poisoning).

Probe failures from a narrow set of expected error types
(ShellCommandError, ShellExecutionError, ShellTimeoutError, and
asyncio.TimeoutError from the per-probe timeout) are recorded as
None fields in the snapshot. Other exceptions propagate. Tool
names are validated against ^[A-Za-z0-9._-]+$ before being
interpolated into a probe command.

Includes 12 unit tests covering happy path, stderr fallback,
timeout handling, expected/unexpected exception paths, malicious
tool name rejection, case-insensitive deduplication, retry after
failure, concurrent first-callers sharing one probe, and the
default and custom formatter paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(tools): document ShellEnvironmentProvider and finish comment cleanup

Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(tools): move shell samples to python/samples/02-agents/tools

The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(tools): cover confine_workdir default and ShellResult.format_for_model

Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): address PR #5664 review feedback

- _tool.py: detect PowerShell via is_powershell() helper instead of basename string match

- _environment.py: use public ContextProvider import (no private _ prefix)

- _session.py: trim _stdout_buf/_stderr_buf after copying to avoid unbounded retention across calls

- _docker.py: short-circuit start()/close() in stateless mode; add configurable shell kwarg (default bash, e.g. 'sh' for alpine)

- tests: parenthesized multi-line assert; alpine integration tests now pass shell='sh'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(tools): satisfy CI quality gates

- pyupgrade: drop quoted self-class refs in __aenter__/method annotations

- ruff format: reflow long lines per workspace style

- pyright: assert psutil non-None in optional-import branch; lowercase mutable module globals; annotate _approval_mode as Literal so tool() Literal-typed kwarg is accepted; add ... body to ShellExecutor.run protocol; remove unused deprecated _kill_tree wrapper

- tests: skip docker integration tests on win32 (Windows containers don't support --read-only / alpine images)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove DEFAULT_DENYLIST; document single-session ownership; fix bandit findings

Mirrors the .NET PR #5604 cleanup:

- Remove DEFAULT_DENYLIST from ShellPolicy. ShellPolicy() now ships with an empty deny-list; operators opt into site-specific patterns explicitly. No major agent framework uses regex matching as a primary security control; AutoGen v2 removed theirs. Approval gating + sandbox tier remain the real boundaries.

- Rewrite module / class docstrings to frame ShellPolicy as a UX pre-filter, not a security control.

- Add Single-session ownership paragraphs to ShellExecutor, ShellSession, LocalShellTool, and DockerShellTool: a persistent-mode tool is owned by exactly one conversation / agent session; do not share across users or concurrent conversations.

- Tests now supply explicit deny patterns instead of relying on a default.

- Address Pre-commit Hooks (bandit) CI failures: convert internal-invariant asserts to explicit RuntimeError, annotate intentional subprocess/shell usage with # nosec, document container-internal /tmp paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #5664 round-2 review feedback

Deny-list documentation drift:

- README and the OpenAI/local-shell sample no longer claim a built-in deny-list of destructive commands. ShellPolicy is described as an optional, operator-supplied UX pre-filter; the real boundaries remain approval gating and the sandbox tier.

Behavioural fixes called out in review:

- ShellPolicy.evaluate() now denies empty / whitespace-only commands explicitly instead of returning allow with no rationale.

- truncate_head_tail() raises ValueError for cap <= 0 instead of silently returning the full input with truncated=False, which previously could defeat output-capping in callers that mis-configured the budget.

- LocalShellTool.as_function() / DockerShellTool.as_function() return the ShellCommandError text directly so the model sees a single, non-redundant 'Command rejected by policy: …' message instead of the prior duplicated 'Command blocked by policy: Command rejected …' wrapping.

- ShellSession POSIX sentinel trailer now snapshots and restores the prior errexit (set -e) state around the trailer, so a user 'set -e' in the persistent shell is no longer permanently disabled by the next run().

Tests:

- New test_shell_parse_rc.py covers the full _parse_rc() edge-case surface (zero, positive, negative, CRLF, no newline, missing prefix, empty input, non-digits, trailing garbage, partial digits).

- test_policy.py asserts the new empty-command deny.

- test_shell_truncate_and_quote.py asserts ValueError for cap=0 and cap<0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback for shell tool

- _resolve.py: reject empty/whitespace shell override string
- _tool.py / _docker.py: mode-aware default tool description (persistent vs stateless)
- _tool.py: fix misleading workdir docstring (re-anchor, not blocking)
- _types.py: emit stream-agnostic [output truncated] marker
- _policy.py: declare _denies/_allows as dataclass fields
- _environment.py: use $(pwd) instead of $PWD in POSIX probe

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review feedback: shell override flag + probe timeout safety

- _resolve.py: in stateless mode, ensure shell overrides end with -c/-Command so commands aren't misinterpreted as script-file paths.
- ShellExecutor.run / LocalShellTool.run / DockerShellTool.run now accept an optional 	imeout kwarg; ShellEnvironmentProvider drops the outer asyncio.wait_for and lets the executor enforce the probe timeout internally, so cancellation no longer risks leaving a hung subprocess or corrupted session.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback: docker isolation + lifecycle robustness

- pyproject.toml: bump agent-framework-core minimum from 1.2.0 to 1.2.2 to align with the rest of the workspace.
- _docker.py: validate extra_run_args at construction time and reject flags that would dismantle the isolation defaults (--privileged, --cap-add, --security-opt, --network/--net, -v/--volume/--mount, --device, --pid, --ipc, --userns, --user, --read-only, --tmpfs, --add-host, --gpus, --cgroupns, --device-cgroup-rule); also documented the warning on the docstring.
- _docker._stop_container: retry docker rm -f once and log a warning/error when it does not succeed, so operators can audit leaked containers instead of getting a silent success.
- _docker._run_stateless timeout path: fall back to docker rm -f when docker kill fails or times out (--rm only reaps on clean exit), and log instead of silently swallowing communicate() errors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: alliscode <bentho@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>

Ben Thomas · 2026-05-22 00:29:59 +00:00

8e54f0b0e7

Python: Prevent duplicate system instructions in Python telemetry (#5981 )

* Initial plan

* Fix duplicated system instructions in Python telemetry

* Clarify telemetry message filtering

* test: cover separate and in-history system messages

* Clarify observability message logging split

* Simplify observability logging serialization

* Harden observability regression test

* Reuse observability span message serialization

* Clarify observability logging loops

* Polish observability message serialization

* Tighten observability zip checks

* Refactor observability message capture loop

* Fix telemetry logging for separate system instructions

* Refine observability OTEL message typing

* Restore prepended-instruction logging path in _capture_messages

* Revert logging change in _capture_messages; keep chat-history-only logging

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Copilot · 2026-05-21 19:59:06 +00:00

c8b8198af1

1062 Commits