Commit Graph

880 Commits

  • Python: run sync tools off the event loop (#5773)
    * fix: run sync tools off event loop
    
    * chore: silence harness tool marker type check
  • Python: Add MCP-based skills discovery (McpSkillsSource) (#6169)
    * Add MCP-based skills discovery (McpSkill, McpSkillsSource, McpSkillResource)
    
    Implement Agent Skills discovery over MCP following the SEP-2640 convention:
    - McpSkillsSource: reads skill://index.json to discover skills served by an MCP server
    - McpSkill: lazily fetches SKILL.md content via resources/read on demand
    - McpSkillResource: wraps MCP resource results (text and binary)
    - Path traversal protection in get_resource for defense in depth
    - Samples for Foundry Toolbox and standalone MCP skills server
    - Comprehensive unit tests (514 lines)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review comments: rename to MCP* convention, fix error handling and samples
    
    - Rename McpSkill/McpSkillResource/McpSkillsSource to MCPSkill/MCPSkillResource/MCPSkillsSource
    - Add data-URI prefix stripping for blob resource decoding
    - Let non-McpError exceptions propagate from get_resource()
    - Fix contradictory test comment
    - Use interactive input() in mcp_based_skill sample
    - Remove misleading sample output block
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Restore debug logging for McpError in get_resource()
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use AzureCliCredential in Foundry toolbox skills sample for consistency
    
    Replace DefaultAzureCredential with AzureCliCredential to match the
    credential convention used in all other samples.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use MCPStreamableHTTPTool in MCP skills sample
    
    Replace raw mcp library imports (ClientSession, streamable_http_client)
    with the framework's MCPStreamableHTTPTool to keep MCP server connections
    consistent regardless of whether skills are enabled.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Branch on McpError.error.code so only not-found errors return empty
    
    Previously _try_read_index() and get_resource() swallowed every McpError
    as 'no skills available', making auth failures, server crashes, and
    connection drops indistinguishable from a server that simply has no
    skills.
    
    Now only two codes are treated as not-found:
    - -32002 (MCP-spec Resource not found)
    - -32601 (METHOD_NOT_FOUND — server lacks resources/read)
    
    All other McpError codes and non-McpError exceptions propagate with a
    warning log, surfacing real failures visibly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Add tests for non-McpError and non-not-found error propagation in MCP skills
    
    Cover the re-raise branch in MCPSkill.get_resource for plain
    ConnectionError/TimeoutError, the generic McpError (code 0) propagation
    on get_resource, and TimeoutError propagation in _try_read_index.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Revert "Use MCPStreamableHTTPTool in MCP skills sample"
    
    This reverts commit f31ed0ded9.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Introduce MCP_SKILLS experimental feature for MCP skill classes
    
    Add a separate MCP_SKILLS feature ID to ExperimentalFeature enum and
    use it for MCPSkillResource, MCPSkill, and MCPSkillsSource, since their
    promotion timeline is partly outside of our control.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: progressive tool exposure via FunctionInvocationContext (#6233)
    * Python: progressive tool exposure via FunctionInvocationContext
    
    Add first-class progressive tool exposure to the Python core function-calling
    loop. Tools can now add or remove real FunctionTool schemas at runtime via the
    injected FunctionInvocationContext, taking effect on the next iteration of the
    loop.
    
    - FunctionInvocationContext gains a live `tools` list plus experimental
      `add_tools()` / `remove_tools()` helpers (feature: PROGRESSIVE_TOOLS).
    - The function-calling loop establishes a run-local, normalized tools list and
      threads it into the context at both invocation paths so mutations propagate.
    - Add a sample (dynamic_tool_exposure.py) and a tools samples README, including
      a note that CodeAct providers (Monty/Hyperlight) use their own provider-level
      tool management instead.
    
    Supersedes #3877.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Validate non-negative input in dynamic_tool_exposure sample tools
    
    Address review feedback: factorial and fibonacci now return an error
    message for negative n instead of producing incorrect results.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Make add_tools atomic and surface swallowed function errors
    
    Address review feedback on progressive tool exposure:
    
    - add_tools now validates the full batch against a throwaway copy before
      committing, so a duplicate-name clash partway through a sequence leaves
      the live tool list unchanged (all-or-nothing).
    - _auto_invoke_function now logs a warning (with traceback) when a tool
      raises, so contract errors such as a duplicate-name ValueError from
      add_tools are debuggable without enabling include_detailed_errors.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Avoid retaining tracebacks when logging swallowed function errors
    
    Logging with exc_info=exc fed the exception traceback to the logging
    machinery, whose frame references created reference cycles collected
    lazily by the cyclic GC. On Windows that could drop a hyperlight
    WasmSandbox on a non-owning thread ("unsendable, dropped on another
    thread"), crashing the xdist worker. Log a pre-formatted message with
    the exception repr instead, so no traceback object is retained.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * added missing decorator
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Promote agent-framework-declarative package to RC (#6256)
    * Promote agent-framework-declarative package to RC
    
    * Update missed package status file.
  • Python: Fix FoundryAgent stripping model from PromptAgent requests (#5526)
    * Fix FoundryAgent stripping model from PromptAgent requests
    
    Move run_options.pop('model', None) inside the _uses_foundry_agent_session()
    conditional so that model is only stripped for hosted agent sessions (where
    the server manages the model) and preserved for PromptAgent requests that
    require it in the Responses API call.
    
    Fixes #5525
    
    * test: add coverage for resp_* continuation preserving model
    
    Adds test_raw_foundry_agent_chat_client_prepare_options_preserves_model_for_resp_continuation
    to explicitly verify that HostedAgent v1 / v2-no-session paths (where conversation_id
    starts with resp_) preserve model and previous_response_id without triggering the
    hosted-session gate.
    
    ---------
    
    Co-authored-by: Benke Qu <bequ@microsoft.com>
    Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
  • Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append (#5913)
    * Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append
    
    Per the OTel spec, OTEL_EXPORTER_OTLP_ENDPOINT is a *base* URL for HTTP —
    the SDK auto-appends /v1/traces, /v1/metrics, /v1/logs when it reads the
    env var directly. Signal-specific endpoint env vars are *full* URLs used
    verbatim.
    
    _get_exporters_from_env read the base endpoint and forwarded it as the
    constructor ``endpoint=`` argument, which the SDK always treats as a full
    signal URL. As a result, with OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
    and HTTP protocol, the exporter sent to http://localhost:4318 instead of
    http://localhost:4318/v1/traces (and likewise for metrics/logs).
    
    Replicate the spec's auto-append here when falling back to the base
    endpoint under HTTP. gRPC behavior is unchanged.
    
    * Python: Fix mypy type errors in OTLP endpoint assignment
    
    Pre-declare traces_endpoint, metrics_endpoint, logs_endpoint as
    str | None before the if/else block. Mypy inferred str from the
    if-branch f-string assignments and then rejected the str | None
    expressions in the else-branch as incompatible.
  • Python: Persist hosted MCP call/results as canonical mcp_call output (#6070)
    * Persist hosted MCP call/results as canonical mcp_call output
    
    - Preserve hosted MCP call/result pairs as canonical mcp_call output items
    
    - Coalesce MCP call + result in non-streaming conversion path
    
    - Keep call-id alignment for MCP tool call tracking and output mapping
    
    - Update tests and package metadata
    
    * Fix missing Mapping import in hosted responses adapter
    
    * Fix pyright unknown type in MCP output stringification
    
    * Fix typing for MCP output sequence iteration
    
    * Improve MCP output robustness and avoid eager flattening
    
    * Bump foundry_hosting to b7 and update responses dependency to b7
    
    * Restore foundry_hosting package version to 1.0.0a260521
    
    * Refactor hosted MCP output parsing
  • Python: feat(bedrock): implement native structured output support via Converse API (#6052)
    * feat(bedrock): add structured output support via Converse API (Fixes #5966)
    
    * fix(bedrock): improve unsupported model exception handling and schema parsing
    
    * refactor(bedrock): use generic traversal for strict schema enforcement
    
    * address Copilot review comments on structured output
    
    * refine bedrock structured output: guard additionalProperties, TypeError check, docs + test
    
    * fix(bedrock): widen response_format to Mapping and add missing test coverage
  • Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101)
    * Python: feat(evals): RubricScore type + EvalScoreResult.dimensions
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(evals): parse rubric_scores from output items + assertion helpers
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): YAML config loader + sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: fix(evals): address PR review feedback
    
    Addresses 4 Copilot review comments on PR #6101:
    
    1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.
    
    2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.
    
    3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.
    
    4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Python: feat(foundry-evals): hosted-agent-aware rubric generation
    
    * Auto-detect hosted Foundry agents in agent_as_eval_source: when the
      agent's chat_client exposes a string agent_name (the convention used
      by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
      type='agent' EvalGenerationSource so the service fetches instructions
      and tools from the agent registry instead of relying on the local
      wrapper (which holds neither for hosted agents).
    * Add hosted_agent_version kwarg and a new agent_version field on
      EvalGenerationSource so PromptAgent runs can pin to a specific hosted
      version for reproducible rubric generation.
    * Add force_prompt_source escape hatch to bypass auto-detection and
      always emit a rendered prompt dossier - useful when the local wrapper
      carries overrides the service-side agent doesnt see.
    * Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
      not dataset_name=/dataset_version=. The mismatch would raise TypeError
      against the real azure-ai-projects 2.3.0a* SDK; only unmocked
      integration paths were affected.
    
    Tests cover: auto-detection happy path, versionless hosted agent,
    explicit hosted_agent_version forwarding, force_prompt_source override,
    non-string chat_client attrs (MagicMock test doubles) not mis-detected,
    agent_version forwarded through _to_sdk_source, and the corrected
    dataset SDK kwarg names.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry-evals): accept canonical dimension_scores key per docs
    
    The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry-evals): add manual create_rubric_evaluator
    
    Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).
    
    Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.
    
    Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * samples(foundry-evals): manual rubric sample + namespace re-exports
    
    Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.
    
    Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.
    
    Updates the foundry-evals README with a chooser entry for the two rubric paths.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry-evals): remove rubric creation flows; keep consumption only
    
    Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.
    
    Removed creation surface area:
    
    - FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.
    
    - EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).
    
    - BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.
    
    - Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.
    
    Kept (consumption surface):
    
    - GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.
    
    - RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.
    
    - _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).
    
    Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * samples(foundry-evals): add evaluate_with_rubric_sample
    
    Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry-evals): satisfy mypy on _fetch_output_items
    
    mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link
    
    The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * test(foundry-evals): hoist repeated local imports to module top
    
    Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Fix core observability unsafe serialization of function-call arguments containing dataclass/framework objects (#6026)
    * fix: safely serialize function-call arguments in core observability
    
    Apply make_json_safe() to content.arguments in _to_otel_part() before
    building the otel message dict, so that dataclass/framework payloads
    (e.g. workflow request_info events) do not cause a TypeError when
    _capture_messages() calls json.dumps().
    
    Lift make_json_safe() into agent_framework._serialization (no new
    external deps — dataclasses/datetime only) so the core observability
    path can use it without a dependency on the ag-ui adapter.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(core): safely serialize workflow request_info payloads in observability (#5733)
    
    - Add make_json_safe() helper to recursively convert non-serializable objects
    - Use make_json_safe() in _to_otel_part() for function_call arguments
    - Fix CustomPayload test class to use @dataclass (resolves B903 lint error)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(serialization): guard callability and normalize dict keys in make_json_safe (#5733)
    
    - Use callable(getattr(obj, method, None)) instead of hasattr() so that
      non-callable attributes named model_dump/to_dict/dict do not raise
      TypeError at runtime.
    - Wrap each call in try/except TypeError to handle callables with
      mandatory arguments gracefully.
    - Convert dict keys to str() so that non-string keys (e.g. datetime,
      int) cannot cause json.dumps to raise TypeError.
    - Add regression tests for both scenarios.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address observability serialization review feedback
    
    ---------
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: refresh dev dependencies and validate runtime bounds (#6238)
    Updates third-party dev dependencies across the Python workspace and
    validates that all runtime dependency bounds still hold at both ends.
    
    Dev dependency bumps (root, lab, declarative, durabletask):
    - uv 0.11.6 -> 0.11.17, ruff 0.15.8 -> 0.15.15,
      pytest-asyncio 1.3.0 -> 1.4.0, mcp 1.27.0 -> 1.27.2,
      azure-monitor-opentelemetry 1.8.7 -> 1.8.8,
      poethepoet 0.42.1 -> 0.46.0, prek 0.3.9 -> 0.4.3,
      types-python-dateutil and types-PyYaml stub bumps.
    - Transitive Dependabot items swept via lock: idna 3.11 -> 3.17,
      pip 26.0.1 -> 26.1.2.
    
    Deliberately excluded:
    - opentelemetry-sdk stays 1.40.0: azure-monitor-opentelemetry (incl.
      1.8.8) hard-pins opentelemetry-sdk==1.40.
    - mypy stays 1.20.0 and pyright stays 1.1.408: the 2.1.0 / 1.1.409
      bumps introduce new diagnostics that fail type checking and need
      dedicated PRs.
    - rich kept as a range: agentlightning (lab[lightning]) forces
      rich==13.9.4.
    
    Code/formatting changes driven by the ruff upgrade:
    - devui lifespan now uses try/finally so shutdown cleanup always runs
      (ruff RUF075).
    - Removed unused TYPE_CHECKING imports in core and foundry flagged by
      ruff 0.15.15.
    - Reapplied ruff 0.15.15 formatting to the files it changed.
    
    Validation: validate-dependency-bounds-test "*" passes (31/31 lower +
    31/31 upper); typing 62/62; lint 31/31; devui tests pass.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add background agent support to harness agent (#6155)
    * Add background agent support to harness agent
    
    * Address PR comments
  • Python: coalesce code interpreter history chunks (#5801)
    * fix: coalesce code interpreter history chunks
    
    * fix: narrow content item list types
    
    * fix: remove redundant content list casts
  • Fix integration test worker crashes in Azure Functions on Py3.13 (#4260)
    * Initial plan
    
    * Fix integration test worker crashes on Python 3.13
    
    Three changes to prevent pytest-xdist workers from crashing during
    Azure Functions integration tests:
    
    1. Add `start_new_session=True` to subprocess on Linux so signals
       (e.g. from test-timeout) cannot propagate between the func host
       and the xdist worker process.
    
    2. Add an overall 100-second budget to the fixture setup loop so
       the retry logic never exceeds the 120-second test timeout. When
       pytest-timeout's thread method fires during fixture setup and the
       thread doesn't respond, it calls os._exit() which kills the
       xdist worker – this is the root cause of the "Not properly
       terminated" crashes.
    
    3. Remove the `UV_PYTHON: "3.10"` workaround from both workflow
       files so integration tests actually run on Python 3.13.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Validate integration tests on Python 3.13
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Revert unintentional uv.lock dependency bumps
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Use time.monotonic() instead of time.time() for fixture budget timing
    
    Addresses review feedback: monotonic clock is immune to NTP/clock
    adjustments that could skew the budget enforcement.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Fix func worker segfault on Python 3.13 by redirecting worker to Python 3.12
    
    The Azure Functions Python worker crashes with SIGSEGV (exit code 139)
    on Python 3.13 due to protobuf C extension (google._upb) compatibility
    issues.  When the test runner uses Python >=3.13, the conftest now
    automatically finds a compatible Python 3.10-3.12 and sets
    languageWorkers__python__defaultExecutablePath so the func host uses
    it for the worker process.
    
    The CI setup action also ensures Python 3.12 is available on the
    runner, falling back to uv python install if the system doesn't have
    it.
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Address code review: add path validation, clarify version range and config key format
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    * Run func worker natively on Python 3.13 by disabling dependency isolation
    
    Replace the Python 3.12 redirect workaround with the proper fix:
    set PYTHON_ISOLATE_WORKER_DEPENDENCIES=0 on Python >=3.13.
    
    The segfault (exit code 139) is caused by the Azure Functions worker's
    module isolation mechanism conflicting with protobuf's C extensions
    (google._upb) on Python 3.13.  Disabling isolation lets the worker
    load dependencies from the app's own environment, which avoids the
    crash while keeping everything running on Python 3.13.
    
    See: https://github.com/Azure/azure-functions-python-worker/issues/1797
    
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
    Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
    Co-authored-by: Laveesh Rohra <larohra@microsoft.com>
  • Python: Reorganize A2A samples and use package A2AExecutor (#6165)
    * Reorganize A2A samples: client demos in 02-agents, use package A2AExecutor
    
    - Move client samples (agent_with_a2a, a2a_agent_as_function_tools) to samples/02-agents/a2a/
    - Add new concept samples: polling, stream reconnection, protocol selection
    - Replace sample agent_executor.py with package-level A2AExecutor (stream=True)
    - Update 04-hosting/a2a to focus on server-side, point to 02-agents for clients
    - Add README.md for the new 02-agents/a2a/ sample collection
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix streaming artifact coalescing and address PR review feedback
    
    A2AExecutor fix:
    - Generate a stable artifact_id per stream in _run_stream so all streaming
      chunks share the same ID, enabling proper append=True coalescing per the
      A2A spec (TaskArtifactUpdateEvent with same artifactId).
    - Previously, item.message_id was None for OpenAI/Foundry streaming updates,
      causing the SDK to generate a new random UUID per token (100+ separate
      artifacts instead of 1 appended artifact).
    
    Sample improvements:
    - Replace join workaround with response.text now that coalescing works
    - Add background=True to stream reconnection resume call (required for
      continuation token emission on in-progress tasks)
    - Fix type ignore specificity in polling sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: [A2A] Set message_id on AgentResponseUpdate for message-bearing paths (#6163)
    Map A2A protocol message_id to AgentResponseUpdate.message_id in two paths
    where it was previously omitted, aligning with .NET behavior:
    
    1. Standalone A2AMessage: set message_id=msg.message_id (matches .NET
       ConvertToAgentResponseUpdate(Message) which sets both ResponseId and
       MessageId to message.MessageId)
    
    2. TaskStatusUpdateEvent (terminal/input_required): set
       message_id=message.message_id (matches .NET which sets
       MessageId=statusUpdateEvent.Status.Message?.MessageId)
    
    Fixes #5949
    
    Co-authored-by: Copilot <copilot@github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: consolidate MCP reliability fixes (#6145)
    * Python: consolidate MCP reliability fixes
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix MCP cleanup and metadata typing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Satisfy MCP metadata mypy typing
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Pyright metadata mapping type
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add Mistral AI embedding client package (#5480)
    * Python: Add Mistral AI embedding client package
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Address review feedback: fix dimensions check, sort embeddings by index, align docs
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Address review feedback: downgrade to alpha, remove integration tests - Change version to 1.0.0a260505 (alpha) - Update classifier to Development Status :: 3 - Alpha - Update PACKAGE_STATUS.md to alpha - Remove Mistral from integration test workflows (no API keys yet)
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Add samples directory for alpha package compliance Per python-package-management skill: alpha packages must include samples inside the package directory.
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    * Fix ruff formatting in sample file
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
    
    ---------
    
    Signed-off-by: Daria Korenieva <daric2612@gmail.com>
  • Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099)
    * Adding AgentFileStore and FileAccessProvider to support file ased operations for agents.
    
    * Address PR review feedback on FileAccessProvider
    
    - Probe symlinks on the unresolved candidate path so in-root symlinks
      cannot silently pass and out-of-root symlinks surface the correct
      error message.
    - Validate matching_lines elements in FileSearchResult.from_dict and
      raise a clean ValueError for non-mapping entries.
    - Cap search regex pattern length (256 chars) via a new
      _compile_search_regex helper to mitigate ReDoS, and surface the cap
      in the file_access_search_files tool description.
    - Skip non-UTF-8 files during filesystem search instead of aborting
      the entire directory walk.
    - Replace the module-scope trailing string in the data-processing
      sample with comments to avoid Ruff B018.
    - Remove the checked-in working/region_totals.md sample artifact so
      the save flow works from a clean checkout.
    - Expand the Windows stdout reconfiguration comment in task_runner.py
      for clarity.
    - Add tests for invalid/oversize regex, non-UTF-8 file search, and
      in-root symlink rejection.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix mypy redundant-cast in FileSearchResult.from_dict
    
    Use cast(list[object], ...) instead of cast(list[Any], ...) so the
    cast represents a real type change (lists are invariant) and is no
    longer flagged by mypy as redundant, while still satisfying pyright's
    reportUnknownVariableType. Matches the existing pattern in _memory.py.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Tighten path normalization and directory resolution in FileAccess
    
    - _normalize_relative_path now strips surrounding whitespace up front
      so leading/trailing spaces never leak into file segments, and
      rejects trailing path separators for file paths so 'foo/' is no
      longer silently coerced to 'foo'.
    - FileSystemAgentFileStore._resolve_safe_directory_path normalizes
      with is_directory=True and maps an empty normalized result to the
      root. This matches InMemoryAgentFileStore so whitespace-only
      directory inputs resolve to the root instead of raising.
    - Added tests for whitespace stripping, trailing-separator rejection,
      and whitespace-only directory listing on the filesystem store.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Harden FileAccess search and atomic save in store API
    
    - Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop.
    - Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store.
    - Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection.
    - Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`.
    - `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix Python 3.10 timeout handling and add directory arg to list/search tools
    
    - Catch asyncio.TimeoutError in _run_search_with_timeout. In Python 3.10
      asyncio.wait_for raises asyncio.exceptions.TimeoutError, which is
      distinct from the builtin TimeoutError (the two were unified in 3.11).
      Catching the asyncio alias works on every supported version.
    - Add an optional directory parameter to file_access_list_files and
      file_access_search_files so agents can enumerate / scope searches to
      nested folders, not just the store root.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address FileAccess review feedback: case, errors, signal, TOCTOU
    
    - InMemoryAgentFileStore now stores (display_name, content) so list_files
      and search_files return the original-case names callers wrote, matching
      the behaviour of FileSystemAgentFileStore on case-preserving filesystems
      and removing the silent in-memory vs. on-disk contract divergence.
    - FileSystemAgentFileStore.read_file raises ValueError instead of letting
      UnicodeDecodeError bubble for binary / non-UTF-8 input, restoring
      symmetry with search_files (which still skips) and giving the tool
      layer a recoverable type to translate.
    - Tool wrappers now catch ValueError and OSError around every operation
      and surface them as readable strings, so 'you used ..' and 'the file
      already exists' are both reported to the model the same way instead of
      the former crashing out as an unhandled exception.
    - _search_files_sync logs per skipped non-UTF-8 file at WARNING and an
      aggregate INFO summary so operators can distinguish 'no matches' from
      'half the corpus was unreadable'.
    - FileSystemAgentFileStore softens its docstrings to acknowledge the
      inherent probe-then-open TOCTOU window. On POSIX both read and write
      now pass O_NOFOLLOW so the kernel refuses if the leaf segment becomes
      a symlink between the probe and the open. Windows has no equivalent
      flag; the limitation is documented.
    - Tests cover: case preservation on list/search, ValueError on non-UTF-8
      read at the store and tool layer, tool-layer string responses for
      path-traversal and oversized-regex inputs, search-skip log output,
      symlink rejection on delete/search/list, and symlinked intermediate
      directory rejection.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address FileAccess nit comments: docstrings, enumerate, opt-in delete approval
    
    - Expand FileSearchMatch/FileSearchResult.to_dict docstrings to explain why
      the override is needed (__slots__ defeats the mixin's __dict__ iteration)
      and why exclude/exclude_none are accepted-but-ignored (mixin signature
      compatibility for callers like to_json).
    - Use enumerate(lines, start=1) in _search_file_content so the +1 below is
      no longer needed; rename loop variable to line_number for clarity.
    - Add opt-in require_delete_approval: bool = False on FileAccessProvider.
      When True, file_access_delete_file is registered with approval_mode
      'always_require' so the host must approve every delete. Default False
      preserves current behaviour and matches the .NET reference, but
      deployments that want a safer-by-default posture can enable it.
    - Add tests covering both delete approval modes.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * FileAccess: require delete approval by default
    
    Flip the default for FileAccessProvider(require_delete_approval=...) from
    False to True so destructive deletes are gated by host approval out of the
    box. Callers that want the previous autonomous behaviour (which matches the
    .NET reference) can pass require_delete_approval=False.
    
    Tests updated accordingly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fixing linkinspector by installing Chrome for puppeteer first.
    
    ---------
    
    Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python A2A: Expose supported_protocol_bindings as configurable parameter (#6098)
    * Expose supported_protocol_bindings as configurable parameter on A2AAgent
    
    Add supported_protocol_bindings parameter to A2AAgent.__init__() allowing
    users to configure which A2A protocol bindings (JSONRPC, GRPC, HTTP+JSON)
    the client prefers when connecting to remote agents.
    
    - Defaults to ["JSONRPC"] matching current behavior
    - Passes through to ClientConfig for transport negotiation
    - Replaces 4 hardcoded references with the configurable value
    
    Closes #6057
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix empty list falsy trap and add fallback path test coverage
    
    - Use 'is not None' check instead of 'or' to preserve explicit empty list
    - Add test verifying empty list is not silently replaced with defaults
    - Add test verifying fallback path uses custom bindings
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Document known protocol binding values in docstring
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Use Literal union for protocol binding type hint
    
    Provides IDE autocomplete for known values while keeping the type
    open for custom bindings (Literal is str at runtime).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: [Breaking] Refactor Skill API to async resource and script lookup (#6135)
    Port of .NET commit 08541ee5a9.
    
    Replace property-based Skill.content/resources/scripts with async
    by-name lookup methods:
    - content property -> async get_content() -> str
    - resources property -> async get_resource(name) -> SkillResource | None
    - scripts property -> async get_script(name) -> SkillScript | None
    
    SkillsProvider now always includes all three tools (load_skill,
    read_skill_resource, run_skill_script) and both instruction blocks
    regardless of whether any skills have resources or scripts.
    
    ClassSkill retains resources/scripts properties as overridable hooks
    for subclass reflection-based discovery.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Bump Python package versions for 1.7.0 release (#6142)
    Bumps the released 1.6.0 packages agent-framework, agent-framework-core, agent-framework-foundry, and agent-framework-openai to 1.7.0, with root continuing to exactly pin agent-framework-core[all]. Bumps the changed prerelease packages agent-framework-a2a, agent-framework-chatkit, agent-framework-declarative, agent-framework-devui, and agent-framework-foundry-hosting to the 260528 date stamp, raises core floors on the packages included in this release, raises Foundry's OpenAI floor alongside OpenAI, and raises ChatKit's openai-chatkit floor to the minimum version required by the current typed API usage. No beta cohort bump was applied; the absent mistal/mistral package was intentionally not bumped because no such package exists in this branch.
  • Python: [Breaking] Remove Python-only declarative actions and rename alias kinds to C# canonical names (#6126)
    * Remove Python-only declarative actions and rename alias kinds to C# canonical names
    
    * Address PR comments.
    
    * Address PR comments.
    
    * Reduce verbose and duplicate output from sample workflow.
  • Python: fix: pass Foundry agent default headers (#6040)
    * fix: pass Foundry agent default headers
    
    * test: loosen Foundry default header assertions
  • Python: Allow hosted checkpoints to restore MessageRole (#6049)
    * Python: Allow hosted checkpoints to restore MessageRole
    
    Allow Responses hosting checkpoint storage to deserialize the Azure Responses MessageRole enum that hosted workflows can persist inside Agent Framework Message objects.
    
    Add regression coverage for both direct load() and the hosted get_latest() restore path, including the plain-storage failure mode where list_checkpoints logs the blocked type and get_latest() returns None.
    
    Ruff also normalizes a duplicate contextlib import in the touched hosting module.
    
    * Address MessageRole checkpoint review comments
    
    * Cover hosted MessageRole checkpoint restore path
  • Python: Align c# and python TodoProvider tool names (#6107)
    * Align c# and python TodoProvider tool names
    
    * Potential fix for pull request finding
    
    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
    
    * Address PR review: remove __slots__ and add typed schemas for tool params
    
    - Remove __slots__ from TodoItem, TodoInput, and TodoCompleteInput classes
      (not needed for low-instance-count objects and hinders dev scenarios)
    - Add _TodoAddItemSchema and _TodoCompleteItemSchema TypedDicts to provide
      proper JSON schema for todos_add and todos_complete tool parameters
    - Use typing_extensions for Python 3.10 compatibility
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: read headers defensively to support stream wrappers without .headers (#6028) (#6029)
    `OpenAIChatClient._inner_get_response()` reads `.headers` on the raw streaming
    response returned by `client.responses.with_raw_response.create(stream=True)`
    (and its three sibling call sites - retrieve-streaming, non-streaming create
    and background retrieve) to surface the `x-ms-served-model` Azure header,
    introduced in #5910.
    
    When `azure-ai-projects>=2.1.0` experimental GenAI tracing is enabled
    (`AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true`), the instrumentor wraps the
    raw streaming response in an inline `AsyncStreamWrapper` that exposes
    `.response` but not `.headers`. Reading `raw_create_response.headers` then
    raises `AttributeError: 'AsyncStreamWrapper' object has no attribute 'headers'`,
    which `FoundryChatClient` rethrows as a `ChatClientException` and breaks every
    streaming call (workflows and free chat).
    
    Fix: read the header dict via `getattr(raw_response, "headers", None)` at all
    four call sites. `_extract_served_model()` already short-circuits on `None`,
    so the served-model surfacing degrades gracefully (model stays the deployment
    alias) instead of crashing when the response is wrapped by an instrumentor
    that does not proxy `.headers`.
    
    Regression test added:
    `test_streaming_response_without_headers_attribute_does_not_crash`
    simulates a stream wrapper that raises `AttributeError` on `.headers` and
    asserts the stream still completes with the deployment alias as `update.model`.
    
    Fixes #6028
    
    Co-authored-by: Emilien Mottet <emilien.mottet@michelin.com>
  • feat(a2a): add A2AAgentSession with reference_task_ids and input-required support (#5980)
    * feat(a2a): link follow-up messages via reference_task_ids
    
    Track the task_id from A2A responses (task, status_update, artifact_update,
    and message payloads) on session.state and include it as reference_task_ids
    on subsequent outgoing messages. This enables remote agents to correlate
    follow-up messages as task refinements per the A2A spec.
    
    Resolves #5938
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(a2a): add A2AAgentSession for typed protocol state tracking
    
    Introduce A2AAgentSession (subclass of AgentSession) with context_id,
    task_id, and task_state properties. This follows the DurableAgentSession
    pattern and mirrors the .NET A2AAgentSession design.
    
    - Track task_id, context_id, and task_state from all response payload types
    - Validate context_id consistency (raise on mismatch)
    - Auto-assign server-generated context_id when not set
    - Only A2AAgentSession gets reference tracking (no state dict fallback)
    - Plain AgentSession continues to work without reference tracking
    - Add serialization support (to_dict/from_dict)
    - Export via agent_framework.a2a and agent_framework_a2a
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * style: remove unnecessary string annotation (pyupgrade)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: use AgentSession.from_dict for state deserialization
    
    Avoids importing private _deserialize_state, matching the
    DurableAgentSession pattern.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: track context_id from message payloads in A2AAgentSession
    
    Previously, context_id was only captured from task, status_update, and
    artifact_update payloads. Message-only responses (which carry context_id
    but may lack task_id) were silently lost. This fix:
    
    - Captures msg.context_id in the message handler
    - Persists session state when either last_task_id or last_context_id is
      present (not only when task_id is truthy)
    - Only updates task_id/task_state when a task_id was actually returned
    - Adds a test for message-only context_id tracking
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * addressed comments
    
    * Gate status content to INPUT_REQUIRED/terminal states (match .NET)
    
    Match .NET's GetUserInputRequests pattern: only emit TaskStatusUpdateEvent
    message content when state is INPUT_REQUIRED or terminal. Intermediate
    status text (WORKING, SUBMITTED) is no longer surfaced to callers.
    
    When state is INPUT_REQUIRED, set additional_properties['input_required']
    = True so callers can distinguish input requests from final responses.
    
    Closes #5937
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review: remove message task_id tracking, defensive fallbacks, and input_required flag
    
    - Do not track task_id from Message payloads (simple interactions
      without task tracking)
    - Remove 'or last_task_id' fallback from status_update and
      artifact_update handlers (spec guarantees task_id is always set)
    - Remove additional_properties['input_required'] flag (content gating
      to INPUT_REQUIRED/terminal states is the signal itself)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Fix deprecated asyncio.iscoroutinefunction usage in test_cleanup_hooks.py (#4563)
    Fixes #4522
    
    Replace deprecated `asyncio.iscoroutinefunction()` with `inspect.iscoroutinefunction()`
    to resolve Python 3.13+ deprecation warning.
    
    Changes:
    - Added `import inspect` to imports
    - Replaced `asyncio.iscoroutinefunction(hook)` with `inspect.iscoroutinefunction(hook)` on line 126
    - This makes the code consistent with other test methods in the same file (lines 201, 236)
    
    The rest of the file already uses `inspect.iscoroutinefunction()` correctly, making
    this change consistent with the existing codebase pattern.
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    Co-authored-by: Claude <noreply@anthropic.com>
    Co-authored-by: Tao Chen <taochen@microsoft.com>
  • Python: fix: keep citation get_url metadata (#6037)
    * fix: keep citation get_url metadata
    
    * fix: satisfy citation metadata mypy check
  • Python: Add a HarnessAgent with available features and sample (#6041)
    * Add a HarnessAgent with available features and sample
    
    * Fix formatting
    
    * Address PR comments and fix mypy error
    
    * Add web search support to HarnessAgent
    
    * Fix build warning
    
    * Apply suggestions from code review
    
    Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
    
    * Address PR comments
    
    * Address PR comments
    
    * Address further PR comments.
    
    * Fix markdown broken link
    
    ---------
    
    Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
  • Python: feat(foundry): add to_prompt_agent / deploy_as_prompt_agent (experimental) (#5959)
    * feat(foundry): add experimental to_prompt_agent converter
    
    Adds `to_prompt_agent(agent)`, an experimental converter
    (`ExperimentalFeature.TO_PROMPT_AGENT`) that turns an Agent Framework
    `Agent` into a Foundry `PromptAgentDefinition` ready to publish via
    `AIProjectClient.agents.create_version(...)`.
    
    Behaviour:
    
    * `agent.client` must be a `FoundryChatClient` (or subclass); otherwise
      `TypeError` is raised. The model deployment name is lifted from the
      bound client so the same Agent definition used for local runs can be
      published as a hosted prompt agent without restating the model.
    * Foundry SDK tool instances (from `FoundryChatClient.get_*_tool()`) are
      passed through unchanged. AF `FunctionTool`s (and `@tool`-decorated
      callables) are emitted as Foundry `FunctionTool` declarations.
    * Local AF MCP tools cannot be expressed in a `PromptAgentDefinition`;
      the converter raises `ValueError` and points at
      `FoundryChatClient.get_mcp_tool()` for hosted MCP servers.
    * The converter walks both `agent.default_options["tools"]` and
      `agent.mcp_tools` because `normalize_tools()` splits local MCP off
      into its own list.
    
    Re-exported through the `agent_framework.foundry` lazy-loading namespace
    (updates both `__init__.py` and the `__init__.pyi` type stub).
    
    Adds a portable-agent sample showing the same `Agent` driven through
    both `agent.run(...)` and `to_prompt_agent(agent)`, and a README section
    covering the new converter.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * chore(samples): remove snippet tags from portable agent sample
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * chore(samples): inline FoundryChatClient and enable prompt-agent publish
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * chore(samples): drop async credential context manager
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry): trim README to_prompt_agent example to publish-only flow
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry): note FoundryAgent runs @tool callables for deployed prompt agents
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry): address review comments on to_prompt_agent converter
    
    * Construct `PromptAgentDefinition` `Tool` from a dict via `**tool_item`
      unpacking rather than the positional Mapping constructor \u2014 cleaner and
      matches the typical Pydantic / Azure SDK pattern.
    * Drop the redundant `isinstance(mcp_tool, MCPTool)` guard in
      `_convert_tools`; the parameter is already typed `Iterable[MCPTool]` so
      the second `raise` was unreachable. The remaining single `raise`
      fires for every entry as intended.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry): match Agent.__init__ model resolution in to_prompt_agent
    
    * Read the model from `agent.default_options.get("model")` first,
      falling back to `agent.client.model`. This mirrors the order
      `Agent.__init__` uses (`_agents.py:740`) when assembling
      default_options, so the model the agent runs with is the same model
      the converter publishes \u2014 e.g. when the caller passes
      `default_options={"model": "..."}` to override the bound client.
    * Updated the missing-model error message to point at both the client
      and the default_options paths.
    * Added tests:
      * tool-only agent with no `instructions` produces a definition
        where `instructions` is `None` and is omitted from the dict
        payload (`Agent.__init__` strips None values from default_options
        before storing them).
      * `default_options['model']` wins over the bound client's model.
      * Fallback to client.model when default_options has no model.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry): add deploy_as_prompt_agent helper + samples
    
    Adds `deploy_as_prompt_agent(agent)`, a convenience wrapper around
    `to_prompt_agent` that reuses the bound FoundryChatClient's project
    client to call `project_client.agents.create_version(...)`. Defaults
    `agent_name` / `description` from `agent.name` / `agent.description`
    so the Agent stays the single source of truth.
    
    * Exposed from `agent_framework_foundry` and the lazy-loading
      `agent_framework.foundry` namespace (including the .pyi stub).
    * Marked experimental with the existing
      `ExperimentalFeature.TO_PROMPT_AGENT` tag.
    * Tests cover the happy path, name/description defaulting, explicit
      override, no-name error, metadata + description forwarding, extra
      kwargs passthrough, and the experimental metadata.
    
    Samples:
    * Renamed the existing sample to `creating_prompt_agents.py`, drops
      'portable' wording, presents `deploy_as_prompt_agent` first as the
      recommended path and `to_prompt_agent` + `AIProjectClient` as the
      two-step alternative, and adds a cleanup step that deletes the
      published agent so re-runs stay idempotent.
    * New `using_prompt_agents.py` shows the end-to-end loop: deploy the
      agent, connect to it with `FoundryAgent` passing the same local
      `@tool` callable, run a query against the deployed prompt agent,
      then clean up.
    
    README updated to introduce `deploy_as_prompt_agent` as the
    recommended path and link to both runnable samples.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry): restore missing-model ValueError in to_prompt_agent
    
    The check was accidentally dropped while reworking docstrings in the
    previous commit. Test `test_to_prompt_agent_rejects_missing_model`
    exercises this path and was failing on CI as a result.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * refactor(foundry): rename deploy_as_prompt_agent -> create_prompt_agent
    
    Renames the helper across the foundry package, core lazy-loader stubs,
    tests, README and samples. The new name better matches the action
    performed (a prompt-agent definition is created in Foundry) and is
    consistent with the surrounding ''create_*'' API surface.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * refactor(foundry): drop create_prompt_agent, enrich to_prompt_agent params
    
    Remove the create_prompt_agent helper and consolidate on to_prompt_agent.
    Expose every PromptAgentDefinition parameter that has either an Agent
    Framework equivalent (sourced from default_options) or no equivalent
    (accepted as a keyword argument).
    
    * default_options-sourced (with kwarg overrides):
      temperature, top_p, string tool_choice
    * kwarg-only Foundry knobs:
      reasoning, text, structured_inputs, rai_config, ToolChoiceParam tool_choice
    
    Precedence is always: explicit keyword > default_options entry > unset.
    
    Tests cover every path (defaults, default_options, kwargs, kwarg override).
    Samples and README rewritten around the enriched to_prompt_agent.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * refactor(foundry): single source of truth for prompt-agent options
    
    Stop duplicating the generation-parameter surface between FoundryChatOptions
    and to_prompt_agent. Translate every field with an Agent Framework equivalent
    (temperature, top_p, tool_choice, reasoning, response_format/text/verbosity)
    from agent.default_options via a new RawFoundryChatClient helper
    _prepare_prompt_agent_options. Only Foundry-specific fields with no AF
    equivalent — structured_inputs and rai_config — remain as keyword arguments
    on to_prompt_agent.
    
    - tool_choice is dropped when there are no tools (mirrors _prepare_options
      semantics and avoids polluting tool-less prompt agents with Agent.__init__'s
      'auto' default).
    - response_format Pydantic models route through
      openai.lib._parsing._responses.type_to_text_format_param; dict shapes go
      through the existing _prepare_response_and_text_format helper.
    - default_options is not mutated; text dict is defensively copied.
    
    Tests, README, and creating_prompt_agents.py sample updated to reflect the
    new single-source model.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry): consolidate prompt-agent sample
    
    Drop creating_prompt_agents.py (the publish-only variant) and rename
    using_prompt_agents.py to foundry_prompt_agents.py so the single sample
    covers the full convert -> publish -> connect -> run loop. Update the
    README link list accordingly.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(foundry): run local Agent + deployed agent in same sample
    
    Add an agent.run() call against the local Agent before publishing, then run
    the deployed prompt agent on the same query. Expand the docstring with a
    compare-and-contrast covering runtime/latency, configurability, and
    persistence/sharing differences between the two execution paths.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * test(foundry): cover conflicting response_format + text.format in to_prompt_agent
    
    Exercises the ValueError path when a Pydantic response_format would overwrite
    an explicit text.format mapping with a different shape. Lifts _chat_client.py
    coverage from 89% to 90%.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * refactor(foundry): move _prepare_prompt_agent_options into _to_prompt_agent
    
    Lift the translation helper off RawFoundryChatClient and into the
    _to_prompt_agent module as a module-private function that takes the client
    as its first argument. The chat client no longer needs to carry a method
    whose only consumer is the prompt-agent converter, while still serving as
    the source of the request-path helper (_prepare_response_and_text_format)
    that the converter reuses for dict-shaped response_format values.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(python): codify GA terminology + post-run docs review
    
    Add two pieces of guidance to python/AGENTS.md:
    
    * Terminology - reserve 'GA' for hosted services; use 'released' or 'stable'
      for Agent Framework code/features to match the feature-lifecycle stages.
    * Maintaining Documentation - review AGENTS.md and skills at the end of every
      run and update any guidance the conversation made stale; before adding a
      new principle, ask the user to confirm it should be captured.
    
    Also pulls in a docstring fix in foundry_prompt_agents.py that swaps the
    stray 'GA' for 'released', applying the new terminology rule.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * address PR review: strict=True default, Tool._deserialize dispatch, sample cleanup safety
    
    - FunctionTool published as strict=True so the server-side schema validation
      matches what the local FoundryAgent(tools=[same_callable]) dispatcher
      enforces. AF FunctionTool has no 'strict' attribute, so the safer default
      is used uniformly instead of silently downgrading to a permissive contract.
    - _validate_mapping_tool now dispatches through ProjectsTool._deserialize so
      dict-shaped tools rehydrate to the concrete subclass (FunctionTool,
      WebSearchTool, ...) via the 'type' discriminator instead of returning a
      generic Tool. Added a test that asserts isinstance(WebSearchTool) and a
      new test for the function-typed dict path.
    - foundry_prompt_agents.py sample now wraps credential + project client in
      async with and the create_version / run flow in try/finally so a failure
      on connect or run still deletes the published prompt agent rather than
      leaving an orphaned, billable resource in the user's Foundry project.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(ci): correct linkspector ignorePattern typo (./pulls -> ./pull)
    
    GitHub PR URLs use the singular segment /pull/N (compare to /issues/N
    for issues). The existing './pulls' ignore pattern never matched
    anything as a result, so legitimately stale PR links (e.g. PRs deleted
    from forks) surface as linkspector failures on unrelated PRs.
    
    This is the same convention the './issues' rule above already follows.
    Fixes the markdown-link-check failure on a dangling link in
    dotnet/src/Microsoft.Agents.AI.DurableTask/CHANGELOG.md.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Add a BackgroundAgentsProvider for python (#6069)
    * Add a BackgroundAgentsProvider for python
    
    * Address PR comments and fix linting warnings
    
    * Address PR comment
  • Python: Fix DevUI streaming memory growth regression (#6038)
    * Fix DevUI streaming memory growth regression
    
    Bounds retained streaming/debug state in DevUI and strengthens browser regression coverage for long streamed responses.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address DevUI memory review feedback
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix DevUI bundle trailing whitespace
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: fix(openai): guard against null delta in streaming chunks from non-co… (#5734)
    * fix(openai): guard against null delta in streaming chunks from non-compliant providers (#5732)
    
    * chore: resolve nit and align with project style
    
    ---------
    
    Co-authored-by: Sergey Borisov <sergey.borisov@dataimpact.io>
    Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com>
  • Python: Add Python parity sample for invoking Foundry Toolbox tools from declarative workflows (#5933)
    * Add Python parity sample for invoking Foundry Toolbox tools from declarative workflows
    
    * Python: address PR review on declarative toolbox sample
    
    Two security fixes for PR #5933:
    
    1. Add safe_mode flag to WorkflowFactory (default True) mirroring
       AgentFactory. Gates =Env.* exposure inside DeclarativeWorkflowState
       PowerFx symbols via _safe_mode_context, so workflow YAML loaded from
       untrusted sources no longer leaks the host's full os.environ snapshot
       into PowerFx evaluation. The flag is also forwarded to the
       internally-constructed AgentFactory so inline agent definitions
       follow the same policy.
    
    2. Pin the invoke_foundry_toolbox_mcp sample's _client_provider to the
       resolved toolbox endpoint. The bearer-authenticated httpx client is
       now only returned when MCPToolInvocation.server_url matches the
       toolbox URL case-insensitively; any other URL gets None (the default
       unauthenticated path), preventing the Foundry AAD bearer token from
       being attached to a mis-configured or injected server URL. Mirrors
       the .NET sample's httpClientProvider guard.
    
    The sample is updated to opt in to safe_mode=False because its YAML
    intentionally uses =Env.FOUNDRY_TOOLBOX_* to keep configuration in env
    vars under the developer's control.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Fix pyright issues.
    
    * Addressed PR comments.
    
    * Fix CI pipelines.
    
    * Resolve PR comments
    
    * Revamped sample to address PR comments.
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Align ModeProvider tool names and instructions (#6071)
    * Align ModeProvider tool names and instructions
    
    * Address PR comments
  • Python: fix(core): point @experimental warnings at user code, not stdlib internals (#5996)
    * fix(core): point @experimental warnings at user code, not stdlib internals
    
    Previously the wrappers installed by @experimental called warnings.warn
    with a fixed stacklevel=3. ABCMeta inserts an extra abc.__new__ frame
    when an experimental ABC is subclassed, so the warning landed inside
    abc.py (or <frozen abc>:106 on modern CPython) instead of the user's
    class Sub(...) line.
    
    Resolve the user frame by walking inspect.currentframe(), skipping
    frames whose module name is abc/functools/typing/contextlib (or
    submodules), then emit via warnings.warn_explicit so the recorded
    filename/lineno point at user code. Falls back to warnings.warn with
    stacklevel=2 if no user frame is found. Module-name matching is used
    because frozen stdlib modules report '<frozen abc>' as their filename.
    
    Also install a one-line warnings.formatwarning specifically for
    FeatureStageWarning so 'file:line: ExperimentalWarning: [ID] Name ...'
    prints without the secondary source-snippet line. Other categories
    delegate to the stdlib default formatter unchanged.
    
    Added a regression test that subclasses an @experimental ABC inside
    warnings.catch_warnings and asserts the recorded filename equals the
    test file.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(core): address review feedback on @experimental warning fix
    
    - Make _install_feature_stage_formatter idempotent: tag the installed
      formatter with a marker attribute and short-circuit re-installation,
      so re-imports/reloads don't wrap the formatter on top of itself.
      Also expose the previous formatter via __wrapped__ for restoration.
    - Avoid leaking frame references in _resolve_user_frame: capture data
      into plain locals inside try and del frame/candidate in finally,
      per CPython's guidance on inspect.currentframe usage.
    - Drop redundant _WARNED_FEATURES.clear() in the new ABC subclass test
      (the autouse fixture already handles it).
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * changed query for foundry web search test
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: bump package versions for 1.6.0 release (#6017)
    * Python: bump package versions for 1.6.0 release
    
    - Released cohort (agent-framework, core, openai, foundry): 1.5.0 -> 1.6.0
    - Beta packages (21 packages): 1.0.0b260519 -> 1.0.0b260521
    - Alpha packages (azure-contentunderstanding, foundry-hosting, gemini, monty): 1.0.0a260518/19 -> 1.0.0a260521
    - ag-ui stays at 1.0.0rc2, orchestrations at 1.0.0rc1 (dependency bounds updated)
    - Inter-package dependency lower bounds updated (>=1.5.0,<2 -> >=1.6.0,<2)
    - Update CHANGELOG compare links
    - uv.lock refreshed
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review: bump RC packages, add shell tool to changelog
    
    - ag-ui: 1.0.0rc2 -> 1.0.0rc3
    - orchestrations: 1.0.0rc1 -> 1.0.0rc2
    - Add shell tool (#5664) to CHANGELOG
    - uv.lock refreshed
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Shell tool with support for local and Docker (#5664)
    * feat(tools): add cross-OS LocalShellTool in new agent-framework-tools package
    
    Introduces a safe, cross-OS local shell tool as the first citizen of a new
    
    agent-framework-tools workspace package. Supports persistent (default) and
    
    stateless modes across pwsh/powershell.exe/bash/sh, with policy denylist,
    
    allowlist, approval gating, process-tree kill on timeout, output truncation,
    
    and audit hooks. Integrates with existing provider get_shell_tool(func=...)
    
    factories via FunctionTool kind='shell'.
    
    See docs/decisions/0026-builtin-tools-local-shell.md for the full design.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(tools): security hardening for LocalShellTool
    
    Codifies what LocalShellTool does and does not defend against, and
    
    delegates the security-relevant lifecycle primitive to a battle-tested
    
    library instead of hand-rolled per-OS code.
    
    Changes:
    
    - Adopt psutil for cross-OS process-tree termination (executor + session).
    
      Replaces hand-rolled taskkill/killpg with one canonical implementation.
    
    - Resolve taskkill.exe to absolute %SystemRoot%\System32 path so PATH
    
      poisoning cannot redirect us to an attacker-supplied binary.
    
    - Reframe ShellPolicy docstring + ADR + README: denylist is a guardrail,
    
      not a security boundary.
    
    - Require acknowledge_unsafe=True to set approval_mode='never_require',
    
      making the unsafe path explicitly opt-in with a self-documenting name.
    
    - Add tests/test_security.py codifying named CVE-style cases. Defenses
    
      we DO claim are asserted; non-defenses (denylist bypasses via
    
      backslash insertion, variable expansion, interpreter escape, base64,
    
      alternative tools, PowerShell-native verbs) are documented as
    
      expected-to-pass tests so residual risk stays visible.
    
    - Add Threat Model + Confidence Strategy sections to ADR 0026.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(tools): add DockerShellTool sandboxed shell tier
    
    Adds a container-backed shell executor as the recommended pattern for untrusted-input shell workflows. The container provides the security boundary (--network none, non-root user, --read-only, --cap-drop ALL, no-new-privileges, memory/pids limits, tmpfs /tmp), so approval gating is optional unlike LocalShellTool.
    
    Also introduces a ShellExecutor Protocol so callers can plug in custom backends (Firecracker, SSH, WASI) without forking the framework.
    
    Removes the planned HyperlightShellExecutor follow-up from ADR 0026: Hyperlight is a WASM code sandbox with no kernel/userland/shell binary, so a Hyperlight-backed shell is not viable. Docker is the realistic sandbox tier for shell.
    
    Tests: 11 unit tests for argv builders + lifecycle (no Docker daemon required); 3 integration tests gated on is_docker_available().
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(tools): backport shell-tool fixes from .NET parity review
    
    Applies the applicable subset of bug fixes accumulated during the
    .NET shell-tool PR review (microsoft/agent-framework#5604) to the
    Python shell tool.
    
    A1 - Quote workdir safely in _maybe_reanchor
    
      Previously _tool.py used double-quote interpolation when emitting
      the cd/Set-Location prefix, which expanded $VAR, $(), and backticks
      in the workdir path. A workdir containing shell metacharacters could
      trigger arbitrary command execution before the user command ran.
    
      Replaced with single-quote escaping helpers _quote_posix and
      _quote_powershell that emit literal-string forms safe for both
      hosts.
    
    A5/A6 - Consolidate truncation to a single byte-aware helper
    
      Extracted a shared truncate_head_tail / truncate_text_head_tail
      helper in _truncate.py. The new implementation distributes odd
      caps so head receives floor(cap/2) and tail receives ceil(cap/2)
      bytes, matching the .NET round-9 fix and ensuring no input bytes
      are silently dropped on the boundary.
    
      _session.py previously truncated by Python str length while the
      caller passed _max_output_bytes - the unit mismatch is now gone:
      raw byte buffers go through truncate_head_tail and decoded text
      goes through truncate_text_head_tail.
    
    Unit tests added for the truncate and quote helpers.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(tools): tone down narrative and overconfident comments in shell tool
    
    The shell tool's docstrings and comments contained two patterns that
    the .NET review pushed back on:
    
    - Narrative framing about implementation history ("hard-won",
      "we sidestep", "design inspiration: ...", competitor framework
      name-drops in module docstrings).
    - Overstated security guarantees ("battle-tested",
      "reasonable for untrusted input", "recommended executor for any
      agent that runs commands from untrusted input",
      "destructive commands are blocked", "safe local shell tool",
      "blocks shell injection").
    
    Rewrites the affected docstrings and comments to describe what the
    code does in neutral terms. Behaviour is unchanged.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(tools): add ShellEnvironmentProvider for the Python shell tool
    
    Ports the .NET ShellEnvironmentProvider as a Python ContextProvider
    so agents using LocalShellTool or DockerShellTool can be primed with
    an accurate description of the shell they're talking to (family,
    version, OS, working directory, and which CLIs are available).
    
    The provider runs probes through any ShellExecutor, caches the
    resulting snapshot, and on every before_run extends the session
    instructions with a markdown block describing the shell idiom to
    use. A failed first probe leaves the cache empty so the next call
    retries (no permanent poisoning).
    
    Probe failures from a narrow set of expected error types
    (ShellCommandError, ShellExecutionError, ShellTimeoutError, and
    asyncio.TimeoutError from the per-probe timeout) are recorded as
    None fields in the snapshot. Other exceptions propagate. Tool
    names are validated against ^[A-Za-z0-9._-]+$ before being
    interpolated into a probe command.
    
    Includes 12 unit tests covering happy path, stderr fallback,
    timeout handling, expected/unexpected exception paths, malicious
    tool name rejection, case-insensitive deduplication, retry after
    failure, concurrent first-callers sharing one probe, and the
    default and custom formatter paths.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs(tools): document ShellEnvironmentProvider and finish comment cleanup
    
    Add a README section introducing ShellEnvironmentProvider, soften two remaining overconfident security-boundary comments in _executor_base.py and the DockerShellTool class docstring, and add a sample (shell_with_environment_provider.py) that demonstrates the provider in stateless and persistent modes.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * refactor(tools): move shell samples to python/samples/02-agents/tools
    
    The repository convention is to host samples under python/samples/ rather than inside the package directory. Move the two net-new shell samples (allow-list and environment-provider) to python/samples/02-agents/tools/ and drop the in-package samples/ directory; the existing top-level providers/openai/client_with_local_shell.py already covers the basic LocalShellTool walkthrough.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * test(tools): cover confine_workdir default and ShellResult.format_for_model
    
    Two new tests in test_local_shell_tool.py exercise the default confine_workdir=True behaviour on POSIX and PowerShell, asserting that 'cd' inside one persistent-mode call does not leak into the next. A new test_shell_result.py module provides direct unit coverage for every conditional branch of ShellResult.format_for_model (stdout, truncated, stderr, timed_out, exit_code) so regressions in the LLM-facing format are caught immediately.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(tools): address PR #5664 review feedback
    
    - _tool.py: detect PowerShell via is_powershell() helper instead of basename string match
    
    - _environment.py: use public ContextProvider import (no private _ prefix)
    
    - _session.py: trim _stdout_buf/_stderr_buf after copying to avoid unbounded retention across calls
    
    - _docker.py: short-circuit start()/close() in stateless mode; add configurable shell kwarg (default bash, e.g. 'sh' for alpine)
    
    - tests: parenthesized multi-line assert; alpine integration tests now pass shell='sh'
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(tools): satisfy CI quality gates
    
    - pyupgrade: drop quoted self-class refs in __aenter__/method annotations
    
    - ruff format: reflow long lines per workspace style
    
    - pyright: assert psutil non-None in optional-import branch; lowercase mutable module globals; annotate _approval_mode as Literal so tool() Literal-typed kwarg is accepted; add ... body to ShellExecutor.run protocol; remove unused deprecated _kill_tree wrapper
    
    - tests: skip docker integration tests on win32 (Windows containers don't support --read-only / alpine images)
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Remove DEFAULT_DENYLIST; document single-session ownership; fix bandit findings
    
    Mirrors the .NET PR #5604 cleanup:
    
    - Remove DEFAULT_DENYLIST from ShellPolicy. ShellPolicy() now ships with an empty deny-list; operators opt into site-specific patterns explicitly. No major agent framework uses regex matching as a primary security control; AutoGen v2 removed theirs. Approval gating + sandbox tier remain the real boundaries.
    
    - Rewrite module / class docstrings to frame ShellPolicy as a UX pre-filter, not a security control.
    
    - Add Single-session ownership paragraphs to ShellExecutor, ShellSession, LocalShellTool, and DockerShellTool: a persistent-mode tool is owned by exactly one conversation / agent session; do not share across users or concurrent conversations.
    
    - Tests now supply explicit deny patterns instead of relying on a default.
    
    - Address Pre-commit Hooks (bandit) CI failures: convert internal-invariant asserts to explicit RuntimeError, annotate intentional subprocess/shell usage with # nosec, document container-internal /tmp paths.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR #5664 round-2 review feedback
    
    Deny-list documentation drift:
    
    - README and the OpenAI/local-shell sample no longer claim a built-in deny-list of destructive commands. ShellPolicy is described as an optional, operator-supplied UX pre-filter; the real boundaries remain approval gating and the sandbox tier.
    
    Behavioural fixes called out in review:
    
    - ShellPolicy.evaluate() now denies empty / whitespace-only commands explicitly instead of returning allow with no rationale.
    
    - truncate_head_tail() raises ValueError for cap <= 0 instead of silently returning the full input with truncated=False, which previously could defeat output-capping in callers that mis-configured the budget.
    
    - LocalShellTool.as_function() / DockerShellTool.as_function() return the ShellCommandError text directly so the model sees a single, non-redundant 'Command rejected by policy: …' message instead of the prior duplicated 'Command blocked by policy: Command rejected …' wrapping.
    
    - ShellSession POSIX sentinel trailer now snapshots and restores the prior errexit (set -e) state around the trailer, so a user 'set -e' in the persistent shell is no longer permanently disabled by the next run().
    
    Tests:
    
    - New test_shell_parse_rc.py covers the full _parse_rc() edge-case surface (zero, positive, negative, CRLF, no newline, missing prefix, empty input, non-digits, trailing garbage, partial digits).
    
    - test_policy.py asserts the new empty-command deny.
    
    - test_shell_truncate_and_quote.py asserts ValueError for cap=0 and cap<0.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review feedback for shell tool
    
    - _resolve.py: reject empty/whitespace shell override string
    - _tool.py / _docker.py: mode-aware default tool description (persistent vs stateless)
    - _tool.py: fix misleading workdir docstring (re-anchor, not blocking)
    - _types.py: emit stream-agnostic [output truncated] marker
    - _policy.py: declare _denies/_allows as dataclass fields
    - _environment.py: use $(pwd) instead of $PWD in POSIX probe
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address PR review feedback: shell override flag + probe timeout safety
    
    - _resolve.py: in stateless mode, ensure shell overrides end with -c/-Command so commands aren't misinterpreted as script-file paths.
    - ShellExecutor.run / LocalShellTool.run / DockerShellTool.run now accept an optional 	imeout kwarg; ShellEnvironmentProvider drops the outer asyncio.wait_for and lets the executor enforce the probe timeout internally, so cancellation no longer risks leaving a hung subprocess or corrupted session.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * Address review feedback: docker isolation + lifecycle robustness
    
    - pyproject.toml: bump agent-framework-core minimum from 1.2.0 to 1.2.2 to align with the rest of the workspace.
    - _docker.py: validate extra_run_args at construction time and reject flags that would dismantle the isolation defaults (--privileged, --cap-add, --security-opt, --network/--net, -v/--volume/--mount, --device, --pid, --ipc, --userns, --user, --read-only, --tmpfs, --add-host, --gpus, --cgroupns, --device-cgroup-rule); also documented the warning on the docstring.
    - _docker._stop_container: retry docker rm -f once and log a warning/error when it does not succeed, so operators can audit leaked containers instead of getting a silent success.
    - _docker._run_stateless timeout path: fall back to docker rm -f when docker kill fails or times out (--rm only reaps on clean exit), and log instead of silently swallowing communicate() errors.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: alliscode <bentho@microsoft.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
  • Python: Prevent duplicate system instructions in Python telemetry (#5981)
    * Initial plan
    
    * Fix duplicated system instructions in Python telemetry
    
    * Clarify telemetry message filtering
    
    * test: cover separate and in-history system messages
    
    * Clarify observability message logging split
    
    * Simplify observability logging serialization
    
    * Harden observability regression test
    
    * Reuse observability span message serialization
    
    * Clarify observability logging loops
    
    * Polish observability message serialization
    
    * Tighten observability zip checks
    
    * Refactor observability message capture loop
    
    * Fix telemetry logging for separate system instructions
    
    * Refine observability OTEL message typing
    
    * Restore prepended-instruction logging path in _capture_messages
    
    * Revert logging change in _capture_messages; keep chat-history-only logging
    
    ---------
    
    Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
  • Python: feat(a2a): use non-streaming transport and return_immediately for background ops (#5963)
    * feat(a2a): use non-streaming transport and return_immediately for background ops
    
    When stream=False, use a client configured with streaming=False so the
    SDK sends a single HTTP POST to message/send instead of opening an SSE
    connection via message/stream. This matches the A2A protocol's design:
    non-streaming calls use direct request/response, streaming calls use
    Server-Sent Events.
    
    Also sets return_immediately=background on SendMessageConfiguration so
    the server respects the caller's intent for background operations.
    
    Changes:
    - Create separate streaming and non-streaming internal clients (sharing
      the same httpx connection pool) to match protocol transport semantics
    - Select non-streaming client for run(stream=False) calls
    - Add SendMessageConfiguration with return_immediately=background
    - Fallback to streaming client when non-streaming unavailable (e.g. user
      provides their own client via constructor)
    - Add tests for client selection and return_immediately behavior
    
    Resolves microsoft/agent-framework#5936
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: address PR review feedback
    
    - Initialize last_request in MockA2AClient.__init__ for explicit state
    - Use 'is not None' instead of truthiness for _non_streaming_client check
    - Assert return_immediately propagates through non-streaming client path
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: only set configuration when background=True
    
    Only attach SendMessageConfiguration to the request when background=True,
    keeping requests minimal and preserving server-side defaults for normal
    (foreground) operations. This follows the framework pattern of only
    setting optional fields when they have meaningful values.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix: only set return_immediately for non-streaming background ops
    
    Per the A2A spec, return_immediately only applies to message/send
    (non-streaming). It has no effect on streaming operations. Only set
    the configuration field when both background=True and stream=False.
    
    Adds test verifying streaming+background does not set return_immediately.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: feat(foundry): add experimental hosted tool factories on FoundryChatClient (#5958)
    * feat(foundry): add experimental hosted tool factories on FoundryChatClient
    
    Adds eight new `@experimental` static factory methods on `FoundryChatClient`
    covering Foundry-hosted tools that previously had no helper:
    
    - get_azure_ai_search_tool
    - get_sharepoint_tool
    - get_fabric_tool
    - get_memory_search_tool
    - get_computer_use_tool
    - get_browser_automation_tool
    - get_bing_custom_search_tool
    - get_a2a_tool
    
    All factories are marked with the new `ExperimentalFeature.FOUNDRY_TOOLS` tag
    and resolve the underlying `azure-ai-projects` preview classes lazily through
    a `_require_sdk_class` helper so older SDK versions still import cleanly and
    fail with a clear `ImportError` only on use.
    
    Tests cover each factory's return type and field wiring, the experimental
    metadata, and the missing-SDK-class fallback.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * test(foundry): address review comments on tool-factory tests
    
    * Skip preview-tool tests gracefully (`_skip_if_sdk_class_missing`) when
      the installed `azure-ai-projects` does not expose the required preview
      class, matching the lazy-import guard in production code so the test
      suite stays green on older SDK installs.
    * Add `filterwarnings("ignore::FutureWarning")` to each new tool-factory
      test (and the parametrized metadata test) so they remain stable under
      strict warning configurations \u2014 the global dedup in
      `_feature_stage._WARNED_FEATURES` makes `pytest.warns` brittle across
      ordered runs.
    * Use `monkeypatch.setattr(..., None, raising=False)` instead of
      `delattr` in the missing-SDK-class test so it works for modules that
      implement PEP 562 `__getattr__`.
    * Split the long `get_bing_custom_search_tool` return into two lines for
      readability.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * fix(foundry): harden tool-factory kwargs against silent override
    
    * Reorder the dict-literal kwargs assembly in get_azure_ai_search_tool,
      get_memory_search_tool, and get_bing_custom_search_tool so explicit
      parameters always take precedence over **kwargs (matching the safe
      pattern already used in get_a2a_tool). This prevents a caller
      passing `project_connection_id`, `index_name`, `memory_store_name`,
      `scope`, or `instance_name` through `**kwargs` from silently
      overriding the explicit security-sensitive arguments.
    * Update the README experimental note to reflect once-per-feature-id
      dedup semantics of `_feature_stage._WARNED_FEATURES` rather than
      claiming a per-factory "first use" warning.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * feat(foundry): split FOUNDRY_TOOLS / FOUNDRY_PREVIEW_TOOLS, add bing-grounding
    
    - Add ExperimentalFeature.FOUNDRY_PREVIEW_TOOLS to distinguish wrappers around
      preview Foundry SDK tool classes (Sharepoint/Fabric/Memory/ComputerUse/
      BrowserAutomation/BingCustomSearch/A2A) from FOUNDRY_TOOLS, which is for
      GA-SDK wrappers that are simply new in agent-framework-foundry
      (AzureAISearch, BingGrounding).
    - Add get_bing_grounding_tool factory and a 'Choosing a web grounding tool'
      comparison block on get_web_search_tool / get_bing_grounding_tool /
      get_bing_custom_search_tool docstrings.
    - Drop the _require_sdk_class lazy resolver: every guarded class is available
      at azure-ai-projects>=2.1.0 (the package floor), so import them eagerly.
      Concrete return types replace 'Any'.
    - README: split the experimental factories into two tables, one per feature
      flag, with a note explaining the distinction.
    - Tests: split into FOUNDRY_TOOLS / FOUNDRY_PREVIEW_TOOLS factory cases;
      drop the obsolete missing-SDK-class ImportError test.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • Python: Show more authentication methods in Foundry Toolbox MCP (#5719)
    * Show more authentication methods in Foundry Toolbox MCP
    
    * Remove hardcoded toolbox version num
    
    * Add Foundry MCP OAuth consent handling
    
    * Use message instead of the dedicated item type
    
    * Go back to using OAuthConsentRequestOutputItem
    
    * WIP: sample testing
    
    * Update error code
    
    * Address review on Foundry Toolbox MCP samples
    
    Reviewed feedback addressed:
    
    - Drop the branch-pinned `git+https://...@feature/...` entries from
      `04_foundry_toolbox/requirements.txt`; restore the simple comment + `mcp`
      runtime dep. The git pins were only useful while iterating on the PR and
      shouldn't ship. (eavanvalkenburg)
    
    - Fix the `/toolsets/` typo in both `04_foundry_toolbox/README.md` and
      `06_files/README.md`. Verified empirically against the
      research_toolbox in the test workspace: the toolbox MCP gateway lives at
      `/toolboxes/{name}/mcp?api-version=v1` and requires the
      `Foundry-Features: Toolboxes=V1Preview` header. `/toolsets/{name}/mcp`
      returns 403 with `preview_feature_required: Toolsets=V1Preview` (a
      different opt-in feature).
    
    - Wrap `httpx.AsyncClient(...)` in `async with ... as http_client:` in both
      samples so the connection pool is cleaned up. (Copilot reviewer)
    
    - Make the `TOOLBOX_NAME` env var consistent in both samples. Previously the
      tool name silently fell back to `"toolbox"` when `TOOLBOX_NAME` was unset,
      but `resolve_toolbox_endpoint()` still required `TOOLBOX_NAME` and would
      raise `KeyError`. The samples now resolve the endpoint once and derive the
      tool name from the resolved URL when `TOOLBOX_NAME` isn't set, so the
      local tool name always matches the upstream toolbox identity regardless
      of which env var the user set. (Copilot reviewer)
    
    - Rename `_responses.is_consent_error` to `consent_url_from_error`: the
      helper returns `str | None` (the consent URL), not a bool, so the new
      name matches behavior. Update the test class accordingly. (eavanvalkenburg)
    
    - Tighten `_handle_inner_agent`'s lazy-entry catch from `Exception` to
      `AgentFrameworkException`, the type the MCP layer actually wraps consent
      errors in via `MCPStreamableHTTPTool.__aenter__` →
      `ToolExecutionException(inner_exception=mcp_error)`. Network failures,
      cancellations, and other non-framework exceptions now propagate normally
      instead of being briefly caught and re-raised. The test helper
      `_make_consent_error` is updated to use `ToolExecutionException` so it
      matches the real-world wrapping. (eavanvalkenburg)
    
    - Clarify the `github_pat` description in `agent.manifest.yaml` to note
      it's only needed when the PAT-based connection (`github-mcp-pat-conn`)
      is chosen; users selecting the OAuth2 connection (`github-mcp-oauth-conn`)
      can leave it empty. (Copilot reviewer)
    
    Validation: ran both samples end-to-end against a real Foundry toolbox
    (`research_toolbox`) -- the samples connect successfully and the agent
    lists the toolbox's MCP tools (`api_specs___fetch_azure_rest_api_docs`,
    etc.). `uv run poe test -P foundry_hosting` passes (119 tests), pyright +
    mypy clean.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: fix broken Foundry samples link in 04_foundry_toolbox README
    
    The previous URL pointed to an old location of the toolbox supported-scenarios
    doc; the doc moved to /samples/python/hosted-agents/SUPPORTED_TOOLBOX_SCENARIOS.md
    and the old /samples/python/toolbox/azd path now 404s.
    
    Caught by the markdown-link-check CI step.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  • [BREAKING] Python: Enable instrumentation by default (#5865)
    * Enable instrumentation by default
    
    * Update samples
    
    * Optimization when span is not recording
    
    * Address Copilot comments
    
    * Revert uv.lock
    
    * Add warning
    
    * Formatting
    
    * Fix mypy
    
    * Add disable_instrumentation() with sticky user-intent semantics
    
    Add a public disable_instrumentation() entry point so users can explicitly opt
    out of Agent Framework telemetry, with a sticky-disable flag that makes the
    user's intent "leading" — no framework code path (foundry's
    configure_azure_monitor, configure_otel_providers, enable_instrumentation,
    enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_*
    writes) can re-enable instrumentation until the user explicitly clears the
    disable with enable_instrumentation(force=True) /
    enable_sensitive_telemetry(force=True).
    
    Also addresses the two remaining unresolved review threads on the PR:
    1. test_observability_settings_defaults_instrumentation_true pins the new
       "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior.
    2. test_enable_instrumentation_reads_env_sensitive_data restores coverage
       for the post-import load_dotenv() fallback path.
    
    Implementation:
    - ObservabilitySettings.enable_instrumentation / enable_sensitive_data become
      properties backed by _enable_*. While _user_disabled is True, the getters
      return False and the setters drop True writes (defense in depth so third-
      party writes can't subvert the disable).
    - Public is_user_disabled read-only property lets integrations (e.g. foundry's
      configure_azure_monitor) cheaply check the disable state without poking at
      privates.
    - enable_instrumentation() and enable_sensitive_telemetry() short-circuit with
      an info log when disabled; gain a force=True kwarg that clears the disable.
    - configure_otel_providers() still creates providers / exporters / views so a
      later force-enable can use them, but logs an info message when called while
      disabled.
    - Foundry's FoundryChatClient.configure_azure_monitor and
      FoundryAgent.configure_azure_monitor early-return when the user has
      disabled, so Azure Monitor's global providers aren't installed unnecessarily.
    
    Tests: 11 new tests covering default-on, env re-read at call time, sticky
    behavior against each re-enable surface (enable_instrumentation,
    enable_sensitive_telemetry, configure_otel_providers, direct attribute
    writes), force=True override, re-arming the disable, and the __all__ export.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: document disable_instrumentation() and force=True paths
    
    Add a "Disabling instrumentation" section to the observability sample README
    that walks through:
    
    - The distinction between the ENABLE_INSTRUMENTATION env var (initial,
      non-sticky) and disable_instrumentation() (process-wide, sticky).
    - Why the sticky semantics matter: framework integrations like
      FoundryChatClient.configure_azure_monitor() can call
      enable_instrumentation() as part of their setup, and the user's opt-out
      needs to win.
    - All five surfaces guarded by the sticky disable (property reads, public
      enable functions, configure_otel_providers, direct attribute writes,
      is_user_disabled-aware integrations).
    - The force=True escape hatch on both enable_instrumentation() and
      enable_sensitive_telemetry().
    - How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled.
    - The limits of the disable (does not tear down existing providers /
      in-flight spans / third-party instrumentation, does not persist across
      processes).
    
    Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env
    vars table.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: soften disable_instrumentation() overclaim about telemetry guarantees
    
    Replace 'no telemetry will be emitted no matter what' (which is too strong,
    since callers can still pass force=True or mutate private attributes) with
    language framing the disable as a user-intent contract that library and
    framework code is expected to honor: the framework actively short-circuits
    the public enable paths, force=True and private-attribute writes are
    acknowledged as out-of-contract escape hatches that integrations should
    not use on the user's behalf.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: correct observability Dependencies section
    
    - opentelemetry-sdk is no longer a hard dependency; it is lazily imported by
      create_resource(), create_metric_views(), and configure_otel_providers()
      with a clear ImportError when missing. Day-to-day instrumentation works
      with opentelemetry-api alone provided some other component configures the
      global OpenTelemetry providers (Azure Monitor, an APM agent, application
      bootstrap, etc.).
    - opentelemetry-semantic-conventions-ai is no longer used anywhere in the
      source; remove it from the listed dependencies.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    * docs: replace stale observability migration guide with current PR's only relevant migration
    
    The old guide documented the move away from setup_observability(otlp_endpoint=...)
    which was an earlier-release API change unrelated to this PR and stale enough that
    it's more confusing than helpful at this point. Replace it with a short note on the
    single migration this PR introduces: callers of
    enable_instrumentation(enable_sensitive_data=True) should switch to
    enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section
    for the rare 'force on without enabling sensitive data' use case where
    enable_instrumentation() still applies.
    
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
    Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>