mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
ea052ab511311ccf026098a21e20aa9d79ffda13
346 Commits
-
Tao Chen ·
2026-06-15 09:50:34 -07:00 -
Tao Chen ·
2026-06-15 09:40:21 -07:00 -
Python: Fix Python OTel usage detail attributes (#6493)
* fix python otel usage detail attributes Map cached/read/reasoning usage detail fields to standard OTel GenAI attributes while preserving provider-specific legacy keys. Add focused coverage for direct response spans, aggregated agent spans, and provider usage parsing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * address usage detail review feedback Omit missing OpenAI Responses usage detail counts while preserving zero-valued counts. Record zero-valued token usage in OTel histograms and add regression coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-15 07:10:14 +00:00 -
Python: [BREAKING] Align FileAccess tools with .NET — directory discovery and recursive search (#6476)
* Align FileAccess tools with .Net; add directory discovery and recursive search * Fix choices field description: spacing, line length, grammar Addresses PR review: separate concatenated string literals with proper spacing/newlines, wrap lines under the 120-char Ruff limit, and fix "doesn't" -> "don't". Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR comments --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
westey ·
2026-06-15 06:55:21 +00:00 -
Python: Add AgentLoopMiddleware for re-running agents in a loop (#6174)
* Python: Add AgentLoopMiddleware for re-running agents in a loop Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped agent in a loop. A single configurable class covers three common patterns, each with a convenience classmethod factory: - Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking (`record_feedback`/`progress`), progress injection (`inject_progress`), optional fresh context per iteration (`fresh_context`), and an early-stop completion signal (`is_complete`). - Predicate (`.with_predicate(...)`): loop while a `should_continue` callable returns True (e.g. paired with `todos_remaining`/`background_tasks_running`). - Judge (`.with_judge(...)`): a second chat client decides whether the original request was answered, using a `JudgeVerdict` structured-output response. The loop also auto-resolves pending function-approval / user-input requests via an `on_approval_request` callable (bounded by `max_approval_rounds`), and the next iteration's input is controlled by `next_message`. Supports both streaming and non-streaming runs. Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and `background_tasks_running`. Adds tests, a sample, and docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Refine AgentLoopMiddleware API and sample - with_judge: add criteria list with {{criteria}} templating into judge instructions plus an agent-side instruction; add fresh_context, additional judge feedback relay; default judge max_iterations. - should_continue is now required and positional; supports (bool, str|None) feedback tuples surfaced to next_message/record_feedback via feedback kwarg. - Judge forwards full multi-modal request and response messages. - Default max_iterations=10 (explicit None = unbounded); removed is_complete and Ralph terminology; ShouldContinueResult is a real TypeAlias. - Sample: stream all loops, print iteration counts via injected user-block boundaries (robust to function calling), <role>: content formatting, per-method expected output, and a looping todo sample. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Fix CI checks for AgentLoopMiddleware - Resolve pyright errors in _loop.py: drop the always-true final_result None check (the while loop always assigns it) and cast finish_reason to the AgentResponse constructor's expected type. - Apply pyupgrade --py310-plus: import TypeAlias from typing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Resolve mypy/pyright disagreement on finish_reason pyright infers AgentResponse.finish_reason as including str and rejects the direct assignment, while mypy considers a cast redundant. Drop the cast and suppress only pyright with a targeted reportArgumentType ignore, satisfying both type checkers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Add todo+judge AgentLoopMiddleware sample Add a second AgentLoopMiddleware sample that composes two criteria in one should_continue predicate: a TodoProvider check (evaluated first) and a report-style judge chat client (evaluated once todos are complete) that grades the assembled report against shared requirements. Register it in the middleware samples README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Compose todo+judge loops as two middleware Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written predicate. The inner todos_remaining loop drafts the report todo-by-todo and the outer with_judge loop re-runs it until an editor chat client judges the report publication-ready, reusing the built-in helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reset session for fresh_context loops via snapshot/restore AgentLoopMiddleware.fresh_context previously only reset context.messages, so with an attached session each iteration still reloaded the local transcript or re-threaded the service-side conversation id and the model saw the accumulated history. Snapshot the session once before the loop (via to_dict) and restore it (from_dict + field copy) between iterations, so every pass starts from the pre-loop baseline. The final iteration's pass is persisted (no restore after the terminating iteration), so a subsequent agent.run continues from there. Removed the obsolete warning, updated docstrings and core AGENTS.md, and added tests: a snapshot/restore round-trip, a session-reset streaming x fresh_context x inject_progress x store matrix across multiple runs and loop iterations, and response_format parsing across the loop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Updated samples and docstrings --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-12 14:35:54 +00:00 -
Tao Chen ·
2026-06-11 15:28:27 -07:00 -
Tao Chen ·
2026-06-11 15:27:36 -07:00 -
Python: Integrate shell tool into harness agent (#6451)
* Integrate shell tool into AgentHarness * Validate shell_executor exposes as_function() with a clear TypeError Addresses PR review feedback: a public factory should fail fast with an actionable error rather than a cryptic AttributeError when an incompatible shell_executor is supplied. Validation happens upfront, regardless of whether the client supports shell tools. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Type shell harness params via TYPE_CHECKING import Addresses PR review feedback: type shell_executor and shell_environment_provider_options instead of Any, using a TYPE_CHECKING import from agent_framework_tools.shell. The import never executes at runtime, so there is no circular dependency, and the lazy runtime import of ShellEnvironmentProvider is retained. Since ShellExecutor is a protocol without as_function(), the validated getattr result is invoked directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
westey ·
2026-06-11 20:51:59 +00:00 -
Tao Chen ·
2026-06-11 11:43:41 -07:00 -
Tao Chen ·
2026-06-11 11:14:36 -07:00 -
Python: Add tool approval middleware (#6414)
* Add Python tool approval middleware Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix tool approval restored state handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Gate hidden approvals on explicit approval responses Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Handle string inputs in approval replay scan Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Cover argument-scoped approval rules Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refine tool approval state and budgets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix tool approval PR CI failures Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert DevUI Aspire README link change Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-11 17:35:44 +00:00 -
Python: [Generated by SRE Agent] Fix MCP allowed_tools empty list handling (#6296)
* Fix MCP allowed_tools empty list handling When allowed_tools is set to an empty list [], the falsy check 'if not self.allowed_tools' incorrectly treats it as unconfigured (same as None), causing all tools to be exposed. Change to an explicit 'is None' check so that an empty list correctly results in no tools being allowed. Co-authored-by: Azure SRE Agent <noreply@microsoft.com> * Clarify allowed_tools docstring: None vs [] semantics Per Eduard's review on PR #6296: explicitly document that None exposes all tools and [] exposes none, across all four MCPTool / MCPStdioTool / MCPStreamableHTTPTool / MCPWebsocketTool docstrings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * allowed_tools docstring: recommend load_tools=False for full disable Per Eduard's follow-up on PR #6296: `load_tools=False` is the cleaner idiom when you don't want to expose any tools. Reframe `allowed_tools=[]` in the docstring as a runtime guard / inspection-only path and cross-reference `load_tools`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Azure SRE Agent <noreply@microsoft.com> Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
chetantoshniwal ·
2026-06-11 06:46:46 +00:00 -
Tao Chen ·
2026-06-10 22:20:10 -07:00 -
Tao Chen ·
2026-06-10 16:36:15 -07:00 -
Python: HarnessAgent: Disable compaction when max tokens not provided (#6410)
* HarnessAgent: Disable compaction when max tokens not provided * Fix regression. * Address PR comments * Require max_output_tokens to be positive Reject max_output_tokens=0 (must be positive), mirroring max_context_window_tokens. Addresses PR review feedback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
westey ·
2026-06-10 13:57:23 +00:00 -
Python: Parse MCP CallToolResult.structuredContent field to prevent tool results returning None (#6421)
* Parse structuredContent from MCP CallToolResult (#3313) The _parse_tool_result_from_mcp method only iterated over the content field from CallToolResult, ignoring the structuredContent field entirely. MCP servers that return JSON data via structuredContent (e.g., Power BI MCP) appeared to return None. Add handling for structuredContent: when present, serialize it as JSON text and append it to the result list. This preserves the data for the LLM while maintaining backward compatibility with existing behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: Parse MCP CallToolResult.structuredContent field to prevent tool results returning None Fixes #3313 * Address review feedback: add default=str to json.dumps and remove .checkpoints/ - Add default=str to json.dumps for structuredContent serialization so non-JSON-serializable values (e.g. bytes) degrade gracefully instead of raising TypeError - Remove all .checkpoints/ runtime artifacts from the repository - Add **/.checkpoints/ to .gitignore to prevent future accidental commits - Add test for non-serializable structuredContent values Fixes #3313 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback for #3313: Python: MCP CallToolResult.structuredContent field is not parsed, causing tool results to return None --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Giles Odigwe ·
2026-06-10 12:51:09 +00:00 -
Python: [BREAKING] Add sampling guardrails to MCP tools (#6413)
* Add sampling guardrails to MCP tools Add approval, token, and request-count controls to the MCP sampling callback used when an MCPTool is configured with a chat client. - Add `sampling_approval_callback`, `sampling_max_tokens`, and `sampling_max_requests` parameters to `MCPTool` and its `MCPStdioTool`, `MCPStreamableHTTPTool`, and `MCPWebsocketTool` subclasses, positioned directly after `client`. - Gate each server-initiated `sampling/createMessage` request behind the approval callback, which denies by default when no callback is provided. - Clamp the requested `maxTokens` to `sampling_max_tokens` and enforce a per-session request count via `sampling_max_requests`. - Log incoming sampling requests at WARNING level (counts only). - Export `SamplingApprovalCallback` from the public API. - Add tests, a sample, and documentation updates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make sampling denial message context-aware Distinguish the deny-by-default case (no approval callback configured) from an explicit denial by a configured `sampling_approval_callback`, so the returned ErrorData message is accurate for callback-driven denials and exceptions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-10 10:17:36 +00:00 -
Tao Chen ·
2026-06-09 11:02:17 -07:00 -
Tao Chen ·
2026-06-09 09:38:45 -07:00 -
Tao Chen ·
2026-06-09 09:27:21 -07:00 -
Python: Filter MCP tool kwargs to declared params via allowlist (#6399)
* Filter MCP tool kwargs to declared params via allowlist Previously MCPTool combined framework runtime kwargs (from FunctionInvocationContext.kwargs) with the LLM-supplied arguments and stripped only a hardcoded denylist of known framework keys before forwarding to the MCP server. Any new framework-injected kwarg leaked to the server unless the denylist was updated. Switch to an allowlist built from each tool's declared parameters (inputSchema.properties). Only declared params are forwarded; everything else is stripped. Add an `additional_tool_argument_names` constructor argument so users can opt extra names back in, globally (Sequence[str]) and/or per remote tool name (Mapping with reserved "*" global key). The existing denylist is kept as a safety net for framework-named params a server declares in its schema; explicitly opted-in extras always win. The reserved _meta handling is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address MCP allowlist review comments and fix reload arg loss - Fix pyright reportUnknownArgumentType in _load_tools (cast schema properties). - Register declared param names before the existing-tool skip guard so that tool-list reloads preserve the allowlist for already-loaded tools (previously unchanged tools silently dropped all declared args after a background reload). - Handle bare-string values in an additional_tool_argument_names mapping instead of iterating their characters. - Clarify the framework denylist comment: explicit extras override the denylist. - Make the extras-override-denylist test unambiguous (opt in a denylisted name). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-09 07:37:11 +00:00 -
Python: Fix per-service-call history persistence with server-storing clients (#6310)
* Fix per-service-call history persistence with server-storing clients When an Agent set require_per_service_call_history_persistence=True together with a HistoryProvider, and the chat client stored history server-side by default (e.g. OpenAIChatClient, STORES_BY_DEFAULT=True), the external history provider was silently never persisted. Unify persistence on the per-service-call middleware: when the flag is set and a HistoryProvider exists, the middleware is always installed and owns persistence. service_stores_history now only selects middleware behavior: - service does not store: load providers and drive the function loop with a local sentinel conversation id, or - service stores: skip loading (the service owns history) and persist each service call while the real conversation id flows through. Also rationalize chat-options handling in _prepare_run_context: - _merge_options now skips None overrides and strips remaining None values, so an unset `store` is never forwarded and the service decides its own default. - Resolve `store` and `conversation_id` once from a single combined view (effective_options) instead of probing both default and runtime dicts; the auto-injection and per-service-call resolution now agree on conversation_id. Fixes #5798 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Correct as_agent() docstring: persistence is per service call, not once per run Address PR review: when the client stores history server-side, the per-service-call middleware still persists after each model call; only provider loading is skipped. The previous "persist once per run()" wording contradicted the implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: docs, missing-conversation-id warning, and tests - Clarify that require_per_service_call_history_persistence is a no-op when no HistoryProvider is present (docstrings in _agents.py and _clients.py). - Warn on every service call when the client stores history server-side but returns no conversation_id, so the (uncommon) loss of cross-turn resumability cannot fail silently. - Add tests: storing client + existing conversation_id does not raise and the id propagates; two runs on the same session keep persisting with a stable service_session_id and no provider loading; storing-without-conversation-id warns per call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-09 05:47:57 +00:00 -
Tao Chen ·
2026-06-08 16:30:59 -07:00 -
Tao Chen ·
2026-06-08 13:42:42 -07:00 -
Tao Chen ·
2026-06-05 16:29:19 -07:00 -
Python: feat(python): Add MCP client OTel spans per GenAI semantic conventions (#6349)
* feat(python): Add MCP client OTel spans per GenAI semantic conventions Implement MCP client spans per the OTel GenAI Semantic Conventions for MCP (https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/#client). Operations instrumented: - initialize: CLIENT span capturing MCP session setup - tools/list: CLIENT span for tool listing (per-page) - prompts/list: CLIENT span for prompt listing (per-page) - tools/call: CLIENT span (nested under execute_tool when called via FunctionTool) - prompts/get: CLIENT span Span attributes follow the MCP semantic conventions: - Required: mcp.method.name - Conditional: error.type, gen_ai.tool.name, gen_ai.prompt.name - Recommended: gen_ai.operation.name, mcp.protocol.version, mcp.session.id, network.transport, server.address, server.port Transport-specific attributes per subclass: - MCPStdioTool: network.transport=pipe - MCPStreamableHTTPTool: network.transport=tcp, network.protocol.name=http - MCPWebsocketTool: network.transport=tcp, network.protocol.name=websocket All span creation gated behind OBSERVABILITY_SETTINGS.ENABLED. Closes #3624 Closes #4697 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: simplify MCP spans — remove enrichment logic and protocol version caching - Always create nested CLIENT spans for tools/call instead of enriching the parent execute_tool span - Remove _ACTIVE_TOOL_EXECUTION_SPAN contextvar (no longer needed) - Remove enrich_span_with_mcp_attributes() helper - Remove _otel_error_type preservation in FunctionTool.invoke() - Remove _mcp_protocol_version instance variable; protocol version is only set on the initialize span where it is available Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refine copilot solution * fix: enable automatic exception recording on MCP spans Remove record_exception=False and set_status_on_exception=False from create_mcp_client_span. Let OTel handle exception recording and status setting automatically. The manual set_mcp_span_error calls for tools/call still correctly set error.type (which OTel's automatic handling doesn't touch), so tool_error is preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce number of lines * Add comment to sample * test: address PR review comments on MCP observability tests - Fix initialize test to call mocked session.initialize() and read protocolVersion from the result instead of hardcoding it - Add tools/call McpError error-path test - Add prompts/get McpError error-path test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix export error --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tao Chen ·
2026-06-05 19:23:01 +00:00 -
Python: Refactor workflow as agent pending request handling (#6259)
* WIP: Refactor Workflow as agent pending request handling * WIP: debugging empty message bug * Working: Workflow as agent with function approval * Address Copilot comments * Fix mypy * Address comments and fix pipeline * Request info non function approval now becomes function call * Revert uv.lock * Fix mypy * Bump min version of azure-ai-project * Remove RequestInfoFunctionArgs * fix tests * Fix failing tests * Fix sample
Tao Chen ·
2026-06-05 17:23:19 +00:00 -
Python: MCP long-running task support in Python (#6319)
* MCP long-running task support in Python * Fix pyupgrade and AGENTS.md reconnect description - pyupgrade: drop forward-reference string annotations in _mcp.py (Python 3.10+ resolves them natively now that MCPTaskOptions is defined before use). - AGENTS.md: align reconnect description with current behavior. Phase 1 (initial tools/call) does NOT retry on connection loss; raises 'connection lost; task state unknown' instead, so a server that accepted the request but lost the response cannot start the operation twice. Phase 2 (tasks/get / tasks/result) still reconnects once against the same task_id. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bandit nosec marker for CI pipeline * Address PR feedbacks * Clarifiied comments and addressed more PR feedbacks. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Peter Ibekwe ·
2026-06-05 00:04:55 +00:00 -
Python: Fix compaction message-id collisions and tool-loop summary persistence (#6299)
* Fix compaction message-id collisions and tool-loop summary persistence Fixes two bugs in the compaction strategies: - #5237: incremental group annotation assigned message ids by position within the re-annotated slice, so moving the re-annotation start back to a previous group start restarted ids at 0 and produced collisions (e.g. a user message reusing an assistant message's id), merging groups and causing tool-result compaction to wrongly exclude messages. group_messages/_ensure_message_ids now take an id_offset and guard against existing-id collisions; annotate_message_groups threads the slice start index through as the offset. - #4991: the function-invocation loop copied the message list each iteration, so summaries inserted by compaction landed in a throwaway copy and were lost across tool-loop iterations (only the persistent excluded flags survived). _prepare_messages_for_model_call now compacts the list in place when messages is a list, so inserted summaries persist. Adds regression tests (incremental id uniqueness, existing-id collision avoidance, idempotency, and tool-loop summary persistence including streaming and conversation-id modes). Also adds a summarization.py sample demonstrating SummarizationStrategy directly with a real client, and reworks advanced.py with tool-call groups and a real summarizer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Guard incremental message-id assignment against prefix-id collisions Addresses PR review on #5237: _ensure_message_ids only guarded against collisions within the re-annotated slice. A preexisting (e.g. user-supplied) id in the preserved prefix could still be reassigned in the suffix when the id was numerically out of position, merging groups across the re-annotation boundary again. group_messages/_ensure_message_ids now accept reserved_ids, and annotate_message_groups passes the preserved prefix's ids so auto-assigned suffix ids never collide across the full list. Adds a regression test reproducing the out-of-position prefix-id collision. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-04 08:37:59 +00:00 -
Python: run sync tools off the event loop (#5773)
* fix: run sync tools off event loop * chore: silence harness tool marker type check
Yufeng He ·
2026-06-04 04:42:08 +00:00 -
Python: Add MCP-based skills discovery (McpSkillsSource) (#6169)
* Add MCP-based skills discovery (McpSkill, McpSkillsSource, McpSkillResource) Implement Agent Skills discovery over MCP following the SEP-2640 convention: - McpSkillsSource: reads skill://index.json to discover skills served by an MCP server - McpSkill: lazily fetches SKILL.md content via resources/read on demand - McpSkillResource: wraps MCP resource results (text and binary) - Path traversal protection in get_resource for defense in depth - Samples for Foundry Toolbox and standalone MCP skills server - Comprehensive unit tests (514 lines) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review comments: rename to MCP* convention, fix error handling and samples - Rename McpSkill/McpSkillResource/McpSkillsSource to MCPSkill/MCPSkillResource/MCPSkillsSource - Add data-URI prefix stripping for blob resource decoding - Let non-McpError exceptions propagate from get_resource() - Fix contradictory test comment - Use interactive input() in mcp_based_skill sample - Remove misleading sample output block Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restore debug logging for McpError in get_resource() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Use AzureCliCredential in Foundry toolbox skills sample for consistency Replace DefaultAzureCredential with AzureCliCredential to match the credential convention used in all other samples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Use MCPStreamableHTTPTool in MCP skills sample Replace raw mcp library imports (ClientSession, streamable_http_client) with the framework's MCPStreamableHTTPTool to keep MCP server connections consistent regardless of whether skills are enabled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Branch on McpError.error.code so only not-found errors return empty Previously _try_read_index() and get_resource() swallowed every McpError as 'no skills available', making auth failures, server crashes, and connection drops indistinguishable from a server that simply has no skills. Now only two codes are treated as not-found: - -32002 (MCP-spec Resource not found) - -32601 (METHOD_NOT_FOUND — server lacks resources/read) All other McpError codes and non-McpError exceptions propagate with a warning log, surfacing real failures visibly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add tests for non-McpError and non-not-found error propagation in MCP skills Cover the re-raise branch in MCPSkill.get_resource for plain ConnectionError/TimeoutError, the generic McpError (code 0) propagation on get_resource, and TimeoutError propagation in _try_read_index. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert "Use MCPStreamableHTTPTool in MCP skills sample" This reverts commit
f31ed0ded9. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Introduce MCP_SKILLS experimental feature for MCP skill classes Add a separate MCP_SKILLS feature ID to ExperimentalFeature enum and use it for MCPSkillResource, MCPSkill, and MCPSkillsSource, since their promotion timeline is partly outside of our control. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>semenshi-m ·
2026-06-03 18:09:50 +00:00 -
Python: progressive tool exposure via FunctionInvocationContext (#6233)
* Python: progressive tool exposure via FunctionInvocationContext Add first-class progressive tool exposure to the Python core function-calling loop. Tools can now add or remove real FunctionTool schemas at runtime via the injected FunctionInvocationContext, taking effect on the next iteration of the loop. - FunctionInvocationContext gains a live `tools` list plus experimental `add_tools()` / `remove_tools()` helpers (feature: PROGRESSIVE_TOOLS). - The function-calling loop establishes a run-local, normalized tools list and threads it into the context at both invocation paths so mutations propagate. - Add a sample (dynamic_tool_exposure.py) and a tools samples README, including a note that CodeAct providers (Monty/Hyperlight) use their own provider-level tool management instead. Supersedes #3877. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Validate non-negative input in dynamic_tool_exposure sample tools Address review feedback: factorial and fibonacci now return an error message for negative n instead of producing incorrect results. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make add_tools atomic and surface swallowed function errors Address review feedback on progressive tool exposure: - add_tools now validates the full batch against a throwaway copy before committing, so a duplicate-name clash partway through a sequence leaves the live tool list unchanged (all-or-nothing). - _auto_invoke_function now logs a warning (with traceback) when a tool raises, so contract errors such as a duplicate-name ValueError from add_tools are debuggable without enabling include_detailed_errors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Avoid retaining tracebacks when logging swallowed function errors Logging with exc_info=exc fed the exception traceback to the logging machinery, whose frame references created reference cycles collected lazily by the cyclic GC. On Windows that could drop a hyperlight WasmSandbox on a non-owning thread ("unsendable, dropped on another thread"), crashing the xdist worker. Log a pre-formatted message with the exception repr instead, so no traceback object is retained. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * added missing decorator --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-03 09:01:07 +00:00 -
Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append (#5913)
* Python: Fix OTLP HTTP base-endpoint losing /v1/{signal} auto-append Per the OTel spec, OTEL_EXPORTER_OTLP_ENDPOINT is a *base* URL for HTTP — the SDK auto-appends /v1/traces, /v1/metrics, /v1/logs when it reads the env var directly. Signal-specific endpoint env vars are *full* URLs used verbatim. _get_exporters_from_env read the base endpoint and forwarded it as the constructor ``endpoint=`` argument, which the SDK always treats as a full signal URL. As a result, with OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 and HTTP protocol, the exporter sent to http://localhost:4318 instead of http://localhost:4318/v1/traces (and likewise for metrics/logs). Replicate the spec's auto-append here when falling back to the base endpoint under HTTP. gRPC behavior is unchanged. * Python: Fix mypy type errors in OTLP endpoint assignment Pre-declare traces_endpoint, metrics_endpoint, logs_endpoint as str | None before the if/else block. Mypy inferred str from the if-branch f-string assignments and then rejected the str | None expressions in the else-branch as incompatible.Dineshsuriya D ·
2026-06-02 09:59:50 +00:00 -
Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101)
* Python: feat(evals): RubricScore type + EvalScoreResult.dimensions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators= Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(evals): parse rubric_scores from output items + assertion helpers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(foundry-evals): YAML config loader + sample Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: fix(evals): address PR review feedback Addresses 4 Copilot review comments on PR #6101: 1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression. 2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path. 3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run. 4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Python: feat(foundry-evals): hosted-agent-aware rubric generation * Auto-detect hosted Foundry agents in agent_as_eval_source: when the agent's chat_client exposes a string agent_name (the convention used by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a type='agent' EvalGenerationSource so the service fetches instructions and tools from the agent registry instead of relying on the local wrapper (which holds neither for hosted agents). * Add hosted_agent_version kwarg and a new agent_version field on EvalGenerationSource so PromptAgent runs can pin to a specific hosted version for reproducible rubric generation. * Add force_prompt_source escape hatch to bypass auto-detection and always emit a rendered prompt dossier - useful when the local wrapper carries overrides the service-side agent doesnt see. * Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=, not dataset_name=/dataset_version=. The mismatch would raise TypeError against the real azure-ai-projects 2.3.0a* SDK; only unmocked integration paths were affected. Tests cover: auto-detection happy path, versionless hosted agent, explicit hosted_agent_version forwarding, force_prompt_source override, non-string chat_client attrs (MagicMock test doubles) not mis-detected, agent_version forwarded through _to_sdk_source, and the corrected dataset SDK kwarg names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(foundry-evals): accept canonical dimension_scores key per docs The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(foundry-evals): add manual create_rubric_evaluator Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety). Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent. Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * samples(foundry-evals): manual rubric sample + namespace re-exports Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run. Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config. Updates the foundry-evals README with a chooser entry for the two rubric paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(foundry-evals): remove rubric creation flows; keep consumption only Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK. Removed creation surface area: - FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers. - EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources). - BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline. - Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef. Kept (consumption surface): - GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn. - RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates. - _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key). Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * samples(foundry-evals): add evaluate_with_rubric_sample Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(foundry-evals): satisfy mypy on _fetch_output_items mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(foundry-evals): hoist repeated local imports to module top Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ben Thomas ·
2026-06-01 23:01:56 +00:00 -
Python: Fix core observability unsafe serialization of function-call arguments containing dataclass/framework objects (#6026)
* fix: safely serialize function-call arguments in core observability Apply make_json_safe() to content.arguments in _to_otel_part() before building the otel message dict, so that dataclass/framework payloads (e.g. workflow request_info events) do not cause a TypeError when _capture_messages() calls json.dumps(). Lift make_json_safe() into agent_framework._serialization (no new external deps — dataclasses/datetime only) so the core observability path can use it without a dependency on the ag-ui adapter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(core): safely serialize workflow request_info payloads in observability (#5733) - Add make_json_safe() helper to recursively convert non-serializable objects - Use make_json_safe() in _to_otel_part() for function_call arguments - Fix CustomPayload test class to use @dataclass (resolves B903 lint error) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(serialization): guard callability and normalize dict keys in make_json_safe (#5733) - Use callable(getattr(obj, method, None)) instead of hasattr() so that non-callable attributes named model_dump/to_dict/dict do not raise TypeError at runtime. - Wrap each call in try/except TypeError to handle callables with mandatory arguments gracefully. - Convert dict keys to str() so that non-string keys (e.g. datetime, int) cannot cause json.dumps to raise TypeError. - Add regression tests for both scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address observability serialization review feedback --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Evan Mattson ·
2026-06-01 21:41:52 +00:00 -
Python: refresh dev dependencies and validate runtime bounds (#6238)
Updates third-party dev dependencies across the Python workspace and validates that all runtime dependency bounds still hold at both ends. Dev dependency bumps (root, lab, declarative, durabletask): - uv 0.11.6 -> 0.11.17, ruff 0.15.8 -> 0.15.15, pytest-asyncio 1.3.0 -> 1.4.0, mcp 1.27.0 -> 1.27.2, azure-monitor-opentelemetry 1.8.7 -> 1.8.8, poethepoet 0.42.1 -> 0.46.0, prek 0.3.9 -> 0.4.3, types-python-dateutil and types-PyYaml stub bumps. - Transitive Dependabot items swept via lock: idna 3.11 -> 3.17, pip 26.0.1 -> 26.1.2. Deliberately excluded: - opentelemetry-sdk stays 1.40.0: azure-monitor-opentelemetry (incl. 1.8.8) hard-pins opentelemetry-sdk==1.40. - mypy stays 1.20.0 and pyright stays 1.1.408: the 2.1.0 / 1.1.409 bumps introduce new diagnostics that fail type checking and need dedicated PRs. - rich kept as a range: agentlightning (lab[lightning]) forces rich==13.9.4. Code/formatting changes driven by the ruff upgrade: - devui lifespan now uses try/finally so shutdown cleanup always runs (ruff RUF075). - Removed unused TYPE_CHECKING imports in core and foundry flagged by ruff 0.15.15. - Reapplied ruff 0.15.15 formatting to the files it changed. Validation: validate-dependency-bounds-test "*" passes (31/31 lower + 31/31 upper); typing 62/62; lint 31/31; devui tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-06-01 17:53:56 +00:00 -
Python: Add background agent support to harness agent (#6155)
* Add background agent support to harness agent * Address PR comments
westey ·
2026-06-01 17:20:39 +00:00 -
Python: coalesce code interpreter history chunks (#5801)
* fix: coalesce code interpreter history chunks * fix: narrow content item list types * fix: remove redundant content list casts
Yufeng He ·
2026-06-01 13:26:20 +00:00 -
Python: consolidate MCP reliability fixes (#6145)
* Python: consolidate MCP reliability fixes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix MCP cleanup and metadata typing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Satisfy MCP metadata mypy typing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Pyright metadata mapping type Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-05-29 07:21:14 +00:00 -
Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099)
* Adding AgentFileStore and FileAccessProvider to support file ased operations for agents. * Address PR review feedback on FileAccessProvider - Probe symlinks on the unresolved candidate path so in-root symlinks cannot silently pass and out-of-root symlinks surface the correct error message. - Validate matching_lines elements in FileSearchResult.from_dict and raise a clean ValueError for non-mapping entries. - Cap search regex pattern length (256 chars) via a new _compile_search_regex helper to mitigate ReDoS, and surface the cap in the file_access_search_files tool description. - Skip non-UTF-8 files during filesystem search instead of aborting the entire directory walk. - Replace the module-scope trailing string in the data-processing sample with comments to avoid Ruff B018. - Remove the checked-in working/region_totals.md sample artifact so the save flow works from a clean checkout. - Expand the Windows stdout reconfiguration comment in task_runner.py for clarity. - Add tests for invalid/oversize regex, non-UTF-8 file search, and in-root symlink rejection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix mypy redundant-cast in FileSearchResult.from_dict Use cast(list[object], ...) instead of cast(list[Any], ...) so the cast represents a real type change (lists are invariant) and is no longer flagged by mypy as redundant, while still satisfying pyright's reportUnknownVariableType. Matches the existing pattern in _memory.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Tighten path normalization and directory resolution in FileAccess - _normalize_relative_path now strips surrounding whitespace up front so leading/trailing spaces never leak into file segments, and rejects trailing path separators for file paths so 'foo/' is no longer silently coerced to 'foo'. - FileSystemAgentFileStore._resolve_safe_directory_path normalizes with is_directory=True and maps an empty normalized result to the root. This matches InMemoryAgentFileStore so whitespace-only directory inputs resolve to the root instead of raising. - Added tests for whitespace stripping, trailing-separator rejection, and whitespace-only directory listing on the filesystem store. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Harden FileAccess search and atomic save in store API - Add wall-clock timeout (10s) around regex scans so a pathological pattern (e.g. `(a+)+`) below the length cap cannot stall the event loop. - Offload the InMemoryAgentFileStore regex scan to a worker thread, matching the filesystem store. - Fail closed when `Path.is_symlink` raises during the safe-path probe so a permission error cannot silently bypass the symlink/reparse-point rejection. - Add `overwrite: bool = True` to `AgentFileStore.write_file`; the in-memory store performs the check under the existing lock and the filesystem store uses `open(mode='x')` so concurrent callers cannot race past `overwrite=False`. - `file_access_save_file` now relies on the atomic store call instead of a separate `file_exists` round-trip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Python 3.10 timeout handling and add directory arg to list/search tools - Catch asyncio.TimeoutError in _run_search_with_timeout. In Python 3.10 asyncio.wait_for raises asyncio.exceptions.TimeoutError, which is distinct from the builtin TimeoutError (the two were unified in 3.11). Catching the asyncio alias works on every supported version. - Add an optional directory parameter to file_access_list_files and file_access_search_files so agents can enumerate / scope searches to nested folders, not just the store root. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address FileAccess review feedback: case, errors, signal, TOCTOU - InMemoryAgentFileStore now stores (display_name, content) so list_files and search_files return the original-case names callers wrote, matching the behaviour of FileSystemAgentFileStore on case-preserving filesystems and removing the silent in-memory vs. on-disk contract divergence. - FileSystemAgentFileStore.read_file raises ValueError instead of letting UnicodeDecodeError bubble for binary / non-UTF-8 input, restoring symmetry with search_files (which still skips) and giving the tool layer a recoverable type to translate. - Tool wrappers now catch ValueError and OSError around every operation and surface them as readable strings, so 'you used ..' and 'the file already exists' are both reported to the model the same way instead of the former crashing out as an unhandled exception. - _search_files_sync logs per skipped non-UTF-8 file at WARNING and an aggregate INFO summary so operators can distinguish 'no matches' from 'half the corpus was unreadable'. - FileSystemAgentFileStore softens its docstrings to acknowledge the inherent probe-then-open TOCTOU window. On POSIX both read and write now pass O_NOFOLLOW so the kernel refuses if the leaf segment becomes a symlink between the probe and the open. Windows has no equivalent flag; the limitation is documented. - Tests cover: case preservation on list/search, ValueError on non-UTF-8 read at the store and tool layer, tool-layer string responses for path-traversal and oversized-regex inputs, search-skip log output, symlink rejection on delete/search/list, and symlinked intermediate directory rejection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address FileAccess nit comments: docstrings, enumerate, opt-in delete approval - Expand FileSearchMatch/FileSearchResult.to_dict docstrings to explain why the override is needed (__slots__ defeats the mixin's __dict__ iteration) and why exclude/exclude_none are accepted-but-ignored (mixin signature compatibility for callers like to_json). - Use enumerate(lines, start=1) in _search_file_content so the +1 below is no longer needed; rename loop variable to line_number for clarity. - Add opt-in require_delete_approval: bool = False on FileAccessProvider. When True, file_access_delete_file is registered with approval_mode 'always_require' so the host must approve every delete. Default False preserves current behaviour and matches the .NET reference, but deployments that want a safer-by-default posture can enable it. - Add tests covering both delete approval modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * FileAccess: require delete approval by default Flip the default for FileAccessProvider(require_delete_approval=...) from False to True so destructive deletes are gated by host approval out of the box. Callers that want the previous autonomous behaviour (which matches the .NET reference) can pass require_delete_approval=False. Tests updated accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fixing linkinspector by installing Chrome for puppeteer first. --------- Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ben Thomas ·
2026-05-28 20:09:50 +00:00 -
Tao Chen ·
2026-05-28 20:03:46 +00:00 -
Python: [Breaking] Refactor Skill API to async resource and script lookup (#6135)
Port of .NET commit
08541ee5a9. Replace property-based Skill.content/resources/scripts with async by-name lookup methods: - content property -> async get_content() -> str - resources property -> async get_resource(name) -> SkillResource | None - scripts property -> async get_script(name) -> SkillScript | None SkillsProvider now always includes all three tools (load_skill, read_skill_resource, run_skill_script) and both instruction blocks regardless of whether any skills have resources or scripts. ClassSkill retains resources/scripts properties as overridable hooks for subclass reflection-based discovery. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>semenshi-m ·
2026-05-28 15:54:20 +00:00 -
Python: Align c# and python TodoProvider tool names (#6107)
* Align c# and python TodoProvider tool names * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Address PR review: remove __slots__ and add typed schemas for tool params - Remove __slots__ from TodoItem, TodoInput, and TodoCompleteInput classes (not needed for low-instance-count objects and hinders dev scenarios) - Add _TodoAddItemSchema and _TodoCompleteItemSchema TypedDicts to provide proper JSON schema for todos_add and todos_complete tool parameters - Use typing_extensions for Python 3.10 compatibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
westey ·
2026-05-28 08:40:13 +00:00 -
Python: Add a HarnessAgent with available features and sample (#6041)
* Add a HarnessAgent with available features and sample * Fix formatting * Address PR comments and fix mypy error * Add web search support to HarnessAgent * Fix build warning * Apply suggestions from code review Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> * Address PR comments * Address PR comments * Address further PR comments. * Fix markdown broken link --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
westey ·
2026-05-27 14:54:00 +01:00 -
Python: Add a BackgroundAgentsProvider for python (#6069)
* Add a BackgroundAgentsProvider for python * Address PR comments and fix linting warnings * Address PR comment
westey ·
2026-05-27 09:12:01 +00:00 -
Python: Align ModeProvider tool names and instructions (#6071)
* Align ModeProvider tool names and instructions * Address PR comments
westey ·
2026-05-26 14:37:34 +00:00 -
Python: fix(core): point @experimental warnings at user code, not stdlib internals (#5996)
* fix(core): point @experimental warnings at user code, not stdlib internals Previously the wrappers installed by @experimental called warnings.warn with a fixed stacklevel=3. ABCMeta inserts an extra abc.__new__ frame when an experimental ABC is subclassed, so the warning landed inside abc.py (or <frozen abc>:106 on modern CPython) instead of the user's class Sub(...) line. Resolve the user frame by walking inspect.currentframe(), skipping frames whose module name is abc/functools/typing/contextlib (or submodules), then emit via warnings.warn_explicit so the recorded filename/lineno point at user code. Falls back to warnings.warn with stacklevel=2 if no user frame is found. Module-name matching is used because frozen stdlib modules report '<frozen abc>' as their filename. Also install a one-line warnings.formatwarning specifically for FeatureStageWarning so 'file:line: ExperimentalWarning: [ID] Name ...' prints without the secondary source-snippet line. Other categories delegate to the stdlib default formatter unchanged. Added a regression test that subclasses an @experimental ABC inside warnings.catch_warnings and asserts the recorded filename equals the test file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(core): address review feedback on @experimental warning fix - Make _install_feature_stage_formatter idempotent: tag the installed formatter with a marker attribute and short-circuit re-installation, so re-imports/reloads don't wrap the formatter on top of itself. Also expose the previous formatter via __wrapped__ for restoration. - Avoid leaking frame references in _resolve_user_frame: capture data into plain locals inside try and del frame/candidate in finally, per CPython's guidance on inspect.currentframe usage. - Drop redundant _WARNED_FEATURES.clear() in the new ABC subclass test (the autouse fixture already handles it). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * changed query for foundry web search test --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eduard van Valkenburg ·
2026-05-22 12:07:10 +00:00 -
Python: Prevent duplicate system instructions in Python telemetry (#5981)
* Initial plan * Fix duplicated system instructions in Python telemetry * Clarify telemetry message filtering * test: cover separate and in-history system messages * Clarify observability message logging split * Simplify observability logging serialization * Harden observability regression test * Reuse observability span message serialization * Clarify observability logging loops * Polish observability message serialization * Tighten observability zip checks * Refactor observability message capture loop * Fix telemetry logging for separate system instructions * Refine observability OTEL message typing * Restore prepended-instruction logging path in _capture_messages * Revert logging change in _capture_messages; keep chat-history-only logging --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Copilot ·
2026-05-21 19:59:06 +00:00 -
[BREAKING] Python: Enable instrumentation by default (#5865)
* Enable instrumentation by default * Update samples * Optimization when span is not recording * Address Copilot comments * Revert uv.lock * Add warning * Formatting * Fix mypy * Add disable_instrumentation() with sticky user-intent semantics Add a public disable_instrumentation() entry point so users can explicitly opt out of Agent Framework telemetry, with a sticky-disable flag that makes the user's intent "leading" — no framework code path (foundry's configure_azure_monitor, configure_otel_providers, enable_instrumentation, enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_* writes) can re-enable instrumentation until the user explicitly clears the disable with enable_instrumentation(force=True) / enable_sensitive_telemetry(force=True). Also addresses the two remaining unresolved review threads on the PR: 1. test_observability_settings_defaults_instrumentation_true pins the new "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior. 2. test_enable_instrumentation_reads_env_sensitive_data restores coverage for the post-import load_dotenv() fallback path. Implementation: - ObservabilitySettings.enable_instrumentation / enable_sensitive_data become properties backed by _enable_*. While _user_disabled is True, the getters return False and the setters drop True writes (defense in depth so third- party writes can't subvert the disable). - Public is_user_disabled read-only property lets integrations (e.g. foundry's configure_azure_monitor) cheaply check the disable state without poking at privates. - enable_instrumentation() and enable_sensitive_telemetry() short-circuit with an info log when disabled; gain a force=True kwarg that clears the disable. - configure_otel_providers() still creates providers / exporters / views so a later force-enable can use them, but logs an info message when called while disabled. - Foundry's FoundryChatClient.configure_azure_monitor and FoundryAgent.configure_azure_monitor early-return when the user has disabled, so Azure Monitor's global providers aren't installed unnecessarily. Tests: 11 new tests covering default-on, env re-read at call time, sticky behavior against each re-enable surface (enable_instrumentation, enable_sensitive_telemetry, configure_otel_providers, direct attribute writes), force=True override, re-arming the disable, and the __all__ export. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: document disable_instrumentation() and force=True paths Add a "Disabling instrumentation" section to the observability sample README that walks through: - The distinction between the ENABLE_INSTRUMENTATION env var (initial, non-sticky) and disable_instrumentation() (process-wide, sticky). - Why the sticky semantics matter: framework integrations like FoundryChatClient.configure_azure_monitor() can call enable_instrumentation() as part of their setup, and the user's opt-out needs to win. - All five surfaces guarded by the sticky disable (property reads, public enable functions, configure_otel_providers, direct attribute writes, is_user_disabled-aware integrations). - The force=True escape hatch on both enable_instrumentation() and enable_sensitive_telemetry(). - How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled. - The limits of the disable (does not tear down existing providers / in-flight spans / third-party instrumentation, does not persist across processes). Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env vars table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: soften disable_instrumentation() overclaim about telemetry guarantees Replace 'no telemetry will be emitted no matter what' (which is too strong, since callers can still pass force=True or mutate private attributes) with language framing the disable as a user-intent contract that library and framework code is expected to honor: the framework actively short-circuits the public enable paths, force=True and private-attribute writes are acknowledged as out-of-contract escape hatches that integrations should not use on the user's behalf. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: correct observability Dependencies section - opentelemetry-sdk is no longer a hard dependency; it is lazily imported by create_resource(), create_metric_views(), and configure_otel_providers() with a clear ImportError when missing. Day-to-day instrumentation works with opentelemetry-api alone provided some other component configures the global OpenTelemetry providers (Azure Monitor, an APM agent, application bootstrap, etc.). - opentelemetry-semantic-conventions-ai is no longer used anywhere in the source; remove it from the listed dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: replace stale observability migration guide with current PR's only relevant migration The old guide documented the move away from setup_observability(otlp_endpoint=...) which was an earlier-release API change unrelated to this PR and stale enough that it's more confusing than helpful at this point. Replace it with a short note on the single migration this PR introduces: callers of enable_instrumentation(enable_sensitive_data=True) should switch to enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section for the rare 'force on without enabling sensitive data' use case where enable_instrumentation() still applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tao Chen ·
2026-05-20 11:52:08 +00:00 -
Python: Skip MCP prompt loading when unsupported (#5370)
* Python: Skip MCP prompt loading when unsupported * Fix MCP pagination pyright checks * Simplify MCP support flag checks
Baidar ·
2026-05-20 11:50:26 +00:00