agent-framework

Merge branch 'main' into local-branch-python-add-reset-to-workflow

Tao Chen · 2026-06-15 09:50:34 -07:00

ea052ab511

Addres race condition when stream is dropped midway

Tao Chen · 2026-06-15 09:40:21 -07:00

d08cec5379

Python: Fix Python OTel usage detail attributes (#6493 )

* fix python otel usage detail attributes

Map cached/read/reasoning usage detail fields to standard OTel GenAI attributes while preserving provider-specific legacy keys.

Add focused coverage for direct response spans, aggregated agent spans, and provider usage parsing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* address usage detail review feedback

Omit missing OpenAI Responses usage detail counts while preserving zero-valued counts.

Record zero-valued token usage in OTel histograms and add regression coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-15 07:10:14 +00:00

d7e8d2206d

Python: [BREAKING] Align FileAccess tools with .NET — directory discovery and recursive search (#6476 )

* Align FileAccess tools with .Net; add directory discovery and recursive search

* Fix choices field description: spacing, line length, grammar

Addresses PR review: separate concatenated string literals with proper
spacing/newlines, wrap lines under the 120-char Ruff limit, and fix
"doesn't" -> "don't".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR comments

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

westey · 2026-06-15 06:55:21 +00:00

d7027fc1f9

Python: [Breaking] Additional bug fix for declarative workflows (#6489 )

* Fix declarative object parsing bug

* Remove unnecessary comment

* Address PR comments

* Address PR comments.

* Fix CI failures.

* declarative action approval bugfix

* Address PR comments

* Inlined single use variables.

Peter Ibekwe · 2026-06-12 16:58:35 +00:00

ed4ff188fc

Python: Add AgentLoopMiddleware for re-running agents in a loop (#6174 )

* Python: Add AgentLoopMiddleware for re-running agents in a loop

Add `AgentLoopMiddleware`, an `AgentMiddleware` that re-runs the wrapped
agent in a loop. A single configurable class covers three common patterns,
each with a convenience classmethod factory:

- Ralph loop (`.ralph(...)`): no exit criteria, with feedback tracking
  (`record_feedback`/`progress`), progress injection (`inject_progress`),
  optional fresh context per iteration (`fresh_context`), and an early-stop
  completion signal (`is_complete`).
- Predicate (`.with_predicate(...)`): loop while a `should_continue` callable
  returns True (e.g. paired with `todos_remaining`/`background_tasks_running`).
- Judge (`.with_judge(...)`): a second chat client decides whether the original
  request was answered, using a `JudgeVerdict` structured-output response.

The loop also auto-resolves pending function-approval / user-input requests via
an `on_approval_request` callable (bounded by `max_approval_rounds`), and the
next iteration's input is controlled by `next_message`. Supports both streaming
and non-streaming runs.

Exports `AgentLoopMiddleware`, `JudgeVerdict`, `todos_remaining`, and
`background_tasks_running`. Adds tests, a sample, and docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Refine AgentLoopMiddleware API and sample

- with_judge: add criteria list with {{criteria}} templating into judge
  instructions plus an agent-side instruction; add fresh_context, additional
  judge feedback relay; default judge max_iterations.
- should_continue is now required and positional; supports (bool, str|None)
  feedback tuples surfaced to next_message/record_feedback via feedback kwarg.
- Judge forwards full multi-modal request and response messages.
- Default max_iterations=10 (explicit None = unbounded); removed is_complete and
  Ralph terminology; ShouldContinueResult is a real TypeAlias.
- Sample: stream all loops, print iteration counts via injected user-block
  boundaries (robust to function calling), <role>: content formatting, per-method
  expected output, and a looping todo sample.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Fix CI checks for AgentLoopMiddleware

- Resolve pyright errors in _loop.py: drop the always-true final_result None
  check (the while loop always assigns it) and cast finish_reason to the
  AgentResponse constructor's expected type.
- Apply pyupgrade --py310-plus: import TypeAlias from typing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Resolve mypy/pyright disagreement on finish_reason

pyright infers AgentResponse.finish_reason as including str and rejects the
direct assignment, while mypy considers a cast redundant. Drop the cast and
suppress only pyright with a targeted reportArgumentType ignore, satisfying
both type checkers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Add todo+judge AgentLoopMiddleware sample

Add a second AgentLoopMiddleware sample that composes two criteria in one
should_continue predicate: a TodoProvider check (evaluated first) and a
report-style judge chat client (evaluated once todos are complete) that grades
the assembled report against shared requirements. Register it in the middleware
samples README.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Compose todo+judge loops as two middleware

Rework the todo+judge sample to compose two AgentLoopMiddleware on the agent
itself (middleware=[judge_loop, todo_loop]) instead of a single hand-written
predicate. The inner todos_remaining loop drafts the report todo-by-todo and the
outer with_judge loop re-runs it until an editor chat client judges the report
publication-ready, reusing the built-in helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reset session for fresh_context loops via snapshot/restore

AgentLoopMiddleware.fresh_context previously only reset context.messages,
so with an attached session each iteration still reloaded the local
transcript or re-threaded the service-side conversation id and the model
saw the accumulated history. Snapshot the session once before the loop
(via to_dict) and restore it (from_dict + field copy) between iterations,
so every pass starts from the pre-loop baseline. The final iteration's
pass is persisted (no restore after the terminating iteration), so a
subsequent agent.run continues from there.

Removed the obsolete warning, updated docstrings and core AGENTS.md, and
added tests: a snapshot/restore round-trip, a session-reset
streaming x fresh_context x inject_progress x store matrix across multiple
runs and loop iterations, and response_format parsing across the loop.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Updated samples and docstrings

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-12 14:35:54 +00:00

1acd242550

Python: Add opt-in AG-UI thread snapshot persistence and hydration (#6471 )

* feat(ag-ui): add thread snapshot store primitives

Key decisions:\n- Introduce an AGUIThreadSnapshot model limited to replayable messages, optional Shared State, and optional interrupt state.\n- Define AGUIThreadSnapshotStore as an async protocol keyed by explicit Snapshot Scope and AG-UI Thread id.\n- Add InMemoryAGUIThreadSnapshotStore as memory-only, latest-only, bounded local/demo/test storage; no file-backed store is introduced.\n- Require snapshot_scope_resolver whenever an endpoint is configured with a snapshot store, including pre-wrapped runners, so thread ids are not authorization boundaries.\n\nFiles changed:\n- packages/ag-ui/agent_framework_ag_ui/_snapshots.py\n- packages/ag-ui/agent_framework_ag_ui/__init__.py\n- packages/ag-ui/agent_framework_ag_ui/_agent.py\n- packages/ag-ui/agent_framework_ag_ui/_workflow.py\n- packages/ag-ui/agent_framework_ag_ui/_endpoint.py\n- packages/core/agent_framework/ag_ui/__init__.py\n- packages/core/agent_framework/ag_ui/__init__.pyi\n- packages/ag-ui/tests/ag_ui/test_snapshots.py\n- packages/ag-ui/tests/ag_ui/test_endpoint.py\n- packages/ag-ui/tests/ag_ui/test_public_exports.py\n- packages/ag-ui/AGENTS.md\n\nVerification:\n- uv run pytest packages/ag-ui/tests/ag_ui/test_snapshots.py packages/ag-ui/tests/ag_ui/test_public_exports.py packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_store_configured packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_accepts_snapshot_store_with_scope_resolver -q\n- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_store_configured packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_wrapped_runner_has_store packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_accepts_snapshot_store_with_scope_resolver -q\n- uv run poe syntax -P ag-ui -C\n- uv run poe pyright -P ag-ui\n- uv run poe syntax -P core -C\n- uv run poe pyright -P core\n- uv run poe typing -P ag-ui\n- uv run poe typing -P core\n- uv run poe test -P ag-ui\n- uv run poe check -P ag-ui\n- git diff --check\n- git diff --cached --check\n\nBlockers / next iteration:\n- No blockers. Next slice can use the store contract to capture and hydrate agent snapshots.\n- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.\n- The poe-check commit hook was skipped after manual verification because it reformatted unrelated core MCP files outside this task.

* feat(ag-ui): hydrate agent threads from snapshots

Key decisions:
- Resolve Snapshot Scope per endpoint request and pass it to the AG-UI runner only when snapshot storage is active.
- Treat empty messages with no resume payload as an agent Hydrate Request when a scoped snapshot store is configured, replaying stored Shared State and message snapshots without invoking the wrapped agent.
- Save the latest replayable agent message snapshot and Shared State at normal completion under Snapshot Scope plus AG-UI Thread id; no durable or file-backed store is introduced.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_agent_run.py
- packages/ag-ui/agent_framework_ag_ui/_endpoint.py
- packages/ag-ui/agent_framework_ag_ui/_snapshots.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_empty_messages packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe test -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can reconstruct normal new-user agent turns from stored snapshots.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshed unrelated uv.lock dependency resolution.

* feat(ag-ui): reconstruct agent turns from snapshots

Key decisions:
- Load scoped thread snapshots for non-hydrate agent requests only when snapshot storage is active and no resume payload is present.
- Rebuild prior AG-UI history from stored snapshot messages, preserving the incoming new user suffix and treating stored snapshot content as authoritative over conflicting prior client history.
- Merge stored Shared State with request state overrides before schema defaults and existing state-context injection.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_agent_run.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_prepends_stored_snapshot_for_new_user_turn -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_deduplicates_full_history_and_merges_fresh_state -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_empty_messages packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_prepends_stored_snapshot_for_new_user_turn packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_deduplicates_full_history_and_merges_fresh_state -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe check -P ag-ui
- uv run poe typing -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can enable workflow AG-UI Thread Snapshot persistence and hydration.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* feat(ag-ui): hydrate workflow threads from snapshots

Key decisions:
- Handle workflow Hydrate Requests before resolving or invoking the wrapped workflow when snapshot storage and Snapshot Scope are active.
- Capture only replayable workflow protocol data: workflow-emitted state snapshots, workflow-emitted message snapshots, and synthesized messages from text/tool output.
- Keep workflow snapshot capture inactive without configured persistence, and skip saving snapshots when the workflow stream emits RUN_ERROR.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_workflow.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_emitted_snapshots_without_invoking_workflow packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_synthesized_text_and_tool_snapshot -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run pytest packages/ag-ui/tests/ag_ui/golden/test_scenario_workflow.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can preserve interruption state and protect snapshots on errors across agent and workflow endpoints.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* feat(ag-ui): preserve interrupted thread snapshots

Key decisions:
- Capture workflow RUN_FINISHED interrupt metadata in replayable AG-UI Thread Snapshots so Hydrate Requests can restore pending workflow actions without invoking or resuming the workflow.
- Keep failed agent and workflow runs from replacing the last good snapshot; RUN_ERROR streams leave the previous snapshot available for hydration.
- Verify interruption hydration through endpoint-level AG-UI streams for both agent and workflow wrappers, including Shared State replay and no wrapped runner invocation.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_workflow.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_interrupted_thread_without_invoking_workflow -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_interrupted_thread_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_run_error_does_not_overwrite_previous_snapshot packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_interrupted_thread_without_invoking_workflow packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_run_error_does_not_overwrite_previous_snapshot -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run pytest packages/ag-ui/tests/ag_ui/golden/test_scenario_workflow.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can document AG-UI Thread Snapshot security and usage.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* docs(ag-ui): document thread snapshot security

Key decisions:
- Document AG-UI Thread Snapshot persistence as opt-in and disabled unless a snapshot_store is configured.
- Place Snapshot Scope guidance next to endpoint authentication guidance, making clear that AG-UI Thread ids identify threads but do not authorize snapshot access.
- Describe built-in storage as in-memory only, process-local, latest-only, and not durable production storage; durable stores remain app-owned implementations of AGUIThreadSnapshotStore.
- Call out snapshot confidentiality impact and that no file-backed AG-UI snapshot store is provided.

Files changed:
- packages/ag-ui/README.md

Verification:
- uv run python scripts/check_md_code_blocks.py packages/ag-ui/README.md --no-glob
- git diff --check
- git diff --cached --check
- commit hook without SKIP ran changed-package lint/format and AG-UI README markdown-code-lint successfully before stopping because uv.lock was modified
- uv run poe markdown-code-lint (failed due existing unrelated packages/mistral/README.md missing agent_framework_mistral import resolution; changed AG-UI README blocks passed)

Blockers / next iteration:
- No blockers. Local issue/PRD planning artifacts remain uncommitted.
- uv refreshed azure-ai-projects in uv.lock during markdown lint and the commit hook; reverted the generated lockfile churn because this documentation change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* fix(ag-ui): harden thread snapshot persistence edge cases

- Persist the completed confirm_changes turn with interrupt=None so hydration
  no longer replays a stale pending interrupt after the user responds; resume
  requests prepend stored history so the persisted thread is not truncated.
- Defer endpoint default_state application to the runners when snapshot
  persistence is active, filling only keys missing from both the stored
  snapshot state and the request state so defaults never reset persisted
  Shared State.
- Always fold the turn's output into the persisted messages snapshot even when
  the outbound MESSAGES_SNAPSHOT event is suppressed for predictive tools
  without confirmation.
- Load the stored snapshot on workflow follow-up turns, reconstruct full
  thread history into the run input, and seed the snapshot builder with merged
  state so saving a new turn no longer replaces prior history.
- Move snapshot message reconstruction helpers to _run_common for reuse by the
  workflow runner; load stored agent snapshots on resume turns for state merge.
- Add endpoint regression tests for all four scenarios.

* fix(ag-ui): protect snapshot history on resume and harden suffix trust

- Prepend stored thread history when persisting snapshots for resume runs on
  both the agent and workflow paths, so a resumed interrupt no longer
  overwrites the stored thread with just the resume turn's output.
- Filter the incoming message suffix during thread reconstruction: only user
  turns and tool results answering backend-issued tool calls (stored tool
  calls or pending interrupts) may extend authoritative history. Client-forged
  assistant and tool messages are dropped and logged instead of being
  persisted and replayed.
- Close the workflow snapshot builder's tool-call group when a tool result or
  text message lands, so synthesized transcripts keep tool results adjacent to
  their tool_calls message and stay valid as provider replay history.
- Export DEFAULT_MAX_THREAD_SNAPSHOTS from agent_framework_ag_ui and expose
  SnapshotScopeResolver through the core ag_ui facade and stub.
- Add regression tests for agent and workflow resume history preservation,
  forged suffix rejection, builder tool-call grouping, and the export surface.

* fix(ag-ui): tolerate snapshot save failures and scope workflow cache

- Wrap snapshot_store.save() on both the agent and workflow paths so a
  transient store failure (timeout, connection refused) is logged instead of
  propagating. Previously a failing save converted an already-streamed
  successful run into RUN_ERROR, and on the workflow path emitted RUN_ERROR
  after RUN_FINISHED, violating the single-terminal-event invariant. The
  previous snapshot stays available for hydration.
- Key the workflow_factory instance cache by (snapshot_scope, thread_id). The
  Snapshot Scope is the authorization boundary, so the same thread id under
  different scopes no longer shares an in-memory workflow instance.
  clear_thread_workflow accepts an optional snapshot_scope and clears all
  scopes for the thread when omitted.
- Add tests: save-failure tolerance for agent and workflow endpoints,
  scope-isolated workflow cache, async snapshot_scope_resolver support, and
  in-memory store key validation errors.

* fix(ci): ignore all dotnet.microsoft.com links in linkspector

The existing ignore pattern only matched https://dotnet.microsoft.com/download,
but Microsoft sites insert a locale segment between host and path
(e.g. /en-us/download/dotnet/10.0), so localized links slip past the pattern
and get checked. dotnet.microsoft.com bot-blocks CI link checkers with
intermittent 403s across the whole site, which fails markdown-link-check on
unrelated pull requests since linkspector scans the entire repository.

Ignore the domain wholesale, matching how platform.openai.com is already
handled for the same reason. A 403 from bot blocking is indistinguishable
from a removed page, so the checker cannot produce a meaningful signal for
this domain either way.

* ag-ui: simplify raw_messages assignment and drop OrderedDict

- Replace list(cast(...)) with a typed annotation for raw_messages
  (_agent_run.py:866) per review suggestion
- Replace OrderedDict with a plain dict in InMemoryAGUIThreadSnapshotStore
  (_snapshots.py:136); regular dicts are insertion-order-safe since
  Python 3.7, so OrderedDict is unnecessary. Update _evict_oldest to use
  next(iter(...)) for FIFO removal instead of popitem(last=False).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #2458: review comment fixes

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-06-12 08:29:38 +00:00

76b2b1bf39

Address comments

Tao Chen · 2026-06-11 16:54:42 -07:00

422f7e7382

Python: Bug fix for declarative workflows (#6468 )

* Fix declarative object parsing bug

* Remove unnecessary comment

* Address PR comments

* Address PR comments.

* Fix CI failures.

Peter Ibekwe · 2026-06-11 22:34:15 +00:00

e7937947d9

Merge branch 'main' into local-branch-python-add-reset-to-workflow

Tao Chen · 2026-06-11 15:28:27 -07:00

96af0cd15a

Address comments

Tao Chen · 2026-06-11 15:27:36 -07:00

b0d0224ed4

Python: Integrate shell tool into harness agent (#6451 )

* Integrate shell tool into AgentHarness

* Validate shell_executor exposes as_function() with a clear TypeError

Addresses PR review feedback: a public factory should fail fast with an
actionable error rather than a cryptic AttributeError when an incompatible
shell_executor is supplied. Validation happens upfront, regardless of whether
the client supports shell tools.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Type shell harness params via TYPE_CHECKING import

Addresses PR review feedback: type shell_executor and
shell_environment_provider_options instead of Any, using a TYPE_CHECKING
import from agent_framework_tools.shell. The import never executes at
runtime, so there is no circular dependency, and the lazy runtime import of
ShellEnvironmentProvider is retained. Since ShellExecutor is a protocol
without as_function(), the validated getattr result is invoked directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

westey · 2026-06-11 20:51:59 +00:00

3d5421edc1

Add tests

Tao Chen · 2026-06-11 11:43:41 -07:00

ed27241543

Remove reset

Tao Chen · 2026-06-11 11:14:36 -07:00

9da83347c8

Python: Add tool approval middleware (#6414 )

* Add Python tool approval middleware

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix tool approval restored state handling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Gate hidden approvals on explicit approval responses

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Handle string inputs in approval replay scan

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Cover argument-scoped approval rules

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refine tool approval state and budgets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix tool approval PR CI failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert DevUI Aspire README link change

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-11 17:35:44 +00:00

df29af611c

Python: [Generated by SRE Agent] Fix MCP allowed_tools empty list handling (#6296 )

* Fix MCP allowed_tools empty list handling

When allowed_tools is set to an empty list [], the falsy check
'if not self.allowed_tools' incorrectly treats it as unconfigured
(same as None), causing all tools to be exposed. Change to an
explicit 'is None' check so that an empty list correctly results
in no tools being allowed.

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>

* Clarify allowed_tools docstring: None vs [] semantics

Per Eduard's review on PR #6296: explicitly document that None exposes all tools and [] exposes none, across all four MCPTool / MCPStdioTool / MCPStreamableHTTPTool / MCPWebsocketTool docstrings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* allowed_tools docstring: recommend load_tools=False for full disable

Per Eduard's follow-up on PR #6296: `load_tools=False` is the cleaner idiom when you don't want to expose any tools. Reframe `allowed_tools=[]` in the docstring as a runtime guard / inspection-only path and cross-reference `load_tools`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>
Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

chetantoshniwal · 2026-06-11 06:46:46 +00:00

4149f24791

Add create checkpoint to workflow interface

Tao Chen · 2026-06-10 22:20:10 -07:00

6534a739d0

Fix checkpoint ancestry bug

Tao Chen · 2026-06-10 16:36:15 -07:00

0e3831192a

Python: HarnessAgent: Disable compaction when max tokens not provided (#6410 )

* HarnessAgent: Disable compaction when max tokens not provided

* Fix regression.

* Address PR comments

* Require max_output_tokens to be positive

Reject max_output_tokens=0 (must be positive), mirroring
max_context_window_tokens. Addresses PR review feedback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

westey · 2026-06-10 13:57:23 +00:00

8dde9ef627

Python: Parse MCP CallToolResult.structuredContent field to prevent tool results returning None (#6421 )

* Parse structuredContent from MCP CallToolResult (#3313)

The _parse_tool_result_from_mcp method only iterated over the content
field from CallToolResult, ignoring the structuredContent field entirely.
MCP servers that return JSON data via structuredContent (e.g., Power BI
MCP) appeared to return None.

Add handling for structuredContent: when present, serialize it as JSON
text and append it to the result list. This preserves the data for the
LLM while maintaining backward compatibility with existing behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Parse MCP CallToolResult.structuredContent field to prevent tool results returning None

Fixes #3313

* Address review feedback: add default=str to json.dumps and remove .checkpoints/

- Add default=str to json.dumps for structuredContent serialization so
  non-JSON-serializable values (e.g. bytes) degrade gracefully instead
  of raising TypeError
- Remove all .checkpoints/ runtime artifacts from the repository
- Add **/.checkpoints/ to .gitignore to prevent future accidental commits
- Add test for non-serializable structuredContent values

Fixes #3313

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #3313: Python: MCP CallToolResult.structuredContent field is not parsed, causing tool results to return None

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-06-10 12:51:09 +00:00

93cbf6b3f0

Python: [BREAKING] Add sampling guardrails to MCP tools (#6413 )

* Add sampling guardrails to MCP tools

Add approval, token, and request-count controls to the MCP sampling
callback used when an MCPTool is configured with a chat client.

- Add `sampling_approval_callback`, `sampling_max_tokens`, and
  `sampling_max_requests` parameters to `MCPTool` and its
  `MCPStdioTool`, `MCPStreamableHTTPTool`, and `MCPWebsocketTool`
  subclasses, positioned directly after `client`.
- Gate each server-initiated `sampling/createMessage` request behind the
  approval callback, which denies by default when no callback is provided.
- Clamp the requested `maxTokens` to `sampling_max_tokens` and enforce a
  per-session request count via `sampling_max_requests`.
- Log incoming sampling requests at WARNING level (counts only).
- Export `SamplingApprovalCallback` from the public API.
- Add tests, a sample, and documentation updates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make sampling denial message context-aware

Distinguish the deny-by-default case (no approval callback configured)
from an explicit denial by a configured `sampling_approval_callback`, so
the returned ErrorData message is accurate for callback-driven denials
and exceptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-10 10:17:36 +00:00

9a56bc9f16

Python: bump package versions for 1.8.1 release (#6420 )

* Python: bump package versions for 1.8.1 release

* Python: bump agent-framework-foundry-hosting for 1.8.1 release

* Python: bump ag-ui and azurefunctions for 1.8.1 release

* Remove incorrect agent-framework-foundry changelog entry for #6259

* Add [1.8.1] changelog compare link and update [Unreleased] base

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Copilot · 2026-06-09 21:27:42 +00:00

3daed114ee

Address comments

Tao Chen · 2026-06-09 11:02:17 -07:00

47012f1dcf

Purview: Parallelize PSPC cold-cache scope refresh (#5832 )

* Parallelize Purview PSPC cold cache path

* Cache Purview payment-required state for scope refresh

* Cache Purview payment-required state for scope refresh

* Align Purview policy action dedupe and 402 caching

 Deduplicate combined policy actions by action and restriction action so restriction-only actions are preserved
without duplicating identical entries. Cache tenant-level payment-required state from background scope refresh so
subsequent calls short-circuit consistently.

* .NET: Implement best-effort caching for background job scope retrieval and add unit tests for cache write failures

* Purview - feat: Enhance ScopedContentProcessor to queue ContentActivityJob when no applicable scopes are found and update related tests

* docs: Update purview package README and AGENTS documentation to reflect caching optimizations and policy enforcement scenarios

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Taisir Hassan · 2026-06-09 18:01:21 +00:00

383d551b86

Python: [Generated by SRE Agent] docs: clarify checkpoint storage security model and deserialization trust boundaries (#6295 )

* docs: clarify checkpoint storage security model and deserialization trust boundaries

Add Security Model documentation sections to the checkpoint encoding and
Azure Functions serialization modules explaining:
- Checkpoint storage is a trusted data source requiring access controls
- The RestrictedUnpickler allowlist is defense-in-depth, not a security boundary
- Developer responsibilities for securing storage backends
- Guidance on using allowed_types and strip_pickle_markers

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Azure SRE Agent <noreply@microsoft.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

chetantoshniwal · 2026-06-09 16:53:48 +00:00

632f67b92e

Merge branch 'main' into local-branch-python-add-reset-to-workflow

Tao Chen · 2026-06-09 09:38:45 -07:00

4e623d561f

Remove lifecycle flag

Tao Chen · 2026-06-09 09:27:21 -07:00

6eb85477fc

Python: fix: use getattr for non-OpenAI provider response compatibility (#6270 )

* fix: use getattr for non-OpenAI provider response compatibility

Fixes #6234
Fixes #6235

Use getattr with None fallback for system_fingerprint and output
attributes to prevent AttributeError when non-OpenAI providers
return response objects without these fields.

* fix: use typed variable for response output to satisfy pyright

Fixes #6235

Use getattr with None fallback for the output attribute, and assign
to a typed list variable before the match statement to help pyright
narrow the response item types correctly.

* fix: rename response_outputs to avoid name collision with case-block variable

Fixes #6235

Rename outputs to response_outputs on line 1974 to avoid mypy error
about conflicting variable names in the match statement's case blocks.
Also use list[Any] for explicit generic type annotation.

* fix: use cast(list[Any]) for response output to satisfy pyright

Fixes #6235

The getattr() call returns Unknown type which pyright cannot narrow
in the match statement. Use an explicit cast to list[Any].

* fix: use hasattr guard instead of getattr for response.output

Fixes #6235

Using hasattr(response, 'output') and then accessing response.output
directly gives pyright enough type information to verify the match
statement exhaustiveness. This avoids the cast(list[Any]) approach
which pyright still flagged as partially unknown.

* fix: use ternary operator for response_outputs assignment

Replace if-else block with ternary expression to satisfy ruff SIM108 lint rule.
This fixes the Package Checks (3.11) CI failure.

* fix: use ternary with cast for ruff SIM108 and pyright type safety

Replace if-else block with ternary expression using cast(list[Any], ...)
to satisfy:
- ruff SIM108 (use ternary instead of if-else)
- ruff E501 (line length < 120)
- pyright type narrowing (cast preserves type info lost in ternary)

All local checks pass: ruff check, ruff format, pyright, 298 tests.

* fix: replace hasattr+cast with try/except to preserve pyright types

---------

Co-authored-by: Tao Chen <taochen@microsoft.com>

Willow Lopez · 2026-06-09 15:17:39 +00:00

29cec0d27b

Python: Filter MCP tool kwargs to declared params via allowlist (#6399 )

* Filter MCP tool kwargs to declared params via allowlist

Previously MCPTool combined framework runtime kwargs (from
FunctionInvocationContext.kwargs) with the LLM-supplied arguments and
stripped only a hardcoded denylist of known framework keys before
forwarding to the MCP server. Any new framework-injected kwarg leaked to
the server unless the denylist was updated.

Switch to an allowlist built from each tool's declared parameters
(inputSchema.properties). Only declared params are forwarded; everything
else is stripped. Add an `additional_tool_argument_names` constructor
argument so users can opt extra names back in, globally (Sequence[str])
and/or per remote tool name (Mapping with reserved "*" global key). The
existing denylist is kept as a safety net for framework-named params a
server declares in its schema; explicitly opted-in extras always win. The
reserved _meta handling is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address MCP allowlist review comments and fix reload arg loss

- Fix pyright reportUnknownArgumentType in _load_tools (cast schema properties).
- Register declared param names before the existing-tool skip guard so that
  tool-list reloads preserve the allowlist for already-loaded tools (previously
  unchanged tools silently dropped all declared args after a background reload).
- Handle bare-string values in an additional_tool_argument_names mapping instead
  of iterating their characters.
- Clarify the framework denylist comment: explicit extras override the denylist.
- Make the extras-override-denylist test unambiguous (opt in a denylisted name).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-09 07:37:11 +00:00

cfb033e5d4

Python: feat(claude): bump claude-agent-sdk to 0.2.87 (#6248 )

* feat(claude): bump claude-agent-sdk to 0.2.87

Upgrade claude-agent-sdk dependency from >=0.1.36,<0.1.49 to >=0.2.87,<0.3.

Changes:
- Bump version pin in pyproject.toml
- Add 'xhigh' effort level to ClaudeAgentOptions (Opus 4.7 specific)
- Expose new upstream SDK options: skills, session_id, task_budget,
  include_hook_events, strict_mcp_config, continue_conversation,
  fork_session
- Add TaskBudget type import
- Update uv.lock

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: lower claude-agent-sdk floor to >=0.1.36

Keep the lower bound at 0.1.36 since the 0.1→0.2 transition was additive
and our code works on older versions as long as new options aren't used.
This avoids forcing unnecessary upgrades on existing users.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace TaskBudget import with inline type for SDK compat

TaskBudget was added in claude-agent-sdk 0.2.93 but does not exist in
0.2.87. Use dict[str, int] inline type instead so type checking passes
against 0.2.87. Lock file pinned to 0.2.87.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-06-09 06:01:55 +00:00

e89e745bc0

Python: Fix per-service-call history persistence with server-storing clients (#6310 )

* Fix per-service-call history persistence with server-storing clients

When an Agent set require_per_service_call_history_persistence=True together
with a HistoryProvider, and the chat client stored history server-side by
default (e.g. OpenAIChatClient, STORES_BY_DEFAULT=True), the external history
provider was silently never persisted.

Unify persistence on the per-service-call middleware: when the flag is set and
a HistoryProvider exists, the middleware is always installed and owns
persistence. service_stores_history now only selects middleware behavior:
- service does not store: load providers and drive the function loop with a
  local sentinel conversation id, or
- service stores: skip loading (the service owns history) and persist each
  service call while the real conversation id flows through.

Also rationalize chat-options handling in _prepare_run_context:
- _merge_options now skips None overrides and strips remaining None values, so
  an unset `store` is never forwarded and the service decides its own default.
- Resolve `store` and `conversation_id` once from a single combined view
  (effective_options) instead of probing both default and runtime dicts; the
  auto-injection and per-service-call resolution now agree on conversation_id.

Fixes #5798

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Correct as_agent() docstring: persistence is per service call, not once per run

Address PR review: when the client stores history server-side, the
per-service-call middleware still persists after each model call; only
provider loading is skipped. The previous "persist once per run()" wording
contradicted the implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: docs, missing-conversation-id warning, and tests

- Clarify that require_per_service_call_history_persistence is a no-op when no
  HistoryProvider is present (docstrings in _agents.py and _clients.py).
- Warn on every service call when the client stores history server-side but
  returns no conversation_id, so the (uncommon) loss of cross-turn resumability
  cannot fail silently.
- Add tests: storing client + existing conversation_id does not raise and the id
  propagates; two runs on the same session keep persisting with a stable
  service_session_id and no provider loading; storing-without-conversation-id
  warns per call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-09 05:47:57 +00:00

7e0767a0a0

Fix tests and address comments

Tao Chen · 2026-06-08 16:30:59 -07:00

568afdd293

Add reset to hosted workflow

Tao Chen · 2026-06-08 14:57:01 -07:00

5910a6f869

Add reset to workflow

Tao Chen · 2026-06-08 13:42:42 -07:00

65522bdbee

Match AG-UI approval responses to requested arguments (#6376 )

Evan Mattson · 2026-06-08 16:33:16 +00:00

9bc7b27813

Python: fix(mem0): isolate entity retrieval and correct app_id payload (#6242 )

* fix(mem0): parallel memory retrieval logic and strict type compliance

* fix(mem0): align parallel retrieval types for pyright and mypy

* fix(mem0): handle asyncio.CancelledError in search response and update test description

* fix(mem0): improve error handling for asyncio.CancelledError and update test names for clarity

* fix(mem0): improve retrieval response handling

Vedant Sonani · 2026-06-08 13:50:23 +00:00

6169df04cb

Move runner state management out of Workflow

Tao Chen · 2026-06-05 16:29:19 -07:00

c5e6a7797f

Python: feat(python): Add MCP client OTel spans per GenAI semantic conventions (#6349 )

* feat(python): Add MCP client OTel spans per GenAI semantic conventions

Implement MCP client spans per the OTel GenAI Semantic Conventions for MCP
(https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/#client).

Operations instrumented:
- initialize: CLIENT span capturing MCP session setup
- tools/list: CLIENT span for tool listing (per-page)
- prompts/list: CLIENT span for prompt listing (per-page)
- tools/call: CLIENT span (nested under execute_tool when called via FunctionTool)
- prompts/get: CLIENT span

Span attributes follow the MCP semantic conventions:
- Required: mcp.method.name
- Conditional: error.type, gen_ai.tool.name, gen_ai.prompt.name
- Recommended: gen_ai.operation.name, mcp.protocol.version, mcp.session.id,
  network.transport, server.address, server.port

Transport-specific attributes per subclass:
- MCPStdioTool: network.transport=pipe
- MCPStreamableHTTPTool: network.transport=tcp, network.protocol.name=http
- MCPWebsocketTool: network.transport=tcp, network.protocol.name=websocket

All span creation gated behind OBSERVABILITY_SETTINGS.ENABLED.

Closes #3624
Closes #4697

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: simplify MCP spans — remove enrichment logic and protocol version caching

- Always create nested CLIENT spans for tools/call instead of enriching
  the parent execute_tool span
- Remove _ACTIVE_TOOL_EXECUTION_SPAN contextvar (no longer needed)
- Remove enrich_span_with_mcp_attributes() helper
- Remove _otel_error_type preservation in FunctionTool.invoke()
- Remove _mcp_protocol_version instance variable; protocol version is
  only set on the initialize span where it is available

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refine copilot solution

* fix: enable automatic exception recording on MCP spans

Remove record_exception=False and set_status_on_exception=False from
create_mcp_client_span. Let OTel handle exception recording and status
setting automatically. The manual set_mcp_span_error calls for tools/call
still correctly set error.type (which OTel's automatic handling doesn't
touch), so tool_error is preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce number of lines

* Add comment to sample

* test: address PR review comments on MCP observability tests

- Fix initialize test to call mocked session.initialize() and read
  protocolVersion from the result instead of hardcoding it
- Add tools/call McpError error-path test
- Add prompts/get McpError error-path test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix export error

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Tao Chen · 2026-06-05 19:23:01 +00:00

dcc218dbac

Python: Refactor workflow as agent pending request handling (#6259 )

* WIP: Refactor Workflow as agent pending request handling

* WIP: debugging empty message bug

* Working: Workflow as agent with function approval

* Address Copilot comments

* Fix mypy

* Address comments and fix pipeline

* Request info non function approval now becomes function call

* Revert uv.lock

* Fix mypy

* Bump min version of azure-ai-project

* Remove RequestInfoFunctionArgs

* fix tests

* Fix failing tests

* Fix sample

Tao Chen · 2026-06-05 17:23:19 +00:00

9cafd7e58b

Python (fix:gemini): make Gemini honor declarative outputSchema, not just JSON mode (#5893 )

* fix(gemini): preserve schema response_format

* fix(gemini): satisfy pyright strict in response schema extraction

Cast Any-narrowed mappings to Mapping[str, Any] in the structured-output
schema helpers so pyright strict no longer reports partially-unknown
member, argument, and variable types. Pass response_format["format"]
straight into the recursive extractor, which already guards non-mapping
inputs. No behavior change.

* fix(gemini): use Sequence[object] cast to satisfy both mypy and pyright

The Sequence[Any] cast pyright strict needs to know the loop element type
is reported as a redundant-cast by mypy, which already narrows the
isinstance branch to Sequence[Any]. Cast to Sequence[object] instead:
pyright gets a fully known element type and mypy no longer sees an
identical-type cast. No behavior change.

---------

Co-authored-by: Evan Mattson <evan.mattson@microsoft.com>

cooleryu · 2026-06-05 15:17:51 +00:00

d5335fbeae

Python: MCP long-running task support in Python (#6319 )

* MCP long-running task support in Python

* Fix pyupgrade and AGENTS.md reconnect description

- pyupgrade: drop forward-reference string annotations in _mcp.py (Python 3.10+ resolves them natively now that MCPTaskOptions is defined before use).

- AGENTS.md: align reconnect description with current behavior. Phase 1 (initial tools/call) does NOT retry on connection loss; raises 'connection lost; task state unknown' instead, so a server that accepted the request but lost the response cannot start the operation twice. Phase 2 (tasks/get / tasks/result) still reconnects once against the same task_id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix bandit nosec marker for CI pipeline

* Address PR feedbacks

* Clarifiied comments and addressed more PR feedbacks.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Peter Ibekwe · 2026-06-05 00:04:55 +00:00

bf4ad48cf2

Python: bump package versions for 1.8.0 release (#6351 )

- Released cohort (core, openai, foundry, root): 1.7.0 -> 1.8.0
- agent-framework-github-copilot: promote to RC (1.0.0rc1)
- agent-framework-orchestrations: rc2 -> rc3 (bug fix)
- Beta/alpha packages with changes: a2a, anthropic, azurefunctions, bedrock,
  foundry-hosting, mistral bumped to new date stamp (260604)
- Inter-package dependency bounds updated for changed packages
- CHANGELOG.md and PACKAGE_STATUS.md updated

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-06-04 23:03:24 +00:00

01fc518b29

Python: Fix toolbox consent flow in hosted agent (#6249 )

* Fix toolbox consent flow in hosted agent

* Resolve conflict

* Make unused tool as comment

* Fix tests

Tao Chen · 2026-06-04 20:28:59 +00:00

dbc312a78a

Python: Add timeout parameter to FoundryAgent to fix ConnectTimeout on multi-turn conversations (#6263 )

* Python: fix ConnectTimeout on multi-turn FoundryAgent conversations (#6241)

Expose a `timeout` parameter on `RawFoundryAgentChatClient`,
`_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`, and
`RawOpenAIChatClient` so callers can override the HTTP timeout used by
the underlying AsyncOpenAI client.

Root cause: `RawFoundryAgentChatClient.__init__` called
`project_client.get_openai_client()` without configuring any timeout,
inheriting the OpenAI SDK default of `httpx.Timeout(connect=5.0)`.
When connections are recycled between turns under load, the 5 s connect
timeout fires and surfaces as `openai.APITimeoutError`.

Fix:
- `load_openai_service_settings` (`_shared.py`): accept `timeout` and
  include it in `client_args` for all three `AsyncOpenAI`/
  `AsyncAzureOpenAI` construction paths.
- `RawOpenAIChatClient.__init__` (`_chat_client.py`): accept `timeout`
  and forward to `load_openai_service_settings`.
- `RawFoundryAgentChatClient.__init__` (`_agent.py`): accept `timeout`
  and set `openai_client.timeout = timeout` on the client returned by
  `get_openai_client()` before passing it to the base class.
- `_FoundryAgentChatClient`, `RawFoundryAgent`, `FoundryAgent`: accept
  and propagate `timeout` through the construction chain.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add timeout parameter to FoundryAgent and RawOpenAIChatClient

Expose a timeout parameter on RawFoundryAgentChatClient,
_FoundryAgentChatClient, RawFoundryAgent, FoundryAgent, and
RawOpenAIChatClient. When provided, the value is applied to the
underlying AsyncOpenAI client so that connect timeouts under load
or after connection recycling can be tuned by callers.

Previously, get_openai_client() was called without any timeout
override, so the SDK default of httpx.Timeout(connect=5.0) was
inherited and could fire on multi-turn conversations where the
underlying connection is recycled between turns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Add `timeout` parameter to `FoundryAgent` to fix `ConnectTimeout` on multi-turn conversations

Fixes #6241

* fix(foundry): use with_options to avoid mutating shared OpenAI client timeout (#6241)

Replace direct assignment  with
 in
RawFoundryAgentChatClient.__init__.

The Azure AI Projects SDK caches and returns a shared AsyncOpenAI client
per AIProjectClient. Mutating its .timeout attribute leaked the override
to all other code paths sharing that client (other agents, user code).
with_options() returns a new client instance with the override applied,
leaving the original shared client untouched.

Update tests to assert with_options is called with the correct timeout
and that the original shared client's timeout attribute is not mutated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry): assert with_options return value flows to instance.client (#6241)

The four timeout propagation tests verified that with_options was called
but did not confirm that the returned (timeout-configured) client was
actually stored on the instance. A silent discard of the return value
would have left the tests green while the timeout had no effect.

Each test now captures the constructed instance and asserts:
  assert <instance>.client is openai_client_mock.with_options.return_value

Affected tests:
- test_raw_foundry_agent_chat_client_init_applies_timeout_to_openai_client
- test_raw_foundry_agent_chat_client_init_applies_timeout_with_preview_enabled
- test_foundry_agent_chat_client_init_propagates_timeout
- test_foundry_agent_init_propagates_timeout_to_openai_client

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-06-04 18:25:18 +00:00

6b94315161

fix: drop hosted MCP calls when reasoning is stripped (#6210 )

Yufeng He · 2026-06-04 18:11:24 +00:00

bc0e65d716

Python: Fix spurious Magentic custom manager warning (#6261 )

* Fix magentic manager warning

* Use typing_extensions.Sentinel for _MISSING sentinel value

Replace the bare object() sentinel with typing_extensions.Sentinel per
PEP 661 (now final). Sentinel provides a proper name and repr
('<_MISSING>') and is the idiomatic approach going forward.

Refs #4306

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct Sentinel type annotation for max_stall_count param (#6261)

Use int | Sentinel for max_stall_count parameter type annotation instead
of int with cast(Any, _MISSING) to properly express that the parameter
can hold either an int or the _MISSING sentinel value. This fixes the
pyright reportUnnecessaryComparison errors caused by the types int and
Sentinel having no overlap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename _MISSING sentinel to UNSET in orchestrations

The sentinel is user-visible as a default in public init signatures, so
use UNSET (no leading underscore) instead of the private _MISSING name.
Drop the now-unnecessary reportPrivateUsage ignores on the UNSET imports.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evan Mattson · 2026-06-04 08:59:04 +00:00

4268080c20

Python: [BREAKING] Upgrade github-copilot-sdk to v1.0.0 (stable) (#6292 )

* Python: Upgrade github-copilot-sdk to v1.0.0 (stable)

Upgrade agent-framework-github-copilot from github-copilot-sdk 1.0.0b2 to the
stable 1.0.0 release, adapting to all breaking API changes.

Source changes (_agent.py):
- SubprocessConfig removed: use RuntimeConnection.for_stdio(path=...) +
  CopilotClient kwargs (connection, log_level, base_directory)
- Import paths: copilot.generated.session_events -> copilot.session_events
- Settings: copilot_home -> base_directory (env GITHUB_COPILOT_BASE_DIRECTORY)
- Default deny handler: PermissionDecisionUserNotAvailable() (from
  copilot.generated.rpc)

Test changes:
- Updated imports and client-construction assertions (kwargs-based)
- Permission handler tests use concrete decision types
  (PermissionDecisionApproveOnce, PermissionDecisionDeniedInteractivelyByUser)

Sample changes:
- Permission handlers use PermissionHandler.approve_all or sync
  approve_and_log pattern (v1.0.0 protocol v3 dispatch is incompatible
  with blocking input() in permission handlers)
- Function approval sample uses asyncio.to_thread for interactive prompts
- Simplified imports across all samples

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: scope permission handlers, widen type, add test

- Shell sample: only approve kind='shell', deny others
- URL sample: only approve kind='url', deny others
- Use getattr() for kind-specific attributes to satisfy pyright
- Widen PermissionHandlerType to accept async handlers (matches SDK)
- Add test for _deny_all_permissions return value

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix validation script and strengthen test assertion

- Update scripts/sample_validation/create_dynamic_workflow_executor.py to
  use copilot.session_events imports and PermissionHandler.approve_all
- Assert isinstance(result, PermissionDecisionUserNotAvailable) instead of
  stringly-typed kind check

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add integration tests for GitHubCopilotAgent

Add 6 integration tests mirroring .NET coverage:
- Basic non-streaming response
- Streaming response
- Function tool invocation
- Session context (multi-turn)
- Session resume by ID
- Shell command execution

Tests require COPILOT_GITHUB_TOKEN env var (skipped otherwise).
Each test cleans up its Copilot session via delete_session.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Giles Odigwe · 2026-06-04 08:42:35 +00:00

fe08574a7c

Python: Fix compaction message-id collisions and tool-loop summary persistence (#6299 )

* Fix compaction message-id collisions and tool-loop summary persistence

Fixes two bugs in the compaction strategies:

- #5237: incremental group annotation assigned message ids by position
  within the re-annotated slice, so moving the re-annotation start back to
  a previous group start restarted ids at 0 and produced collisions
  (e.g. a user message reusing an assistant message's id), merging groups
  and causing tool-result compaction to wrongly exclude messages.
  group_messages/_ensure_message_ids now take an id_offset and guard
  against existing-id collisions; annotate_message_groups threads the
  slice start index through as the offset.

- #4991: the function-invocation loop copied the message list each
  iteration, so summaries inserted by compaction landed in a throwaway
  copy and were lost across tool-loop iterations (only the persistent
  excluded flags survived). _prepare_messages_for_model_call now compacts
  the list in place when messages is a list, so inserted summaries persist.

Adds regression tests (incremental id uniqueness, existing-id collision
avoidance, idempotency, and tool-loop summary persistence including
streaming and conversation-id modes).

Also adds a summarization.py sample demonstrating SummarizationStrategy
directly with a real client, and reworks advanced.py with tool-call
groups and a real summarizer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Guard incremental message-id assignment against prefix-id collisions

Addresses PR review on #5237: _ensure_message_ids only guarded against
collisions within the re-annotated slice. A preexisting (e.g. user-supplied)
id in the preserved prefix could still be reassigned in the suffix when the
id was numerically out of position, merging groups across the re-annotation
boundary again.

group_messages/_ensure_message_ids now accept reserved_ids, and
annotate_message_groups passes the preserved prefix's ids so auto-assigned
suffix ids never collide across the full list. Adds a regression test
reproducing the out-of-position prefix-id collision.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Eduard van Valkenburg · 2026-06-04 08:37:59 +00:00

f970a699d8

Python: run sync tools off the event loop (#5773 )

* fix: run sync tools off event loop

* chore: silence harness tool marker type check

Yufeng He · 2026-06-04 04:42:08 +00:00

f29bae8fbc

Python: Add MCP-based skills discovery (McpSkillsSource) (#6169 )

* Add MCP-based skills discovery (McpSkill, McpSkillsSource, McpSkillResource)

Implement Agent Skills discovery over MCP following the SEP-2640 convention:
- McpSkillsSource: reads skill://index.json to discover skills served by an MCP server
- McpSkill: lazily fetches SKILL.md content via resources/read on demand
- McpSkillResource: wraps MCP resource results (text and binary)
- Path traversal protection in get_resource for defense in depth
- Samples for Foundry Toolbox and standalone MCP skills server
- Comprehensive unit tests (514 lines)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review comments: rename to MCP* convention, fix error handling and samples

- Rename McpSkill/McpSkillResource/McpSkillsSource to MCPSkill/MCPSkillResource/MCPSkillsSource
- Add data-URI prefix stripping for blob resource decoding
- Let non-McpError exceptions propagate from get_resource()
- Fix contradictory test comment
- Use interactive input() in mcp_based_skill sample
- Remove misleading sample output block

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore debug logging for McpError in get_resource()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use AzureCliCredential in Foundry toolbox skills sample for consistency

Replace DefaultAzureCredential with AzureCliCredential to match the
credential convention used in all other samples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Use MCPStreamableHTTPTool in MCP skills sample

Replace raw mcp library imports (ClientSession, streamable_http_client)
with the framework's MCPStreamableHTTPTool to keep MCP server connections
consistent regardless of whether skills are enabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Branch on McpError.error.code so only not-found errors return empty

Previously _try_read_index() and get_resource() swallowed every McpError
as 'no skills available', making auth failures, server crashes, and
connection drops indistinguishable from a server that simply has no
skills.

Now only two codes are treated as not-found:
- -32002 (MCP-spec Resource not found)
- -32601 (METHOD_NOT_FOUND — server lacks resources/read)

All other McpError codes and non-McpError exceptions propagate with a
warning log, surfacing real failures visibly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add tests for non-McpError and non-not-found error propagation in MCP skills

Cover the re-raise branch in MCPSkill.get_resource for plain
ConnectionError/TimeoutError, the generic McpError (code 0) propagation
on get_resource, and TimeoutError propagation in _try_read_index.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert "Use MCPStreamableHTTPTool in MCP skills sample"

This reverts commit f31ed0ded9.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Introduce MCP_SKILLS experimental feature for MCP skill classes

Add a separate MCP_SKILLS feature ID to ExperimentalFeature enum and
use it for MCPSkillResource, MCPSkill, and MCPSkillsSource, since their
promotion timeline is partly outside of our control.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

semenshi-m · 2026-06-03 18:09:50 +00:00

c6951c21f6

928 Commits