Files
Evan Mattson 76b2b1bf39 Python: Add opt-in AG-UI thread snapshot persistence and hydration (#6471)
* feat(ag-ui): add thread snapshot store primitives

Key decisions:\n- Introduce an AGUIThreadSnapshot model limited to replayable messages, optional Shared State, and optional interrupt state.\n- Define AGUIThreadSnapshotStore as an async protocol keyed by explicit Snapshot Scope and AG-UI Thread id.\n- Add InMemoryAGUIThreadSnapshotStore as memory-only, latest-only, bounded local/demo/test storage; no file-backed store is introduced.\n- Require snapshot_scope_resolver whenever an endpoint is configured with a snapshot store, including pre-wrapped runners, so thread ids are not authorization boundaries.\n\nFiles changed:\n- packages/ag-ui/agent_framework_ag_ui/_snapshots.py\n- packages/ag-ui/agent_framework_ag_ui/__init__.py\n- packages/ag-ui/agent_framework_ag_ui/_agent.py\n- packages/ag-ui/agent_framework_ag_ui/_workflow.py\n- packages/ag-ui/agent_framework_ag_ui/_endpoint.py\n- packages/core/agent_framework/ag_ui/__init__.py\n- packages/core/agent_framework/ag_ui/__init__.pyi\n- packages/ag-ui/tests/ag_ui/test_snapshots.py\n- packages/ag-ui/tests/ag_ui/test_endpoint.py\n- packages/ag-ui/tests/ag_ui/test_public_exports.py\n- packages/ag-ui/AGENTS.md\n\nVerification:\n- uv run pytest packages/ag-ui/tests/ag_ui/test_snapshots.py packages/ag-ui/tests/ag_ui/test_public_exports.py packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_store_configured packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_accepts_snapshot_store_with_scope_resolver -q\n- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_store_configured packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_requires_snapshot_scope_resolver_when_wrapped_runner_has_store packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_accepts_snapshot_store_with_scope_resolver -q\n- uv run poe syntax -P ag-ui -C\n- uv run poe pyright -P ag-ui\n- uv run poe syntax -P core -C\n- uv run poe pyright -P core\n- uv run poe typing -P ag-ui\n- uv run poe typing -P core\n- uv run poe test -P ag-ui\n- uv run poe check -P ag-ui\n- git diff --check\n- git diff --cached --check\n\nBlockers / next iteration:\n- No blockers. Next slice can use the store contract to capture and hydrate agent snapshots.\n- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.\n- The poe-check commit hook was skipped after manual verification because it reformatted unrelated core MCP files outside this task.

* feat(ag-ui): hydrate agent threads from snapshots

Key decisions:
- Resolve Snapshot Scope per endpoint request and pass it to the AG-UI runner only when snapshot storage is active.
- Treat empty messages with no resume payload as an agent Hydrate Request when a scoped snapshot store is configured, replaying stored Shared State and message snapshots without invoking the wrapped agent.
- Save the latest replayable agent message snapshot and Shared State at normal completion under Snapshot Scope plus AG-UI Thread id; no durable or file-backed store is introduced.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_agent_run.py
- packages/ag-ui/agent_framework_ag_ui/_endpoint.py
- packages/ag-ui/agent_framework_ag_ui/_snapshots.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_empty_messages packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe test -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can reconstruct normal new-user agent turns from stored snapshots.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshed unrelated uv.lock dependency resolution.

* feat(ag-ui): reconstruct agent turns from snapshots

Key decisions:
- Load scoped thread snapshots for non-hydrate agent requests only when snapshot storage is active and no resume payload is present.
- Rebuild prior AG-UI history from stored snapshot messages, preserving the incoming new user suffix and treating stored snapshot content as authoritative over conflicting prior client history.
- Merge stored Shared State with request state overrides before schema defaults and existing state-context injection.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_agent_run.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_prepends_stored_snapshot_for_new_user_turn -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_deduplicates_full_history_and_merges_fresh_state -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_endpoint_empty_messages packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_stored_thread_snapshot_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_snapshots_by_scope_and_thread packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_prepends_stored_snapshot_for_new_user_turn packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_deduplicates_full_history_and_merges_fresh_state -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe check -P ag-ui
- uv run poe typing -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can enable workflow AG-UI Thread Snapshot persistence and hydration.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* feat(ag-ui): hydrate workflow threads from snapshots

Key decisions:
- Handle workflow Hydrate Requests before resolving or invoking the wrapped workflow when snapshot storage and Snapshot Scope are active.
- Capture only replayable workflow protocol data: workflow-emitted state snapshots, workflow-emitted message snapshots, and synthesized messages from text/tool output.
- Keep workflow snapshot capture inactive without configured persistence, and skip saving snapshots when the workflow stream emits RUN_ERROR.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_workflow.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_emitted_snapshots_without_invoking_workflow packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_synthesized_text_and_tool_snapshot -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run pytest packages/ag-ui/tests/ag_ui/golden/test_scenario_workflow.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can preserve interruption state and protect snapshots on errors across agent and workflow endpoints.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* feat(ag-ui): preserve interrupted thread snapshots

Key decisions:
- Capture workflow RUN_FINISHED interrupt metadata in replayable AG-UI Thread Snapshots so Hydrate Requests can restore pending workflow actions without invoking or resuming the workflow.
- Keep failed agent and workflow runs from replacing the last good snapshot; RUN_ERROR streams leave the previous snapshot available for hydration.
- Verify interruption hydration through endpoint-level AG-UI streams for both agent and workflow wrappers, including Shared State replay and no wrapped runner invocation.

Files changed:
- packages/ag-ui/agent_framework_ag_ui/_workflow.py
- packages/ag-ui/tests/ag_ui/test_endpoint.py

Verification:
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_interrupted_thread_without_invoking_workflow -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_hydrates_interrupted_thread_without_invoking_agent packages/ag-ui/tests/ag_ui/test_endpoint.py::test_agent_endpoint_run_error_does_not_overwrite_previous_snapshot packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_hydrates_interrupted_thread_without_invoking_workflow packages/ag-ui/tests/ag_ui/test_endpoint.py::test_workflow_endpoint_run_error_does_not_overwrite_previous_snapshot -q
- uv run pytest packages/ag-ui/tests/ag_ui/test_endpoint.py -q
- uv run pytest packages/ag-ui/tests/ag_ui/golden/test_scenario_workflow.py -q
- uv run poe syntax -P ag-ui -C
- uv run poe pyright -P ag-ui
- uv run poe test -P ag-ui
- uv run poe typing -P ag-ui
- uv run poe check -P ag-ui
- git diff --check
- git diff --cached --check

Blockers / next iteration:
- No blockers. Next slice can document AG-UI Thread Snapshot security and usage.
- uv repeatedly refreshed azure-ai-projects in uv.lock during local runs; reverted the generated lockfile churn because this change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* docs(ag-ui): document thread snapshot security

Key decisions:
- Document AG-UI Thread Snapshot persistence as opt-in and disabled unless a snapshot_store is configured.
- Place Snapshot Scope guidance next to endpoint authentication guidance, making clear that AG-UI Thread ids identify threads but do not authorize snapshot access.
- Describe built-in storage as in-memory only, process-local, latest-only, and not durable production storage; durable stores remain app-owned implementations of AGUIThreadSnapshotStore.
- Call out snapshot confidentiality impact and that no file-backed AG-UI snapshot store is provided.

Files changed:
- packages/ag-ui/README.md

Verification:
- uv run python scripts/check_md_code_blocks.py packages/ag-ui/README.md --no-glob
- git diff --check
- git diff --cached --check
- commit hook without SKIP ran changed-package lint/format and AG-UI README markdown-code-lint successfully before stopping because uv.lock was modified
- uv run poe markdown-code-lint (failed due existing unrelated packages/mistral/README.md missing agent_framework_mistral import resolution; changed AG-UI README blocks passed)

Blockers / next iteration:
- No blockers. Local issue/PRD planning artifacts remain uncommitted.
- uv refreshed azure-ai-projects in uv.lock during markdown lint and the commit hook; reverted the generated lockfile churn because this documentation change does not alter dependencies.
- The poe-check commit hook was skipped after manual verification because it refreshes unrelated uv.lock dependency resolution.

* fix(ag-ui): harden thread snapshot persistence edge cases

- Persist the completed confirm_changes turn with interrupt=None so hydration
  no longer replays a stale pending interrupt after the user responds; resume
  requests prepend stored history so the persisted thread is not truncated.
- Defer endpoint default_state application to the runners when snapshot
  persistence is active, filling only keys missing from both the stored
  snapshot state and the request state so defaults never reset persisted
  Shared State.
- Always fold the turn's output into the persisted messages snapshot even when
  the outbound MESSAGES_SNAPSHOT event is suppressed for predictive tools
  without confirmation.
- Load the stored snapshot on workflow follow-up turns, reconstruct full
  thread history into the run input, and seed the snapshot builder with merged
  state so saving a new turn no longer replaces prior history.
- Move snapshot message reconstruction helpers to _run_common for reuse by the
  workflow runner; load stored agent snapshots on resume turns for state merge.
- Add endpoint regression tests for all four scenarios.

* fix(ag-ui): protect snapshot history on resume and harden suffix trust

- Prepend stored thread history when persisting snapshots for resume runs on
  both the agent and workflow paths, so a resumed interrupt no longer
  overwrites the stored thread with just the resume turn's output.
- Filter the incoming message suffix during thread reconstruction: only user
  turns and tool results answering backend-issued tool calls (stored tool
  calls or pending interrupts) may extend authoritative history. Client-forged
  assistant and tool messages are dropped and logged instead of being
  persisted and replayed.
- Close the workflow snapshot builder's tool-call group when a tool result or
  text message lands, so synthesized transcripts keep tool results adjacent to
  their tool_calls message and stay valid as provider replay history.
- Export DEFAULT_MAX_THREAD_SNAPSHOTS from agent_framework_ag_ui and expose
  SnapshotScopeResolver through the core ag_ui facade and stub.
- Add regression tests for agent and workflow resume history preservation,
  forged suffix rejection, builder tool-call grouping, and the export surface.

* fix(ag-ui): tolerate snapshot save failures and scope workflow cache

- Wrap snapshot_store.save() on both the agent and workflow paths so a
  transient store failure (timeout, connection refused) is logged instead of
  propagating. Previously a failing save converted an already-streamed
  successful run into RUN_ERROR, and on the workflow path emitted RUN_ERROR
  after RUN_FINISHED, violating the single-terminal-event invariant. The
  previous snapshot stays available for hydration.
- Key the workflow_factory instance cache by (snapshot_scope, thread_id). The
  Snapshot Scope is the authorization boundary, so the same thread id under
  different scopes no longer shares an in-memory workflow instance.
  clear_thread_workflow accepts an optional snapshot_scope and clears all
  scopes for the thread when omitted.
- Add tests: save-failure tolerance for agent and workflow endpoints,
  scope-isolated workflow cache, async snapshot_scope_resolver support, and
  in-memory store key validation errors.

* fix(ci): ignore all dotnet.microsoft.com links in linkspector

The existing ignore pattern only matched https://dotnet.microsoft.com/download,
but Microsoft sites insert a locale segment between host and path
(e.g. /en-us/download/dotnet/10.0), so localized links slip past the pattern
and get checked. dotnet.microsoft.com bot-blocks CI link checkers with
intermittent 403s across the whole site, which fails markdown-link-check on
unrelated pull requests since linkspector scans the entire repository.

Ignore the domain wholesale, matching how platform.openai.com is already
handled for the same reason. A 403 from bot blocking is indistinguishable
from a removed page, so the checker cannot produce a meaningful signal for
this domain either way.

* ag-ui: simplify raw_messages assignment and drop OrderedDict

- Replace list(cast(...)) with a typed annotation for raw_messages
  (_agent_run.py:866) per review suggestion
- Replace OrderedDict with a plain dict in InMemoryAGUIThreadSnapshotStore
  (_snapshots.py:136); regular dicts are insertion-order-safe since
  Python 3.7, so OrderedDict is unnecessary. Update _evict_oldest to use
  next(iter(...)) for FIFO removal instead of popitem(last=False).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #2458: review comment fixes

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-12 08:29:38 +00:00

38 lines
1.2 KiB
YAML

dirs:
- .
excludedFiles:
- ./python/CHANGELOG.md
ignorePatterns:
- pattern: "/github/"
- pattern: "./actions"
- pattern: "./blob"
- pattern: "./issues"
- pattern: "./discussions"
- pattern: "./pull"
- pattern: "https:\/\/platform.openai.com"
- pattern: "http:\/\/localhost"
- pattern: "http:\/\/127.0.0.1"
- pattern: "https:\/\/localhost"
- pattern: "https:\/\/127.0.0.1"
- pattern: "0001-spec.md"
- pattern: "0001-madr-architecture-decisions.md"
- pattern: "https://api.powerplatform.com/.default"
- pattern: "https://your-resource.openai.azure.com/"
- pattern: "http://host.docker.internal"
- pattern: "https://openai.github.io/openai-agents-js/openai/agents/classes/"
# dotnet.microsoft.com bot-blocks CI link checkers with intermittent 403s on any
# path (including localized variants like /en-us/download/...), so ignore the
# whole domain rather than just /download.
- pattern: "https:\/\/dotnet.microsoft.com"
- pattern: "https://github.com/Rel1cx/eslint-react"
# excludedDirs:
# Folders which include links to localhost, since it's not ignored with regular expressions
baseUrl: https://github.com/microsoft/agent-framework/
aliveStatusCodes:
- 200
- 206
- 429
- 500
- 503
useGitIgnore: true