Python: Feature/hosted dwf (#5531)

* Fix declarative Workflow.as_agent() by accepting list[Message] in start executor The declarative start executor (JoinExecutor) only advertised dict and str in its input_types, so WorkflowAgent.__init__ rejected it with 'Workflow's start executor cannot handle list[Message]'. Add list[Message] to the JoinExecutor handler annotation and add a matching branch in DeclarativeActionExecutor._ensure_state_initialized that extracts the last user-message text and falls through to the string-input initialization path, so =System.LastMessageText works end-to-end via as_agent(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Populate Conversation.messages from list[Message] trigger When Workflow.as_agent() is invoked with a list[Message], the start executor now populates Conversation.messages / Conversation.history / System.conversations.{id}.messages with prior turns only (excluding the latest user message), and surfaces the latest user message via Inputs.input and System.LastMessage*. This matches InvokeAzureAgent's contract that the messages binding holds prior turns and the executor itself appends the new user input before invoking, avoiding double-append of the trailing user turn while preserving full history (incl. assistant/system/tool roles and multi-modal content) for downstream actions. * Coerce Enum values when serializing PowerFx symbols MessageRole and other str-subclass Enums passed isinstance(v, str) and were forwarded to pythonnet unchanged. pythonnet then raised 'MessageRole value cannot be converted to System.String' for every PowerFx primitive when ConditionGroup/Expr eval walked the symbol table containing Conversation.messages. Reduce Enum members to their underlying value before the primitive check so eval sees plain strings/ints. * Foundry hosting: pass full conversation history to workflow agents _handle_inner_workflow only forwarded the latest user turn to WorkflowAgent.run, even though _handle_inner_agent already prepends history fetched from Foundry storage to the messages it sends a regular agent. Declarative workflows reset Conversation.messages on every run (state.initialize), so checkpoint replay alone does not give them prior turns - the host has to pass them in, the same way it does for non-workflow agents. Mirror that contract: fetch context.get_history() and pass [*history, *input_messages] to the workflow agent. * feat(workflows): support combined message + checkpoint_id for multi-turn continuation Allow Workflow.run(message=..., checkpoint_id=...) so callers can restore prior workflow state from a checkpoint AND deliver a new message to the start executor in a single call. The existing reset_context logic already preserves shared state when checkpoint_id is set, so this gives us 'fresh start executor invocation with prior state intact' - exactly what hosted multi-turn declarative workflows need. - _workflow.py: drop the message+checkpoint_id mutual exclusion and update _execute_with_message_or_checkpoint to do both (restore then execute) when both are provided. - _agent.py: in _run_core's checkpoint branch, also forward input_messages so WorkflowAgent.run(messages, checkpoint_id=...) works end-to-end. Falls back to the legacy 'restore only' behavior when messages are absent. - _declarative_base.py: detect continuation in _ensure_state_initialized by checking whether DECLARATIVE_STATE_KEY already exists in shared state; if so, refresh inputs/LastMessage* and append non-user trigger messages instead of calling state.initialize() (which would wipe Conversation/Local/System). - foundry_hosting/_responses.py: collapse the host's two-call pattern (restore-only, then fresh run) into a single combined call now that the underlying APIs support it. - tests: drop the assertion that combined message+checkpoint_id raises. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Pivot: preserve workflow state across run() calls Replace the prior 'combined message + checkpoint_id in one run()' approach with a cleaner default: Workflow.run no longer wipes shared state or runner- context messages between calls. Iteration counting and per-run kwargs still reset on a fresh-message run; checkpoint and responses runs are continuations that preserve everything. This lets a WorkflowAgent be invoked repeatedly on the same instance and maintain multi-turn context (e.g. accumulated Conversation.messages) without asking developers to opt in. Hosted-agent multi-turn pattern becomes two explicit calls: restore-from-checkpoint (drive to idle), then run-with-message. Key changes: - _workflow.py: drop _state.clear() and reset_for_new_run() from run(). Reset iteration count and run kwargs on fresh-message runs only. Restore 'Cannot provide both message and checkpoint_id' validation. Add async guard: fresh-message run with un-drained pending executor messages from a prior run is invalid. - _runner.py: clear _state before import_state in restore_from_checkpoint so restore is authoritative (import_state merges, not replaces). - _agent.py: revert checkpoint branch to restore-only (no message forward). - _responses.py (foundry_hosting): two-call host pattern - restore checkpoint silently, then run with new user input. - tests: state-preservation is the new default; rebuild Workflow for clean slate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix CI lint and mypy issues from prior pivot commit - _workflow.py: collapse nested if (SIM102), drop redundant assignment (RET504) - _declarative_base.py: remove unused last_user_msg = tail assignment whose Message | None type clashed with the prior Message-typed branch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review: fix Inputs.input update and checkpoint storage path - _declarative_base.py: continuation branch was writing 'Inputs.input' via state.set, which routes to the Custom namespace and never updates the PowerFx-visible Workflow.Inputs.input. Update state_data['Inputs'] in place via get_state_data / set_state_data so =Workflow.Inputs.input and =inputs.input see the new turn's user text on continuation. - _declarative_base.py: refresh docstring to clarify that on a list[Message] trigger, Conversation.messages excludes the current user message at the start of the turn (agent executors append it before invoking the inner agent). - _responses.py: when previous_response_id is supplied (no conversation_id), the prior checkpoint lives under <storage>/<previous_response_id> but new checkpoints must land under <storage>/<current_response_id> for the next turn to find them. Hold onto restore_storage from the get_latest lookup and pass it to the restore-only run; pass write_storage (current id) to the message-delivery run and to checkpoint cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix pyright errors in _declarative_base.py for CI - Replace state._state.get(...) protected access with new public is_initialized() method on DeclarativeWorkflowState (also clearer intent for the continuation detection use case). - Add narrow pyright ignores for the Any-typed trigger paths that pyright cannot fully narrow (the list[Message] isinstance loop and the fallback-DefaultTransform branch). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Copilot review batch: tests + Workflow.reset escape hatch * Add Workflow.reset() public method as recovery escape hatch when an in-flight run aborted (e.g. WorkflowConvergenceException) and the workflow is not checkpointed. Update the in-flight messages guard's error message to point callers at it. * Add test_workflow_run_inflight_messages_guard exercising both the guard (sync + streaming) and the reset() recovery path. * Add test_workflow_reset_rejects_concurrent_runs to lock down the in-progress guard on reset. * Add test_as_agent_continuation_preserves_prior_state covering the is_continuation branch in _ensure_state_initialized: stamps a marker between calls and asserts it survives, while Inputs.input and System.LastMessageText refresh to the new turn. * Add test_powerfx_safe.py regression tests for the Enum branch in _make_powerfx_safe (str-subclass, int-subclass, plain Enum, and Enums nested in dict/list). * Drop redundant @pytest.mark.asyncio on test_as_agent_round_trip_with_last_message_text (asyncio_mode='auto'). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Skip restore-only pre-pass when checkpoint has pending request_info Address Copilot review on _responses.py: the restore-only checkpoint replay populates self._agent.pending_requests for any request_info events captured in the checkpoint. The follow-up run(input_messages) call would then route through WorkflowAgent._process_pending_requests, which expects function-response content and rejects plain text input as 'unexpected content while awaiting request info responses'. Workflows resumed from a checkpoint that was idle-with-pending-requests would therefore fail every subsequent plain-text user turn. Inspect the loaded checkpoint and skip the pre-pass when its pending_request_info_events dict is non-empty. Workflows that don't use request_info (the current sample set) are unaffected; workflows that do will fall through to a fresh-message run rather than silently corrupting the routing state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Loosen azure-ai-agentserver-* pins to major version The exact-version pins on azure-ai-agentserver-{core,responses,invocations} forced foundry-hosting consumers to upgrade in lockstep with every beta bump from upstream. Switch to '>=current,<next-major' so we pick up patch and feature updates within the same major series without a coordinated release. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Drop Workflow.reset(); checkpointing is the recovery path The in-flight-messages guard prevented silent misbehavior, but the companion Workflow.reset() escape hatch only cleared _messages while leaving iteration count, executor-local state, and shared State mutations in an indeterminate condition after a mid-run failure. That gave a false sense of recovery. Recovery from a mid-run failure is supported only via checkpoint restoration. Keep the guard and reframe its error message accordingly; remove reset() and its tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Tao's review on PR 5531 - Rename Workflow._run_workflow_with_tracing parameter is_fresh_message_run -> is_continuation (default False, inverted). Fresh-message turns reset per-run accounting; continuations (checkpoint restores, responses replays) preserve it. - Simplify the in-flight-messages guard: _validate_run_params already enforces that 'message' is mutually exclusive with 'checkpoint_id' and 'responses', so the additional checks were dead code. - foundry_hosting _responses: move the restore-only pre-pass above emit_created/emit_in_progress; restore is preparation, not run progress. Drop the skip-restore gate (state preservation requires unconditional restore) and instead clear agent.pending_requests after the restore-only call. Collapse over-conditioned check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Don't clear pending_requests after restore-only pre-pass Pending requests in the restored checkpoint represent genuinely outstanding HITL requests. The next user input may carry function responses (Responses API `function_call_output` items become FunctionResultContent / FunctionApprovalResponseContent), which `WorkflowAgent._process_pending_requests` correctly extracts and matches against the populated `pending_requests`. Clearing them after restore would silently drop that state and force the next turn to be treated as a fresh input even when the caller is responding to the outstanding requests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: alliscode <bentho@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-04-28 17:51:49 -07:00
parent 866a325b48
commit 8b71f9459a
11 changed files with 512 additions and 85 deletions
@@ -32,10 +32,12 @@ import uuid
 from collections.abc import Mapping
 from dataclasses import dataclass
 from decimal import Decimal as _Decimal
+from enum import Enum
 from typing import Any, Literal, cast

 from agent_framework import (
    Executor,
+    Message,
    WorkflowContext,
 )
 from agent_framework._workflows._state import State
@@ -120,7 +122,20 @@ def _make_powerfx_safe(value: Any) -> Any:
    Returns:
        A PowerFx-safe representation of the value
    """
-    if value is None or isinstance(value, _POWERFX_SAFE_TYPES):
+    if value is None:
+        return value
+
+    # Enum coercion must run BEFORE the primitive type check: many MAF
+    # enums (e.g. MessageRole) are ``str``-subclass enums, so they pass
+    # ``isinstance(v, str)`` but pythonnet refuses to convert them to
+    # ``System.String`` and raises ``'MessageRole' value cannot be
+    # converted to System.<X>'`` for every PowerFx primitive type. Reduce
+    # to the underlying value (or its string form) so PowerFx sees a
+    # plain ``str``/``int``.
+    if isinstance(value, Enum):
+        return _make_powerfx_safe(value.value)
+
+    if isinstance(value, _POWERFX_SAFE_TYPES):
        return value

    if isinstance(value, dict):
@@ -197,6 +212,16 @@ class DeclarativeWorkflowState:
            result = self._state.get(DECLARATIVE_STATE_KEY)
        return cast(DeclarativeStateData, result)

+    def is_initialized(self) -> bool:
+        """Return True when declarative state has been initialized.
+
+        Useful for distinguishing a fresh start from a continuation: when
+        Workflow state preserves data across run() calls (multi-turn
+        scenarios), the start executor needs to avoid calling initialize()
+        and clobbering the prior turn's Conversation/Local/System data.
+        """
+        return self._state.get(DECLARATIVE_STATE_KEY) is not None
+
    def set_state_data(self, data: DeclarativeStateData) -> None:
        """Set the full state data dict in state."""
        self._state.set(DECLARATIVE_STATE_KEY, data)
@@ -873,6 +898,20 @@ class DeclarativeActionExecutor(Executor):
        Follows .NET's DefaultTransform pattern - accepts any input type:
        - dict/Mapping: Used directly as workflow.inputs
        - str: Converted to {"input": value}
+        - list[Message]: Treated as the agent-facing message contract
+          (e.g. from WorkflowAgent / as_agent()). The prior conversation
+          history is stored in ``Conversation.messages``/
+          ``Conversation.history`` and mirrored to
+          ``System.conversations.{id}.messages`` so workflows that
+          reference ``=Conversation.messages`` (e.g. InvokeAzureAgent) see
+          assistant turns and other earlier messages, including non-text
+          content. At the start of a turn this history excludes the current
+          user message; that message's text is instead used as the string
+          input (``Inputs.input``) and surfaced via ``System.LastMessage*``
+          for backward compatibility with simple text-only workflows. Agent
+          executors are responsible for appending the current user message
+          to ``Conversation.messages`` immediately before invoking the
+          inner agent.
        - DeclarativeMessage: Internal message, no initialization needed
        - Any other type: Converted via str() to {"input": str(value)}

@@ -888,6 +927,104 @@ class DeclarativeActionExecutor(Executor):
        if isinstance(trigger, dict):
            # Structured inputs - use directly
            state.initialize(trigger)  # type: ignore
+        elif isinstance(trigger, list) and all(isinstance(m, Message) for m in trigger):  # pyright: ignore[reportUnknownVariableType]
+            # list[Message] (e.g. from WorkflowAgent / as_agent()).
+            messages_list = cast(list[Message], trigger)
+
+            # Detect continuation: if the workflow's shared state already
+            # carries declarative data from a prior turn (because the host
+            # restored a checkpoint and dispatched this run with
+            # reset_context=False), we MUST NOT call state.initialize() -
+            # that would wipe Conversation.messages, Local.*, System.* etc.
+            # Instead, treat the trigger as the new turn's user input only:
+            # update Inputs.input, append the new user message to existing
+            # Conversation history, and refresh System.LastMessage*.
+            #
+            # Continuation = declarative state already exists in the workflow's
+            # shared state (either left over in-memory from a prior turn on
+            # the same instance, or restored from a checkpoint just before
+            # this run). In that case state.initialize() would wipe Local.*,
+            # System.*, Conversation.* etc., destroying the cross-turn
+            # context we're trying to preserve.
+            is_continuation = state.is_initialized()
+
+            # Locate the trailing user message in the trigger.
+            last_user_index = -1
+            for idx in range(len(messages_list) - 1, -1, -1):
+                if str(messages_list[idx].role).lower() == "user":
+                    last_user_index = idx
+                    break
+
+            if last_user_index >= 0:
+                last_user_msg = messages_list[last_user_index]
+                last_user_text = last_user_msg.text or ""
+                last_user_id = getattr(last_user_msg, "message_id", "") or ""
+                history_messages = (
+                    messages_list[:last_user_index] + messages_list[last_user_index + 1:]
+                )
+            else:
+                history_messages = list(messages_list)
+                tail = messages_list[-1] if messages_list else None
+                last_user_text = (tail.text or "") if tail is not None else ""
+                last_user_id = (
+                    getattr(tail, "message_id", "") or "" if tail is not None else ""
+                )
+
+            if is_continuation:
+                # Continuation turn: keep prior Conversation.messages intact.
+                # Refresh inputs and surface the new user message via the
+                # System.LastMessage* fields. We deliberately do NOT append
+                # the new user message to Conversation.messages here: agent
+                # executors append the live user input themselves before
+                # invoking the inner agent (matching the first-turn
+                # contract where Conversation.messages holds prior turns
+                # only).
+                #
+                # Note: ``state.set("Inputs.input", ...)`` would route to
+                # the Custom namespace (Inputs is not a recognized top-level
+                # writable namespace - see DeclarativeWorkflowState.set).
+                # PowerFx expressions like ``=Workflow.Inputs.input`` /
+                # ``=inputs.input`` read state_data["Inputs"] directly, so
+                # we update that dict in place via get_state_data /
+                # set_state_data.
+                state_data = state.get_state_data()
+                inputs_dict = state_data.get("Inputs")
+                if not isinstance(inputs_dict, dict):
+                    inputs_dict = {}
+                    state_data["Inputs"] = inputs_dict
+                inputs_dict["input"] = last_user_text
+                state.set_state_data(state_data)
+                # Trailing non-user messages (e.g. tool results) sandwiched
+                # before the new user message in the trigger are still
+                # appended so later actions see them.
+                for msg in history_messages:
+                    state.append("Conversation.messages", msg)
+                    state.append("Conversation.history", msg)
+                conversation_id = state.get("System.ConversationId")
+                if conversation_id:
+                    conv_path = f"System.conversations.{conversation_id}.messages"
+                    for msg in history_messages:
+                        state.append(conv_path, msg)
+                state.set("System.LastMessage", {"Text": last_user_text, "Id": last_user_id})
+                state.set("System.LastMessageText", last_user_text)
+                state.set("System.LastMessageId", last_user_id)
+            else:
+                # First turn: full initialization.
+                state.initialize({"input": last_user_text})
+
+                for msg in history_messages:
+                    state.append("Conversation.messages", msg)
+                    state.append("Conversation.history", msg)
+
+                conversation_id = state.get("System.ConversationId")
+                if conversation_id:
+                    conv_path = f"System.conversations.{conversation_id}.messages"
+                    for msg in history_messages:
+                        state.append(conv_path, msg)
+
+                state.set("System.LastMessage", {"Text": last_user_text, "Id": last_user_id})
+                state.set("System.LastMessageText", last_user_text)
+                state.set("System.LastMessageId", last_user_id)
        elif isinstance(trigger, str):
            # String input - wrap in dict and populate System.LastMessage.Text
            # so YAML expressions like =System.LastMessage.Text see the user input
@@ -895,10 +1032,11 @@ class DeclarativeActionExecutor(Executor):
            state.set("System.LastMessage", {"Text": trigger, "Id": ""})
            state.set("System.LastMessageText", trigger)
        elif not isinstance(
-            trigger, (ActionTrigger, ActionComplete, ConditionResult, LoopIterationResult, LoopControl)
+            trigger,
+            (ActionTrigger, ActionComplete, ConditionResult, LoopIterationResult, LoopControl),  # pyright: ignore[reportUnknownArgumentType]
        ):
            # Any other type - convert to string like .NET's DefaultTransform
-            input_str = str(trigger)
+            input_str = str(cast(Any, trigger))
            state.initialize({"input": input_str})
            state.set("System.LastMessage", {"Text": input_str, "Id": ""})
            state.set("System.LastMessageText", input_str)
@@ -17,6 +17,7 @@ The key insight is that control flow becomes GRAPH STRUCTURE, not executor logic
 from typing import Any, cast

 from agent_framework import (
+    Message,
    WorkflowContext,
    handler,
 )
@@ -492,7 +493,13 @@ class JoinExecutor(DeclarativeActionExecutor):
    @handler
    async def handle_action(
        self,
-        trigger: dict[str, Any] | str | ActionTrigger | ActionComplete | ConditionResult | LoopIterationResult,
+        trigger: dict[str, Any]
+        | str
+        | list[Message]
+        | ActionTrigger
+        | ActionComplete
+        | ConditionResult
+        | LoopIterationResult,
        ctx: WorkflowContext[ActionComplete],
    ) -> None:
        """Simply pass through to continue the workflow."""