Python: Fix incorrect workflow timings in DevUI by adding created_at to executor events (#5615)

* fix(devui): add created_at to custom output item events for correct workflow timings (#5545) CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent lacked a created_at field, causing the frontend to synthesize timestamps using integer-second precision with a forced +1s minimum gap between events. This made instant workflows appear to take 3+ seconds in the DevUI timeline. Fix: - Add optional created_at: float | None field to both custom event models - Populate created_at=float(time.time()) in the mapper for executor_invoked, executor_completed, and executor_failed events Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(devui): use event created_at for accurate workflow timeline timings workflow-view.tsx synthesized _uiTimestamp using Math.max(baseTimestamp, lastTimestamp + 1) with integer-second precision, forcing a minimum 1-second gap between every sequential event. This made instant workflows appear to take several seconds in the DevUI timeline. The fix prefers event.created_at (a float Unix timestamp populated by the backend mapper for all executor events) and only falls back to the synthetic timestamp when created_at is absent. This matches the pattern already used in devuiStore.ts:addDebugEvent. Added a regression test in test_mapper.py verifying that the mapper attaches created_at to all executor lifecycle events (invoked, completed, failed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(devui): address review feedback for issue #5545 - Read data.timestamp (ISO string) and response.created_at in addition to top-level created_at when deriving _uiTimestamp, so response.workflow_event.completed events get a real server timestamp instead of a synthesized one - Change uniqueTimestamp tiebreaker: when a real server timestamp is available use Math.max(eventTimestamp, lastTimestamp) rather than lastTimestamp + 1, eliminating artificial 1-second gaps while still preserving monotonic ordering - Apply the same fix in the HIL streaming path (second setOpenAIEvents call in workflow-view.tsx) - Add assert event.created_at > 0 to regression test to guard against zero or negative timestamps - Add test_custom_output_item_event_models_have_created_at_field model- level test so removing the field produces a clear named failure rather than a downstream ValidationError Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(#5545): guard NaN timestamps, fix fallback ID uniqueness, add regression tests - workflow-view.tsx (×2): Wrap data.timestamp ISO→number conversion in a Number.isFinite() guard. Python's datetime.now().isoformat() emits microseconds without a trailing 'Z' (e.g. '2024-01-15T12:34:56.123456'), which some JS engines cannot parse, returning NaN. NaN !== undefined is true so the eventTimestamp !== undefined guard did not catch it, poisoning _uiTimestamp and resetting the monotonic ordering seed (NaN || 0 → 0). - execution-timeline.tsx: Replace uiTimestamp in the fallback syntheticItemId with the per-executor runNumber counter. Two runs of the same executor within the same second previously received identical _uiTimestamp values and therefore identical syntheticItemIds, causing their output buckets, state, and run entries to collide (execution-timeline.tsx:360–408). - Add missing test_workflow_timings_bug.py source file (only a stale .pyc existed). Three regression tests: · test_custom_event_models_lack_created_at_field – model field guard · test_workflow_executor_events_lack_created_at – mapper populates created_at · test_rapid_workflow_events_have_no_top_level_timestamps – confirms data.timestamp format that requires the frontend NaN guard Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback for #5545: Python: [Bug]: Workflow timings in DevUI are incorrect * devui: move timing regression tests into test_mapper.py, remove dedicated bug file - Delete test_workflow_timings_bug.py; tests belong in existing module files - The two tests already present in test_mapper.py (test_executor_events_carry_created_at_timestamp and test_custom_output_item_event_models_have_created_at_field) cover the same ground as the first two tests in the deleted file - Add test_executor_completed_maps_to_output_item_done_event to test_mapper.py, replacing the third test from the deleted file with a generic, issue-agnostic name and docstring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback for #5545: review comment fixes --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-05-05 14:59:08 +09:00
parent 7476049d7e
commit 27f926609f
5 changed files with 147 additions and 9 deletions
@@ -1056,6 +1056,7 @@ class MessageMapper:
                        output_index=context["output_index"],
                        sequence_number=self._next_sequence(context),
                        item=executor_item,
+                        created_at=float(time.time()),
                    )
                ]

@@ -1088,6 +1089,7 @@ class MessageMapper:
                        output_index=context.get("output_index", 0),
                        sequence_number=self._next_sequence(context),
                        item=executor_item,
+                        created_at=float(time.time()),
                    )
                ]

@@ -1121,6 +1123,7 @@ class MessageMapper:
                        output_index=context.get("output_index", 0),
                        sequence_number=self._next_sequence(context),
                        item=executor_item,
+                        created_at=float(time.time()),
                    )
                ]

@@ -64,6 +64,7 @@ class CustomResponseOutputItemAddedEvent(BaseModel):
    output_index: int
    sequence_number: int
    item: dict[str, Any] | ExecutorActionItem | Any  # Flexible item type
+    created_at: float | None = None  # Unix timestamp; used by frontend for accurate workflow timings


 class CustomResponseOutputItemDoneEvent(BaseModel):
@@ -77,6 +78,7 @@ class CustomResponseOutputItemDoneEvent(BaseModel):
    output_index: int
    sequence_number: int
    item: dict[str, Any] | ExecutorActionItem | Any  # Flexible item type
+    created_at: float | None = None  # Unix timestamp; used by frontend for accurate workflow timings


 class ResponseWorkflowEventComplete(BaseModel):
@@ -356,8 +356,10 @@ export function ExecutionTimeline({
        const runNumber = (runCount.get(executorId) || 0) + 1;
        runCount.set(executorId, runNumber);

-        // Create synthetic item ID for fallback format (no real item.id from backend)
-        const syntheticItemId = `fallback_${executorId}_${uiTimestamp}`;
+        // Create synthetic item ID using the run counter for guaranteed uniqueness.
+        // Using uiTimestamp here caused collisions when the same executor ran
+        // twice within the same second (both fallback entries would share an ID).
+        const syntheticItemId = `fallback_${executorId}_run${runNumber}`;

        runs.push({
          executorId,
@@ -576,17 +576,37 @@ export function WorkflowView({
            openAIEvent.type === "response.workflow_event.complete" // Fallback variant
          ) {
            setOpenAIEvents((prev) => {
-              // Generate unique timestamp for each event
+              // Derive a server-side timestamp from the event, in priority order:
+              //   1. top-level created_at  (custom output-item events)
+              //   2. response.created_at   (response.created / lifecycle events)
+              //   3. data.timestamp        (response.workflow_event.completed ISO string)
+              // Fall back to a synthesized timestamp only when none is present.
+              const anyEvent = openAIEvent as Record<string, unknown>;
+              const eventTimestamp: number | undefined =
+                typeof anyEvent["created_at"] === "number" && anyEvent["created_at"]
+                  ? (anyEvent["created_at"] as number)
+                  : typeof (anyEvent["response"] as Record<string, unknown> | undefined)?.["created_at"] === "number"
+                  ? ((anyEvent["response"] as Record<string, number>)["created_at"] as number)
+                  : (() => {
+                      const ts = (anyEvent["data"] as Record<string, unknown> | undefined)?.["timestamp"];
+                      if (typeof ts !== "string") return undefined;
+                      const ms = new Date(ts).getTime();
+                      // Guard against NaN: Python isoformat() emits microseconds without Z,
+                      // which some JS engines cannot parse. Number.isFinite rejects NaN.
+                      return Number.isFinite(ms) ? ms / 1000 : undefined;
+                    })();
              const baseTimestamp = Math.floor(Date.now() / 1000);
              const lastTimestamp =
                prev.length > 0
                  ? (prev[prev.length - 1] as { _uiTimestamp?: number })
                      ._uiTimestamp || 0
                  : 0;
-              const uniqueTimestamp = Math.max(
-                baseTimestamp,
-                lastTimestamp + 1
-              );
+              // When we have a real server timestamp clamp to lastTimestamp (no +1s gap).
+              // When synthesizing, keep the +1 s gap so ordering is always monotonic.
+              const uniqueTimestamp =
+                eventTimestamp !== undefined
+                  ? Math.max(eventTimestamp, lastTimestamp)
+                  : Math.max(baseTimestamp, lastTimestamp + 1);

              return [
                ...prev,
@@ -992,14 +1012,37 @@ export function WorkflowView({
          openAIEvent.type === "response.workflow_event.completed"
        ) {
          setOpenAIEvents((prev) => {
-            // Generate unique timestamp for each event
+            // Derive a server-side timestamp from the event, in priority order:
+            //   1. top-level created_at  (custom output-item events)
+            //   2. response.created_at   (response.created / lifecycle events)
+            //   3. data.timestamp        (response.workflow_event.completed ISO string)
+            // Fall back to a synthesized timestamp only when none is present.
+            const anyEvent = openAIEvent as Record<string, unknown>;
+            const eventTimestamp: number | undefined =
+              typeof anyEvent["created_at"] === "number" && anyEvent["created_at"]
+                ? (anyEvent["created_at"] as number)
+                : typeof (anyEvent["response"] as Record<string, unknown> | undefined)?.["created_at"] === "number"
+                ? ((anyEvent["response"] as Record<string, number>)["created_at"] as number)
+                : (() => {
+                    const ts = (anyEvent["data"] as Record<string, unknown> | undefined)?.["timestamp"];
+                    if (typeof ts !== "string") return undefined;
+                    const ms = new Date(ts).getTime();
+                    // Guard against NaN: Python isoformat() emits microseconds without Z,
+                    // which some JS engines cannot parse. Number.isFinite rejects NaN.
+                    return Number.isFinite(ms) ? ms / 1000 : undefined;
+                  })();
            const baseTimestamp = Math.floor(Date.now() / 1000);
            const lastTimestamp =
              prev.length > 0
                ? (prev[prev.length - 1] as { _uiTimestamp?: number })
                    ._uiTimestamp || 0
                : 0;
-            const uniqueTimestamp = Math.max(baseTimestamp, lastTimestamp + 1);
+            // When we have a real server timestamp clamp to lastTimestamp (no +1s gap).
+            // When synthesizing, keep the +1 s gap so ordering is always monotonic.
+            const uniqueTimestamp =
+              eventTimestamp !== undefined
+                ? Math.max(eventTimestamp, lastTimestamp)
+                : Math.max(baseTimestamp, lastTimestamp + 1);

            return [
              ...prev,
@@ -391,6 +391,94 @@ async def test_executor_failed_event(mapper: MessageMapper, test_request: AgentF
    assert "Executor failed" in str(item["error"])


+async def test_executor_events_carry_created_at_timestamp(
+    mapper: MessageMapper, test_request: AgentFrameworkRequest
+) -> None:
+    """REGRESSION TEST: Executor mapped events must include a created_at timestamp.
+
+    Without created_at, the frontend synthesizes timestamps using
+    Math.max(baseTimestamp, lastTimestamp + 1) with second precision, forcing
+    a minimum 1-second gap between sequential events regardless of their actual
+    elapsed time.  This makes instant workflows appear to take multiple seconds
+    in the DevUI timeline.
+    """
+    invoke_event = create_executor_invoked_event(executor_id="exec_ts")
+    complete_event = create_executor_completed_event(executor_id="exec_ts")
+    fail_event = create_executor_failed_event(executor_id="exec_ts_fail")
+
+    invoked_results = await mapper.convert_event(invoke_event, test_request)
+    completed_results = await mapper.convert_event(complete_event, test_request)
+
+    # Set up a separate context for the failed path
+    mapper2 = MessageMapper()
+    await mapper2.convert_event(create_executor_invoked_event(executor_id="exec_ts_fail"), test_request)
+    failed_results = await mapper2.convert_event(fail_event, test_request)
+
+    for label, results in [
+        ("executor_invoked", invoked_results),
+        ("executor_completed", completed_results),
+        ("executor_failed", failed_results),
+    ]:
+        assert results, f"mapper.convert_event should return events for {label}"
+        for event in results:
+            assert getattr(event, "created_at", None) is not None, (
+                f"{label} mapped event {type(event).__name__} is missing 'created_at'. "
+                "The frontend relies on this field for accurate workflow timeline timings."
+            )
+            assert event.created_at > 0, (
+                f"{label} mapped event {type(event).__name__} has a non-positive "
+                f"created_at value ({event.created_at!r}); expected a valid Unix timestamp."
+            )
+
+
+def test_custom_output_item_event_models_have_created_at_field() -> None:
+    """MODEL TEST: CustomResponseOutputItemAddedEvent and Done must declare created_at.
+
+    This guards against accidentally removing the field from the model definition.
+    A missing field causes a downstream ValidationError instead of a clear test failure.
+    """
+    from agent_framework_devui.models._openai_custom import (
+        CustomResponseOutputItemAddedEvent,
+        CustomResponseOutputItemDoneEvent,
+    )
+
+    assert "created_at" in CustomResponseOutputItemAddedEvent.model_fields, (
+        "CustomResponseOutputItemAddedEvent is missing 'created_at' in model_fields. "
+        "The frontend uses this field for accurate workflow timeline timings."
+    )
+    assert "created_at" in CustomResponseOutputItemDoneEvent.model_fields, (
+        "CustomResponseOutputItemDoneEvent is missing 'created_at' in model_fields. "
+        "The frontend uses this field for accurate workflow timeline timings."
+    )
+
+
+async def test_executor_completed_maps_to_output_item_done_event(
+    mapper: MessageMapper, test_request: AgentFrameworkRequest
+) -> None:
+    """Test executor_completed events are mapped to CustomResponseOutputItemDoneEvent.
+
+    Ensures executor_completed does not fall through to the legacy
+    ResponseWorkflowEventComplete path, which lacks a top-level created_at field.
+    """
+    from agent_framework_devui.models._openai_custom import ResponseWorkflowEventComplete
+
+    invoke_event = create_executor_invoked_event(executor_id="exec_output_item")
+    await mapper.convert_event(invoke_event, test_request)
+
+    complete_event = create_executor_completed_event(executor_id="exec_output_item")
+    results = await mapper.convert_event(complete_event, test_request)
+
+    assert results, "mapper.convert_event should return events for executor_completed"
+
+    workflow_events = [r for r in results if isinstance(r, ResponseWorkflowEventComplete)]
+    assert not workflow_events, (
+        "executor_completed should map to CustomResponseOutputItemDoneEvent, not ResponseWorkflowEventComplete."
+    )
+
+    output_item_done = [r for r in results if r.type == "response.output_item.done"]
+    assert output_item_done, f"Expected at least one response.output_item.done event; got: {[r.type for r in results]}"
+
+
 # =============================================================================
 # Workflow Lifecycle Event Tests
 # =============================================================================