Python: Fix incorrect workflow timings in DevUI by adding created_at to executor events (#5615)

* fix(devui): add created_at to custom output item events for correct workflow timings (#5545)

CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent lacked a
created_at field, causing the frontend to synthesize timestamps using integer-second
precision with a forced +1s minimum gap between events. This made instant workflows
appear to take 3+ seconds in the DevUI timeline.

Fix:
- Add optional created_at: float | None field to both custom event models
- Populate created_at=float(time.time()) in the mapper for executor_invoked,
  executor_completed, and executor_failed events

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(devui): use event created_at for accurate workflow timeline timings

workflow-view.tsx synthesized _uiTimestamp using Math.max(baseTimestamp,
lastTimestamp + 1) with integer-second precision, forcing a minimum 1-second
gap between every sequential event. This made instant workflows appear to take
several seconds in the DevUI timeline.

The fix prefers event.created_at (a float Unix timestamp populated by the
backend mapper for all executor events) and only falls back to the synthetic
timestamp when created_at is absent. This matches the pattern already used in
devuiStore.ts:addDebugEvent.

Added a regression test in test_mapper.py verifying that the mapper attaches
created_at to all executor lifecycle events (invoked, completed, failed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(devui): address review feedback for issue #5545

- Read data.timestamp (ISO string) and response.created_at in addition
  to top-level created_at when deriving _uiTimestamp, so
  response.workflow_event.completed events get a real server timestamp
  instead of a synthesized one
- Change uniqueTimestamp tiebreaker: when a real server timestamp is
  available use Math.max(eventTimestamp, lastTimestamp) rather than
  lastTimestamp + 1, eliminating artificial 1-second gaps while still
  preserving monotonic ordering
- Apply the same fix in the HIL streaming path (second setOpenAIEvents
  call in workflow-view.tsx)
- Add assert event.created_at > 0 to regression test to guard against
  zero or negative timestamps
- Add test_custom_output_item_event_models_have_created_at_field model-
  level test so removing the field produces a clear named failure rather
  than a downstream ValidationError

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(#5545): guard NaN timestamps, fix fallback ID uniqueness, add regression tests

- workflow-view.tsx (×2): Wrap data.timestamp ISO→number conversion in a
  Number.isFinite() guard.  Python's datetime.now().isoformat() emits
  microseconds without a trailing 'Z' (e.g. '2024-01-15T12:34:56.123456'),
  which some JS engines cannot parse, returning NaN.  NaN !== undefined is
  true so the eventTimestamp !== undefined guard did not catch it, poisoning
  _uiTimestamp and resetting the monotonic ordering seed (NaN || 0 → 0).

- execution-timeline.tsx: Replace uiTimestamp in the fallback syntheticItemId
  with the per-executor runNumber counter.  Two runs of the same executor
  within the same second previously received identical _uiTimestamp values
  and therefore identical syntheticItemIds, causing their output buckets,
  state, and run entries to collide (execution-timeline.tsx:360–408).

- Add missing test_workflow_timings_bug.py source file (only a stale .pyc
  existed).  Three regression tests:
    · test_custom_event_models_lack_created_at_field – model field guard
    · test_workflow_executor_events_lack_created_at – mapper populates created_at
    · test_rapid_workflow_events_have_no_top_level_timestamps – confirms
      data.timestamp format that requires the frontend NaN guard

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #5545: Python: [Bug]: Workflow timings in DevUI are incorrect

* devui: move timing regression tests into test_mapper.py, remove dedicated bug file

- Delete test_workflow_timings_bug.py; tests belong in existing module files
- The two tests already present in test_mapper.py (test_executor_events_carry_created_at_timestamp
  and test_custom_output_item_event_models_have_created_at_field) cover the same ground as the
  first two tests in the deleted file
- Add test_executor_completed_maps_to_output_item_done_event to test_mapper.py, replacing the
  third test from the deleted file with a generic, issue-agnostic name and docstring

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback for #5545: review comment fixes

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Evan Mattson
2026-05-05 14:59:08 +09:00
committed by GitHub
Unverified
parent 7476049d7e
commit 27f926609f
5 changed files with 147 additions and 9 deletions
@@ -1056,6 +1056,7 @@ class MessageMapper:
output_index=context["output_index"],
sequence_number=self._next_sequence(context),
item=executor_item,
created_at=float(time.time()),
)
]
@@ -1088,6 +1089,7 @@ class MessageMapper:
output_index=context.get("output_index", 0),
sequence_number=self._next_sequence(context),
item=executor_item,
created_at=float(time.time()),
)
]
@@ -1121,6 +1123,7 @@ class MessageMapper:
output_index=context.get("output_index", 0),
sequence_number=self._next_sequence(context),
item=executor_item,
created_at=float(time.time()),
)
]
@@ -64,6 +64,7 @@ class CustomResponseOutputItemAddedEvent(BaseModel):
output_index: int
sequence_number: int
item: dict[str, Any] | ExecutorActionItem | Any # Flexible item type
created_at: float | None = None # Unix timestamp; used by frontend for accurate workflow timings
class CustomResponseOutputItemDoneEvent(BaseModel):
@@ -77,6 +78,7 @@ class CustomResponseOutputItemDoneEvent(BaseModel):
output_index: int
sequence_number: int
item: dict[str, Any] | ExecutorActionItem | Any # Flexible item type
created_at: float | None = None # Unix timestamp; used by frontend for accurate workflow timings
class ResponseWorkflowEventComplete(BaseModel):
@@ -356,8 +356,10 @@ export function ExecutionTimeline({
const runNumber = (runCount.get(executorId) || 0) + 1;
runCount.set(executorId, runNumber);
// Create synthetic item ID for fallback format (no real item.id from backend)
const syntheticItemId = `fallback_${executorId}_${uiTimestamp}`;
// Create synthetic item ID using the run counter for guaranteed uniqueness.
// Using uiTimestamp here caused collisions when the same executor ran
// twice within the same second (both fallback entries would share an ID).
const syntheticItemId = `fallback_${executorId}_run${runNumber}`;
runs.push({
executorId,
@@ -576,17 +576,37 @@ export function WorkflowView({
openAIEvent.type === "response.workflow_event.complete" // Fallback variant
) {
setOpenAIEvents((prev) => {
// Generate unique timestamp for each event
// Derive a server-side timestamp from the event, in priority order:
// 1. top-level created_at (custom output-item events)
// 2. response.created_at (response.created / lifecycle events)
// 3. data.timestamp (response.workflow_event.completed ISO string)
// Fall back to a synthesized timestamp only when none is present.
const anyEvent = openAIEvent as Record<string, unknown>;
const eventTimestamp: number | undefined =
typeof anyEvent["created_at"] === "number" && anyEvent["created_at"]
? (anyEvent["created_at"] as number)
: typeof (anyEvent["response"] as Record<string, unknown> | undefined)?.["created_at"] === "number"
? ((anyEvent["response"] as Record<string, number>)["created_at"] as number)
: (() => {
const ts = (anyEvent["data"] as Record<string, unknown> | undefined)?.["timestamp"];
if (typeof ts !== "string") return undefined;
const ms = new Date(ts).getTime();
// Guard against NaN: Python isoformat() emits microseconds without Z,
// which some JS engines cannot parse. Number.isFinite rejects NaN.
return Number.isFinite(ms) ? ms / 1000 : undefined;
})();
const baseTimestamp = Math.floor(Date.now() / 1000);
const lastTimestamp =
prev.length > 0
? (prev[prev.length - 1] as { _uiTimestamp?: number })
._uiTimestamp || 0
: 0;
const uniqueTimestamp = Math.max(
baseTimestamp,
lastTimestamp + 1
);
// When we have a real server timestamp clamp to lastTimestamp (no +1s gap).
// When synthesizing, keep the +1 s gap so ordering is always monotonic.
const uniqueTimestamp =
eventTimestamp !== undefined
? Math.max(eventTimestamp, lastTimestamp)
: Math.max(baseTimestamp, lastTimestamp + 1);
return [
...prev,
@@ -992,14 +1012,37 @@ export function WorkflowView({
openAIEvent.type === "response.workflow_event.completed"
) {
setOpenAIEvents((prev) => {
// Generate unique timestamp for each event
// Derive a server-side timestamp from the event, in priority order:
// 1. top-level created_at (custom output-item events)
// 2. response.created_at (response.created / lifecycle events)
// 3. data.timestamp (response.workflow_event.completed ISO string)
// Fall back to a synthesized timestamp only when none is present.
const anyEvent = openAIEvent as Record<string, unknown>;
const eventTimestamp: number | undefined =
typeof anyEvent["created_at"] === "number" && anyEvent["created_at"]
? (anyEvent["created_at"] as number)
: typeof (anyEvent["response"] as Record<string, unknown> | undefined)?.["created_at"] === "number"
? ((anyEvent["response"] as Record<string, number>)["created_at"] as number)
: (() => {
const ts = (anyEvent["data"] as Record<string, unknown> | undefined)?.["timestamp"];
if (typeof ts !== "string") return undefined;
const ms = new Date(ts).getTime();
// Guard against NaN: Python isoformat() emits microseconds without Z,
// which some JS engines cannot parse. Number.isFinite rejects NaN.
return Number.isFinite(ms) ? ms / 1000 : undefined;
})();
const baseTimestamp = Math.floor(Date.now() / 1000);
const lastTimestamp =
prev.length > 0
? (prev[prev.length - 1] as { _uiTimestamp?: number })
._uiTimestamp || 0
: 0;
const uniqueTimestamp = Math.max(baseTimestamp, lastTimestamp + 1);
// When we have a real server timestamp clamp to lastTimestamp (no +1s gap).
// When synthesizing, keep the +1 s gap so ordering is always monotonic.
const uniqueTimestamp =
eventTimestamp !== undefined
? Math.max(eventTimestamp, lastTimestamp)
: Math.max(baseTimestamp, lastTimestamp + 1);
return [
...prev,
@@ -391,6 +391,94 @@ async def test_executor_failed_event(mapper: MessageMapper, test_request: AgentF
assert "Executor failed" in str(item["error"])
async def test_executor_events_carry_created_at_timestamp(
mapper: MessageMapper, test_request: AgentFrameworkRequest
) -> None:
"""REGRESSION TEST: Executor mapped events must include a created_at timestamp.
Without created_at, the frontend synthesizes timestamps using
Math.max(baseTimestamp, lastTimestamp + 1) with second precision, forcing
a minimum 1-second gap between sequential events regardless of their actual
elapsed time. This makes instant workflows appear to take multiple seconds
in the DevUI timeline.
"""
invoke_event = create_executor_invoked_event(executor_id="exec_ts")
complete_event = create_executor_completed_event(executor_id="exec_ts")
fail_event = create_executor_failed_event(executor_id="exec_ts_fail")
invoked_results = await mapper.convert_event(invoke_event, test_request)
completed_results = await mapper.convert_event(complete_event, test_request)
# Set up a separate context for the failed path
mapper2 = MessageMapper()
await mapper2.convert_event(create_executor_invoked_event(executor_id="exec_ts_fail"), test_request)
failed_results = await mapper2.convert_event(fail_event, test_request)
for label, results in [
("executor_invoked", invoked_results),
("executor_completed", completed_results),
("executor_failed", failed_results),
]:
assert results, f"mapper.convert_event should return events for {label}"
for event in results:
assert getattr(event, "created_at", None) is not None, (
f"{label} mapped event {type(event).__name__} is missing 'created_at'. "
"The frontend relies on this field for accurate workflow timeline timings."
)
assert event.created_at > 0, (
f"{label} mapped event {type(event).__name__} has a non-positive "
f"created_at value ({event.created_at!r}); expected a valid Unix timestamp."
)
def test_custom_output_item_event_models_have_created_at_field() -> None:
"""MODEL TEST: CustomResponseOutputItemAddedEvent and Done must declare created_at.
This guards against accidentally removing the field from the model definition.
A missing field causes a downstream ValidationError instead of a clear test failure.
"""
from agent_framework_devui.models._openai_custom import (
CustomResponseOutputItemAddedEvent,
CustomResponseOutputItemDoneEvent,
)
assert "created_at" in CustomResponseOutputItemAddedEvent.model_fields, (
"CustomResponseOutputItemAddedEvent is missing 'created_at' in model_fields. "
"The frontend uses this field for accurate workflow timeline timings."
)
assert "created_at" in CustomResponseOutputItemDoneEvent.model_fields, (
"CustomResponseOutputItemDoneEvent is missing 'created_at' in model_fields. "
"The frontend uses this field for accurate workflow timeline timings."
)
async def test_executor_completed_maps_to_output_item_done_event(
mapper: MessageMapper, test_request: AgentFrameworkRequest
) -> None:
"""Test executor_completed events are mapped to CustomResponseOutputItemDoneEvent.
Ensures executor_completed does not fall through to the legacy
ResponseWorkflowEventComplete path, which lacks a top-level created_at field.
"""
from agent_framework_devui.models._openai_custom import ResponseWorkflowEventComplete
invoke_event = create_executor_invoked_event(executor_id="exec_output_item")
await mapper.convert_event(invoke_event, test_request)
complete_event = create_executor_completed_event(executor_id="exec_output_item")
results = await mapper.convert_event(complete_event, test_request)
assert results, "mapper.convert_event should return events for executor_completed"
workflow_events = [r for r in results if isinstance(r, ResponseWorkflowEventComplete)]
assert not workflow_events, (
"executor_completed should map to CustomResponseOutputItemDoneEvent, not ResponseWorkflowEventComplete."
)
output_item_done = [r for r in results if r.type == "response.output_item.done"]
assert output_item_done, f"Expected at least one response.output_item.done event; got: {[r.type for r in results]}"
# =============================================================================
# Workflow Lifecycle Event Tests
# =============================================================================