codex

[codex] Use input items for Responses Lite tools (#27946 )

When using Responses Lite, we should all use `additional_tools` and a
developer item instead of the top level tools array & instructions
field. This keeps things 1-to-1.

Forced namespacing for _all_ tools will land in a following PR after
some coordination & fixes in Responses API (around collisions & return
items).

The goal is to eventually expand the scope of this to _all_ requests
from codex, but that will require larger coordination across providers &
slower rollout.

rka-oai · 2026-06-22 23:56:16 -07:00

33cc928d33

core: rename metadata -> internal_chat_message_metadata_passthrough (#28968 )

## Description
This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
here: https://github.com/openai/codex/pull/28355) to
`ResponseItem.internal_chat_message_metadata_passthrough`, which is the
blessed path and has strongly-typed keys.

For now we have to drop this MAv2 usage of `metadata`:
https://github.com/openai/codex/pull/28561 until we figure out where
that should live.

Owen Lin · 2026-06-22 11:11:25 -07:00

5b95745eae

Propagate safety buffering events to app-server clients (#29371 )

Responses API safety buffering metadata currently stops at the transport
boundary, so app-server clients cannot render the in-progress safety
review state.

This change:
- decodes and deduplicates `safety_buffering` metadata from Responses
API SSE and WebSocket events without suppressing the original response
event
- emits a typed core event containing the requested model plus backend
use cases and reasons
- forwards that event as `turn/safetyBuffering/updated` through
app-server v2 and updates generated protocol schemas
- keeps the side-channel event out of persisted rollouts and turn timing

This supports the Codex Apps buffering UX and depends on the Responses
API backend work in https://github.com/openai/openai/pull/1044569 and
https://github.com/openai/openai/pull/1044571.

Validation:
- focused `codex-core` safety-buffering integration test passes
- `cargo check -p codex-core -p codex-app-server -p
codex-app-server-protocol`
- `just fix -p codex-api -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-rollout -p
codex-rollout-trace -p codex-otel`
- `just fmt`
- broad package test run: 4,430/4,492 passed; 62 unrelated
local-environment/concurrency failures involved unavailable test
binaries, MCP subprocess setup, and app-server timeouts

Francis Chalissery · 2026-06-22 03:39:14 +00:00

566f7bf631

[codex] Assign response item IDs when recording history (#28814 )

## Why

Client-created response items enter history without IDs, so their
identity is lost across rollout persistence and resume. IDs should be
assigned once at the history-recording boundary, while IDs returned by
the server must remain unchanged.

The Responses API validates item IDs using type-specific prefixes.
Locally generated IDs therefore use the matching prefix plus a
hyphenated UUIDv7, keeping them valid while distinguishable from
server-generated IDs. Because this changes persisted history and
provider request shapes, the behavior is opt-in behind the
under-development `item_ids` feature. Compaction triggers remain request
controls whose API shape does not accept an ID.

## What changed

- Register the disabled-by-default `item_ids` feature and expose it in
`config.schema.json`.
- Make supported optional `ResponseItem` IDs serializable and expose
them in the generated app-server schemas.
- When `item_ids` is enabled, assign an ID during conversation-history
preparation if an item has no ID.
- Generate type-prefixed, hyphenated UUIDv7 IDs using the Responses API
item conventions.
- Preserve existing server IDs without rewriting them.
- Persist assigned IDs in rollouts and include them in subsequent
Responses requests.
- Remove the unsupported ID field from `CompactionTrigger` and document
why it has no ID.
- Add integration coverage for enabled ID persistence, preservation of
server IDs, and omission of generated IDs while the feature is disabled.

`prepare_conversation_items_for_history` is the single response-item ID
allocation boundary.

## Test plan

- `just test -p codex-features`
- `just test -p codex-core
response_item_ids_persist_across_resume_and_preserve_server_ids`
- `just test -p codex-core
non_openai_responses_requests_omit_item_turn_metadata`
- `just test -p codex-core
resize_all_images_prepares_failures_before_history_insertion`
- `just test -p codex-protocol`
- `just test -p codex-app-server-protocol`
- `just test -p codex-api azure_default_store_attaches_ids_and_headers`

pakrym-oai · 2026-06-18 17:30:55 -07:00

f00f93d8c0

unified-exec: retain PathUri in command events (#28780 )

## Why

App-server must report command events containing foreign-platform paths
without changing existing client or rollout path-string formats.

## What changed

- retain `PathUri` through exec command begin/end events
- convert cwd values to `LegacyAppPathString` at the app-server
compatibility boundary
- drop command actions with foreign paths and log them
- serialize rollout-trace cwd values using their inferred native path
representation
- restore Wine coverage for retained Windows cwd values and successful
completion

Adam Perry @ OpenAI · 2026-06-18 05:00:04 +00:00

3931bc2bde

[codex] Add optional IDs to response items (#28812 )

## Why

`ResponseItem` variants do not have a consistent internal ID shape: some
variants carry required IDs, some carry optional IDs, and some cannot
represent an ID at all. The existing fields also use inconsistent serde,
TypeScript, and JSON-schema annotations. A single enum-level access path
is needed before history recording can assign and retain IDs.

This PR establishes that internal model only. It intentionally does not
generate or serialize IDs; allocation and wire persistence are isolated
in the stacked follow-up.

## What changed

- Give every concrete `ResponseItem` variant an `Option<String>` ID
field.
- Apply the same internal-only annotations to every ID field:
`#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and
`#[schemars(skip)]`.
- Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared
accessors.
- Preserve IDs when history items are rewritten for truncation.
- Adapt consumers that previously assumed reasoning and image-generation
IDs were required.
- Regenerate app-server schemas so the hidden fields are represented
consistently.

The serde catch-all `ResponseItem::Other` remains ID-less because it
must remain a unit variant.

## Test plan

- `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace
-p codex-image-generation-extension`
- `just test -p codex-protocol`
- `just test -p codex-app-server-protocol`
- `just test -p codex-api -p codex-rollout-trace -p
codex-image-generation-extension`
- `just test -p codex-core event_mapping`

pakrym-oai · 2026-06-17 18:27:43 -07:00

dbd2857f4b

feat: render typed envelopes for multi-agent v2 messages (#28368 )

## Why

Multi-agent v2 messages need a consistent, model-visible envelope that
identifies what kind of interaction occurred, who sent it, and which
agent it targets. Previously, encrypted deliveries exposed only
`encrypted_content`, while child completion used the legacy
`<subagent_notification>` shape. That meant the client could not
consistently present `NEW_TASK`, `MESSAGE`, and `FINAL_ANSWER` using the
same format.

This change adds the routing envelope as plaintext while keeping task
and message payloads encrypted. No new Responses API field is required:
an encrypted delivery is represented as an `input_text` header
immediately followed by its existing `encrypted_content` item.

Every envelope now follows this shape:

```text
Message Type: <NEW_TASK | MESSAGE | FINAL_ANSWER>
Task name: <recipient agent path>
Sender: <author agent path>
Payload:
<message payload>
```

## Message types

### `NEW_TASK`

`NEW_TASK` is used when the recipient should begin a new turn, including
an initial `spawn_agent` task and a later `followup_task`.

For a root agent spawning `/root/worker`, the request contains a
plaintext envelope followed by the encrypted task:

```json
{
  "type": "agent_message",
  "author": "/root",
  "recipient": "/root/worker",
  "content": [
    {
      "type": "input_text",
      "text": "Message Type: NEW_TASK\nTask name: /root/worker\nSender: /root\nPayload:\n"
    },
    {
      "type": "encrypted_content",
      "encrypted_content": "<encrypted task payload>"
    }
  ]
}
```

Conceptually, the model receives:

```text
Message Type: NEW_TASK
Task name: /root/worker
Sender: /root
Payload:
Review the authentication changes and report any regressions.
```

### `MESSAGE`

`MESSAGE` is used for a queued `send_message` delivery. It communicates
with an existing agent without starting a new turn.

For `/root/worker` reporting progress to the root agent, the request
contains:

```json
{
  "type": "agent_message",
  "author": "/root/worker",
  "recipient": "/root",
  "content": [
    {
      "type": "input_text",
      "text": "Message Type: MESSAGE\nTask name: /root\nSender: /root/worker\nPayload:\n"
    },
    {
      "type": "encrypted_content",
      "encrypted_content": "<encrypted message payload>"
    }
  ]
}
```

Conceptually, the model receives:

```text
Message Type: MESSAGE
Task name: /root
Sender: /root/worker
Payload:
The protocol tests pass; I am checking the resume path now.
```

### `FINAL_ANSWER`

`FINAL_ANSWER` is emitted when a child agent reaches a terminal state
and reports its result to its parent. Completion payloads are already
available locally, so the complete envelope is represented as plaintext
rather than as a plaintext header plus encrypted content.

For `/root/worker` completing work for the root agent, the request
contains:

```json
{
  "type": "agent_message",
  "author": "/root/worker",
  "recipient": "/root",
  "content": [
    {
      "type": "input_text",
      "text": "Message Type: FINAL_ANSWER\nTask name: /root\nSender: /root/worker\nPayload:\nNo regressions found."
    }
  ]
}
```

The model-visible form is:

```text
Message Type: FINAL_ANSWER
Task name: /root
Sender: /root/worker
Payload:
No regressions found.
```

Errored, shut down, and missing agents also use `FINAL_ANSWER`, with a
terminal-status description in the payload.

## What changed

- Render `NEW_TASK` or `MESSAGE` in
`InterAgentCommunication::to_model_input_item`, based on whether the
encrypted delivery starts a turn.
- Replace the multi-agent v2 `<subagent_notification>` completion
payload with a model-visible `FINAL_ANSWER` envelope.
- Document `Task name`, `Sender`, and `Payload` consistently in the
multi-agent developer instructions.
- Prevent local-only history projections from treating an encrypted
message's plaintext header as the complete assistant message.
- Preserve rollout-trace interaction edges when an agent message
contains both plaintext and encrypted content.

Legacy multi-agent behavior remains unchanged.

## Verification

- `just test -p codex-protocol`
- `just test -p codex-rollout-trace`
- `just test -p codex-web-search-extension`
- `just test -p codex-core
encrypted_multi_agent_v2_spawn_sends_agent_message_to_child`
- `just test -p codex-core
plaintext_multi_agent_v2_completion_sends_agent_message`
- `just test -p codex-core
multi_agent_v2_followup_task_completion_notifies_parent_on_every_turn`
- `just test -p codex-core
multi_agent_v2_completion_queues_message_for_direct_parent`

jif · 2026-06-16 11:46:59 +02:00

5b22a8e5b1

feat(core): add metadata field to ResponseItem (#28355 )

## Description

This PR adds an optional `metadata` field to `ResponseItem` for
Responses API calls. Only mechanical plumbing, no actual values
populated and sent yet. Turns out just adding a new field to
`ResponseItem` has quite a large blast radius already.

This change is backwards compatible because `metadata` is optional and
omitted when absent, so existing response items and rollout history
without it still deserialize and requests that do not set it keep the
same wire shape. For provider compatibility, we strip out `metadata`
before non-OpenAI Responses requests so Azure and AWS Bedrock never see
this field.

My followup PR here will actually make use of it to start storing and
passing along `turn_id`: https://github.com/openai/codex/pull/28360

## What changed

- Added `ResponseItemMetadata` with optional `turn_id`, plus optional
`metadata` on Responses API item variants and inter-agent communication.
- Preserved item metadata through response-item rewrites such as
truncation, missing tool-output synthesis, compaction history
rebuilding, visible-history conversion, rollout/resume, and generated
app-server schemas/types.
- Strip item metadata from non-OpenAI Responses requests while
preserving it for OpenAI-shaped requests.
- Updated the mechanical fixture/test construction churn required by the
new optional field.

Owen Lin · 2026-06-15 15:05:28 -07:00

040dafa32d

Support plaintext agent messages (#27830 )

## Why

Multi-agent v2 `send_message` deliveries already reach the receiving
model as typed `agent_message` items with encrypted content.
Child-completion notifications are generated by Codex itself, so their
content is plaintext and previously fell back to a serialized JSON
envelope inside an assistant message.

With plaintext `input_text` supported for `agent_message`, both delivery
paths can use the same model-visible type while preserving explicit
author and recipient metadata.

## What changed

- add plaintext `input_text` support to `AgentMessageInputContent` and
regenerate the affected app-server schemas
- preserve `InterAgentCommunication` as structured mailbox input instead
of converting it to assistant text
- record delivered communications as typed `agent_message` history items
- persist a dedicated rollout item so local delivery metadata such as
`trigger_turn` remains available without leaking into the Responses
request
- reconstruct typed agent messages on resume and preserve fork-turn
truncation behavior
- remove request-time assistant-content parsing
- preserve plaintext and encrypted inter-agent deliveries in stage-one
memory inputs
- normalize and link plaintext and encrypted agent messages in rollout
traces without treating inbound messages as child results
- cover the real MultiAgent V2 child-completion path end to end with
deterministic mailbox synchronization

## Verification

- `just test -p codex-core
plaintext_multi_agent_v2_completion_sends_agent_message`
- `just test -p codex-core input_queue_drains_mailbox_in_delivery_order
record_initial_history_reconstructs_typed_inter_agent_message
fork_turn_positions_use_inter_agent_delivery_metadata`
- `just test -p codex-memories-write
serializes_inter_agent_communications_for_memory`
- `just test -p codex-rollout-trace
agent_messages_preserve_routing_and_content
sub_agent_started_activity_creates_spawn_edge`
- `just test -p codex-rollout-trace
agent_result_edge_falls_back_to_child_thread_without_result_message`
- `just test -p codex-protocol -p codex-rollout -p
codex-app-server-protocol`

jif · 2026-06-12 13:50:04 -07:00

8f2d6416ce

multi-agent: add path-based v2 activity tracking (#27007 )

## Why

Multi-agent v2 identifies agents by canonical paths, but its tool
handlers still emitted the larger legacy collaboration begin/end events
built around nickname and role metadata. App-server, rollout-trace,
analytics, and TUI consumers therefore lacked one compact path-based
completion signal that behaved consistently across live events and
replay.

The TUI also needs a bounded `/agent` status surface for v2 agents. It
should use recent local activity for previews, refresh liveness without
loading full histories, and keep the legacy picker available when no
path-backed v2 agent is known.

## What changed

- Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
`interrupt_agent` legacy lifecycle emissions with a success-only
`SubAgentActivity` event. The event records the tool call ID, occurrence
time, affected thread, canonical agent path, and `started`,
`interacted`, or `interrupted` kind.
- Expose the activity as a completion-only app-server v2
`subAgentActivity` thread item in live notifications and reconstructed
history, regenerate the protocol schemas, and count it in sub-agent tool
analytics.
- Track canonical paths from live activity and loaded-thread metadata in
the TUI, and render the activity in live and replayed transcripts.
- Make `/agent` list running path-backed agents with summaries from
bounded local event buffers. Each summary is capped at 240 graphemes,
the scan is capped at six recent items, only the last three wrapped
lines are shown, and command output is omitted. Liveness falls back to
metadata-only `thread/read` when local turn state is unavailable.
- Persist the activity as a terminal rollout-trace runtime payload and
reduce it to the corresponding spawn, send, follow-up, or close
interaction edge. `interrupt_agent` is classified as a close-edge
operation.
- Preserve the legacy picker when no path-backed v2 agent is known.

## Compatibility

App-server v2 clients that consumed `collabAgentToolCall` begin/end
pairs for these tools must handle the new completion-only
`subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.

## Screenshot

<img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
/>

## Testing

- `just test -p codex-app-server-protocol`
- `just test -p codex-rollout-trace`
- Added focused coverage for activity analytics, terminal trace
serialization, spawn-edge reduction, `interrupt_agent` classification,
TUI status rendering without aggregated command output, and clearing
stale running state after a completed turn.

jif · 2026-06-09 12:14:48 +02:00

fae2709320

[codex] Forward turn moderation metadata through app-server (#25710 )

## Why
First-party backends can supply turn-scoped moderation metadata that
app-server clients need for client-side presentation. Exposing this as
an experimental typed notification lets opted-in clients consume it
without interpreting raw Responses API events.

## What changed
- forward `response.metadata.openai_chatgpt_moderation_metadata` from
Responses API SSE and WebSocket streams as turn-scoped moderation
metadata
- emit the experimental app-server v2 `turn/moderationMetadata`
notification with `{ threadId, turnId, metadata }`
- add app-server integration coverage for the typed moderation metadata
notification

## Testing
- `just test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
primarily missing `test_stdio_server` and shell snapshot timeouts)
- `just test -p codex-app-server-protocol`
- `just test -p codex-app-server
turn_moderation_metadata_emits_typed_notification_v2`
- `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
and 5 timed out; failures are in existing environment-sensitive tests,
primarily because nested macOS `sandbox-exec` is not permitted)
- `just write-app-server-schema --experimental --schema-root
/tmp/codex-app-server-schema-experimental`

carlc-oai · 2026-06-05 02:41:06 -07:00

55aa071b17

Encrypt multi-agent v2 message payloads (#26210 )

## Why

Multi-agent v2 currently routes agent instructions through normal tool
arguments and inter-agent context. That means the parent model can emit
plaintext task text, Codex can persist it in history/rollouts, and the
recipient can receive it as ordinary assistant-message JSON.

This changes the v2 path so agent instructions stay encrypted between
model calls: Responses encrypts the `message` argument returned by the
model, Codex forwards only that ciphertext, and Responses decrypts it
internally for the recipient model.

## What changed

- Mark the v2 `message` parameter as encrypted for `spawn_agent`,
`send_message`, and `followup_task`.
- Treat multi-agent v2 tool `message` values as ciphertext
unconditionally.
- Store v2 inter-agent task text in
`InterAgentCommunication.encrypted_content` with empty plaintext
`content`.
- Convert encrypted inter-agent communications into the Responses
`agent_message` input item before sending the child request.
- Preserve `agent_message` items across history, rollout, compaction,
telemetry, and app-server schema paths.
- Leave multi-agent v1 unchanged.

## Message shape

The model still calls the v2 tools with a `message` argument, but that
value is now ciphertext:

```json
{
  "name": "spawn_agent",
  "arguments": {
    "task_name": "worker",
    "message": "<ciphertext>"
  }
}
```

Codex stores the task as encrypted inter-agent communication:

```json
{
  "author": "/root",
  "recipient": "/root/worker",
  "content": "",
  "encrypted_content": "<ciphertext>",
  "trigger_turn": true
}
```

When Codex builds the recipient request, it forwards the ciphertext
using the new Responses input item:

```json
{
  "type": "agent_message",
  "author": "/root",
  "recipient": "/root/worker",
  "content": [
    {
      "type": "encrypted_content",
      "encrypted_content": "<ciphertext>"
    }
  ]
}
```

Responses decrypts that item internally for the recipient model.

## Context impact

- Parent context no longer carries plaintext v2 agent task instructions
from these tool arguments.
- Codex rollout/history stores ciphertext for v2 agent instructions.
- Recipient requests receive an `agent_message` item instead of
assistant commentary JSON for encrypted task delivery.
- Plaintext completion/status notifications are still plaintext because
they are Codex-generated status messages, not encrypted model tool
arguments.

## Validation

- `just test -p codex-tools`
- `just test -p codex-protocol`
- `just test -p codex-rollout`
- `just test -p codex-rollout-trace`
- `just test -p codex-otel`
- `just write-app-server-schema`

jif · 2026-06-05 10:25:57 +02:00

5f4d06ef18

[codex] Rename multi-agent v2 assign_task to followup_task (#25636 )

## Summary

Renames the MultiAgentV2 turn-triggering tool from `assign_task` to
`followup_task` so the exposed tool name better describes sending an
additional task to an existing agent.

This updates the tool spec, handler/module names, registry wiring,
default multi-agent v2 usage hints, and tests. Rollout trace
classification keeps accepting legacy `assign_task` events so older
traces still reduce correctly, while docs show the new tool name.

## Test plan

- `just test -p codex-core followup_task`
- `just test -p codex-core -E
'test(multi_agent_feature_selects_one_agent_tool_family) |
test(multi_agent_v2_can_use_configured_tool_namespace) |
test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'`
- `just test -p codex-rollout-trace`
- `just fix -p codex-core`
- `just fix -p codex-rollout-trace`

Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase
because the local environment could not resolve `hatchling>=1.27.0` from
the configured internal registry. A full `just test -p codex-core` also
hit unrelated environment-sensitive integration failures involving
missing spawned test binaries/sandbox behavior; the changed multi-agent
spec/handler tests passed in the filtered runs above.

jif-oai · 2026-06-01 19:57:11 +02:00

6ddb747e76

Rename multi-agent v2 assignment tool (#25267 )

## Summary
- rename the multi-agent v2 follow-up task tool surface to assign_task
- update core tests and spec-plan expectations
- keep rollout-trace classification backward-compatible with legacy
followup_task

## Tests
- just fmt
- just test -p codex-core
multi_agents_spec::tests::assign_task_tool_requires_message_and_has_no_output_schema
- just test -p codex-rollout-trace
- just fix -p codex-core
- just fix -p codex-rollout-trace

Note: a broad just test -p codex-core run was attempted locally, but
this sandbox produced unrelated environment failures around
sandbox-exec, missing test_stdio_server, and realtime timeouts.

jif-oai · 2026-05-30 14:13:05 +02:00

8acaec73b6

Trace logical websocket request after untraced warmup (#23581 )

## Why

`prewarm_websocket` intentionally stays out of rollout inference
tracing, but the next traced websocket request can still reuse the
warmup `response_id` and send an empty `input` delta. If tracing records
that wire payload verbatim, replay sees an incremental request whose
parent was never traced and cannot reconstruct the conversation.

This fixes that at the producer boundary instead of relaxing
`rollout-trace` replay semantics around unresolved
`previous_response_id` values.

## What

- track whether the last websocket response came from an untraced warmup
and clear that state when the websocket session is reset or reconnected
- when a traced websocket request reuses that warmup parent, keep
sending the compressed websocket request on the wire but record the
logical `ResponsesApiRequest` in the rollout trace
- add a regression test that proves replay reconstructs the logical user
message even though the websocket follow-up carries
`previous_response_id = warm-1` with empty `input`
- update `InferenceTraceAttempt::record_started` docs to reflect that
callers may record a logical request rather than the exact transport
payload

## Testing

- `cargo test -p codex-core --test all
responses_websocket_request_prewarm_traces_logical_request`

jif-oai · 2026-05-21 11:13:23 +02:00

20fedafff8

[5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508 )

**Stack position:** [5 of 7]

## Summary

This PR adds `Op::ThreadSettings`, a queued settings-only update
mechanism for changing stored thread settings without starting a new
turn. It also removes the legacy `Op::OverrideTurnContext` in the same
layer, so reviewers can see the replacement and deletion together.

## Changes

- Add `Op::ThreadSettings` for settings-only queued updates.
- Emit `ThreadSettingsApplied` with the effective thread settings
snapshot after core applies an update.
- Route settings-only updates through the same submission queue as user
input.
- Migrate remaining `OverrideTurnContext` tests and callers to the
queued `Op::ThreadSettings` path.
- Delete `Op::OverrideTurnContext` from the core protocol and submission
loop.

This stack addresses #20656 and #22090.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)

Eric Traut · 2026-05-18 21:03:51 -07:00

a668379abf

[rollout-trace] Add a trace ID to MCP calls. (#22326 )

This allows us to connect individual tool calls to the logs of the
invocations.

cassirer-openai · 2026-05-13 16:03:33 +00:00

702e6a3c64

[rollout-trace] Add x-codex-inference-call-id header to inference calls. (#22311 )

This allows us to attach call logs to inference requests in traces.

cassirer-openai · 2026-05-12 05:55:11 -07:00

cb55b769d1

Simplify MCP tool handler plumbing (#21595 )

## Why
The MCP tool path had accumulated a few core-owned special cases: a
dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
translation path, and a side channel for parallel-call metadata. That
made `ToolRegistry` and the spec builder know more about MCP than they
needed to.

This change moves MCP-specific execution details back onto `ToolInfo`
and `McpHandler` so `codex-core` can treat MCP calls like normal
function calls while still preserving MCP-specific dispatch and
telemetry behavior where it belongs.

## What changed
- removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
the remaining registry-side MCP resolver path
- stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
including `supports_parallel_tool_calls`
- deleted the legacy `AfterToolUse` consumer in `core`, which removes
the need for handler-specific `after_tool_use_payload` implementations
- switched tool-result telemetry to handler-provided tags and kept
MCP-specific dispatch payload construction inside the handler
- simplified tool spec planning/building by passing `ToolInfo` directly
and dropping the direct/deferred MCP wrapper structs and the
parallel-server side table

## Testing
- `cargo check -p codex-core -p codex-mcp -p codex-otel`
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_description_lists_each_mcp_source_once`
- `cargo test -p codex-mcp
list_all_tools_uses_startup_snapshot_while_client_is_pending`
- `just fix -p codex-core -p codex-mcp -p codex-otel`

pakrym-oai · 2026-05-12 00:11:31 +00:00

ed5944ba1d

Reapply "Move skills watcher to app-server" (#21652 )

## Why

PR #21460 reverted the earlier move of skills change watching from
`codex-core` into app-server. This reapplies that boundary change so
app-server owns client-facing `skills/changed` notifications and core no
longer carries the watcher.

## What

- Restore the app-server `SkillsWatcher` and register it from thread
listener setup.
- Remove the core-owned skills watcher and its core live-reload
integration surface.
- Restore app-server coverage for `skills/changed` notifications after a
watched skill file changes.

## Validation

- `cargo test -p codex-app-server --test all
suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
-- --exact --nocapture`
- `cargo test -p codex-core --lib --no-run`

pakrym-oai · 2026-05-08 17:41:15 -07:00

408e6218ab

Revert "Move skills watcher to app-server" (#21460 )

Reverts openai/codex#21287

pakrym-oai · 2026-05-07 02:24:20 +00:00

103dc2b6ae

Move skills watcher to app-server (#21287 )

## Why

Skills update notifications are app-server API behavior, but the watcher
lived in `codex-core` and surfaced through
`EventMsg::SkillsUpdateAvailable`. Moving the watcher out keeps core
focused on thread execution and lets app-server own both cache
invalidation and the `skills/changed` notification.

## What changed

- Added an app-server-owned skills watcher that watches local skill
roots, clears the shared skills cache, and emits `skills/changed`
directly.
- Registers skill watches from the common app-server thread listener
attach path, including direct starts, resumes, and app-server-observed
child or forked threads.
- Stores the `WatchRegistration` on `ThreadState`, so listener
replacement, thread teardown, idle unload, and app-server shutdown
deregister by dropping the RAII guard.
- Removed `EventMsg::SkillsUpdateAvailable`, the core watcher, and the
old core live-reload test.
- Extended the app-server skills change test to verify a cached skills
list is refreshed after a filesystem change without forcing reload.

## Validation

- `cargo check -p codex-core -p codex-app-server -p codex-mcp-server -p
codex-rollout -p codex-rollout-trace`
- `cargo test -p codex-app-server
skills_changed_notification_is_emitted_after_skill_change`

pakrym-oai · 2026-05-06 15:38:11 -07:00

d5eea229cc

Remove core MCP list tools op (#21281 )

## Why

The core `Op::ListMcpTools` request path is no longer needed. Keeping it
around left a dead request/response surface alongside the app-server MCP
inventory APIs that own current server status listing.

## What Changed

- Removed `Op::ListMcpTools`, `EventMsg::McpListToolsResponse`, and the
core handler that built the MCP snapshot response.
- Removed the now-unused `codex-mcp` snapshot wrapper/export and passive
event handling arms in rollout and MCP-server consumers.
- Updated tests that used the old op as a synchronization hook to wait
on existing startup/skills events, and deleted the plugin test that only
exercised the removed listing op.

## Validation

- `cargo test -p codex-protocol`
- `cargo test -p codex-mcp`
- `cargo test -p codex-rollout -p codex-rollout-trace -p
codex-mcp-server`
- `cargo test -p codex-core --test all
pending_input::queued_inter_agent_mail`
- `cargo test -p codex-core --test all
rmcp_client::stdio_mcp_tool_call_includes_sandbox_state_meta`
- `cargo test -p codex-core --test all
rmcp_client::stdio_image_responses`
- `just fix -p codex-core -p codex-protocol -p codex-mcp -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`

pakrym-oai · 2026-05-06 11:20:34 -07:00

712305be47

Move message history out of core (#21278 )

## Why

Message history was implemented inside `codex-core` and surfaced through
core protocol ops and `SessionConfiguredEvent` fields even though the
current consumer is TUI-local prompt recall. That made core own UI
history persistence and exposed `history_log_id` / `history_entry_count`
through surfaces that app-server and other clients do not need.

This change moves message history persistence out of core and keeps the
recall plumbing local to the TUI.

## What changed

- Added a new `codex-message-history` crate for appending, looking up,
trimming, and reading metadata from `history.jsonl`.
- Removed core protocol history ops/events: `AddToHistory`,
`GetHistoryEntryRequest`, and `GetHistoryEntryResponse`.
- Removed `history_log_id` and `history_entry_count` from
`SessionConfiguredEvent` and updated exec/MCP/test fixtures accordingly.
- Updated the TUI to dispatch local app events for message-history
append/lookup and keep its persistent-history metadata in TUI session
state.

## Validation

- `cargo test -p codex-message-history -p codex-protocol`
- `cargo test -p codex-exec event_processor_with_json_output`
- `cargo test -p codex-mcp-server outgoing_message`
- `cargo test -p codex-tui`
- `just fix -p codex-message-history -p codex-protocol -p codex-core -p
codex-tui -p codex-exec -p codex-mcp-server`

pakrym-oai · 2026-05-06 08:35:42 -07:00

2004173cd7

[codex] Remove legacy ListSkills op (#21282 )

## Why

`skills/list` is already exposed through app-server v2 and covered by
the app-server test suite. Keeping the separate core `Op::ListSkills`
path leaves a duplicate legacy protocol surface that no longer needs to
be maintained.

## What Changed

- Removed `Op::ListSkills` and `EventMsg::ListSkillsResponse` from the
core protocol.
- Deleted the corresponding core session handler and stale core
integration tests.
- Removed rollout/MCP ignore branches and protocol v1 docs references
for the deleted event/op.
- Left app-server `skills/list` and its existing coverage intact.

## Validation

- `cargo test -p codex-protocol`
- `cargo test -p codex-core --test all suite::skills`
- `cargo check -p codex-mcp-server -p codex-rollout -p
codex-rollout-trace`
- `just fix -p codex-core`

pakrym-oai · 2026-05-05 18:58:18 -07:00

136e442e95

[codex] Move thread naming to app server (#21260 )

## Why

Thread names are app-server metadata now, backed by the thread store and
sqlite state database. Keeping a core `SetThreadName` op plus a rollout
`thread_name_updated` event made rename persistence live in the wrong
layer and required historical replay support for an event that new
app-server flows should not write.

## What changed

- Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the
core protocol and deleted the core handler path that appended rename
events to rollouts.
- Updated app-server `thread/name/set` so both loaded and unloaded
threads write through thread-store metadata and app-server emits
`thread/name/updated` notifications.
- Updated local thread-store name metadata updates to write sqlite title
metadata and the legacy thread-name index without appending rollout
events.
- Removed state extraction and rollout handling for the deleted
thread-name event.

## Validation

- `cargo test -p codex-app-server thread_name_updated_broadcasts`
- `cargo test -p codex-app-server
thread_name_set_is_reflected_in_read_list_and_resume`
- `cargo test -p codex-thread-store
update_thread_metadata_sets_name_on_active_rollout_and_indexes_name`
- `cargo test -p codex-state`
- `cargo check -p codex-mcp-server -p codex-rollout-trace`
- `just fix -p codex-app-server -p codex-thread-store -p codex-state -p
codex-mcp-server -p codex-rollout-trace`

## Docs

No external documentation update is expected for this internal ownership
change.

pakrym-oai · 2026-05-05 17:16:06 -07:00

2c1a361a2e

feat: add remote compaction v2 Responses client path (#20773 )

## Why

This adds the `remote_compaction_v2` client path so remote compaction
can run through the normal Responses stream and install a
`context_compaction` item that trigger a compaction.

The goal is to migrate some of the compaction logic on the client side

We keeps the v2 transport behind a feature flag while letting follow-up
requests reuse the compacted context instead of falling back to the
legacy compaction item shape.

## What changed

- add `ResponseItem::ContextCompaction` and refresh the generated
app-server / schema / TypeScript fixtures that expose response items on
the wire
- add `core/src/compact_remote_v2.rs` to send compaction through the
standard streamed Responses client, require exactly one
`context_compaction` output item, and install that item into compacted
history
- route manual compact and auto-compaction through the v2 path when
`remote_compaction_v2` is enabled, while keeping the existing remote
compaction path as the fallback
- preserve the new item type across history retention, follow-up request
construction, telemetry, rollout persistence, and rollout-trace
normalization
- add targeted coverage for the feature flag, `context_compaction`
serialization, rollout-trace normalization, and remote-compaction
follow-up behavior

## Verification

- added protocol tests for `context_compaction`
serialization/deserialization in `protocol/src/models.rs`
- added rollout-trace coverage for `context_compaction` normalization in
`rollout-trace/src/reducer/conversation_tests.rs`
- added remote compaction integration coverage for v2 follow-up reuse
and mixed compaction output streams in
`core/tests/suite/compact_remote.rs`

---------

Co-authored-by: Codex <noreply@openai.com>

jif-oai · 2026-05-04 14:15:01 +02:00

d927f61208

[codex] Emit image view as core item (#20512 )

## Why

Image-view results should be represented as a core-produced turn item
instead of being reconstructed by app-server. At the same time, existing
rollout/history paths still understand the legacy `ViewImageToolCall`
event, so this keeps that event as compatibility output generated from
the new item lifecycle.

## What changed

- Added `TurnItem::ImageView` to `codex-protocol`.
- Emitted image-view item start/completion directly from the core
`view_image` handler.
- Kept `ViewImageToolCall` as a legacy event and generate it from
completed `TurnItem::ImageView` items.
- Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay
path, with `ImageView` item lifecycle events ignored there.
- Updated app-server protocol conversion, rollout persistence, and
affected exhaustive event matches for the new item plus legacy fan-out
shape.

## Verification

- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server -p
codex-app-server --lib`
- `cargo test -p codex-core --test all
view_image_tool_attaches_local_image`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-rollout -p codex-rollout-trace -p
codex-mcp-server`
- `git diff --check`

pakrym-oai · 2026-05-01 11:28:30 -07:00

aed74e5ee4

[codex] Remove unused event messages (#20511 )

## Why

Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.

`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.

## What Changed

- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.

## Verification

- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.

pakrym-oai · 2026-04-30 20:03:26 -07:00

f50c02d7bc

[rollout-tracer] Match analysis messages on encrypted id. (#20123 )

In some setups the summary or raw content can be dropped between
requests. This triggers a check in the reducer which expects that the
messages should remain identical between requests.

This PR relaxes the checks to only focus on the encrypted ID instead. It
also changes the reducer to keep the most rich version of the message
observed during the rollout (this ensures that we don't accidentally
lose the CoT nor summary when available).

cassirer-openai · 2026-04-29 17:22:24 +00:00

df966996a7

[rollout-trace] Include x-request-id in rollout trace. (#20066 )

## Why

Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.

This PR separates those concepts so trace consumers can reliably answer
both questions:

- which Responses API response did this inference produce?
- which upstream request handled it?

## Structure

The change keeps the upstream request id at the same lifecycle level as
the provider stream:

- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.

## Why This Shape

`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.

Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.

## Validation

The PR updates trace and reducer coverage for:

- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.

cassirer-openai · 2026-04-28 21:11:17 +00:00

89698ad1c3

[codex] Trace cancelled inference streams (#19839 )

Records cancelled inference streams when Codex stops consuming a
provider response before `response.completed`, preserving complete
output items observed before cancellation.

Also closes still-running inference calls when the owning turn ends, so
reduced rollout traces do not leave stale `Running` inference nodes.

Covered by focused reducer coverage and a core stream-drop test for
partial output preservation.

cassirer-openai · 2026-04-27 21:58:29 +00:00

30c5c768de

Add goal app-server API (2 / 5) (#18074 )

Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.

## Why

Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.

## What changed

- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.

## Verification

- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.

Eric Traut · 2026-04-24 20:53:41 -07:00

6c874f9b34

permissions: make profiles represent enforcement (#19231 )

## Why

`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.

The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:

```rust
pub enum PermissionProfile {
    Managed {
        file_system: FileSystemSandboxPolicy,
        network: NetworkSandboxPolicy,
    },
    Disabled,
    External {
        network: NetworkSandboxPolicy,
    },
}
```

This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.

## How Existing Modeling Maps

Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:

- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.

## What Changed

- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.

## Compatibility

Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.

## Verification

- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`

Michael Bolin · 2026-04-23 23:02:18 -07:00

4816b89204

[rollout_trace] Add debug trace reduction command (#18880 )

## Summary

Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.

## Stack

This is PR 5/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.

The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.

cassirer-openai · 2026-04-24 01:56:48 +00:00

e3c8720a99

[rollout_trace] Trace tool and code-mode boundaries (#18878 )

## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.

cassirer-openai · 2026-04-23 12:22:11 -07:00

6d09b6752d

[rollout_trace] Record core session rollout traces (#18877 )

## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.

cassirer-openai · 2026-04-22 17:00:48 +00:00

f67383bcba

[rollout_trace] Add rollout trace crate (#18876 )

## Summary

Adds the standalone `codex-rollout-trace` crate, which defines the raw
trace event format, replay/reduction model, writer, and reducer logic
for reconstructing model-visible conversation/runtime state from
recorded rollout data.

The crate-level design is documented in
[`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).

## Stack

This is PR 1/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR intentionally does not wire tracing into live Codex execution.
It establishes the data model and reducer contract first, with
crate-local tests covering conversation reconstruction, compaction
boundaries, tool/session edges, and code-cell lifecycle reduction. Later
PRs emit into this model.

The README is the best entry point for reviewing the intended trace
format and reduction semantics before diving into the reducer modules.

cassirer-openai · 2026-04-21 21:54:05 +00:00

27d9673273

38 Commits