38 Commits

  • [codex] Use input items for Responses Lite tools (#27946)
    When using Responses Lite, we should all use `additional_tools` and a
    developer item instead of the top level tools array & instructions
    field. This keeps things 1-to-1.
    
    Forced namespacing for _all_ tools will land in a following PR after
    some coordination & fixes in Responses API (around collisions & return
    items).
    
    The goal is to eventually expand the scope of this to _all_ requests
    from codex, but that will require larger coordination across providers &
    slower rollout.
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Propagate safety buffering events to app-server clients (#29371)
    Responses API safety buffering metadata currently stops at the transport
    boundary, so app-server clients cannot render the in-progress safety
    review state.
    
    This change:
    - decodes and deduplicates `safety_buffering` metadata from Responses
    API SSE and WebSocket events without suppressing the original response
    event
    - emits a typed core event containing the requested model plus backend
    use cases and reasons
    - forwards that event as `turn/safetyBuffering/updated` through
    app-server v2 and updates generated protocol schemas
    - keeps the side-channel event out of persisted rollouts and turn timing
    
    This supports the Codex Apps buffering UX and depends on the Responses
    API backend work in https://github.com/openai/openai/pull/1044569 and
    https://github.com/openai/openai/pull/1044571.
    
    Validation:
    - focused `codex-core` safety-buffering integration test passes
    - `cargo check -p codex-core -p codex-app-server -p
    codex-app-server-protocol`
    - `just fix -p codex-api -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-rollout -p
    codex-rollout-trace -p codex-otel`
    - `just fmt`
    - broad package test run: 4,430/4,492 passed; 62 unrelated
    local-environment/concurrency failures involved unavailable test
    binaries, MCP subprocess setup, and app-server timeouts
  • [codex] Assign response item IDs when recording history (#28814)
    ## Why
    
    Client-created response items enter history without IDs, so their
    identity is lost across rollout persistence and resume. IDs should be
    assigned once at the history-recording boundary, while IDs returned by
    the server must remain unchanged.
    
    The Responses API validates item IDs using type-specific prefixes.
    Locally generated IDs therefore use the matching prefix plus a
    hyphenated UUIDv7, keeping them valid while distinguishable from
    server-generated IDs. Because this changes persisted history and
    provider request shapes, the behavior is opt-in behind the
    under-development `item_ids` feature. Compaction triggers remain request
    controls whose API shape does not accept an ID.
    
    ## What changed
    
    - Register the disabled-by-default `item_ids` feature and expose it in
    `config.schema.json`.
    - Make supported optional `ResponseItem` IDs serializable and expose
    them in the generated app-server schemas.
    - When `item_ids` is enabled, assign an ID during conversation-history
    preparation if an item has no ID.
    - Generate type-prefixed, hyphenated UUIDv7 IDs using the Responses API
    item conventions.
    - Preserve existing server IDs without rewriting them.
    - Persist assigned IDs in rollouts and include them in subsequent
    Responses requests.
    - Remove the unsupported ID field from `CompactionTrigger` and document
    why it has no ID.
    - Add integration coverage for enabled ID persistence, preservation of
    server IDs, and omission of generated IDs while the feature is disabled.
    
    `prepare_conversation_items_for_history` is the single response-item ID
    allocation boundary.
    
    ## Test plan
    
    - `just test -p codex-features`
    - `just test -p codex-core
    response_item_ids_persist_across_resume_and_preserve_server_ids`
    - `just test -p codex-core
    non_openai_responses_requests_omit_item_turn_metadata`
    - `just test -p codex-core
    resize_all_images_prepares_failures_before_history_insertion`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-api azure_default_store_attaches_ids_and_headers`
  • unified-exec: retain PathUri in command events (#28780)
    ## Why
    
    App-server must report command events containing foreign-platform paths
    without changing existing client or rollout path-string formats.
    
    ## What changed
    
    - retain `PathUri` through exec command begin/end events
    - convert cwd values to `LegacyAppPathString` at the app-server
    compatibility boundary
    - drop command actions with foreign paths and log them
    - serialize rollout-trace cwd values using their inferred native path
    representation
    - restore Wine coverage for retained Windows cwd values and successful
    completion
  • [codex] Add optional IDs to response items (#28812)
    ## Why
    
    `ResponseItem` variants do not have a consistent internal ID shape: some
    variants carry required IDs, some carry optional IDs, and some cannot
    represent an ID at all. The existing fields also use inconsistent serde,
    TypeScript, and JSON-schema annotations. A single enum-level access path
    is needed before history recording can assign and retain IDs.
    
    This PR establishes that internal model only. It intentionally does not
    generate or serialize IDs; allocation and wire persistence are isolated
    in the stacked follow-up.
    
    ## What changed
    
    - Give every concrete `ResponseItem` variant an `Option<String>` ID
    field.
    - Apply the same internal-only annotations to every ID field:
    `#[serde(default, skip_serializing)]`, `#[ts(skip)]`, and
    `#[schemars(skip)]`.
    - Add `ResponseItem::id()` and `ResponseItem::set_id()` as the shared
    accessors.
    - Preserve IDs when history items are rewritten for truncation.
    - Adapt consumers that previously assumed reasoning and image-generation
    IDs were required.
    - Regenerate app-server schemas so the hidden fields are represented
    consistently.
    
    The serde catch-all `ResponseItem::Other` remains ID-less because it
    must remain a unit variant.
    
    ## Test plan
    
    - `cargo check --tests -p codex-core -p codex-api -p codex-rollout-trace
    -p codex-image-generation-extension`
    - `just test -p codex-protocol`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-api -p codex-rollout-trace -p
    codex-image-generation-extension`
    - `just test -p codex-core event_mapping`
  • feat: render typed envelopes for multi-agent v2 messages (#28368)
    ## Why
    
    Multi-agent v2 messages need a consistent, model-visible envelope that
    identifies what kind of interaction occurred, who sent it, and which
    agent it targets. Previously, encrypted deliveries exposed only
    `encrypted_content`, while child completion used the legacy
    `<subagent_notification>` shape. That meant the client could not
    consistently present `NEW_TASK`, `MESSAGE`, and `FINAL_ANSWER` using the
    same format.
    
    This change adds the routing envelope as plaintext while keeping task
    and message payloads encrypted. No new Responses API field is required:
    an encrypted delivery is represented as an `input_text` header
    immediately followed by its existing `encrypted_content` item.
    
    Every envelope now follows this shape:
    
    ```text
    Message Type: <NEW_TASK | MESSAGE | FINAL_ANSWER>
    Task name: <recipient agent path>
    Sender: <author agent path>
    Payload:
    <message payload>
    ```
    
    ## Message types
    
    ### `NEW_TASK`
    
    `NEW_TASK` is used when the recipient should begin a new turn, including
    an initial `spawn_agent` task and a later `followup_task`.
    
    For a root agent spawning `/root/worker`, the request contains a
    plaintext envelope followed by the encrypted task:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: NEW_TASK\nTask name: /root/worker\nSender: /root\nPayload:\n"
        },
        {
          "type": "encrypted_content",
          "encrypted_content": "<encrypted task payload>"
        }
      ]
    }
    ```
    
    Conceptually, the model receives:
    
    ```text
    Message Type: NEW_TASK
    Task name: /root/worker
    Sender: /root
    Payload:
    Review the authentication changes and report any regressions.
    ```
    
    ### `MESSAGE`
    
    `MESSAGE` is used for a queued `send_message` delivery. It communicates
    with an existing agent without starting a new turn.
    
    For `/root/worker` reporting progress to the root agent, the request
    contains:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root/worker",
      "recipient": "/root",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: MESSAGE\nTask name: /root\nSender: /root/worker\nPayload:\n"
        },
        {
          "type": "encrypted_content",
          "encrypted_content": "<encrypted message payload>"
        }
      ]
    }
    ```
    
    Conceptually, the model receives:
    
    ```text
    Message Type: MESSAGE
    Task name: /root
    Sender: /root/worker
    Payload:
    The protocol tests pass; I am checking the resume path now.
    ```
    
    ### `FINAL_ANSWER`
    
    `FINAL_ANSWER` is emitted when a child agent reaches a terminal state
    and reports its result to its parent. Completion payloads are already
    available locally, so the complete envelope is represented as plaintext
    rather than as a plaintext header plus encrypted content.
    
    For `/root/worker` completing work for the root agent, the request
    contains:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root/worker",
      "recipient": "/root",
      "content": [
        {
          "type": "input_text",
          "text": "Message Type: FINAL_ANSWER\nTask name: /root\nSender: /root/worker\nPayload:\nNo regressions found."
        }
      ]
    }
    ```
    
    The model-visible form is:
    
    ```text
    Message Type: FINAL_ANSWER
    Task name: /root
    Sender: /root/worker
    Payload:
    No regressions found.
    ```
    
    Errored, shut down, and missing agents also use `FINAL_ANSWER`, with a
    terminal-status description in the payload.
    
    ## What changed
    
    - Render `NEW_TASK` or `MESSAGE` in
    `InterAgentCommunication::to_model_input_item`, based on whether the
    encrypted delivery starts a turn.
    - Replace the multi-agent v2 `<subagent_notification>` completion
    payload with a model-visible `FINAL_ANSWER` envelope.
    - Document `Task name`, `Sender`, and `Payload` consistently in the
    multi-agent developer instructions.
    - Prevent local-only history projections from treating an encrypted
    message's plaintext header as the complete assistant message.
    - Preserve rollout-trace interaction edges when an agent message
    contains both plaintext and encrypted content.
    
    Legacy multi-agent behavior remains unchanged.
    
    ## Verification
    
    - `just test -p codex-protocol`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-core
    encrypted_multi_agent_v2_spawn_sends_agent_message_to_child`
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core
    multi_agent_v2_followup_task_completion_notifies_parent_on_every_turn`
    - `just test -p codex-core
    multi_agent_v2_completion_queues_message_for_direct_parent`
  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • Support plaintext agent messages (#27830)
    ## Why
    
    Multi-agent v2 `send_message` deliveries already reach the receiving
    model as typed `agent_message` items with encrypted content.
    Child-completion notifications are generated by Codex itself, so their
    content is plaintext and previously fell back to a serialized JSON
    envelope inside an assistant message.
    
    With plaintext `input_text` supported for `agent_message`, both delivery
    paths can use the same model-visible type while preserving explicit
    author and recipient metadata.
    
    ## What changed
    
    - add plaintext `input_text` support to `AgentMessageInputContent` and
    regenerate the affected app-server schemas
    - preserve `InterAgentCommunication` as structured mailbox input instead
    of converting it to assistant text
    - record delivered communications as typed `agent_message` history items
    - persist a dedicated rollout item so local delivery metadata such as
    `trigger_turn` remains available without leaking into the Responses
    request
    - reconstruct typed agent messages on resume and preserve fork-turn
    truncation behavior
    - remove request-time assistant-content parsing
    - preserve plaintext and encrypted inter-agent deliveries in stage-one
    memory inputs
    - normalize and link plaintext and encrypted agent messages in rollout
    traces without treating inbound messages as child results
    - cover the real MultiAgent V2 child-completion path end to end with
    deterministic mailbox synchronization
    
    ## Verification
    
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core input_queue_drains_mailbox_in_delivery_order
    record_initial_history_reconstructs_typed_inter_agent_message
    fork_turn_positions_use_inter_agent_delivery_metadata`
    - `just test -p codex-memories-write
    serializes_inter_agent_communications_for_memory`
    - `just test -p codex-rollout-trace
    agent_messages_preserve_routing_and_content
    sub_agent_started_activity_creates_spawn_edge`
    - `just test -p codex-rollout-trace
    agent_result_edge_falls_back_to_child_thread_without_result_message`
    - `just test -p codex-protocol -p codex-rollout -p
    codex-app-server-protocol`
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • Encrypt multi-agent v2 message payloads (#26210)
    ## Why
    
    Multi-agent v2 currently routes agent instructions through normal tool
    arguments and inter-agent context. That means the parent model can emit
    plaintext task text, Codex can persist it in history/rollouts, and the
    recipient can receive it as ordinary assistant-message JSON.
    
    This changes the v2 path so agent instructions stay encrypted between
    model calls: Responses encrypts the `message` argument returned by the
    model, Codex forwards only that ciphertext, and Responses decrypts it
    internally for the recipient model.
    
    ## What changed
    
    - Mark the v2 `message` parameter as encrypted for `spawn_agent`,
    `send_message`, and `followup_task`.
    - Treat multi-agent v2 tool `message` values as ciphertext
    unconditionally.
    - Store v2 inter-agent task text in
    `InterAgentCommunication.encrypted_content` with empty plaintext
    `content`.
    - Convert encrypted inter-agent communications into the Responses
    `agent_message` input item before sending the child request.
    - Preserve `agent_message` items across history, rollout, compaction,
    telemetry, and app-server schema paths.
    - Leave multi-agent v1 unchanged.
    
    ## Message shape
    
    The model still calls the v2 tools with a `message` argument, but that
    value is now ciphertext:
    
    ```json
    {
      "name": "spawn_agent",
      "arguments": {
        "task_name": "worker",
        "message": "<ciphertext>"
      }
    }
    ```
    
    Codex stores the task as encrypted inter-agent communication:
    
    ```json
    {
      "author": "/root",
      "recipient": "/root/worker",
      "content": "",
      "encrypted_content": "<ciphertext>",
      "trigger_turn": true
    }
    ```
    
    When Codex builds the recipient request, it forwards the ciphertext
    using the new Responses input item:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "encrypted_content",
          "encrypted_content": "<ciphertext>"
        }
      ]
    }
    ```
    
    Responses decrypts that item internally for the recipient model.
    
    ## Context impact
    
    - Parent context no longer carries plaintext v2 agent task instructions
    from these tool arguments.
    - Codex rollout/history stores ciphertext for v2 agent instructions.
    - Recipient requests receive an `agent_message` item instead of
    assistant commentary JSON for encrypted task delivery.
    - Plaintext completion/status notifications are still plaintext because
    they are Codex-generated status messages, not encrypted model tool
    arguments.
    
    ## Validation
    
    - `just test -p codex-tools`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-otel`
    - `just write-app-server-schema`
  • [codex] Rename multi-agent v2 assign_task to followup_task (#25636)
    ## Summary
    
    Renames the MultiAgentV2 turn-triggering tool from `assign_task` to
    `followup_task` so the exposed tool name better describes sending an
    additional task to an existing agent.
    
    This updates the tool spec, handler/module names, registry wiring,
    default multi-agent v2 usage hints, and tests. Rollout trace
    classification keeps accepting legacy `assign_task` events so older
    traces still reduce correctly, while docs show the new tool name.
    
    ## Test plan
    
    - `just test -p codex-core followup_task`
    - `just test -p codex-core -E
    'test(multi_agent_feature_selects_one_agent_tool_family) |
    test(multi_agent_v2_can_use_configured_tool_namespace) |
    test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'`
    - `just test -p codex-rollout-trace`
    - `just fix -p codex-core`
    - `just fix -p codex-rollout-trace`
    
    Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase
    because the local environment could not resolve `hatchling>=1.27.0` from
    the configured internal registry. A full `just test -p codex-core` also
    hit unrelated environment-sensitive integration failures involving
    missing spawned test binaries/sandbox behavior; the changed multi-agent
    spec/handler tests passed in the filtered runs above.
  • Rename multi-agent v2 assignment tool (#25267)
    ## Summary
    - rename the multi-agent v2 follow-up task tool surface to assign_task
    - update core tests and spec-plan expectations
    - keep rollout-trace classification backward-compatible with legacy
    followup_task
    
    ## Tests
    - just fmt
    - just test -p codex-core
    multi_agents_spec::tests::assign_task_tool_requires_message_and_has_no_output_schema
    - just test -p codex-rollout-trace
    - just fix -p codex-core
    - just fix -p codex-rollout-trace
    
    Note: a broad just test -p codex-core run was attempted locally, but
    this sandbox produced unrelated environment failures around
    sandbox-exec, missing test_stdio_server, and realtime timeouts.
  • Trace logical websocket request after untraced warmup (#23581)
    ## Why
    
    `prewarm_websocket` intentionally stays out of rollout inference
    tracing, but the next traced websocket request can still reuse the
    warmup `response_id` and send an empty `input` delta. If tracing records
    that wire payload verbatim, replay sees an incremental request whose
    parent was never traced and cannot reconstruct the conversation.
    
    This fixes that at the producer boundary instead of relaxing
    `rollout-trace` replay semantics around unresolved
    `previous_response_id` values.
    
    ## What
    
    - track whether the last websocket response came from an untraced warmup
    and clear that state when the websocket session is reset or reconnected
    - when a traced websocket request reuses that warmup parent, keep
    sending the compressed websocket request on the wire but record the
    logical `ResponsesApiRequest` in the rollout trace
    - add a regression test that proves replay reconstructs the logical user
    message even though the websocket follow-up carries
    `previous_response_id = warm-1` with empty `input`
    - update `InferenceTraceAttempt::record_started` docs to reflect that
    callers may record a logical request rather than the exact transport
    payload
    
    ## Testing
    
    - `cargo test -p codex-core --test all
    responses_websocket_request_prewarm_traces_logical_request`
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [rollout-trace] Add a trace ID to MCP calls. (#22326)
    This allows us to connect individual tool calls to the logs of the
    invocations.
  • [rollout-trace] Add x-codex-inference-call-id header to inference calls. (#22311)
    This allows us to attach call logs to inference requests in traces.
  • Simplify MCP tool handler plumbing (#21595)
    ## Why
    The MCP tool path had accumulated a few core-owned special cases: a
    dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
    translation path, and a side channel for parallel-call metadata. That
    made `ToolRegistry` and the spec builder know more about MCP than they
    needed to.
    
    This change moves MCP-specific execution details back onto `ToolInfo`
    and `McpHandler` so `codex-core` can treat MCP calls like normal
    function calls while still preserving MCP-specific dispatch and
    telemetry behavior where it belongs.
    
    ## What changed
    - removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
    the remaining registry-side MCP resolver path
    - stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
    including `supports_parallel_tool_calls`
    - deleted the legacy `AfterToolUse` consumer in `core`, which removes
    the need for handler-specific `after_tool_use_payload` implementations
    - switched tool-result telemetry to handler-provided tags and kept
    MCP-specific dispatch payload construction inside the handler
    - simplified tool spec planning/building by passing `ToolInfo` directly
    and dropping the direct/deferred MCP wrapper structs and the
    parallel-server side table
    
    ## Testing
    - `cargo check -p codex-core -p codex-mcp -p codex-otel`
    - `cargo test -p codex-core
    mcp_parallel_support_uses_exact_payload_server`
    - `cargo test -p codex-core
    direct_mcp_tools_register_namespaced_handlers`
    - `cargo test -p codex-core
    search_tool_description_lists_each_mcp_source_once`
    - `cargo test -p codex-mcp
    list_all_tools_uses_startup_snapshot_while_client_is_pending`
    - `just fix -p codex-core -p codex-mcp -p codex-otel`
  • Reapply "Move skills watcher to app-server" (#21652)
    ## Why
    
    PR #21460 reverted the earlier move of skills change watching from
    `codex-core` into app-server. This reapplies that boundary change so
    app-server owns client-facing `skills/changed` notifications and core no
    longer carries the watcher.
    
    ## What
    
    - Restore the app-server `SkillsWatcher` and register it from thread
    listener setup.
    - Remove the core-owned skills watcher and its core live-reload
    integration surface.
    - Restore app-server coverage for `skills/changed` notifications after a
    watched skill file changes.
    
    ## Validation
    
    - `cargo test -p codex-app-server --test all
    suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
    -- --exact --nocapture`
    - `cargo test -p codex-core --lib --no-run`
  • Move skills watcher to app-server (#21287)
    ## Why
    
    Skills update notifications are app-server API behavior, but the watcher
    lived in `codex-core` and surfaced through
    `EventMsg::SkillsUpdateAvailable`. Moving the watcher out keeps core
    focused on thread execution and lets app-server own both cache
    invalidation and the `skills/changed` notification.
    
    ## What changed
    
    - Added an app-server-owned skills watcher that watches local skill
    roots, clears the shared skills cache, and emits `skills/changed`
    directly.
    - Registers skill watches from the common app-server thread listener
    attach path, including direct starts, resumes, and app-server-observed
    child or forked threads.
    - Stores the `WatchRegistration` on `ThreadState`, so listener
    replacement, thread teardown, idle unload, and app-server shutdown
    deregister by dropping the RAII guard.
    - Removed `EventMsg::SkillsUpdateAvailable`, the core watcher, and the
    old core live-reload test.
    - Extended the app-server skills change test to verify a cached skills
    list is refreshed after a filesystem change without forcing reload.
    
    ## Validation
    
    - `cargo check -p codex-core -p codex-app-server -p codex-mcp-server -p
    codex-rollout -p codex-rollout-trace`
    - `cargo test -p codex-app-server
    skills_changed_notification_is_emitted_after_skill_change`
  • Remove core MCP list tools op (#21281)
    ## Why
    
    The core `Op::ListMcpTools` request path is no longer needed. Keeping it
    around left a dead request/response surface alongside the app-server MCP
    inventory APIs that own current server status listing.
    
    ## What Changed
    
    - Removed `Op::ListMcpTools`, `EventMsg::McpListToolsResponse`, and the
    core handler that built the MCP snapshot response.
    - Removed the now-unused `codex-mcp` snapshot wrapper/export and passive
    event handling arms in rollout and MCP-server consumers.
    - Updated tests that used the old op as a synchronization hook to wait
    on existing startup/skills events, and deleted the plugin test that only
    exercised the removed listing op.
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-rollout -p codex-rollout-trace -p
    codex-mcp-server`
    - `cargo test -p codex-core --test all
    pending_input::queued_inter_agent_mail`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_mcp_tool_call_includes_sandbox_state_meta`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_image_responses`
    - `just fix -p codex-core -p codex-protocol -p codex-mcp -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server`
  • Move message history out of core (#21278)
    ## Why
    
    Message history was implemented inside `codex-core` and surfaced through
    core protocol ops and `SessionConfiguredEvent` fields even though the
    current consumer is TUI-local prompt recall. That made core own UI
    history persistence and exposed `history_log_id` / `history_entry_count`
    through surfaces that app-server and other clients do not need.
    
    This change moves message history persistence out of core and keeps the
    recall plumbing local to the TUI.
    
    ## What changed
    
    - Added a new `codex-message-history` crate for appending, looking up,
    trimming, and reading metadata from `history.jsonl`.
    - Removed core protocol history ops/events: `AddToHistory`,
    `GetHistoryEntryRequest`, and `GetHistoryEntryResponse`.
    - Removed `history_log_id` and `history_entry_count` from
    `SessionConfiguredEvent` and updated exec/MCP/test fixtures accordingly.
    - Updated the TUI to dispatch local app events for message-history
    append/lookup and keep its persistent-history metadata in TUI session
    state.
    
    ## Validation
    
    - `cargo test -p codex-message-history -p codex-protocol`
    - `cargo test -p codex-exec event_processor_with_json_output`
    - `cargo test -p codex-mcp-server outgoing_message`
    - `cargo test -p codex-tui`
    - `just fix -p codex-message-history -p codex-protocol -p codex-core -p
    codex-tui -p codex-exec -p codex-mcp-server`
  • [codex] Remove legacy ListSkills op (#21282)
    ## Why
    
    `skills/list` is already exposed through app-server v2 and covered by
    the app-server test suite. Keeping the separate core `Op::ListSkills`
    path leaves a duplicate legacy protocol surface that no longer needs to
    be maintained.
    
    ## What Changed
    
    - Removed `Op::ListSkills` and `EventMsg::ListSkillsResponse` from the
    core protocol.
    - Deleted the corresponding core session handler and stale core
    integration tests.
    - Removed rollout/MCP ignore branches and protocol v1 docs references
    for the deleted event/op.
    - Left app-server `skills/list` and its existing coverage intact.
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core --test all suite::skills`
    - `cargo check -p codex-mcp-server -p codex-rollout -p
    codex-rollout-trace`
    - `just fix -p codex-core`
  • [codex] Move thread naming to app server (#21260)
    ## Why
    
    Thread names are app-server metadata now, backed by the thread store and
    sqlite state database. Keeping a core `SetThreadName` op plus a rollout
    `thread_name_updated` event made rename persistence live in the wrong
    layer and required historical replay support for an event that new
    app-server flows should not write.
    
    ## What changed
    
    - Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the
    core protocol and deleted the core handler path that appended rename
    events to rollouts.
    - Updated app-server `thread/name/set` so both loaded and unloaded
    threads write through thread-store metadata and app-server emits
    `thread/name/updated` notifications.
    - Updated local thread-store name metadata updates to write sqlite title
    metadata and the legacy thread-name index without appending rollout
    events.
    - Removed state extraction and rollout handling for the deleted
    thread-name event.
    
    ## Validation
    
    - `cargo test -p codex-app-server thread_name_updated_broadcasts`
    - `cargo test -p codex-app-server
    thread_name_set_is_reflected_in_read_list_and_resume`
    - `cargo test -p codex-thread-store
    update_thread_metadata_sets_name_on_active_rollout_and_indexes_name`
    - `cargo test -p codex-state`
    - `cargo check -p codex-mcp-server -p codex-rollout-trace`
    - `just fix -p codex-app-server -p codex-thread-store -p codex-state -p
    codex-mcp-server -p codex-rollout-trace`
    
    ## Docs
    
    No external documentation update is expected for this internal ownership
    change.
  • feat: add remote compaction v2 Responses client path (#20773)
    ## Why
    
    This adds the `remote_compaction_v2` client path so remote compaction
    can run through the normal Responses stream and install a
    `context_compaction` item that trigger a compaction.
    
    The goal is to migrate some of the compaction logic on the client side
    
    We keeps the v2 transport behind a feature flag while letting follow-up
    requests reuse the compacted context instead of falling back to the
    legacy compaction item shape.
    
    ## What changed
    
    - add `ResponseItem::ContextCompaction` and refresh the generated
    app-server / schema / TypeScript fixtures that expose response items on
    the wire
    - add `core/src/compact_remote_v2.rs` to send compaction through the
    standard streamed Responses client, require exactly one
    `context_compaction` output item, and install that item into compacted
    history
    - route manual compact and auto-compaction through the v2 path when
    `remote_compaction_v2` is enabled, while keeping the existing remote
    compaction path as the fallback
    - preserve the new item type across history retention, follow-up request
    construction, telemetry, rollout persistence, and rollout-trace
    normalization
    - add targeted coverage for the feature flag, `context_compaction`
    serialization, rollout-trace normalization, and remote-compaction
    follow-up behavior
    
    ## Verification
    
    - added protocol tests for `context_compaction`
    serialization/deserialization in `protocol/src/models.rs`
    - added rollout-trace coverage for `context_compaction` normalization in
    `rollout-trace/src/reducer/conversation_tests.rs`
    - added remote compaction integration coverage for v2 follow-up reuse
    and mixed compaction output streams in
    `core/tests/suite/compact_remote.rs`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Emit image view as core item (#20512)
    ## Why
    
    Image-view results should be represented as a core-produced turn item
    instead of being reconstructed by app-server. At the same time, existing
    rollout/history paths still understand the legacy `ViewImageToolCall`
    event, so this keeps that event as compatibility output generated from
    the new item lifecycle.
    
    ## What changed
    
    - Added `TurnItem::ImageView` to `codex-protocol`.
    - Emitted image-view item start/completion directly from the core
    `view_image` handler.
    - Kept `ViewImageToolCall` as a legacy event and generate it from
    completed `TurnItem::ImageView` items.
    - Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay
    path, with `ImageView` item lifecycle events ignored there.
    - Updated app-server protocol conversion, rollout persistence, and
    affected exhaustive event matches for the new item plus legacy fan-out
    shape.
    
    ## Verification
    
    - `cargo test -p codex-protocol -p codex-app-server-protocol -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server -p
    codex-app-server --lib`
    - `cargo test -p codex-core --test all
    view_image_tool_attaches_local_image`
    - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
    -p codex-app-server -p codex-rollout -p codex-rollout-trace -p
    codex-mcp-server`
    - `git diff --check`
  • [codex] Remove unused event messages (#20511)
    ## Why
    
    Several legacy `EventMsg` variants were still emitted or mapped even
    though clients either ignored them or had moved to item/lifecycle
    events. `Op::Undo` had also degraded to an unavailable shim, so this
    removes that dead task path instead of preserving a command that cannot
    do useful work.
    
    `McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
    intentionally kept because useful consumers still depend on them: MCP
    startup completion drives readiness behavior, and the begin events let
    app-server/core consumers surface in-progress web-search and
    image-generation items before the final payload arrives.
    
    ## What Changed
    
    - Removed weak legacy event variants and payloads from `codex-protocol`,
    including legacy agent deltas, background events, and undo lifecycle
    events.
    - Kept/restored `EventMsg::McpStartupComplete`,
    `EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
    serializer and emission coverage.
    - Updated core, rollout, MCP server, app-server thread history,
    review/delegate filtering, and tests to rely on the useful replacement
    events that remain.
    - Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
    slash-command comments.
    - Stopped agent job/background progress and compaction retry notices
    from emitting `BackgroundEvent` payloads.
    
    ## Verification
    
    - `cargo check -p codex-protocol -p codex-app-server-protocol -p
    codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - `cargo test -p codex-protocol -p codex-app-server-protocol -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - `cargo test -p codex-core --test all suite::items`
    - `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
    -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
    - Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
    core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
    startup TUI tests.
  • [rollout-tracer] Match analysis messages on encrypted id. (#20123)
    In some setups the summary or raw content can be dropped between
    requests. This triggers a check in the reducer which expects that the
    messages should remain identical between requests.
    
    This PR relaxes the checks to only focus on the encrypted ID instead. It
    also changes the reducer to keep the most rich version of the message
    observed during the rollout (this ensures that we don't accidentally
    lose the CoT nor summary when available).
  • [rollout-trace] Include x-request-id in rollout trace. (#20066)
    ## Why
    
    Rollout traces need an identifier that can be used to correlate a Codex
    inference with upstream Responses API, proxy, and engine logs. The
    reduced trace model already exposed `upstream_request_id`, but it was
    being populated from the Responses API `response.id`. That value is
    useful for `previous_response_id` chaining, but it is not the transport
    request id that upstream systems key on.
    
    This PR separates those concepts so trace consumers can reliably answer
    both questions:
    
    - which Responses API response did this inference produce?
    - which upstream request handled it?
    
    ## Structure
    
    The change keeps the upstream request id at the same lifecycle level as
    the provider stream:
    
    - `codex-api` captures the `x-request-id` HTTP response header when the
    SSE stream is created and exposes it on `ResponseStream`. Fixture and
    websocket streams set the field to `None` because they do not have that
    HTTP response header.
    - `codex-core` carries that stream-level id into `InferenceTraceAttempt`
    when recording terminal stream outcomes. Completed, failed, cancelled,
    dropped-stream, and pre-response error paths all record the id when it
    is available.
    - `rollout-trace` now records both identifiers in raw terminal inference
    events and response payloads: `response_id` for the Responses API
    `response.id`, and `upstream_request_id` for `x-request-id`.
    - The reducer stores both fields on `InferenceCall`. It also uses
    `response_id` for `previous_response_id` conversation linking, which
    removes the old accidental dependency on the misnamed
    `upstream_request_id` field.
    - Terminal inference reduction now consumes the full terminal payload
    (`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
    one place. That keeps status, partial payloads, response ids, and
    upstream request ids consistent across success, failure, cancellation,
    and late stream-mapper events.
    
    ## Why This Shape
    
    `x-request-id` is a property of the HTTP/provider response envelope, not
    an SSE event. Capturing it once in `codex-api` and plumbing it through
    terminal trace recording avoids trying to infer the value from stream
    contents, and it preserves the id even when the stream fails or is
    cancelled after only partial output.
    
    Keeping `response_id` separate from `upstream_request_id` also makes the
    reduced trace model less surprising: `response_id` remains the
    conversation-continuation id, while `upstream_request_id` is the
    operational correlation id for upstream debugging.
    
    ## Validation
    
    The PR updates trace and reducer coverage for:
    
    - reading `x-request-id` from SSE response headers;
    - storing the true upstream request id on completed inference calls;
    - preserving upstream request ids for cancelled and late-cancelled
    inference streams;
    - keeping `previous_response_id` reconstruction tied to `response_id`
    rather than transport request ids.
  • [codex] Trace cancelled inference streams (#19839)
    Records cancelled inference streams when Codex stops consuming a
    provider response before `response.completed`, preserving complete
    output items observed before cancellation.
    
    Also closes still-running inference calls when the owning turn ends, so
    reduced rollout traces do not leave stale `Running` inference nodes.
    
    Covered by focused reducer coverage and a core stream-drop test for
    partial output preservation.
  • Add goal app-server API (2 / 5) (#18074)
    Adds the app-server v2 goal API on top of the persisted goal state from
    PR 1.
    
    ## Why
    
    Clients need a stable app-server surface for reading and controlling
    materialized thread goals before the model tools and TUI can use them.
    Goal changes also need to be observable by app-server clients, including
    clients that resume an existing thread.
    
    ## What changed
    
    - Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
    RPCs for materialized threads.
    - Added `thread/goal/updated` and `thread/goal/cleared` notifications so
    clients can keep local goal state in sync.
    - Added resume/snapshot wiring so reconnecting clients see the current
    goal state for a thread.
    - Added app-server handlers that reconcile persisted rollout state
    before direct goal mutations.
    - Updated the app-server README plus generated JSON and TypeScript
    schema fixtures for the new API surface.
    
    ## Verification
    
    - Added app-server v2 coverage for goal get/set/clear behavior,
    notification emission, resume snapshots, and non-local thread-store
    interactions.
  • permissions: make profiles represent enforcement (#19231)
    ## Why
    
    `PermissionProfile` is becoming the canonical permissions abstraction,
    but the old shape only carried optional filesystem and network fields.
    It could describe allowed access, but not who is responsible for
    enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
    when profiles were exported, cached, or round-tripped through app-server
    APIs.
    
    The important model change is that active permissions are now a disjoint
    union over the enforcement mode. Conceptually:
    
    ```rust
    pub enum PermissionProfile {
        Managed {
            file_system: FileSystemSandboxPolicy,
            network: NetworkSandboxPolicy,
        },
        Disabled,
        External {
            network: NetworkSandboxPolicy,
        },
    }
    ```
    
    This distinction matters because `Disabled` means Codex should apply no
    outer sandbox at all, while `External` means filesystem isolation is
    owned by an outside caller. Those are not equivalent to a broad managed
    sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
    inner sandbox may require the outer Codex layer to use no sandbox rather
    than a permissive one.
    
    ## How Existing Modeling Maps
    
    Legacy `SandboxPolicy` remains a boundary projection, but it now maps
    into the higher-fidelity profile model:
    
    - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
    with restricted filesystem entries plus the corresponding network
    policy.
    - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
    the “no outer sandbox” intent instead of treating it as a lax managed
    sandbox.
    - `ExternalSandbox { network_access }` maps to
    `PermissionProfile::External { network }`, preserving external
    filesystem enforcement while still carrying the active network policy.
    - Split runtime policies that legacy `SandboxPolicy` cannot faithfully
    express, such as managed unrestricted filesystem plus restricted
    network, stay `Managed` instead of being collapsed into
    `ExternalSandbox`.
    - Per-command/session/turn grants remain partial overlays via
    `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
    complete active runtime permissions.
    
    ## What Changed
    
    - Change active `PermissionProfile` into a tagged union: `managed`,
    `disabled`, and `external`.
    - Keep partial permission grants separate with
    `AdditionalPermissionProfile` for command/session/turn overlays.
    - Represent managed filesystem permissions as either `restricted`
    entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
    present.
    - Preserve old rollout compatibility by accepting the pre-tagged `{
    network, file_system }` profile shape during deserialization.
    - Preserve fidelity for important edge cases: `DangerFullAccess`
    round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
    and managed unrestricted filesystem + restricted network stays managed
    instead of being mistaken for external enforcement.
    - Preserve configured deny-read entries and bounded glob scan depth when
    full profiles are projected back into runtime policies, including
    unrestricted replacements that now become `:root = write` plus deny
    entries.
    - Regenerate the experimental app-server v2 JSON/TypeScript schema and
    update the `command/exec` README example for the tagged
    `permissionProfile` shape.
    
    ## Compatibility
    
    Legacy `SandboxPolicy` remains available at config/API boundaries as the
    compatibility projection. Existing rollout lines with the old
    `PermissionProfile` shape continue to load. The app-server
    `permissionProfile` field is experimental, so its v2 wire shape is
    intentionally updated to match the higher-fidelity model.
    
    ## Verification
    
    - `just write-app-server-schema`
    - `cargo check --tests`
    - `cargo test -p codex-protocol permission_profile`
    - `cargo test -p codex-protocol
    preserving_deny_entries_keeps_unrestricted_policy_enforceable`
    - `cargo test -p codex-app-server-protocol
    permission_profile_file_system_permissions`
    - `cargo test -p codex-app-server-protocol serialize_client_response`
    - `cargo test -p codex-core
    session_configured_reports_permission_profile_for_external_sandbox`
    - `just fix`
    - `just fix -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-core`
    - `just fix -p codex-app-server`
  • [rollout_trace] Add debug trace reduction command (#18880)
    ## Summary
    
    Adds the debug CLI entry point for reducing recorded rollout traces.
    This gives developers a direct way to inspect whether the emitted trace
    stream reduces into the expected conversation/runtime model.
    
    ## Stack
    
    This is PR 5/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is intentionally last: it depends on the trace crate, core
    recorder, runtime/tool events, and session/agent edge data all existing.
    The command should remain a debug/developer tool and avoid adding new
    runtime behavior.
    
    The useful review question is whether the CLI exposes the reducer in the
    smallest practical way for local inspection without turning the debug
    command into a supported user-facing workflow.
  • [rollout_trace] Trace tool and code-mode boundaries (#18878)
    ## Summary
    
    Extends rollout tracing across tool dispatch and code-mode runtime
    boundaries. This records canonical tool-call lifecycle events and links
    code-mode execution/wait operations back to the model-visible calls that
    caused them.
    
    ## Stack
    
    This is PR 3/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is about attribution. Reviewers should focus on whether direct
    tool calls, code-mode-originated tool calls, waits, outputs, and
    cancellation boundaries are recorded with enough source information for
    deterministic reduction without coupling the reducer to live runtime
    internals.
    
    The stack remains valid after this layer: tool and code-mode traces
    reduce through the existing crate model, while the broader session and
    multi-agent relationships are added in the next PR.
  • [rollout_trace] Record core session rollout traces (#18877)
    ## Summary
    
    Wires rollout trace recording into `codex-core` session and turn
    execution. This records the core model request/response, compaction, and
    session lifecycle boundaries needed for replay without yet tracing every
    nested runtime/tool boundary.
    
    ## Stack
    
    This is PR 2/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This layer is the first live integration point. The important review
    question is whether trace recording is isolated from normal session
    behavior: trace failures should not become user-visible execution
    failures, and recording should preserve the existing turn/session
    lifecycle semantics.
    
    The PR depends on the reducer/data model from the first stack entry and
    only introduces the core recorder surface that later PRs use for richer
    runtime and relationship events.
  • [rollout_trace] Add rollout trace crate (#18876)
    ## Summary
    
    Adds the standalone `codex-rollout-trace` crate, which defines the raw
    trace event format, replay/reduction model, writer, and reducer logic
    for reconstructing model-visible conversation/runtime state from
    recorded rollout data.
    
    The crate-level design is documented in
    [`codex-rs/rollout-trace/README.md`](https://github.com/openai/codex/blob/codex/rollout-trace-crate/codex-rs/rollout-trace/README.md).
    
    ## Stack
    
    This is PR 1/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR intentionally does not wire tracing into live Codex execution.
    It establishes the data model and reducer contract first, with
    crate-local tests covering conversation reconstruction, compaction
    boundaries, tool/session edges, and code-cell lifecycle reduction. Later
    PRs emit into this model.
    
    The README is the best entry point for reviewing the intended trace
    format and reduction semantics before diving into the reducer modules.