122 Commits

  • feat(protocol): define missing rollout turn items (#30282)
    ## Description
    
    This PR adds canonical core `TurnItem` shapes for command execution,
    dynamic tool calls, collab agent tool calls, and sub-agent activity, to
    be stored in the rollout file soon.
    
    It also teaches app-server protocol / `ThreadHistoryBuilder` how to
    render those items, and adds the small legacy fanout helpers needed for
    existing event-based consumers. No core producer or rollout persistence
    behavior changes here, that will be done in a followup.
    
    ## Making ThreadHistoryBuilder stateless
    
    This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
    enough that we can materialize app-server `ThreadItem`s from only a
    given slice of `RolloutItem` history, without ever needing to replay the
    whole thread from the beginning.
    
    The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
    like live UI events, not like materialized `ThreadItem`s. They work if
    we replay the full rollout in order, but they often do not contain
    enough stable identity or complete item state to project an arbitrary
    suffix on its own.
    
    A few examples:
    
    - `UserMessageEvent` and `AgentMessageEvent` have content, but
    historically do not carry the persisted app-server item ID that should
    become the SQLite primary key.
    - `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
    fragments. `ThreadHistoryBuilder` currently merges them into the last
    reasoning item, which means a slice starting in the middle of reasoning
    cannot know whether to append to an earlier item or create a new one.
    - `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
    similar legacy events can often render a final-looking item, but they
    usually rely on prior replay state to know which turn owns the item.
    - Begin/end legacy events are partial views of one logical item. The
    builder correlates them by `call_id` and mutates prior state to
    synthesize the final `ThreadItem`.
    
    That is the problem this direction fixes. A persisted canonical
    lifecycle record looks much closer to the read model we actually want
    later:
    
    ```rust
    ItemCompletedEvent {
        turn_id,
        item: TurnItem { id, ...full snapshot... },
        completed_at_ms,
    }
    ```
    
    Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
    completed item snapshot, the future SQLite projector can reduce only the
    new rollout suffix and upsert the affected `thread_items` rows. It no
    longer needs to synthesize `item-N`, infer item ownership from the
    active turn, or replay earlier events just to reconstruct the current
    item snapshot.
    
    ## What changed
    
    - Added core `TurnItem` variants and item structs for command execution,
    dynamic tool calls, collab agent tool calls, and sub-agent activity.
    - Added conversions from those canonical items back into the legacy
    event shapes where current consumers still need them.
    - Added app-server v2 `ThreadItem` conversion for the new core item
    variants.
    - Taught `ThreadHistoryBuilder` and rollout persistence metrics to
    recognize the new item variants.
    
    ## Follow-up
    
    The next PR https://github.com/openai/codex/pull/30283 switches the live
    core producers for these item families onto canonical `ItemStarted` /
    `ItemCompleted` events.
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • [2/3] core: persist world state in rollouts (#29835)
    ## Why
    
    `WorldState` currently remembers its model-visible diff baseline only in
    memory. That leaves no durable source for restoring the exact baseline
    after resume, fork, rollback, or compaction.
    
    This is the second PR in the WorldState persistence stack, built on
    #29833 and following #29249. It records durable state transitions; the
    next PR will replay them during rollout reconstruction.
    
    ## What
    
    - Add a `world_state` rollout item containing either a full snapshot or
    an RFC 7386 JSON Merge Patch.
    - Persist a full snapshot after initial context and after compaction
    establishes a new context window.
    - Persist non-empty patches when later sampling steps or turns advance
    the WorldState baseline.
    - Write model-visible history before its matching WorldState record, so
    an interrupted write can only cause a safe repeated update on replay.
    - Preserve WorldState records for full-history forks while excluding
    them from thread previews, metadata, and app-server history
    materialization.
    
    Older binaries read rollout lines independently, so they skip the
    unknown `world_state` records while retaining the rest of the thread.
    
    ## Testing
    
    - `just test -p codex-core
    snapshot_merge_patch_changes_and_removes_nested_values`
    - `just test -p codex-core
    world_state_baseline_deduplicates_until_history_is_replaced`
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-state`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server-protocol`
  • feat(app-server): list descendant threads by ancestor (#29591)
    ## Why
    
    `thread/list` can filter direct children with `parentThreadId`, but
    clients cannot request an entire spawned subtree. Discovering every
    descendant requires repeated client-side requests and gives up the
    database's existing filtering and pagination path.
    
    ## What changed
    
    Experimental clients can use `ancestorThreadId` to return strict
    descendants at any depth while `parentThreadId` retains its direct-child
    meaning. The filters are mutually exclusive, the ancestor is excluded,
    and every result preserves its immediate `parentThreadId` so callers can
    reconstruct the tree.
    
    ## How it works
    
    - **Explicit relationship:** Internal list parameters distinguish direct
    children from transitive descendants without changing the meaning of
    `parentThreadId`.
    - **Existing graph:** Persisted parent-child spawn edges remain the
    source of truth, so descendant lookup needs no schema migration or
    ancestry cache.
    - **Indexed traversal:** A recursive SQLite query starts from the
    parent-edge index, walks each generation, and applies thread filters,
    sorting, and cursor pagination in the same database request.
    - **Reconstructable results:** The response stays flat and normally
    ordered while carrying each descendant's immediate parent.
    
    ## Verification
    
    Ran 550 tests across the protocol, state, rollout, and thread-store
    crates, then reran the four focused state, store, and app-server
    descendant-listing tests after the final diff reduction. Scoped Clippy
    and formatting checks passed. Stable and experimental schema generation
    was checked; the stable fixtures remain unchanged while the experimental
    schema includes the new field.
  • Persist agent messages as response items (#29829)
    ## Why
    
    Inter-agent messages are recorded in live history as
    `ResponseItem::AgentMessage`, but rollouts stored
    `InterAgentCommunication` and rebuilt the response item during resume.
    This made the rollout differ from the actual Responses history.
    
    ## What changed
    
    - store the prepared `agent_message` response item directly
    - keep `trigger_turn` in a small local metadata record for fork
    truncation
    - keep reading older `inter_agent_communication` rollout items
  • Support thread-level originator overrides (#29477)
    ## Why
    
    Work(TPP) threads can be launched from the Desktop app, but if they all
    keep the Desktop app's default originator then downstream attribution
    cannot distinguish local Work launches from cloud-backed Work launches.
    `thread/start.serviceName` already carries that launch signal, while
    `SessionMeta.originator` is the durable thread-level value that survives
    resume and fork.
    
    This change converts the Desktop Work service names into an effective
    originator at thread creation time, persists that originator with the
    thread, and keeps using it for later model requests and memory writes.
    
    ## What changed
    
    - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
    per-thread originators, while preserving
    `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
    - Persist the effective originator in `SessionMeta.originator`, read it
    back on resume/fork, and inherit the parent originator for subagent
    spawns when there is no persisted session metadata.
    - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
    back to the live parent originator when the forked history no longer
    includes `SessionMeta`.
    - Thread the per-thread originator through Responses headers,
    websocket/compaction request paths, thread-store creation, rollout
    metadata, and memory stage-one telemetry.
    
    ## Verification
    
    - `just test -p codex-core
    agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
    agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
    thread_manager::tests::originator_override_precedes_service_name_remapping`
    - `just test -p codex-core
    agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
    - `just test -p codex-memories-write`
    - `just fix -p codex-core -p codex-memories-write`
    - `git diff --check`
  • core: persist initial context window metadata (#29519)
    ## Why
    
    PR #29494 made context-window IDs visible to the model by wrapping the
    token-budget window payload in `<context_window>`, but rollout JSONL
    consumers still could not see the initial window identity by tailing the
    session file. Compacted rollout items carry window IDs only after
    compaction has happened, so a session with no compaction had no durable
    JSONL record for window 0.
    
    This change gives tailing consumers a stable initial-window record at
    session creation time.
    
    ## What Changed
    
    - Added `session_meta.context_window.window_id` for the initial
    context-window identity.
    - `CreateThreadParams` now requires `initial_window_id: String`, so
    thread-store callers cannot accidentally create new threads without
    window-0 metadata.
    - Live thread creation derives the persisted initial window ID from the
    same `AutoCompactWindowIds` used to initialize `SessionState`, keeping
    runtime state and JSONL metadata aligned.
    - Rollout reconstruction uses `session_meta.context_window.window_id` as
    the initial-window fallback and derives `window_number = 0`,
    `first_window_id = window_id`, and `previous_window_id = None`
    internally.
    - Fork reconstruction intentionally uses the same rollout reconstruction
    path; consumers that need to distinguish copied initial-window metadata
    can use the rollout `thread_id`.
    - Legacy compactions without `window_number` still use compaction-count
    fallback accounting instead of being reset to window 0 by the
    initial-window fallback.
    - Compacted rollout metadata still takes precedence once compaction
    records exist, preserving the richer chain fields there.
    
    ## JSONL Shape
    
    Real rollout JSONL is one object per line. This example is expanded for
    readability, but shows the new initial `session_meta.context_window`
    record followed by the existing compacted rollout item shape that also
    carries window IDs:
    
    ```jsonl
    {
      "timestamp": "2026-06-22T12:00:00.000Z",
      "type": "session_meta",
      "payload": {
        "session_id": "<THREAD_ID>",
        "id": "<THREAD_ID>",
        "timestamp": "2026-06-22T12:00:00.000Z",
        "cwd": "/repo",
        "originator": "codex",
        "cli_version": "0.0.0",
        "source": "cli",
        "model_provider": "<MODEL_PROVIDER>",
        "context_window": {
          "window_id": "<INITIAL_WINDOW_ID>"
        }
      }
    }
    ...
    {
      "timestamp": "2026-06-22T12:34:56.000Z",
      "type": "compacted",
      "payload": {
        "message": "<COMPACTION_SUMMARY>",
        "replacement_history": [
          "..."
        ],
        "window_number": 1,
        "first_window_id": "<INITIAL_WINDOW_ID>",
        "previous_window_id": "<INITIAL_WINDOW_ID>",
        "window_id": "<NEXT_WINDOW_ID>"
      }
    }
    ```
    
    The nested `context_window` object is intentional: it gives rollout
    consumers a stable namespace for context-window metadata while only
    writing the non-derivable initial `window_id`. For the initial window,
    `window_number`, `first_window_id`, and `previous_window_id` are derived
    internally instead of being written to the rollout.
    
    ## Verification
    
    - `just test -p codex-protocol`
    - `just test -p codex-rollout
    recorder_materializes_on_flush_with_pending_items`
    - `just test -p codex-core reconstruct_history`
    - `just test -p codex-core
    record_initial_history_reconstructs_forked_transcript`
    - `just test -p codex-thread-store`
    - `just test -p codex-state`
    - `just test -p codex-app-server
    thread_read_returns_summary_without_turns`
    - `just test -p codex-rollout persistence_metrics`
  • Handle additional tools in rollout persistence metrics (#29669)
    ## Why
    
    The rollout persistence metrics added on current `main` exhaustively
    match `ResponseItem`, but omit `ResponseItem::AdditionalTools`. That
    prevents `codex-rollout` and downstream targets from compiling across
    Cargo and Bazel builds.
    
    ## What
    
    Map `ResponseItem::AdditionalTools` to the `response.additional_tools`
    metric label, consistent with the existing exact-variant labels.
    
    ## Validation
    
    - `just test -p codex-rollout` (76 passed)
    - `just fix -p codex-rollout`
  • [codex] Handle additional tools in rollout persistence metrics (#29672)
    ## Summary
    
    Handle `ResponseItem::AdditionalTools` in rollout persistence metrics.
    
    The persistence metrics match was added after the `AdditionalTools`
    variant and omitted it, causing release builds to fail with a
    non-exhaustive pattern error. This assigns the item the
    `response.additional_tools` metrics label.
    
    Release failure:
    https://github.com/openai/codex/actions/runs/28043786727/job/83016608475
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-rollout` (76 passed)
  • [codex] Instrument rollout persistence bytes (#29498)
    - Add 1%-sampled rollout persistence metrics that report per-item and
    per-thread JSON byte totals before and after filtering when metrics
    export is enabled.
    - Tag each item with its exact response or event variant, including
    nested turn-item kinds for conditionally persisted completion events, so
    aggregate cloud-storage impact can be estimated by policy choice.
  • Share resumed rollout history (#28426)
    ## Summary
    
    Resuming a persisted thread currently deep-clones its complete rollout
    history several times. `InitialHistory` is retained for the app-server
    response, copied into thread persistence, and copied again by read-only
    accessors. These copies scale with the complete rollout rather than the
    bounded model context and add measurable latency for large sessions.
    
    This change stores resumed rollout history in `Arc<Vec<RolloutItem>>`.
    Rollout loading wraps the parsed vector once, while app-server response
    construction, session initialization, and thread persistence share it
    through inexpensive `Arc` clones. Read-only history access now returns a
    borrowed slice, and fork paths use `Arc::unwrap_or_clone` where they
    genuinely need mutable ownership. Rollout reconstruction also consumes
    its temporary context instead of cloning the reconstructed model
    history.
    
    The serialized representation remains unchanged. In an artificial 123 MB
    rollout benchmark, sharing resumed history reduced cold resume latency
    by roughly 9–10%. The affected crates compile with their test targets,
    all 80 thread-store tests pass, and the Bazel dependency lock remains
    valid.
  • [codex] Use input items for Responses Lite tools (#27946)
    When using Responses Lite, we should all use `additional_tools` and a
    developer item instead of the top level tools array & instructions
    field. This keeps things 1-to-1.
    
    Forced namespacing for _all_ tools will land in a following PR after
    some coordination & fixes in Responses API (around collisions & return
    items).
    
    The goal is to eventually expand the scope of this to _all_ requests
    from codex, but that will require larger coordination across providers &
    slower rollout.
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Persist session IDs across thread resume (#29327)
    ## Summary
    
    A cold-resumed subagent kept its durable thread ID but could receive a
    new session ID, splitting one agent tree across multiple sessions after
    a restart.
    
    Persist the root session ID in every rollout `SessionMeta`, carry it
    through thread creation, and restore it before initializing the resumed
    `Session` and `AgentControl`.
    
    ## Behavior
    
    For a nested agent tree:
    
    ```text
    root session R
      parent thread P
        child thread C
    ```
    
    The child rollout stores:
    
    ```text
    session_id:       R
    parent_thread_id: P
    id:               C
    ```
    
    After a cold resume, the child still belongs to root session `R` while
    its immediate parent remains `P`. The integration coverage uses distinct
    values for all three IDs so it catches restoring the session from
    `parent_thread_id`.
    
    ## Legacy rollouts
    
    Previous rollouts have `id` but no `session_id`. `SessionMetaLine`
    deserialization treats a missing `session_id` as `id`, keeping those
    files readable, listable, and resumable. When a legacy subagent is
    resumed through its root, that synthesized child ID no longer overrides
    the inherited root-scoped `AgentControl`. New rollouts always persist
    the explicit root session ID.
  • Propagate safety buffering events to app-server clients (#29371)
    Responses API safety buffering metadata currently stops at the transport
    boundary, so app-server clients cannot render the in-progress safety
    review state.
    
    This change:
    - decodes and deduplicates `safety_buffering` metadata from Responses
    API SSE and WebSocket events without suppressing the original response
    event
    - emits a typed core event containing the requested model plus backend
    use cases and reasons
    - forwards that event as `turn/safetyBuffering/updated` through
    app-server v2 and updates generated protocol schemas
    - keeps the side-channel event out of persisted rollouts and turn timing
    
    This supports the Codex Apps buffering UX and depends on the Responses
    API backend work in https://github.com/openai/openai/pull/1044569 and
    https://github.com/openai/openai/pull/1044571.
    
    Validation:
    - focused `codex-core` safety-buffering integration test passes
    - `cargo check -p codex-core -p codex-app-server -p
    codex-app-server-protocol`
    - `just fix -p codex-api -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-rollout -p
    codex-rollout-trace -p codex-otel`
    - `just fmt`
    - broad package test run: 4,430/4,492 passed; 62 unrelated
    local-environment/concurrency failures involved unavailable test
    binaries, MCP subprocess setup, and app-server timeouts
  • core: add context window lineage IDs (#29256)
    ## Why
    
    The rendered `<token_budget>` fragment identifies the thread and current
    context window, but it does not expose enough lineage to identify the
    first window in the thread or the immediately preceding window. Those
    IDs also need to remain stable across compaction, resume, and rollback.
    
    ## What changed
    
    - Track first, previous, and current UUIDv7 context-window IDs in
    auto-compaction state.
    - Render `thread_id`, `first_window_id`, `previous_window_id`, and the
    current window ID in the full `<token_budget>` fragment.
    - Persist the first and previous window IDs in compacted rollout
    checkpoints and restore them during rollout reconstruction.
    - Preserve compatibility with older compacted records that do not
    contain the new optional fields.
    - Update focused state, rendering, reconstruction, rollback, and
    serialization coverage.
    
    ## Validation
    
    - `just test -p codex-core token_budget`
    - `just test -p codex-protocol compacted_item::tests`
    - `just test -p codex-core tracks_prefill_and_window_boundaries`
    - `just test -p codex-core
    reconstruct_history_uses_replacement_history_verbatim`
    - `just test -p codex-core
    thread_rollback_restores_cleared_reference_context_item_after_compaction`
  • Add per-turn multi-agent mode (#28685)
    ## Why
    
    Multi-agent v2 currently carries an explicit-request-only delegation
    rule in its static usage hint. That provides a safe default, but it
    prevents clients from selecting proactive delegation per turn without
    changing static guidance or rewriting prior model context.
    
    This change makes delegation mode a session selection that can be
    updated through `turn/start`, while deriving the effective model-visible
    mode separately for each turn. Eligible multi-agent v2 turns remain
    explicit-request-only unless proactive mode is both selected and
    enabled.
    
    ## What changed
    
    - Add the experimental `turn/start.multiAgentMode` parameter with
    `explicitRequestOnly` and `proactive` values. Omission retains the
    loaded session's current optional selection.
    - Add the default-off `features.multi_agent_mode` feature gate. Eligible
    multi-agent v2 turns use the selected mode when enabled; an unset
    selection or disabled gate resolves to `explicitRequestOnly`.
    - Treat mode prompting as inapplicable for multi-agent v1 and other
    unsupported session configurations, producing no multi-agent mode
    developer message rather than rejecting the turn.
    - Move the explicit-request-only rule out of the static v2 usage hint
    and into a bounded, tagged developer context fragment.
    - Emit the effective mode in initial context and only when that
    effective mode changes on later turns.
    - Persist the effective mode in `TurnContextItem` as the durable
    baseline for resume and context-update comparisons.
    
    Historical rollout items are not rewritten. Later mode developer
    messages establish the current rule incrementally.
    
    ## Not covered
    
    - Initial selection through `thread/start` and selected-mode reporting
    from thread lifecycle/settings APIs; those are isolated in the stacked
    #28792.
    - A TUI control or slash command for selecting the mode.
    - Persisting a preferred mode to `config.toml`; selection remains
    session/turn scoped.
    - Changes to multi-agent concurrency limits, tool availability, or model
    catalog capability declarations.
    - Rewriting historical rollout prompt items. Cold resume restores the
    latest persisted effective mode when available while leaving historical
    developer messages intact.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-core multi_agent_mode`
    - Focused app-server coverage verifies that `turn/start.multiAgentMode`
    produces proactive developer instructions for an eligible v2 turn.
    
    ## Stack
    
    Followed by #28792, which adds `thread/start` initialization and
    lifecycle/settings observability.
  • core: add UUIDv7 context window IDs (#28953)
    ## Why
    
    The token-budget context currently identifies a context window by its
    thread-local sequence number. A UUIDv7 gives the model a stable opaque
    identity that remains fixed for a window and rotates when compaction or
    `new_context` starts the next one.
    
    ## What changed
    
    - Preserve the existing monotonic value as `window_number` and add a
    UUIDv7 `window_id` to `CompactedItem`.
    - Generate and rotate the UUID with auto-compaction window state,
    persist it alongside the number, and reconstruct it on resume and
    rollback.
    - Accept legacy compacted rollout records where the numeric `window_id`
    represented the window number.
    - Use the UUID only in token-budget context; existing request headers
    and metadata continue using `thread_id:window_number`.
    
    ## Testing
    
    - `just test -p codex-protocol compacted_item::tests`
    - `just test -p codex-core token_budget`
  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • [codex] core: restore absolute turn context cwd (#28629)
    ## Why
    
    #28152 jumped the gun on moving the rollout format to store URIs, and
    would likely break compat with some features that don't go through the
    same types as the core logic.
    
    ## What
    
    Make `TurnContextItem.cwd` an `AbsolutePathBuf` again, remove test added
    for `PathUri` serialization in rollouts. Also drops a bunch of error
    paths that are no longer needed.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • core: render remote environment cwd natively (#28152)
    ## Why
    
    Model-visible `<environment_context>` should match the environment of
    the executor, not of the app server.
    
    Stacked on #28146.
    
    ## What
    
    - Keep selected environment cwd values as `PathUri` while building
    environment context.
    - Render cwd text using the path convention represented by the URI, with
    the canonical URI as a fallback.
    - Preserve compatibility with legacy `TurnContextItem.cwd` values when
    reconstructing and diffing context.
    - Extend the Wine-backed remote Windows test to assert that the model
    sees `powershell` and `C:\windows`.
  • [codex] Compress cold active rollouts (#28338)
    ## Why
    
    The local rollout compression worker currently scans only
    `archived_sessions`, so cold unarchived thread history remains expanded
    indefinitely.
    
    ## What changed
    
    - Scan `sessions` after `archived_sessions` within the existing worker
    runtime budget.
    - Update rollout compression coverage to require both cold active and
    archived rollouts to be compressed while fresh active rollouts remain
    plain.
    
    The worker remains behind the disabled-by-default
    `local_thread_store_compression` feature, and the existing seven-day
    cold-file threshold is unchanged.
    
    ## Validation
    
    - `just test -p codex-rollout` (69 passed)
    - `just fmt`
    - `git diff --check`
  • [codex] Add interruptible sleep tool (#28429)
    ## Why
    
    Models sometimes need to pause briefly while waiting for external work,
    but using a shell command for that delay ties the wait to a process and
    does not naturally resume when new turn input arrives.
    
    ## What changed
    
    - add a built-in `sleep` tool behind the under-development `sleep_tool`
    feature
    - accept a bounded `duration_ms` argument, matching the millisecond
    convention used by unified exec
    - end the sleep early when either steered user input or mailbox input
    arrives
    - include elapsed wall-clock time in completed and interrupted outputs
    - emit a dedicated core `SleepItem` through `item/started` and
    `item/completed`
    - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain
    it in reconstructed thread history
    - regenerate the configuration schema for the new feature flag
    - regenerate app-server JSON and TypeScript schema fixtures
    
    ## Test plan
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • feat(app-server): filter threads by parent (#26662)
    ## Why
    
    Clients that display or coordinate spawned subagents need an
    authoritative snapshot of a thread's immediate spawned children when
    they connect to app-server or recover after missing live events.
    `thread/list` cannot query by parent, so clients must otherwise scan
    unrelated threads or reconstruct relationships from rollout history and
    transient events.
    
    The direct spawn relationship already exists in persisted
    `thread_spawn_edges` state. Review and Guardian threads do not
    participate in that lifecycle and are intentionally outside this
    filter's scope.
    
    ## What changed
    
    This adds an experimental `parentThreadId` filter to `thread/list`.
    Parent-filtered requests return direct spawned children from persisted
    state while preserving the existing response shape, explicit filters,
    sorting, and timestamp-only cursor behavior. The lookup does not read
    rollout transcripts or recursively return descendants.
    
    Supersedes #25112 with the narrower `thread/list` filter approach.
    
    ## How it works
    
    1. An experimental client passes a valid thread ID as `parentThreadId`.
    2. App-server routes the list through the existing thread-store and
    state-database boundaries.
    3. SQLite selects threads whose IDs have a direct persisted spawn edge
    from that parent.
    4. Omitted provider and source filters include all values; explicit
    filters keep ordinary `thread/list` semantics.
    5. Grandchildren, Review threads, and Guardian threads are excluded.
    
    ## Verification
    
    State (144 tests), rollout (69 tests), and focused app-server
    thread-list (31 tests) suites passed. Scoped Clippy checks and
    repository formatting also passed. Coverage includes direct spawned
    children, omitted grandchildren, pagination, malformed IDs, mixed source
    kinds, explicit filters, and operation without rollout files.
  • Support plaintext agent messages (#27830)
    ## Why
    
    Multi-agent v2 `send_message` deliveries already reach the receiving
    model as typed `agent_message` items with encrypted content.
    Child-completion notifications are generated by Codex itself, so their
    content is plaintext and previously fell back to a serialized JSON
    envelope inside an assistant message.
    
    With plaintext `input_text` supported for `agent_message`, both delivery
    paths can use the same model-visible type while preserving explicit
    author and recipient metadata.
    
    ## What changed
    
    - add plaintext `input_text` support to `AgentMessageInputContent` and
    regenerate the affected app-server schemas
    - preserve `InterAgentCommunication` as structured mailbox input instead
    of converting it to assistant text
    - record delivered communications as typed `agent_message` history items
    - persist a dedicated rollout item so local delivery metadata such as
    `trigger_turn` remains available without leaking into the Responses
    request
    - reconstruct typed agent messages on resume and preserve fork-turn
    truncation behavior
    - remove request-time assistant-content parsing
    - preserve plaintext and encrypted inter-agent deliveries in stage-one
    memory inputs
    - normalize and link plaintext and encrypted agent messages in rollout
    traces without treating inbound messages as child results
    - cover the real MultiAgent V2 child-completion path end to end with
    deterministic mailbox synchronization
    
    ## Verification
    
    - `just test -p codex-core
    plaintext_multi_agent_v2_completion_sends_agent_message`
    - `just test -p codex-core input_queue_drains_mailbox_in_delivery_order
    record_initial_history_reconstructs_typed_inter_agent_message
    fork_turn_positions_use_inter_agent_delivery_metadata`
    - `just test -p codex-memories-write
    serializes_inter_agent_communications_for_memory`
    - `just test -p codex-rollout-trace
    agent_messages_preserve_routing_and_content
    sub_agent_started_activity_creates_spawn_edge`
    - `just test -p codex-rollout-trace
    agent_result_edge_falls_back_to_child_thread_without_result_message`
    - `just test -p codex-protocol -p codex-rollout -p
    codex-app-server-protocol`
  • [codex] Remove async_trait from first-party code (#27475)
    ## Why
    
    First-party async traits should expose their `Send` contracts explicitly
    without requiring `async_trait`. This completes the migration pattern
    established in #27303 and #27304.
    
    ## What changed
    
    - Replaced the remaining first-party `async_trait` traits with native
    return-position `impl Future + Send` where statically dispatched and
    explicit boxed `Send` futures where object safety is required.
    - Kept implementations behavior-preserving, outlining existing async
    bodies into inherent methods where that keeps the diff reviewable.
    - Removed all direct first-party `async-trait` dependencies and the
    workspace dependency declaration.
    - Added a cargo-deny policy that permits `async-trait` only through the
    remaining transitive wrapper crates.
    - Updated `rand` from 0.8.5 to 0.8.6 to resolve RUSTSEC-2026-0097 and
    keep the full cargo-deny check passing.
    
    ## Validation
    
    - `just test -p codex-exec-server`: 216 passed, 2 skipped.
    - `just test -p codex-model-provider`: 39 passed.
    - `just test -p codex-core` and `just test`: changed tests passed;
    remaining failures are environment-sensitive suites unrelated to this
    migration.
    - `cargo deny check`
    - `just fix`
    - `just fmt`
    - `cargo shear`
    - `just bazel-lock-check`
  • [codex] Compact when comp_hash changes (#27520)
    ## Summary
    - snapshot `comp_hash` into `TurnContext` when the turn is created and
    use that snapshot as the downstream source of truth
    - persist the turn hash in rollout context and recover it into
    previous-turn settings during resume and fork replay
    - compact existing history with the previous model only when both
    adjacent turns provide hashes and the values differ
    - record `comp_hash_changed` as the compaction reason
    - cover ordinary transitions, resume, and missing-hash compatibility
    with end-to-end tests
    
    ## Why
    History produced under one compaction-compatible model configuration may
    not be safe to carry directly into another. Compacting at the turn
    boundary converts that history before context updates and the new user
    message are added. Persisting the turn snapshot in `TurnContextItem`
    makes the same protection work after resuming a rollout.
    
    A missing hash is not treated as evidence of incompatibility. `None →
    Some`, `Some → None`, and `None → None` do not trigger compaction; only
    `Some(previous) → Some(current)` with unequal values does.
    
    ## Stack
    - depends on #27532
    - #27532 is based directly on `main`
    
    ## Testing
    - `just test -p codex-core pre_sampling_compact_` — 6 passed
    - `just test -p codex-core
    turn_context_item_uses_turn_context_comp_hash_snapshot` — passed
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • fix: Auto-recover from corrupted sqlite databases (#26859)
    Further investigation of the sqlite incidents showed that the problems
    are due to corruption from the older version of SQLite that we recently
    upgraded, and that the data is truly corrupted in the root database --
    recovery of all data is not possible. Given that the data is
    reconstructable from the rollouts on disk, we should just auto-backup
    the database and let codex rebuild the rollout info from the disk
    rollouts.
    
    The new behavior is that appserver auto-backs-up and rebuilds (with logs
    reflecting that behavior). The CLI now pops a message letting you know
    this happened and the paths of the backed-up corrupt db and the new
    database. There is also context added so that the desktop app can read
    the rebuild info from it and inform the user with it.
  • Add app-server thread/delete API (#25018)
    ## Why
    
    Clients can archive and unarchive threads today, but there is no
    app-server API for permanently removing a thread. Deletion also needs to
    cover the full session tree: deleting a main thread should remove
    spawned subagent threads and the related local metadata instead of
    leaving orphaned rollout files, goals, or subagent state behind.
    
    ## What
    
    - Adds the v2 `thread/delete` request and `thread/deleted` notification,
    with the response shape kept consistent with `thread/archive`.
    - Implements local hard delete for active and archived rollout files.
    - Deletes the requested thread's state DB row as the commit point, then
    best-effort cleans associated state including spawned descendants,
    goals, spawn edges, logs, dynamic tools, and agent job assignments.
    - Updates app-server API docs and generated protocol schema/TypeScript
    fixtures.
  • Fix compressed rollout search path matching (#27407)
    ## Why
    
    `thread/search` found content inside compressed rollouts but could drop
    the result when joining it with SQLite-backed thread metadata. Search
    returned the physical `.jsonl.zst` path while SQLite retained the
    logical `.jsonl` path, so exact path matching failed.
    
    ## What changed
    
    - Key rollout search matches by their canonical logical `.jsonl` path,
    independent of the on-disk representation.
    - Canonicalize thread-list paths before joining them with content-search
    matches.
    - Update compressed-rollout coverage to assert the logical-path
    contract.
    
    ## Validation
    
    - Ran `just fmt`.
    - Ran `git diff --check`.
    - Tests and Clippy were intentionally left to CI.
  • Reduce archive rollout lookup CPU (#27276)
    ## Why
    
    Archiving a thread can spike app-server CPU when the state DB does not
    have a usable rollout path. The archive path falls back to locating the
    rollout by thread id; because rollout filenames already contain the
    UUID, the cheap fallback should find the file directly before invoking
    broader file search.
    
    ## What Changed
    
    - In `codex-rs/rollout/src/list.rs`, try the exact rollout filename
    lookup before `codex-file-search`.
    - Keep fuzzy search as the final legacy fallback when no filename match
    is found.
    - Preserve the legacy fallback when the filename scan hits a traversal
    error, so an inaccessible stale subtree does not block lookup elsewhere.
    
    ## Verification
    
    - `just test -p codex-rollout`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server thread_archive`
  • [codex] Store compact window id in rollout (#27264)
    ## Why
    
    Compaction window identity is part of session history, not model-client
    transport state. Persisting it with the compacted rollout item lets
    resumed threads continue from the reconstructed window without keeping
    mutable window state on `ModelClient`.
    
    ## What changed
    
    - Added `window_id` to `CompactedItem` and stamp it when
    `replace_compacted_history` installs compacted history.
    - Moved auto-compact window id ownership into `AutoCompactWindow` /
    `SessionState`; `ModelClient` now receives the request window id from
    callers instead of storing it.
    - Returned `window_id` from rollout reconstruction for resume.
    Reconstruction uses the newest surviving compacted item's stored
    `window_id` when present, and falls back to the legacy compacted-item
    count when it is absent.
    - Kept fork startup at the fresh default window id and updated direct
    model-client tests to pass explicit test window ids.
    
    ## Validation
    
    - `cargo check -p codex-core --tests`
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • Encrypt multi-agent v2 message payloads (#26210)
    ## Why
    
    Multi-agent v2 currently routes agent instructions through normal tool
    arguments and inter-agent context. That means the parent model can emit
    plaintext task text, Codex can persist it in history/rollouts, and the
    recipient can receive it as ordinary assistant-message JSON.
    
    This changes the v2 path so agent instructions stay encrypted between
    model calls: Responses encrypts the `message` argument returned by the
    model, Codex forwards only that ciphertext, and Responses decrypts it
    internally for the recipient model.
    
    ## What changed
    
    - Mark the v2 `message` parameter as encrypted for `spawn_agent`,
    `send_message`, and `followup_task`.
    - Treat multi-agent v2 tool `message` values as ciphertext
    unconditionally.
    - Store v2 inter-agent task text in
    `InterAgentCommunication.encrypted_content` with empty plaintext
    `content`.
    - Convert encrypted inter-agent communications into the Responses
    `agent_message` input item before sending the child request.
    - Preserve `agent_message` items across history, rollout, compaction,
    telemetry, and app-server schema paths.
    - Leave multi-agent v1 unchanged.
    
    ## Message shape
    
    The model still calls the v2 tools with a `message` argument, but that
    value is now ciphertext:
    
    ```json
    {
      "name": "spawn_agent",
      "arguments": {
        "task_name": "worker",
        "message": "<ciphertext>"
      }
    }
    ```
    
    Codex stores the task as encrypted inter-agent communication:
    
    ```json
    {
      "author": "/root",
      "recipient": "/root/worker",
      "content": "",
      "encrypted_content": "<ciphertext>",
      "trigger_turn": true
    }
    ```
    
    When Codex builds the recipient request, it forwards the ciphertext
    using the new Responses input item:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "encrypted_content",
          "encrypted_content": "<ciphertext>"
        }
      ]
    }
    ```
    
    Responses decrypts that item internally for the recipient model.
    
    ## Context impact
    
    - Parent context no longer carries plaintext v2 agent task instructions
    from these tool arguments.
    - Codex rollout/history stores ciphertext for v2 agent instructions.
    - Recipient requests receive an `agent_message` item instead of
    assistant commentary JSON for encrypted task delivery.
    - Plaintext completion/status notifications are still plaintext because
    they are Codex-generated status messages, not encrypted model tool
    arguments.
    
    ## Validation
    
    - `just test -p codex-tools`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-otel`
    - `just write-app-server-schema`
  • Persist multi-agent runtime metadata (#25721)
    Stack split from #25708. Original PR intentionally left open. This
    second PR persists multi-agent runtime metadata through thread creation,
    rollout recording, and thread storage.
  • Add multi-agent runtime metadata types (#25720)
    Stack split from #25708. Original PR intentionally left open. This first
    PR adds the multi-agent runtime metadata types and catalog plumbing used
    by the rest of the stack.
  • feat: reuse compressed rollout search snippets (#25814)
    ## Summary
    - teach rollout search to return precomputed snippets for compressed
    rollouts
    - reuse those snippets in local thread search instead of reopening
    matching compressed files
    - keep the no-`rg` fallback single-pass and add regression coverage for
    the compressed path
    
    ## Why
    `thread/search` currently decodes matching compressed rollouts twice:
    once to discover the matching path and again to extract the snippet
    shown in results. That defeats a meaningful part of the compressed-read
    optimization work.
    
    ## Impact
    Compressed rollout hits now pay one decode pass on the search path while
    plain `.jsonl` hits keep the existing ripgrep-driven flow.
    
    ## Validation
    - `just test -p codex-rollout`
    - `just test -p codex-thread-store`
    - `just fix -p codex-rollout`
    - `just fix -p codex-thread-store`
    - `just fmt`
  • app-server: remove experimental persist_extended_history bool flag (#25712)
    ## Summary
    
    Remove the dead experimental `persistExtendedHistory` app-server flag
    and collapse rollout persistence to the single policy app-server already
    used.
    
    ## What Changed
    
    - Removed `persistExtendedHistory` from v2 thread start/resume/fork
    params and deleted its deprecation notice path.
    - Removed the persistence-mode enums and plumbing through core, rollout,
    and thread-store.
    - Made rollout filtering mode-free, keeping the existing limited
    persisted-history behavior.
    
    ## Test Plan
    
    - `just write-app-server-schema`
    - `cargo nextest run --no-fail-fast -p codex-app-server-protocol
    schema_fixtures`
    - `cargo nextest run --no-fail-fast -p codex-app-server
    thread_shell_command_history_responses_exclude_persisted_command_executions`
    - `cargo nextest run --no-fail-fast -p codex-rollout -p
    codex-thread-store`
    - final `rg` for removed flag/type names
  • Reject directory rollout paths for pathless side chats (#25661)
    ## Why
    
    Fixes openai/codex#20944.
    
    Desktop side chats are intentionally ephemeral and pathless. They can
    still accept live turns while loaded, but after a reload there is no
    persisted rollout to resume. In the reported failure mode, Desktop could
    send `$CODEX_HOME` as the resume/fork path for one of these pathless
    side chats.
    
    `thread/resume` and `thread/fork` prefer an explicit `path` over
    `threadId`, and rollout path lookup only checked that a candidate
    existed. That let `$CODEX_HOME` pass as a rollout path, so the later
    rollout reader tried to open a directory and surfaced the low-level `Is
    a directory` error.
    
    ## What Changed
    
    - Reject explicit rollout paths that resolve to a directory or other
    non-file before attempting to read rollout history.
    - Make `codex_rollout::existing_rollout_path` return only plain or
    compressed rollout candidates that are actual files.
    - Add an app-server regression test that creates an ephemeral fork, runs
    a turn while the side thread is loaded, simulates reload, then verifies
    both `thread/resume` and `thread/fork` reject `$CODEX_HOME` with `path
    is a directory` instead of the OS-level directory-read error.
    - Rebase over the `TestAppServer` rename and update the remaining stale
    test harness call sites to use `TestAppServer` with `app_server` local
    variables.
    
    Relevant code:
    
    - `thread-store/src/local/read_thread.rs` validates explicit rollout
    paths before rollout reading:
    https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/thread-store/src/local/read_thread.rs#L146-L165
    - `rollout/src/compression.rs` now requires file metadata for plain and
    compressed rollout candidates:
    https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/rollout/src/compression.rs#L940-L950
    - The repro test covers the pathless ephemeral side-chat reload case:
    https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/app-server/tests/suite/v2/thread_fork.rs#L774-L886
    
    ## Verification
    
    - `just test -p codex-app-server
    pathless_ephemeral_thread_rejects_codex_home_path_after_reload`
  • Add rollout compression histograms (#25680)
    ## Summary
    
    Stacked on #25679. Add histogram telemetry for rollout compression
    runtime, per-file compression time, byte sizes, and compression ratio.
    
    ## Changes
    
    - Emit `codex.rollout_compression.run.duration_ms` tagged by final run
    status.
    - Emit `codex.rollout_compression.file.duration_ms` tagged by file
    outcome.
    - Emit source and compressed byte histograms for compression
    candidates/results.
    - Emit `codex.rollout_compression.file.compression_ratio` for successful
    compressions, recorded as integer basis points.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-rollout`
    - `just fix -p codex-rollout`
  • Add rollout compression counters (#25679)
    ## Summary
    
    Add counter telemetry for the local rollout compression worker so we can
    see when it runs, why it skips, and how individual file/materialization
    paths resolve.
    
    ## Changes
    
    - Emit `codex.rollout_compression.run` with statuses for start,
    completion, failure, duplicate-run skip, and missing runtime skip.
    - Emit `codex.rollout_compression.file` outcomes for scanned,
    compressed, skipped, and failed compression candidates.
    - Emit `codex.rollout_compression.temp_cleanup` and
    `codex.rollout_compression.materialize` counters for cleanup and
    decompression paths.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-rollout`
    - `just fix -p codex-rollout`
  • Throttle repeated rollout compression runs (#25659)
    ## Why
    
    [#25089](https://github.com/openai/codex/pull/25089) introduced the
    background worker that compresses cold archived rollouts, and
    [#25654](https://github.com/openai/codex/pull/25654) made that pass
    faster once it starts. But the worker still deleted
    `rollout-compression.lock` on successful exit, so the existing six-hour
    staleness window only helped with overlapping or crashed workers. Each
    new local thread-store initialization could immediately rescan archived
    rollouts even if a full pass had just finished.
    
    This change keeps the existing marker around long enough to throttle
    redundant reruns. The worker is still best-effort, but it no longer does
    repeated startup scans when nothing new is eligible for compression.
    
    ## What Changed
    
    - Replace the drop-scoped `CompressionLock` with a
    `CompressionRunMarker` that claims the existing
    `.tmp/rollout-compression.lock` path and leaves it in place after
    success.
    - Reuse the existing six-hour staleness window to block both overlapping
    starts and immediate reruns, while still letting a stale marker be
    reclaimed.
    - Update the worker docs and debug logging to describe the new "already
    running or recently ran" behavior.
    - Extend the rollout compression tests to assert that a successful run
    leaves the marker behind and that a fresh marker suppresses a new run.
    
    ## Validation
    
    - `just test -p codex-rollout`
  • Parallelize cold rollout compression (#25654)
    ## Why
    
    [#25089](https://github.com/openai/codex/pull/25089) added the
    background worker for compressing cold archived rollouts, but the worker
    still processed files effectively one at a time: each compression job
    was sent to `spawn_blocking` and then awaited before the next file
    started. On machines with a backlog of archived rollouts, that makes
    catch-up slower than it needs to be even though the actual compression
    work already runs off the async runtime.
    
    ## What Changed
    
    - Queue rollout compression work in a `JoinSet` while directory
    traversal continues.
    - Cap the worker at two in-flight compression jobs so it can overlap
    compression without turning the background task into unbounded blocking
    work.
    - Drain pending jobs before returning, including the
    `read_dir.next_entry()` error path, so every launched job still
    contributes to the final `compressed`, `skipped`, and `failed` stats.
    - Treat task join failures the same way as compression failures in the
    worker's warning and failure accounting.
  • Compress cold local rollouts (#25089)
    ## Rollout compression stack
    
    This stack splits #24941 into reviewable steps for local rollout
    compression. The design is intentionally staged:
    
    1. Teach readers, listing, search, and lookup to understand compressed
    rollouts.
    2. Make append and resume paths materialize compressed rollouts back to
    plain JSONL before writing.
    3. Add a disabled-by-default worker that can compress cold archived
    rollouts behind `local_thread_store_compression`.
    
    The key invariant is that writers append to plain `.jsonl`. A
    `.jsonl.zst` file is a cold/read representation; if a write is needed,
    the compressed file is materialized back to plain JSONL first. Readers
    prefer plain `.jsonl` when both forms exist and can fall back to the
    compressed sibling during transitions.
    
    The worker is deliberately the last PR and remains behind an
    under-development feature flag. It currently scans only
    `archived_sessions`, not active `sessions`, because active sessions have
    the highest resume/append race risk. That means this stack does not yet
    compress most unarchived local history.
    
    ## Known race / follow-up
    
    The remaining unresolved design question is writer/compressor
    coordination. Even for archived rollouts, a resume or metadata update
    can append while the worker is replacing the plain file with
    `.jsonl.zst`; the current double-stat checks narrow but do not fully
    eliminate the window where a writer has opened the plain file before
    unlink. Do not treat the worker PR as production-ready until we either:
    
    - prevent append/resume paths from racing archived compression, or
    - introduce a shared representation/append lock or equivalent
    coordination.
    
    The first two PRs are useful independently: they make compressed
    rollouts readable and make append paths safely recover back to plain
    JSONL. The third PR isolates the worker behavior so that coordination
    issue is reviewable separately.
    
    ## Validation
    
    Focused local validation for the stack includes:
    
    - `just test -p codex-rollout`
    - `just test -p codex-thread-store` where thread-store paths were
    touched
    - `just test -p codex-features` for the feature flag slice
    - `just bazel-lock-check` after dependency graph changes
    - scoped `just fix -p ...` passes for changed crates
    
    CI is still the source of truth for the full platform matrix.
    
    ## This PR in the stack
    
    This is PR 3/3, based on #25088. It adds the under-development feature
    flag and starts the best-effort background worker when enabled. The
    worker currently compresses only cold archived rollouts, skips active
    sessions, verifies compressed output, preserves mtime and permissions,
    keeps a store-level lock heartbeat, and cleans stale temp files.
    
    Stack order:
    
    1. #25087: read compressed local rollouts.
    2. #25088: materialize compressed rollouts before append.
    3. This PR: add the disabled local compression worker.