42 Commits

  • [codex] wire process-owned code mode host into core (#30142)
    ## Summary
    
    - add the `code_mode_host` feature flag and select
    `ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
    - initialize code-mode sessions lazily so a missing host reports a tool
    error without failing thread startup
    - resolve `codex-code-mode-host` beside the running Codex binary by
    default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
    - add unit and end-to-end coverage for host resolution and graceful
    missing-host behavior
    
    ## Why
    
    This wires the process-owned session client from #30112 into the core
    service behind an opt-in rollout gate. Packaged Codex installations can
    place the helper in the same `bin` directory as the main executable
    without relying on `PATH`, while development and custom installations
    can continue to override the helper path.
    
    ## Stack
    
    - Depends on #30112
    - Base branch: `cconger/process-owned-session-runtime-4-client`
    
    ## Validation
    
    Build `codex` and `codex-code-mode-host`
    `CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
    ./target/debug/codex --enable code_mode_host`
  • [codex] add process-owned code-mode session client (#30112)
    ## Summary
    
    - add `ProcessOwnedCodeModeSessionProvider` and logical session
    generation/rebinding state
    - add the supervised child-process connection, reader/writer tasks, and
    driver state machine
    - make dropped execute/wait/open callers cancellation-safe with explicit
    ownership handoff and durable cleanup
    - validate cell/delegate lifecycle state and reject invalid protocol
    transitions
    - add end-to-end stdio coverage for delegates, cancellation, frame
    limits, child loss, stale generations, replacement, and long-lived
    sessions
    
    ## Why
    
    This final stage exposes the process-owned client only after the wire
    protocol, host-safe runtime, and standalone host are independently in
    place. Transport failure is fail-stop: the client closes local state,
    cancels callbacks, reaps the child, and lazily rebuilds a fresh host
    generation rather than transactionally recovering the old connection.
    
    ## Stack
    
    This is **4 of 4** in the process-owned code-mode session stack.
    
    - Depends on #30111
    - Full stack: #30108#30110#30111 → this PR
    
    ## Validation
    
    - `just test -p codex-code-mode -p codex-code-mode-host` — 86 passed
    - `just fix -p codex-code-mode`
    - `just fix -p codex-code-mode-host`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `bazel test //codex-rs/code-mode:code-mode-unit-tests
    //codex-rs/code-mode-host:code-mode-host-unit-tests
    //codex-rs/code-mode-host:code-mode-host-stdio-test
    //codex-rs/code-mode-protocol:code-mode-protocol-unit-tests` — 4/4
    passed
    - `just fmt`
  • [codex] add code-mode host failure supervision hooks (#30110)
    ## Why
    
    A process host should be discarded and rebuilt after critical actor or
    V8 failure, while the existing in-process production path must keep its
    current cell-error semantics. This change establishes that failure
    boundary without adding the host process or remote client.
    
    ## What changed
    
    - add optional task-failure supervision to the transport-neutral
    code-mode session runtime
    - report Tokio cell-actor failures and V8 runtime-thread panics to a
    host-provided fail-stop handler
    - preserve the existing handler-less in-process behavior
    - make host-owned cell ID allocation fail before numeric wraparound
    
    ## Follow-up
    
    The V8 panic signal surfaced here should also be consumed by the
    `InProcessCodeModeSession` manager in a future change so it can fail the
    affected cell. This PR intentionally leaves the handler-less in-process
    behavior unchanged while putting the required panic tracking in place.
    
    ## Stack
    
    This is **2 of 4** in the process-owned code-mode session stack.
    
    - #30108 is merged into `main`
    - The next PR targets this branch
    
    ## Validation
    
    - `just test -p codex-code-mode` — 53 passed
    - `just argument-comment-lint -p codex-code-mode`
    - `just fix -p codex-code-mode`
  • code-mode: Remove Session::is_alive() (#29732)
    Remove this unused API. This API is insidious in that it implies that
    alive state should be determinable from the caller, and implies that a
    preflight should indicate routing. Lets drop this, and handle errors
    correctly from a failed session in the future.
  • code-mode: Rename codex_code_mode::CodeModeService (#29716)
    Mechanical rename of CodeModeService => InProcessCodeModeSession
    
    This already implements a CodeModeSession as its prime interface to
    Core. The name was vestigial _and_ confusing af when embedded inside
    core::tools::code_mode::CodeModeService
  • code-mode: preserve initial yield at completion (#29289)
    ## Summary
    
    - Retain the first pre-observation `yield_control()` boundary when a
    cell completes before observation.
    - Deliver the preserved yield before the buffered completion.
    - Keep later unattached yields as no-ops.
    
    ## Why
    
    Create followed by the initial wait must preserve the former execute
    response boundary even when the script runs to completion first.
    
    ## Impact
    
    The first wait observes the same initial yield boundary as before create
    and observe were decoupled.
    
    ## Validation
    
    - Focused initial-yield signature regression passed.
    - Stack-tip validation: `just test -p codex-code-mode -p
    codex-code-mode-protocol` (70 passed).
    - Parent branch:
    `cconger/code-mode-runtime-compact-03e2-observation-delivery`.
  • code-mode: preserve dropped observation output (#29288)
    ## Summary
    
    - Restore yielded output when an observation receiver disappears before
    delivery.
    - Preserve pending-frontier output and tool IDs across failed delivery.
    - Add dropped-observer coverage for yield and pending observations.
    
    ## Why
    
    Canceling a wait must not consume output or a pending frontier that the
    caller never received.
    
    ## Impact
    
    A later observation can recover undelivered incremental output without
    duplication.
    
    ## Validation
    
    - Stack-tip validation: `just test -p codex-code-mode -p
    codex-code-mode-protocol` (70 passed).
    - Parent branch:
    `cconger/code-mode-runtime-compact-03e-shutdown-hierarchy`.
  • code-mode: make session shutdown authoritative (#29287)
    ## Summary
    
    - Give each session and cell a hierarchical cancellation token.
    - Track cell tasks so shutdown waits for admitted actors without polling
    the registry.
    - Make shutdown authoritative across concurrent admission and
    non-cooperative callbacks.
    
    ## Why
    
    A best-effort registry scan can miss cells admitted concurrently or
    blocked behind the registry lock.
    
    ## Impact
    
    Session shutdown reliably stops every admitted cell and rejects new work
    once shutdown begins.
    
    ## Validation
    
    - Stack-tip validation: `just test -p codex-code-mode -p
    codex-code-mode-protocol` (70 passed).
    - Parent branch: `cconger/code-mode-runtime-compact-03c-terminal-state`.
  • code-mode: linearize cell terminal state (#29286)
    ## Summary
    
    - Introduce a single cell terminal-state machine for completion and
    termination.
    - Make stored-value commits atomic with the winning terminal outcome.
    - Buffer terminal results for later observation and cover
    termination-before-commit behavior.
    
    ## Why
    
    Completion, termination, observation, and stored-value updates must
    agree on one linearized outcome under cancellation races.
    
    ## Impact
    
    Terminal delivery becomes deterministic and terminated cells cannot
    commit state after termination wins.
    
    ## Validation
    
    - Focused terminal-state regression passed.
    - Stack-tip validation: `just test -p codex-code-mode -p
    codex-code-mode-protocol` (70 passed).
    - Parent branch:
    `cconger/code-mode-runtime-compact-03b-session-runtime`.
  • code-mode: move session ownership into runtime (#29285)
    ## Summary
    
    - Move code-mode cell ownership and shared stored values from
    `CodeModeService` into `SessionRuntime`.
    - Keep the protocol-facing execute/wait behavior behind the existing
    service adapter.
    - Add runtime-level ownership and isolation coverage.
    
    ## Why
    
    This establishes a transport-neutral session boundary before later
    lifecycle and create/observe changes.
    
    ## Impact
    
    No intended model-facing behavior change. This is an ownership and
    layering refactor.
    
    ## Validation
    
    - Stack-tip validation: `just test -p codex-code-mode -p
    codex-code-mode-protocol` (70 passed).
    - Parent branch: `cconger/code-mode-runtime-compact-03a-runtime-types`.
  • code-mode: define transport-neutral runtime types (#29170)
    ## Summary
    
    - introduce a private `session_runtime` boundary for cell creation
    requests, observation modes, lifecycle events, output items, and tool
    metadata
    - update the cell actor and in-process service to use those
    transport-neutral types
    - keep cell ID allocation on the owning session side
    
    ## Motivation
    
    Cell lifecycle vocabulary currently lives inside the cell actor
    implementation. That makes the service adapter and future session
    runtime depend on actor-specific types, increasing the size and
    complexity of the runtime ownership change.
    
    This is the first reviewable slice of the session-runtime stack. It
    separates the transport-neutral data model without moving lifecycle
    ownership or changing behavior.
    
    Later slices will move session state behind this boundary, harden
    terminal and shutdown behavior, and split cell creation from
    observation.
    
    ## Behavior
    
    There are no public API or user-visible behavior changes in this PR.
    
    In particular:
    
    - `CodeModeSession::execute` and `wait` are unchanged
    - cell IDs remain allocated by the owning session
    - cell admission, observation, termination, and shutdown behavior are
    unchanged
  • code-mode: move cell state into library actor (#28599)
    A code-mode cell is a single JavaScript execution that can produce
    output, call tools, wait for asynchronous work, resume, or be
    terminated. This PR extracts the existing per-cell run loop into a
    dedicated actor that owns the cell’s lifecycle state. It is primarily an
    ownership change rather than a new lifecycle contract: existing behavior
    now has one clear implementation boundary.
    
    ### Architecture
    The session service remains responsible for session-wide concerns:
    allocating cell IDs, storing shared values, creating cells, and routing
    requests to them.
    
    Once a cell is created, its execution state belongs to its actor.
    Callers interact with the actor through a handle. The actor receives two
    kinds of input: runtime events and control requests.
    
    A single event loop serializes these inputs and applies the lifecycle
    rules. It tracks the current observer—the caller waiting for an
    update—along with accumulated output, outstanding callbacks, runtime
    state, yield deadlines, and termination progress. Observation,
    termination, completion, and cleanup therefore have one consistent
    owner.
    
    When the runtime has no immediately runnable work and is waiting only on
    timers or tool results, the actor can return accumulated output and
    information about outstanding tool calls while keeping the cell
    available to resume. On completion or termination, it performs the
    appropriate callback cleanup before publishing the final result and
    removing the cell from the session.
    
    A small host interface connects the actor to session-owned facilities
    such as tool dispatch, notifications, stored values, and final cell
    removal, keeping those responsibilities outside the actor itself.
    
    ### Why
    Previously, cell lifecycle state and coordination lived alongside
    session management. The actor boundary makes each cell a self-contained
    state machine with a single writer, while the service becomes a registry
    and adapter around it.
    
    This makes lifecycle behavior easier to reason about and test in
    isolation. It also establishes a clean boundary for later changing where
    cells run or how they communicate without recreating their lifecycle
    rules.
  • code-mode: extend test coverage to lock in cell lifecycle (#28468)
    This PR establishes the intended behavior as an executable contract
    before a refactor of the cell runtime begins. It also fixes cases where
    a second observer or termination request could replace an existing
    response channel and leave the original caller unresolved.
    
    ### Behavior codified
    - A cell can yield output and subsequently resume to completion.
    - A caller can run a cell until it has no immediately runnable work,
    receive its accumulated output and outstanding tool-call IDs, and then
    resume the same cell when the awaited work is available.
    - Each cell admits one active observer:
       - a second observer receives an explicit busy error
       - the existing observer remains registered and is not displaced
    - A natural result (conclusion of the js module) that has already
    reached the cell controller wins over a later termination request.
    - Otherwise, termination preempts execution and resolves both:
      - the active observer, if present
      - the caller requesting termination
    - Repeated termination requests are rejected while termination is
    already in progress.
    - Terminal responses are sent only after outstanding callback work has
    been handled:
    - natural completion drains notifications and cancels outstanding tool
    calls
    - termination cancels and drains both notification and tool callbacks.
    - Cell removal and cell_closed notification happen after callback
    cleanup
  • [code-mode] Reject remote image URLs from output helpers (#27732)
    ## Summary
    
    - reject HTTP(S) image URLs from the shared code-mode output-image
    normalization path
    - return a concise model-visible tool error so the model can recover on
    its next turn
    - apply the targeted rejection to both `image()` and `generatedImage()`
    - leave other non-empty image URL values to existing downstream handling
    
    The returned error is:
    
    > Tool call failed: remote image URLs are not supported in tool outputs.
    Pass a base64 data URI instead
    
    ## Why
    
    Responses Lite cannot lower a remote image URL emitted from a structured
    tool output. Rejecting HTTP(S) values in the Codex harness preserves the
    tool-call metadata and gives the model a recoverable next turn instead
    of invalidating the sample.
    
    ## Test coverage
    
    The regression is covered primarily by a `test_codex()` agent
    integration test that simulates the Responses API exchange and asserts
    the failed model-visible exec output. A supplemental runtime test covers
    both `http://` and `https://` inputs across both image output helpers.
    
    ## Test plan
    
    - `cd codex-rs && just test -p codex-code-mode`
    - `cd codex-rs && just test -p codex-code-mode-protocol`
    - `cd codex-rs && just test -p codex-core
    code_mode_image_helper_rejects_remote_url`
    - `cd codex-rs && just fmt`
    - `git diff --check origin/main...HEAD`
    
    Related context: https://github.com/openai/openai/pull/1022346
  • code-mode standalone: extract protocol and add host crate (#27724)
    This is phase 1 of a 4 phase stack:
    1. **Add protocol and host crates for new IPC code mode implementation**
    2. Create the new standalone binary
    3. Create a new IPC `CodeModeSessionProvider` to use new binary
    4. Remove v8 from core and only use IPC provider
    
    
    ## Add protocol and host crates for new IPC code mode implementation
    Establish a clean process boundary without changing the existing
    in-process behavior.
    
    - Add the codex-code-mode-protocol crate for shared session, runtime,
    response, and tool-definition types.
    - Move protocol-facing code out of the V8-backed implementation.
    - Add a buildable codex-code-mode-host crate as the foundation for the
    standalone process.
    - Keep the existing in-process runtime as the active implementation.
  • Add saved image path hint to standalone image generation (#25947)
    ## Why
    
    Standalone image generation returns image bytes to the model, but the
    model also needs the host artifact path to reference the generated file
    in follow-up work.
    
    ## What changed
    
    - Append the default saved-image path hint alongside the generated image
    tool output.
    - Reuse the existing core image-generation hint text.
    - Pass the thread ID and Codex home directory needed to compute the
    artifact path.
    - Add app-server and extension coverage for the model-visible hint.
    
    ## Validation
    
    - `just fmt`
    - `just bazel-lock-check`
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
  • [codex] Generalize deferred nested tool guidance (#25689)
    ## Summary
    - describe omitted code-mode tools as deferred nested tools instead of
    MCP/app tools
    - update the prompt-description assertion to match
    
    ## Why
    Deferred dynamic tools are also callable through `tools` and
    discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording
    was too narrow.
    
    ## Validation
    - `just fmt`
    - `just test -p codex-code-mode`
    - `git diff --check`
  • code-mode: introduce durable session interface (#24180)
    ## Summary
    
    Introduce a `CodeModeSession` interface for executing and managing
    code-mode cells.
    
    This moves cell lifecycle, callback delegation, termination, and
    shutdown behind a session abstraction, while continuing to use the
    existing in-process implementation, and the ability to implement an
    external process one behind this interface.
    
    A Codex session owns one `CodeModeSession`, which in turn owns its
    running cells and stored code-mode state. Each cell is represented to
    the caller as a `StartedCell`, exposing its cell ID and initial
    response.
    
    It also introduces a `CodeModeSessionDelegate` callback interface. A
    session uses the delegate to invoke nested host tools and emit
    notifications while a cell is running, allowing the runtime to
    communicate with its owning Codex session without depending directly on
    core turn handling.
    
    <img width="2121" height="1001" alt="image"
    src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
    />
  • [codex] Improve built-in tool schema docs (#24794)
    ## Summary
    - Clarify default, omission, and bounded behavior across built-in tool
    schemas, including unified exec, classic shell, Code Mode exec/wait,
    multi-agent, agent job, MCP resource, image, goal, plan, tool_search,
    and test-sync fields.
    - Convert update_plan status to an enum and add short field descriptions
    where the schema previously relied on surrounding context.
    - Remove the dedicated permission-approval schema test and keep only
    updates to existing expected-spec tests.
    
    ## Validation
    - Ran `just fmt`.
    - Ran `git diff --check`.
    - Did not run clippy or tests, per request.
    
    Regression has been eval
    [here](https://openai.slack.com/archives/C09GDSP1J9X/p1779905065496949)
    and we proved there are no regressions
  • Uprev Rust toolchain pins to 1.95.0 (#24684)
    ## Summary
    - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
    Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
    environment config.
    - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
    match the new version.
    - Leave purpose-specific toolchains unchanged, including the
    `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
    build pin.
    - Includes fixes for new lints from `just fix` and a few codex-authored
    fixes for lints without a suggestion.
  • Restore legacy image detail values (#24644)
    ## Why
    
    Older persisted rollouts can contain `input_image.detail` values of
    `auto` or `low` from before `ImageDetail` was narrowed to
    `high`/`original`. Current deserialization rejects those values, which
    can make resume skip later compacted checkpoints and reconstruct an
    oversized raw suffix before the next compaction attempt.
    
    Confirmed Sentry reports fixed by this compatibility path:
    
    - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
    - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
    - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
    - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
    
    ## Background
    
    [openai/codex#20693](https://github.com/openai/codex/pull/20693) added
    image-detail plumbing for app-server `UserInput` so input images could
    explicitly request `detail: original`. The Slack discussion behind that
    PR was about ScreenSpot / bridge evals where user input images were
    resized, while tool output images already had MCP/code-mode ways to
    request image detail.
    
    In review, the intended new API surface was narrowed to `high` and
    `original`: default to `high`, allow `original` when callers need
    unchanged image handling, and avoid encouraging new `auto` or `low`
    usage. That policy still makes sense for newly emitted values.
    
    The missing compatibility piece is persisted history. Older rollouts can
    already contain `auto` and `low`, and resume reconstructs typed history
    by deserializing those rollout records. Rejecting old values at that
    boundary causes valid compacted checkpoints to be skipped. This PR
    restores `auto` and `low` as real variants so old records deserialize
    and round-trip without being rewritten as `high`, while product paths
    can continue to default to `high` and avoid emitting `auto` for new
    behavior.
    
    ## What changed
    
    - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
    protocol values.
    - Preserved `auto`/`low` through rollout deserialization, MCP image
    metadata, code-mode image output, and schema/type generation.
    - Kept local image byte handling conservative: only `original` switches
    to original-resolution loading; `auto`/`low`/`high` continue through the
    resize-to-fit path while retaining their detail value.
    - Added regression coverage for enum round-tripping and code-mode `low`
    detail handling.
    
    ## Testing
    
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-tools`
    - `just test -p codex-code-mode`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
    - `just test -p codex-core
    suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
    - Loaded broken rollouts on local fixed builds, and started/completed
    new turns.
    
    I also attempted `just test -p codex-core`; the local broad run did not
    finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
    out. The failures were broad timeout/deadline failures across unrelated
    areas; targeted changed-path core tests above passed.
  • code-mode: merge stored values by key (#24159)
    ## Summary
    
    Change code-mode stored value updates to merge writes by key instead of
    replacing the session's complete stored-value map after each cell
    completes.
    
    Previously, each cell received a snapshot of stored values and returned
    the complete resulting map. When multiple cells ran concurrently, a
    later completion could overwrite values written by another cell because
    it committed an older snapshot.
    
    This change moves stored-value ownership into `CodeModeService`:
    
    - Each runtime starts from the service's current stored values.
    - Runtime completion reports only keys written by that cell.
    - The service merges those writes into the current stored-value map on
    successful completion.
    - Core no longer replaces its stored-value state from a cell result.
    
    As a result, concurrently executing cells can update different stored
    keys without clobbering one another.
    
    The move into CodeModeService is motivated by a desire to have this
    lifetime tied to a new lifetime object on that side in a subsequent PR.
  • Preserve image detail in app-server inputs (#20693)
    ## Summary
    
    - Add optional image detail to user image inputs across core, app-server
    v2, thread history/event mapping, and the generated app-server
    schemas/types.
    - Preserve requested detail when serializing Responses image inputs:
    omitted detail stays on the existing `high` default, while explicit
    `original` keeps local images on the original-resolution path.
    - Support `high`/`original` consistently for tool image outputs,
    including MCP `codex/imageDetail`, code-mode image helpers, and
    `view_image`.
  • clean up instructions (#22543)
    rm behavioral steering in tool docs for code mode.
  • code-mode: Add pending-aware code mode execution (#22280)
    Introduce execute_to_pending and wait_to_pending APIs that freeze
    pending-mode runtimes until an explicit resume, while preserving the
    existing continuously-running execute path. Add runtime and service
    coverage for pending, resume, completion, and freeze behavior.
  • code-mode: carry nested tool kind through runtime (#22377)
    ## Why
    
    Code mode only used nested spec lookup at execution time to rediscover
    whether a nested tool should be invoked as a function tool or a freeform
    tool.
    
    That information is already present in the enabled tool metadata that
    code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
    up from the router was redundant and kept execution coupled to a
    separate spec lookup path.
    
    ## What Changed
    
    - thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
    event and `CodeModeNestedToolCall`
    - emit the nested tool kind directly from the V8 callback using the
    already-enabled tool metadata
    - build nested tool payloads from the propagated kind instead of calling
    `find_spec`
    - remove the now-unused `find_spec` plumbing from the router and
    parallel runtime helpers
    - add unit coverage for function vs freeform payload shaping and update
    affected router tests
    
    ## Testing
    
    - `cargo test -p codex-code-mode`
    - `cargo test -p codex-core code_mode::tests`
    - `cargo test -p codex-core
    extension_tool_bundles_are_model_visible_and_dispatchable`
    - `cargo test -p codex-core
    model_visible_specs_filter_deferred_dynamic_tools`
  • Enable V8 sandboxing for source-built builds (#21146)
    ## Summary
    
    This is the first PR in the V8 in-process sandboxing rollout.
    
    It adds the build-system and Rust feature plumbing needed to support
    sandboxed V8 builds, then enables sandboxing by default for the
    source-built Bazel V8 path that we control directly. It deliberately
    keeps the published `rusty_v8` artifact workflows on their current
    non-sandboxed contract so this PR can land and ship independently before
    we change any released artifacts.
    
    ## Rollout plan
    
    - [x] **PR 1: land sandbox plumbing and default source-built Bazel V8 to
    sandboxed mode**
    
    - [ ] **PR 2: publish sandbox-enabled release artifacts and add
    compatibility validation**
    - Produce sandboxed artifact pairs for every released Cargo target that
    does not already use the source-built Bazel path.
    - Add CI coverage that consumes those sandboxed artifacts and verifies:
        - `codex-v8-poc` reports sandbox enabled
        - `codex-code-mode` builds/tests against the sandboxed path
    
    - [ ] **PR 3: switch release consumers to sandboxed artifacts by
    default**
      - Update released artifact selectors/checksums.
    - Enable the Rust `v8_enable_sandbox` feature in the default release
    path.
    - Make the sandboxed artifact family the normal path for published
    builds.
    
    - [ ] **PR 4: remove rollout-only compatibility paths**
    - Remove the temporary non-sandbox release compatibility config once the
    new default has shipped and baked.
      - Keep the invariant tests permanently.
  • Prune unused code-mode globals (#20542)
    Hide Atomics, SharedArrayBuffer, and WebAssembly from the code-mode
    runtime since the harness does not expose worker support or need those
    APIs.
  • [rollout_trace] Trace tool and code-mode boundaries (#18878)
    ## Summary
    
    Extends rollout tracing across tool dispatch and code-mode runtime
    boundaries. This records canonical tool-call lifecycle events and links
    code-mode execution/wait operations back to the model-visible calls that
    caused them.
    
    ## Stack
    
    This is PR 3/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is about attribution. Reviewers should focus on whether direct
    tool calls, code-mode-originated tool calls, waits, outputs, and
    cancellation boundaries are recorded with enough source information for
    deterministic reduction without coupling the reducer to live runtime
    internals.
    
    The stack remains valid after this layer: tool and code-mode traces
    reduce through the existing crate model, while the broader session and
    multi-agent relationships are added in the next PR.
  • Update image outputs to default to high detail (#18386)
    Do not assume the default `detail`.
  • refactor: use cloneable async channels for shared receivers (#18398)
    This is the first mechanical cleanup in a stack whose higher-level goal
    is to enable Clippy coverage for async guards held across `.await`
    points.
    
    The follow-up commits enable Clippy's
    [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
    lint and the configurable
    [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
    lint for Tokio guard types. This PR handles the cases where the
    underlying issue is not protected shared mutable state, but a
    `tokio::sync::mpsc::UnboundedReceiver` wrapped in `Arc<Mutex<_>>` so
    cloned owners can call `recv().await`.
    
    Using a mutex for that shape forces the receiver lock guard to live
    across `.await`. Switching these paths to `async-channel` gives us
    cloneable `Receiver`s, so each owner can hold a receiver handle directly
    and await messages without an async mutex guard.
    
    ## What changed
    
    - In `codex-rs/code-mode`, replace the turn-message
    `mpsc::UnboundedSender`/`UnboundedReceiver` plus `Arc<Mutex<Receiver>>`
    with `async_channel::Sender`/`Receiver`.
    - In `codex-rs/codex-api`, replace the realtime websocket event receiver
    with an `async_channel::Receiver`, allowing `RealtimeWebsocketEvents`
    clones to receive without locking.
    - Add `async-channel` as a dependency for `codex-code-mode` and
    `codex-api`, and update `Cargo.lock`.
    
    ## Verification
    
    - The split stack was verified at the final lint-enabling head with
    `just clippy`.
  • [code mode] defer mcp tools from exec description (#17287)
    ## Summary
    - hide deferred MCP/app nested tool descriptions from the `exec` prompt
    in code mode
    - add short guidance that omitted nested tools are still available
    through `ALL_TOOLS`
    - cover the code_mode_only path with an integration test that discovers
    and calls a deferred app tool
    
    ## Motivation
    `code_mode_only` exposes only top-level `exec`/`wait`, but the `exec`
    description could still include a large nested-tool reference. This
    keeps deferred nested tools callable while avoiding that prompt bloat.
    
    ## Tests
    - `just fmt`
    - `just fix -p codex-code-mode`
    - `just fix -p codex-tools`
    - `cargo test -p codex-code-mode
    exec_description_mentions_deferred_nested_tools_when_available`
    - `cargo test -p codex-tools
    create_code_mode_tool_matches_expected_spec`
    - `cargo test -p codex-core
    code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
  • Support original-detail metadata on MCP image outputs (#17714)
    ## Summary
    - honor `_meta["codex/imageDetail"] == "original"` on MCP image content
    and map it to `detail: "original"` where supported
    - strip that detail back out when the active model does not support
    original-detail image inputs
    - update code-mode `image(...)` to accept individual MCP image blocks
    - teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
    from raw MCP image outputs
    - document the new `_meta` contract and add generic RMCP-backed coverage
    across protocol, core, code-mode, and js_repl paths
  • register all mcp tools with namespace (#17404)
    stacked on #17402.
    
    MCP tools returned by `tool_search` (deferred tools) get registered in
    our `ToolRegistry` with a different format than directly available
    tools. this leads to two different ways of accessing MCP tools from our
    tool catalog, only one of which works for each. fix this by registering
    all MCP tools with the namespace format, since this info is already
    available.
    
    also, direct MCP tools are registered to responsesapi without a
    namespace, while deferred MCP tools have a namespace. this means we can
    receive MCP `FunctionCall`s in both formats from namespaces. fix this by
    always registering MCP tools with namespace, regardless of deferral
    status.
    
    make code mode track `ToolName` provenance of tools so it can map the
    literal JS function name string to the correct `ToolName` for
    invocation, rather than supporting both in core.
    
    this lets us unify to a single canonical `ToolName` representation for
    each MCP tool and force everywhere to use that one, without supporting
    fallbacks.
  • [codex] Initialize ICU data for code mode V8 (#17709)
    Link ICU data into code mode, otherwise locale-dependent methods cause a
    panic and a crash.
  • Add output_schema to code mode render (#17210)
    This updates code-mode tool rendering so MCP tools can surface
    structured output types from their `outputSchema`.
    
    What changed:
    - Detect MCP tool-call result wrappers from the output schema shape
    instead of relying on tool-name parsing or provenance flags.
    - Render shared TypeScript aliases once for MCP tool results
    (`CallToolResult`, `ContentBlock`, etc.) so multiple MCP tool
    declarations stay compact.
    - Type `structuredContent` from the tool definition's `outputSchema`
    instead of rendering it as `unknown`.
    - Update the shared MCP aliases to match the MCP draft `CallToolResult`
    schema more closely.
    
    Example:
    - Before: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<{ _meta?: unknown; content:
    Array<unknown>; isError?: boolean; structuredContent?: unknown; }>; };`
    - After: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<CallToolResult<{ echo: string; env:
    string | null; }>>; };`
  • Add setTimeout support to code mode (#16153)
    The implementation is less than ideal - it starts a thread per timer. A
    better approach might be to switch to tokio and use their timer
    imlementation.
  • chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
    ## Why
    
    `argument-comment-lint` was green in CI even though the repo still had
    many uncommented literal arguments. The main gap was target coverage:
    the repo wrapper did not force Cargo to inspect test-only call sites, so
    examples like the `latest_session_lookup_params(true, ...)` tests in
    `codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
    
    This change cleans up the existing backlog, makes the default repo lint
    path cover all Cargo targets, and starts rolling that stricter CI
    enforcement out on the platform where it is currently validated.
    
    ## What changed
    
    - mechanically fixed existing `argument-comment-lint` violations across
    the `codex-rs` workspace, including tests, examples, and benches
    - updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
    `tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
    `--all-targets` unless the caller explicitly narrows the target set
    - fixed both wrappers so forwarded cargo arguments after `--` are
    preserved with a single separator
    - documented the new default behavior in
    `tools/argument-comment-lint/README.md`
    - updated `rust-ci` so the macOS lint lane keeps the plain wrapper
    invocation and therefore enforces `--all-targets`, while Linux and
    Windows temporarily pass `-- --lib --bins`
    
    That temporary CI split keeps the stricter all-targets check where it is
    already cleaned up, while leaving room to finish the remaining Linux-
    and Windows-specific target-gated cleanup before enabling
    `--all-targets` on those runners. The Linux and Windows failures on the
    intermediate revision were caused by the wrapper forwarding bug, not by
    additional lint findings in those lanes.
    
    ## Validation
    
    - `bash -n tools/argument-comment-lint/run.sh`
    - `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
    - shell-level wrapper forwarding check for `-- --lib --bins`
    - shell-level wrapper forwarding check for `-- --tests`
    - `just argument-comment-lint`
    - `cargo test` in `tools/argument-comment-lint`
    - `cargo test -p codex-terminal-detection`
    
    ## Follow-up
    
    - Clean up remaining Linux-only target-gated callsites, then switch the
    Linux lint lane back to the plain wrapper invocation.
    - Clean up remaining Windows-only target-gated callsites, then switch
    the Windows lint lane back to the plain wrapper invocation.
  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>