9 Commits

  • [codex] add process-owned code-mode session client (#30112)
    ## Summary
    
    - add `ProcessOwnedCodeModeSessionProvider` and logical session
    generation/rebinding state
    - add the supervised child-process connection, reader/writer tasks, and
    driver state machine
    - make dropped execute/wait/open callers cancellation-safe with explicit
    ownership handoff and durable cleanup
    - validate cell/delegate lifecycle state and reject invalid protocol
    transitions
    - add end-to-end stdio coverage for delegates, cancellation, frame
    limits, child loss, stale generations, replacement, and long-lived
    sessions
    
    ## Why
    
    This final stage exposes the process-owned client only after the wire
    protocol, host-safe runtime, and standalone host are independently in
    place. Transport failure is fail-stop: the client closes local state,
    cancels callbacks, reaps the child, and lazily rebuilds a fresh host
    generation rather than transactionally recovering the old connection.
    
    ## Stack
    
    This is **4 of 4** in the process-owned code-mode session stack.
    
    - Depends on #30111
    - Full stack: #30108#30110#30111 → this PR
    
    ## Validation
    
    - `just test -p codex-code-mode -p codex-code-mode-host` — 86 passed
    - `just fix -p codex-code-mode`
    - `just fix -p codex-code-mode-host`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `bazel test //codex-rs/code-mode:code-mode-unit-tests
    //codex-rs/code-mode-host:code-mode-host-unit-tests
    //codex-rs/code-mode-host:code-mode-host-stdio-test
    //codex-rs/code-mode-protocol:code-mode-protocol-unit-tests` — 4/4
    passed
    - `just fmt`
  • [codex] add code-mode host failure supervision hooks (#30110)
    ## Why
    
    A process host should be discarded and rebuilt after critical actor or
    V8 failure, while the existing in-process production path must keep its
    current cell-error semantics. This change establishes that failure
    boundary without adding the host process or remote client.
    
    ## What changed
    
    - add optional task-failure supervision to the transport-neutral
    code-mode session runtime
    - report Tokio cell-actor failures and V8 runtime-thread panics to a
    host-provided fail-stop handler
    - preserve the existing handler-less in-process behavior
    - make host-owned cell ID allocation fail before numeric wraparound
    
    ## Follow-up
    
    The V8 panic signal surfaced here should also be consumed by the
    `InProcessCodeModeSession` manager in a future change so it can fail the
    affected cell. This PR intentionally leaves the handler-less in-process
    behavior unchanged while putting the required panic tracking in place.
    
    ## Stack
    
    This is **2 of 4** in the process-owned code-mode session stack.
    
    - #30108 is merged into `main`
    - The next PR targets this branch
    
    ## Validation
    
    - `just test -p codex-code-mode` — 53 passed
    - `just argument-comment-lint -p codex-code-mode`
    - `just fix -p codex-code-mode`
  • code-mode standalone: extract protocol and add host crate (#27724)
    This is phase 1 of a 4 phase stack:
    1. **Add protocol and host crates for new IPC code mode implementation**
    2. Create the new standalone binary
    3. Create a new IPC `CodeModeSessionProvider` to use new binary
    4. Remove v8 from core and only use IPC provider
    
    
    ## Add protocol and host crates for new IPC code mode implementation
    Establish a clean process boundary without changing the existing
    in-process behavior.
    
    - Add the codex-code-mode-protocol crate for shared session, runtime,
    response, and tool-definition types.
    - Move protocol-facing code out of the V8-backed implementation.
    - Add a buildable codex-code-mode-host crate as the foundation for the
    standalone process.
    - Keep the existing in-process runtime as the active implementation.
  • code-mode: introduce durable session interface (#24180)
    ## Summary
    
    Introduce a `CodeModeSession` interface for executing and managing
    code-mode cells.
    
    This moves cell lifecycle, callback delegation, termination, and
    shutdown behind a session abstraction, while continuing to use the
    existing in-process implementation, and the ability to implement an
    external process one behind this interface.
    
    A Codex session owns one `CodeModeSession`, which in turn owns its
    running cells and stored code-mode state. Each cell is represented to
    the caller as a `StartedCell`, exposing its cell ID and initial
    response.
    
    It also introduces a `CodeModeSessionDelegate` callback interface. A
    session uses the delegate to invoke nested host tools and emit
    notifications while a cell is running, allowing the runtime to
    communicate with its owning Codex session without depending directly on
    core turn handling.
    
    <img width="2121" height="1001" alt="image"
    src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
    />
  • Enable V8 sandboxing for source-built builds (#21146)
    ## Summary
    
    This is the first PR in the V8 in-process sandboxing rollout.
    
    It adds the build-system and Rust feature plumbing needed to support
    sandboxed V8 builds, then enables sandboxing by default for the
    source-built Bazel V8 path that we control directly. It deliberately
    keeps the published `rusty_v8` artifact workflows on their current
    non-sandboxed contract so this PR can land and ship independently before
    we change any released artifacts.
    
    ## Rollout plan
    
    - [x] **PR 1: land sandbox plumbing and default source-built Bazel V8 to
    sandboxed mode**
    
    - [ ] **PR 2: publish sandbox-enabled release artifacts and add
    compatibility validation**
    - Produce sandboxed artifact pairs for every released Cargo target that
    does not already use the source-built Bazel path.
    - Add CI coverage that consumes those sandboxed artifacts and verifies:
        - `codex-v8-poc` reports sandbox enabled
        - `codex-code-mode` builds/tests against the sandboxed path
    
    - [ ] **PR 3: switch release consumers to sandboxed artifacts by
    default**
      - Update released artifact selectors/checksums.
    - Enable the Rust `v8_enable_sandbox` feature in the default release
    path.
    - Make the sandboxed artifact family the normal path for published
    builds.
    
    - [ ] **PR 4: remove rollout-only compatibility paths**
    - Remove the temporary non-sandbox release compatibility config once the
    new default has shipped and baked.
      - Keep the invariant tests permanently.
  • refactor: use cloneable async channels for shared receivers (#18398)
    This is the first mechanical cleanup in a stack whose higher-level goal
    is to enable Clippy coverage for async guards held across `.await`
    points.
    
    The follow-up commits enable Clippy's
    [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
    lint and the configurable
    [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
    lint for Tokio guard types. This PR handles the cases where the
    underlying issue is not protected shared mutable state, but a
    `tokio::sync::mpsc::UnboundedReceiver` wrapped in `Arc<Mutex<_>>` so
    cloned owners can call `recv().await`.
    
    Using a mutex for that shape forces the receiver lock guard to live
    across `.await`. Switching these paths to `async-channel` gives us
    cloneable `Receiver`s, so each owner can hold a receiver handle directly
    and await messages without an async mutex guard.
    
    ## What changed
    
    - In `codex-rs/code-mode`, replace the turn-message
    `mpsc::UnboundedSender`/`UnboundedReceiver` plus `Arc<Mutex<Receiver>>`
    with `async_channel::Sender`/`Receiver`.
    - In `codex-rs/codex-api`, replace the realtime websocket event receiver
    with an `async_channel::Receiver`, allowing `RealtimeWebsocketEvents`
    clones to receive without locking.
    - Add `async-channel` as a dependency for `codex-code-mode` and
    `codex-api`, and update `Cargo.lock`.
    
    ## Verification
    
    - The split stack was verified at the final lint-enabling head with
    `just clippy`.
  • register all mcp tools with namespace (#17404)
    stacked on #17402.
    
    MCP tools returned by `tool_search` (deferred tools) get registered in
    our `ToolRegistry` with a different format than directly available
    tools. this leads to two different ways of accessing MCP tools from our
    tool catalog, only one of which works for each. fix this by registering
    all MCP tools with the namespace format, since this info is already
    available.
    
    also, direct MCP tools are registered to responsesapi without a
    namespace, while deferred MCP tools have a namespace. this means we can
    receive MCP `FunctionCall`s in both formats from namespaces. fix this by
    always registering MCP tools with namespace, regardless of deferral
    status.
    
    make code mode track `ToolName` provenance of tools so it can map the
    literal JS function name string to the correct `ToolName` for
    invocation, rather than supporting both in core.
    
    this lets us unify to a single canonical `ToolName` representation for
    each MCP tool and force everywhere to use that one, without supporting
    fallbacks.
  • [codex] Initialize ICU data for code mode V8 (#17709)
    Link ICU data into code mode, otherwise locale-dependent methods cause a
    panic and a crash.
  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>