Commit Graph

9 Commits

  • code-mode: move cell state into library actor (#28599)
    A code-mode cell is a single JavaScript execution that can produce
    output, call tools, wait for asynchronous work, resume, or be
    terminated. This PR extracts the existing per-cell run loop into a
    dedicated actor that owns the cell’s lifecycle state. It is primarily an
    ownership change rather than a new lifecycle contract: existing behavior
    now has one clear implementation boundary.
    
    ### Architecture
    The session service remains responsible for session-wide concerns:
    allocating cell IDs, storing shared values, creating cells, and routing
    requests to them.
    
    Once a cell is created, its execution state belongs to its actor.
    Callers interact with the actor through a handle. The actor receives two
    kinds of input: runtime events and control requests.
    
    A single event loop serializes these inputs and applies the lifecycle
    rules. It tracks the current observer—the caller waiting for an
    update—along with accumulated output, outstanding callbacks, runtime
    state, yield deadlines, and termination progress. Observation,
    termination, completion, and cleanup therefore have one consistent
    owner.
    
    When the runtime has no immediately runnable work and is waiting only on
    timers or tool results, the actor can return accumulated output and
    information about outstanding tool calls while keeping the cell
    available to resume. On completion or termination, it performs the
    appropriate callback cleanup before publishing the final result and
    removing the cell from the session.
    
    A small host interface connects the actor to session-owned facilities
    such as tool dispatch, notifications, stored values, and final cell
    removal, keeping those responsibilities outside the actor itself.
    
    ### Why
    Previously, cell lifecycle state and coordination lived alongside
    session management. The actor boundary makes each cell a self-contained
    state machine with a single writer, while the service becomes a registry
    and adapter around it.
    
    This makes lifecycle behavior easier to reason about and test in
    isolation. It also establishes a clean boundary for later changing where
    cells run or how they communicate without recreating their lifecycle
    rules.
  • code-mode standalone: extract protocol and add host crate (#27724)
    This is phase 1 of a 4 phase stack:
    1. **Add protocol and host crates for new IPC code mode implementation**
    2. Create the new standalone binary
    3. Create a new IPC `CodeModeSessionProvider` to use new binary
    4. Remove v8 from core and only use IPC provider
    
    
    ## Add protocol and host crates for new IPC code mode implementation
    Establish a clean process boundary without changing the existing
    in-process behavior.
    
    - Add the codex-code-mode-protocol crate for shared session, runtime,
    response, and tool-definition types.
    - Move protocol-facing code out of the V8-backed implementation.
    - Add a buildable codex-code-mode-host crate as the foundation for the
    standalone process.
    - Keep the existing in-process runtime as the active implementation.
  • code-mode: introduce durable session interface (#24180)
    ## Summary
    
    Introduce a `CodeModeSession` interface for executing and managing
    code-mode cells.
    
    This moves cell lifecycle, callback delegation, termination, and
    shutdown behind a session abstraction, while continuing to use the
    existing in-process implementation, and the ability to implement an
    external process one behind this interface.
    
    A Codex session owns one `CodeModeSession`, which in turn owns its
    running cells and stored code-mode state. Each cell is represented to
    the caller as a `StartedCell`, exposing its cell ID and initial
    response.
    
    It also introduces a `CodeModeSessionDelegate` callback interface. A
    session uses the delegate to invoke nested host tools and emit
    notifications while a cell is running, allowing the runtime to
    communicate with its owning Codex session without depending directly on
    core turn handling.
    
    <img width="2121" height="1001" alt="image"
    src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
    />
  • code-mode: Add pending-aware code mode execution (#22280)
    Introduce execute_to_pending and wait_to_pending APIs that freeze
    pending-mode runtimes until an explicit resume, while preserving the
    existing continuously-running execute path. Add runtime and service
    coverage for pending, resume, completion, and freeze behavior.
  • [rollout_trace] Trace tool and code-mode boundaries (#18878)
    ## Summary
    
    Extends rollout tracing across tool dispatch and code-mode runtime
    boundaries. This records canonical tool-call lifecycle events and links
    code-mode execution/wait operations back to the model-visible calls that
    caused them.
    
    ## Stack
    
    This is PR 3/5 in the rollout trace stack.
    
    - [#18876](https://github.com/openai/codex/pull/18876): Add rollout
    trace crate
    - [#18877](https://github.com/openai/codex/pull/18877): Record core
    session rollout traces
    - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
    code-mode boundaries
    - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
    and multi-agent edges
    - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
    reduction command
    
    ## Review Notes
    
    This PR is about attribution. Reviewers should focus on whether direct
    tool calls, code-mode-originated tool calls, waits, outputs, and
    cancellation boundaries are recorded with enough source information for
    deterministic reduction without coupling the reducer to live runtime
    internals.
    
    The stack remains valid after this layer: tool and code-mode traces
    reduce through the existing crate model, while the broader session and
    multi-agent relationships are added in the next PR.
  • Update image outputs to default to high detail (#18386)
    Do not assume the default `detail`.
  • Add output_schema to code mode render (#17210)
    This updates code-mode tool rendering so MCP tools can surface
    structured output types from their `outputSchema`.
    
    What changed:
    - Detect MCP tool-call result wrappers from the output schema shape
    instead of relying on tool-name parsing or provenance flags.
    - Render shared TypeScript aliases once for MCP tool results
    (`CallToolResult`, `ContentBlock`, etc.) so multiple MCP tool
    declarations stay compact.
    - Type `structuredContent` from the tool definition's `outputSchema`
    instead of rendering it as `unknown`.
    - Update the shared MCP aliases to match the MCP draft `CallToolResult`
    schema more closely.
    
    Example:
    - Before: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<{ _meta?: unknown; content:
    Array<unknown>; isError?: boolean; structuredContent?: unknown; }>; };`
    - After: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
    string; message: string; }): Promise<CallToolResult<{ echo: string; env:
    string | null; }>>; };`
  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>