Commit Graph

33 Commits

  • Code mode on v8 (#15276)
    Moves Code Mode to a new crate with no dependencies on codex. This
    create encodes the code mode semantics that we want for lifetime,
    mounting, tool calling.
    
    The model-facing surface is mostly unchanged. `exec` still runs raw
    JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
    are still available through `tools.*`, and helpers like `text`, `image`,
    `store`, `load`, `notify`, `yield_control`, and `exit` still exist.
    
    The major change is underneath that surface:
    
    - Old code mode was an external Node runtime.
    - New code mode is an in-process V8 runtime embedded directly in Rust.
    - Old code mode managed cells inside a long-lived Node runner process.
    - New code mode manages cells in Rust, with one V8 runtime thread per
    active `exec`.
    - Old code mode used JSON protocol messages over child stdin/stdout plus
    Node worker-thread messages.
    - New code mode uses Rust channels and direct V8 callbacks/events.
    
    This PR also fixes the two migration regressions that fell out of that
    substrate change:
    
    - `wait { terminate: true }` now waits for the V8 runtime to actually
    stop before reporting termination.
    - synchronous top-level `exit()` now succeeds again instead of surfacing
    as a script error.
    
    ---
    
    - `core/src/tools/code_mode/*` is now mostly an adapter layer for the
    public `exec` / `wait` tools.
    - `code-mode/src/service.rs` owns cell sessions and async control flow
    in Rust.
    - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
    JavaScript execution.
    - each `exec` spawns a dedicated runtime thread plus a Rust
    session-control task.
    - helper globals are installed directly into the V8 context instead of
    being injected through a source prelude.
    - helper modules like `tools.js` and `@openai/code_mode` are synthesized
    through V8 module resolution callbacks in Rust.
    
    ---
    
    Also added a benchmark for showing the speed of init and use of a code
    mode env:
    ```
    $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
    Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
         Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
    exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
    scenario       tools samples    warmups      iters      mean/exec       p95/exec       rssΔ p50       rssΔ max
    cold_exec          0      30          0          1         1.13ms         1.20ms        8.05MiB        8.06MiB
    warm_exec          0      30          1         25       473.43us       512.49us      912.00KiB        1.33MiB
    cold_exec         32      30          0          1         1.03ms         1.15ms        8.08MiB        8.11MiB
    warm_exec         32      30          1         25       509.73us       545.76us      960.00KiB        1.30MiB
    cold_exec        128      30          0          1         1.14ms         1.19ms        8.30MiB        8.34MiB
    warm_exec        128      30          1         25       575.08us       591.03us      736.00KiB      864.00KiB
    memory uses a fresh-process max RSS delta for each scenario
    ```
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Add remote env CI matrix and integration test (#14869)
    `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
    "remotely" (inside a docker container) turning any integration test into
    remote test.
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • Return image URL from view_image tool (#15072)
    Cleanup image semantics in code mode.
    
    `view_image` now returns `{image_url:string, details?: string}` 
    
    `image()` now allows both string parameter and `{image_url:string,
    details?: string}`
  • Propagate tool errors to code mode (#15075)
    Clean up error flow to push the FunctionCallError all the way up to
    dispatcher and allow code mode to surface as exception.
  • Add notify to code-mode (#14842)
    Allows model to send an out-of-band notification.
    
    The notification is injected as another tool call output for the same
    call_id.
  • Rename exec_wait tool to wait (#14983)
    Summary
    - document that code mode only exposes `exec` and the renamed `wait`
    tool
    - update code mode tool spec and descriptions to match the new tool name
    - rename tests and helper references from `exec_wait` to `wait`
    
    Testing
    - Not run (not requested)
  • Add exit helper to code mode scripts (#14851)
    - **Summary**
    - expose `exit` through the code mode bridge and module so scripts can
    stop mid-flight
      - surface the helper in the description documentation
      - add a regression test ensuring `exit()` terminates execution cleanly
    - **Testing**
      - Not run (not requested)
  • dynamic tool calls: add param exposeToContext to optionally hide tool (#14501)
    This extends dynamic_tool_calls to allow us to hide a tool from the
    model context but still use it as part of the general tool calling
    runtime (for ex from js_repl/code_mode)
  • Add code_mode_only feature (#14617)
    Summary
    - add the code_mode_only feature flag/config schema and wire its
    dependency on code_mode
    - update code mode tool descriptions to list nested tools with detailed
    headers
    - restrict available tools for prompt and exec descriptions when
    code_mode_only is enabled and test the behavior
    
    Testing
    - Not run (not requested)
  • code mode: single line tool declarations (#14526)
    ## Summary
    - render code mode tool declarations as single-line TypeScript snippets
    - make the JSON schema renderer emit inline object shapes for these
    declarations
    - update code mode/spec expectations to match the new inline rendering
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-core render_json_schema_to_typescript`
    - `cargo test -p codex-core code_mode_augments_`
    - `cargo test -p codex-core --test all exports_all_tools_metadata --
    --nocapture`
  • code_mode: Move exec params from runtime declarations to @pragma (#14511)
    This change moves code_mode exec session settings out of the runtime API
    and into an optional first-line pragma, so instead of calling runtime
    helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
    the model can write // @exec: {"yield_time_ms": ...,
    "max_output_tokens": ...} at the top of the freeform exec source. Rust
    now parses that pragma before building the source, validates it, and
    passes the values directly in the exec start message to the code-mode
    broker, which applies them at session start without any worker-runtime
    mutation path. The @openai/code_mode module no longer exposes those
    setter functions, the docs and grammar were updated to describe the
    pragma form, and the existing code_mode tests were converted to use
    pragma-based configuration instead.
  • Expose code-mode tools through globals (#14517)
    Summary
    - make all code-mode tools accessible as globals so callers only need
    `tools.<name>`
    - rename text/image helpers and key globals (store, load, ALL_TOOLS,
    etc.) to reflect the new shared namespace
    - update the JS bridge, runners, descriptions, router, and tests to
    follow the new API
    
    Testing
    - Not run (not requested)
  • Rename exec session IDs to cell IDs (#14510)
    - Update the code-mode executor, wait handler, and protocol plumbing to
    use cell IDs instead of session IDs for node communication
    - Switch tool metadata, wait description, and suite tests to refer to
    cell IDs so user-visible messages match the new terminology
    
    **Testing**
    - Not run (not requested)
  • Fix MCP tool calling (#14491)
    Properly escape mcp tool names and make tools only available via
    imports.
  • Skip nested tool call parallel test on Windows (#14505)
    **Summary**
    - disable the `code_mode_nested_tool_calls_can_run_in_parallel` test on
    Windows where `exec_command` is unavailable
    
    **Testing**
    - Not run (not requested)
  • Add parallel tool call test (#14494)
    Summary
    - pin tests to `test-gpt-5.1-codex` so code-mode suites exercise that
    model explicitly
    - add a regression test that ensures nested tool calls can execute in
    parallel and assert on timing
    - refresh `codex-rs/Cargo.lock` for the updated dependency tree (add
    `codex-utils-pty`, drop `codex-otel`)
    
    Testing
    - Not run (not requested)
  • Add default code-mode yield timeout (#14484)
    Summary
    - expose the default yield timeout through code mode runtime so the
    handler, wait tool, and protocol share the same 10s value that matches
    unified exec
    - document the timeout change in the tool descriptions and propagate the
    value all the way into the runner metadata
    - adjust Cargo.lock to keep the dependency tree in sync with the added
    code mode tool dependency
    
    Testing
    - Not run (not requested)
  • Cleanup code_mode tool descriptions (#14480)
    Move to separate files and clarify a bit.
  • Dispatch tools when code mode is not awaited directly (#14437)
    ## Summary
    - start a code mode worker once per turn and let it pump nested tool
    calls through a dedicated queue
    - simplify code mode request/response dispatch around request ids and
    generic runner-unavailable errors
    - clean up the code mode process API and runner protocol plumbing
    
    ## Testing
    - not run yet
  • Support waiting for code_mode sessions (#14295)
    ## Summary
    - persist the code mode runner process in the session-scoped code mode
    store
    - switch the runner protocol from `init` to `start` with explicit
    session ids
    - handle runner-side session processing without the init waiter queue
    
    ## Validation
    - just fmt
    - cargo check -p codex-core
    - node --check codex-rs/core/src/tools/code_mode_runner.cjs
  • Add ALL_TOOLS export to code mode (#14294)
    So code mode can search for tools.
  • Rename code mode tool to exec (#14254)
    Summary
    - update the code-mode handler, runner, instructions, and error text to
    refer to the `exec` tool name everywhere that used to say `code_mode`
    - ensure generated documentation strings and tool specs describe `exec`
    and rely on the shared `PUBLIC_TOOL_NAME`
    - refresh the suite tests so they invoke `exec` instead of the old name
    
    Testing
    - Not run (not requested)
  • Add store/load support for code mode (#14259)
    adds support for transferring state across code mode invocations.
  • Add code_mode output helpers for text and images (#14244)
    Summary
    - document how code-mode can import `output_text`/`output_image` and
    ensure `add_content` stays compatible
    - add a synthetic `@openai/code_mode` module that appends content items
    and validates inputs
    - cover the new behavior with integration tests for structured text and
    image outputs
    
    Testing
    - Not run (not requested)
  • Add model-controlled truncation for code mode results (#14258)
    Summary
    - document that `@openai/code_mode` exposes
    `set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
    final Rust-side output when the budget is exceeded
    - enforce the configured budget in the Rust tool runner, reusing
    truncation helpers so text-only outputs follow the unified-exec wrapper
    and mixed outputs still fit within the limit
    - ensure the new behavior is covered by a code-mode integration test and
    string spec update
    
    Testing
    - Not run (not requested)
  • Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
    Summary
    - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
    to keep MCP tooling focused on the final result shape
    - wire the new schema definitions through code mode, context, handlers,
    and spec modules so MCP tools serialize the exact output shape expected
    by the model
    - extend code mode tests to cover multiple MCP call scenarios and ensure
    the serialized data matches the new schema
    - refresh JS runner helpers and protocol models alongside the schema
    changes
    
    Testing
    - Not run (not requested)
  • Expose strongly-typed result for exec_command (#14183)
    Summary
    - document output types for the various tool handlers and registry so
    the API exposes richer descriptions
    - update unified execution helpers and client tests to align with the
    new output metadata
    - clean up unused helpers across tool dispatch paths
    
    Testing
    - Not run (not requested)
  • Export tools module into code mode runner (#14167)
    **Summary**
    - allow `code_mode` to pass enabled tools metadata to the runner and
    expose them via `tools.js`
    - import tools inside JavaScript rather than relying only on globals or
    proxies for nested tool calls
    - update specs, docs, and tests to exercise the new bridge and explain
    the tooling changes
    
    **Testing**
    - Not run (not requested)
  • Add code_mode experimental feature (#13418)
    A much narrower and more isolated (no node features) version of js_repl