codex

Preserve image detail in app-server inputs (#20693 )

## Summary

- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.

Curtis 'Fjord' Hawthorne · 2026-05-15 15:04:04 -07:00

8543e39885

clean up instructions (#22543 )

rm behavioral steering in tool docs for code mode.

sayan-oai · 2026-05-13 14:28:57 -07:00

3de4d7f238

code-mode: Add pending-aware code mode execution (#22280 )

Introduce execute_to_pending and wait_to_pending APIs that freeze
pending-mode runtimes until an explicit resume, while preserving the
existing continuously-running execute path. Add runtime and service
coverage for pending, resume, completion, and freeze behavior.

Channing Conger · 2026-05-12 17:16:57 -07:00

589b820d6e

code-mode: carry nested tool kind through runtime (#22377 )

## Why

Code mode only used nested spec lookup at execution time to rediscover
whether a nested tool should be invoked as a function tool or a freeform
tool.

That information is already present in the enabled tool metadata that
code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
up from the router was redundant and kept execution coupled to a
separate spec lookup path.

## What Changed

- thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
event and `CodeModeNestedToolCall`
- emit the nested tool kind directly from the V8 callback using the
already-enabled tool metadata
- build nested tool payloads from the propagated kind instead of calling
`find_spec`
- remove the now-unused `find_spec` plumbing from the router and
parallel runtime helpers
- add unit coverage for function vs freeform payload shaping and update
affected router tests

## Testing

- `cargo test -p codex-code-mode`
- `cargo test -p codex-core code_mode::tests`
- `cargo test -p codex-core
extension_tool_bundles_are_model_visible_and_dispatchable`
- `cargo test -p codex-core
model_visible_specs_filter_deferred_dynamic_tools`

pakrym-oai · 2026-05-12 23:34:37 +00:00

960d42ddae

Enable V8 sandboxing for source-built builds (#21146 )

## Summary

This is the first PR in the V8 in-process sandboxing rollout.

It adds the build-system and Rust feature plumbing needed to support
sandboxed V8 builds, then enables sandboxing by default for the
source-built Bazel V8 path that we control directly. It deliberately
keeps the published `rusty_v8` artifact workflows on their current
non-sandboxed contract so this PR can land and ship independently before
we change any released artifacts.

## Rollout plan

- [x] **PR 1: land sandbox plumbing and default source-built Bazel V8 to
sandboxed mode**

- [ ] **PR 2: publish sandbox-enabled release artifacts and add
compatibility validation**
- Produce sandboxed artifact pairs for every released Cargo target that
does not already use the source-built Bazel path.
- Add CI coverage that consumes those sandboxed artifacts and verifies:
    - `codex-v8-poc` reports sandbox enabled
    - `codex-code-mode` builds/tests against the sandboxed path

- [ ] **PR 3: switch release consumers to sandboxed artifacts by
default**
  - Update released artifact selectors/checksums.
- Enable the Rust `v8_enable_sandbox` feature in the default release
path.
- Make the sandboxed artifact family the normal path for published
builds.

- [ ] **PR 4: remove rollout-only compatibility paths**
- Remove the temporary non-sandbox release compatibility config once the
new default has shipped and baked.
  - Keep the invariant tests permanently.

Channing Conger · 2026-05-05 14:36:37 -07:00

36460387ec

Prune unused code-mode globals (#20542 )

Hide Atomics, SharedArrayBuffer, and WebAssembly from the code-mode
runtime since the harness does not expose worker support or need those
APIs.

Channing Conger · 2026-05-01 15:11:22 -07:00

a5fbcf1ab4

[rollout_trace] Trace tool and code-mode boundaries (#18878 )

## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.

cassirer-openai · 2026-04-23 12:22:11 -07:00

6d09b6752d

Update image outputs to default to high detail (#18386 )

Do not assume the default `detail`.

pakrym-oai · 2026-04-18 11:01:12 -07:00

53b1570367

refactor: use cloneable async channels for shared receivers (#18398 )

This is the first mechanical cleanup in a stack whose higher-level goal
is to enable Clippy coverage for async guards held across `.await`
points.

The follow-up commits enable Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
lint and the configurable
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lint for Tokio guard types. This PR handles the cases where the
underlying issue is not protected shared mutable state, but a
`tokio::sync::mpsc::UnboundedReceiver` wrapped in `Arc<Mutex<_>>` so
cloned owners can call `recv().await`.

Using a mutex for that shape forces the receiver lock guard to live
across `.await`. Switching these paths to `async-channel` gives us
cloneable `Receiver`s, so each owner can hold a receiver handle directly
and await messages without an async mutex guard.

## What changed

- In `codex-rs/code-mode`, replace the turn-message
`mpsc::UnboundedSender`/`UnboundedReceiver` plus `Arc<Mutex<Receiver>>`
with `async_channel::Sender`/`Receiver`.
- In `codex-rs/codex-api`, replace the realtime websocket event receiver
with an `async_channel::Receiver`, allowing `RealtimeWebsocketEvents`
clones to receive without locking.
- Add `async-channel` as a dependency for `codex-code-mode` and
`codex-api`, and update `Cargo.lock`.

## Verification

- The split stack was verified at the final lint-enabling head with
`just clippy`.

Michael Bolin · 2026-04-17 15:20:30 -07:00

c9c4caafd8

[code mode] defer mcp tools from exec description (#17287 )

## Summary
- hide deferred MCP/app nested tool descriptions from the `exec` prompt
in code mode
- add short guidance that omitted nested tools are still available
through `ALL_TOOLS`
- cover the code_mode_only path with an integration test that discovers
and calls a deferred app tool

## Motivation
`code_mode_only` exposes only top-level `exec`/`wait`, but the `exec`
description could still include a large nested-tool reference. This
keeps deferred nested tools callable while avoiding that prompt bloat.

## Tests
- `just fmt`
- `just fix -p codex-code-mode`
- `just fix -p codex-tools`
- `cargo test -p codex-code-mode
exec_description_mentions_deferred_nested_tools_when_available`
- `cargo test -p codex-tools
create_code_mode_tool_matches_expected_spec`
- `cargo test -p codex-core
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`

sayan-oai · 2026-04-17 00:01:14 +08:00

9c6d038622

Support original-detail metadata on MCP image outputs (#17714 )

## Summary
- honor `_meta["codex/imageDetail"] == "original"` on MCP image content
and map it to `detail: "original"` where supported
- strip that detail back out when the active model does not support
original-detail image inputs
- update code-mode `image(...)` to accept individual MCP image blocks
- teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
from raw MCP image outputs
- document the new `_meta` contract and add generic RMCP-backed coverage
across protocol, core, code-mode, and js_repl paths

Curtis 'Fjord' Hawthorne · 2026-04-15 14:43:33 -07:00

9e2fc31854

register all mcp tools with namespace (#17404 )

stacked on #17402.

MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.

also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.

make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.

this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.

sayan-oai · 2026-04-15 21:02:59 +08:00

0df7e9a820

[codex] Initialize ICU data for code mode V8 (#17709 )

Link ICU data into code mode, otherwise locale-dependent methods cause a
panic and a crash.

pakrym-oai · 2026-04-13 22:01:58 -07:00

ad37389c18

Add output_schema to code mode render (#17210 )

This updates code-mode tool rendering so MCP tools can surface
structured output types from their `outputSchema`.

What changed:
- Detect MCP tool-call result wrappers from the output schema shape
instead of relying on tool-name parsing or provenance flags.
- Render shared TypeScript aliases once for MCP tool results
(`CallToolResult`, `ContentBlock`, etc.) so multiple MCP tool
declarations stay compact.
- Type `structuredContent` from the tool definition's `outputSchema`
instead of rendering it as `unknown`.
- Update the shared MCP aliases to match the MCP draft `CallToolResult`
schema more closely.

Example:
- Before: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
string; message: string; }): Promise<{ _meta?: unknown; content:
Array<unknown>; isError?: boolean; structuredContent?: unknown; }>; };`
- After: `declare const tools: { mcp__rmcp__echo(args: { env_var?:
string; message: string; }): Promise<CallToolResult<{ echo: string; env:
string | null; }>>; };`

Vivian Fang · 2026-04-10 11:41:44 +00:00

7bbe3b6011

Render namespace description for tools (#16879 )

Vivian Fang · 2026-04-08 02:39:40 -07:00

d47b755aa2

Render function attribute descriptions (#16880 )

Vivian Fang · 2026-04-08 02:10:45 -07:00

9091999c83

Add setTimeout support to code mode (#16153 )

The implementation is less than ideal - it starts a thread per timer. A
better approach might be to switch to tokio and use their timer
imlementation.

pakrym-oai · 2026-04-06 17:46:28 -07:00

0de7662dab

Update code mode exec() instructions (#16279 )

Andrey Mishchenko · 2026-03-30 12:31:13 -10:00

390b644b21

chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054 )

## Why

`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.

This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.

## What changed

- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`

That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.

## Validation

- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`

## Follow-up

- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.

Michael Bolin · 2026-03-27 19:00:44 -07:00

61dfe0b86c

Code mode on v8 (#15276 )

Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.

The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.

The major change is underneath that surface:

- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.

This PR also fixes the two migration regressions that fell out of that
substrate change:

- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.

---

- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.

---

Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max
cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB
warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB
cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB
warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB
cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB
warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```

---------

Co-authored-by: Codex <noreply@openai.com>

Channing Conger · 2026-03-20 23:36:58 -07:00

e4eedd6170

20 Commits