codex

code-mode: make session shutdown authoritative (#29287 )

## Summary

- Give each session and cell a hierarchical cancellation token.
- Track cell tasks so shutdown waits for admitted actors without polling
the registry.
- Make shutdown authoritative across concurrent admission and
non-cooperative callbacks.

## Why

A best-effort registry scan can miss cells admitted concurrently or
blocked behind the registry lock.

## Impact

Session shutdown reliably stops every admitted cell and rejects new work
once shutdown begins.

## Validation

- Stack-tip validation: `just test -p codex-code-mode -p
codex-code-mode-protocol` (70 passed).
- Parent branch: `cconger/code-mode-runtime-compact-03c-terminal-state`.

Channing Conger · 2026-06-21 13:15:38 -07:00

9c79d87d06

code-mode: move cell state into library actor (#28599 )

A code-mode cell is a single JavaScript execution that can produce
output, call tools, wait for asynchronous work, resume, or be
terminated. This PR extracts the existing per-cell run loop into a
dedicated actor that owns the cell’s lifecycle state. It is primarily an
ownership change rather than a new lifecycle contract: existing behavior
now has one clear implementation boundary.

### Architecture
The session service remains responsible for session-wide concerns:
allocating cell IDs, storing shared values, creating cells, and routing
requests to them.

Once a cell is created, its execution state belongs to its actor.
Callers interact with the actor through a handle. The actor receives two
kinds of input: runtime events and control requests.

A single event loop serializes these inputs and applies the lifecycle
rules. It tracks the current observer—the caller waiting for an
update—along with accumulated output, outstanding callbacks, runtime
state, yield deadlines, and termination progress. Observation,
termination, completion, and cleanup therefore have one consistent
owner.

When the runtime has no immediately runnable work and is waiting only on
timers or tool results, the actor can return accumulated output and
information about outstanding tool calls while keeping the cell
available to resume. On completion or termination, it performs the
appropriate callback cleanup before publishing the final result and
removing the cell from the session.

A small host interface connects the actor to session-owned facilities
such as tool dispatch, notifications, stored values, and final cell
removal, keeping those responsibilities outside the actor itself.

### Why
Previously, cell lifecycle state and coordination lived alongside
session management. The actor boundary makes each cell a self-contained
state machine with a single writer, while the service becomes a registry
and adapter around it.

This makes lifecycle behavior easier to reason about and test in
isolation. It also establishes a clean boundary for later changing where
cells run or how they communicate without recreating their lifecycle
rules.

Channing Conger · 2026-06-16 19:28:55 -07:00

e2f074e16c

[code-mode] Reject remote image URLs from output helpers (#27732 )

## Summary

- reject HTTP(S) image URLs from the shared code-mode output-image
normalization path
- return a concise model-visible tool error so the model can recover on
its next turn
- apply the targeted rejection to both `image()` and `generatedImage()`
- leave other non-empty image URL values to existing downstream handling

The returned error is:

> Tool call failed: remote image URLs are not supported in tool outputs.
Pass a base64 data URI instead

## Why

Responses Lite cannot lower a remote image URL emitted from a structured
tool output. Rejecting HTTP(S) values in the Codex harness preserves the
tool-call metadata and gives the model a recoverable next turn instead
of invalidating the sample.

## Test coverage

The regression is covered primarily by a `test_codex()` agent
integration test that simulates the Responses API exchange and asserts
the failed model-visible exec output. A supplemental runtime test covers
both `http://` and `https://` inputs across both image output helpers.

## Test plan

- `cd codex-rs && just test -p codex-code-mode`
- `cd codex-rs && just test -p codex-code-mode-protocol`
- `cd codex-rs && just test -p codex-core
code_mode_image_helper_rejects_remote_url`
- `cd codex-rs && just fmt`
- `git diff --check origin/main...HEAD`

Related context: https://github.com/openai/openai/pull/1022346

rka-oai · 2026-06-12 02:49:17 -07:00

c09df9e353

code-mode standalone: extract protocol and add host crate (#27724 )

This is phase 1 of a 4 phase stack:
1. **Add protocol and host crates for new IPC code mode implementation**
2. Create the new standalone binary
3. Create a new IPC `CodeModeSessionProvider` to use new binary
4. Remove v8 from core and only use IPC provider


## Add protocol and host crates for new IPC code mode implementation
Establish a clean process boundary without changing the existing
in-process behavior.

- Add the codex-code-mode-protocol crate for shared session, runtime,
response, and tool-definition types.
- Move protocol-facing code out of the V8-backed implementation.
- Add a buildable codex-code-mode-host crate as the foundation for the
standalone process.
- Keep the existing in-process runtime as the active implementation.

Channing Conger · 2026-06-11 22:37:26 -07:00

aa46f2debf

Add saved image path hint to standalone image generation (#25947 )

## Why

Standalone image generation returns image bytes to the model, but the
model also needs the host artifact path to reference the generated file
in follow-up work.

## What changed

- Append the default saved-image path hint alongside the generated image
tool output.
- Reuse the existing core image-generation hint text.
- Pass the thread ID and Codex home directory needed to compute the
artifact path.
- Add app-server and extension coverage for the model-visible hint.

## Validation

- `just fmt`
- `just bazel-lock-check`
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`

Won Park · 2026-06-04 09:39:20 -07:00

12e8764a9c

code-mode: introduce durable session interface (#24180 )

## Summary

Introduce a `CodeModeSession` interface for executing and managing
code-mode cells.

This moves cell lifecycle, callback delegation, termination, and
shutdown behind a session abstraction, while continuing to use the
existing in-process implementation, and the ability to implement an
external process one behind this interface.

A Codex session owns one `CodeModeSession`, which in turn owns its
running cells and stored code-mode state. Each cell is represented to
the caller as a `StartedCell`, exposing its cell ID and initial
response.

It also introduces a `CodeModeSessionDelegate` callback interface. A
session uses the delegate to invoke nested host tools and emit
notifications while a cell is running, allowing the runtime to
communicate with its owning Codex session without depending directly on
core turn handling.

<img width="2121" height="1001" alt="image"
src="https://github.com/user-attachments/assets/c349a819-2a59-485c-bda4-2caf68ac4c31"
/>

Channing Conger · 2026-05-29 11:42:52 -07:00

c9dc0f6338

Restore legacy image detail values (#24644 )

## Why

Older persisted rollouts can contain `input_image.detail` values of
`auto` or `low` from before `ImageDetail` was narrowed to
`high`/`original`. Current deserialization rejects those values, which
can make resume skip later compacted checkpoints and reconstruct an
oversized raw suffix before the next compaction attempt.

Confirmed Sentry reports fixed by this compatibility path:

- [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
- [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
- [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
- [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)

## Background

[openai/codex#20693](https://github.com/openai/codex/pull/20693) added
image-detail plumbing for app-server `UserInput` so input images could
explicitly request `detail: original`. The Slack discussion behind that
PR was about ScreenSpot / bridge evals where user input images were
resized, while tool output images already had MCP/code-mode ways to
request image detail.

In review, the intended new API surface was narrowed to `high` and
`original`: default to `high`, allow `original` when callers need
unchanged image handling, and avoid encouraging new `auto` or `low`
usage. That policy still makes sense for newly emitted values.

The missing compatibility piece is persisted history. Older rollouts can
already contain `auto` and `low`, and resume reconstructs typed history
by deserializing those rollout records. Rejecting old values at that
boundary causes valid compacted checkpoints to be skipped. This PR
restores `auto` and `low` as real variants so old records deserialize
and round-trip without being rewritten as `high`, while product paths
can continue to default to `high` and avoid emitting `auto` for new
behavior.

## What changed

- Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
protocol values.
- Preserved `auto`/`low` through rollout deserialization, MCP image
metadata, code-mode image output, and schema/type generation.
- Kept local image byte handling conservative: only `original` switches
to original-resolution loading; `auto`/`low`/`high` continue through the
resize-to-fit path while retaining their detail value.
- Added regression coverage for enum round-tripping and code-mode `low`
detail handling.

## Testing

- `just write-app-server-schema`
- `just test -p codex-protocol`
- `just test -p codex-tools`
- `just test -p codex-code-mode`
- `just test -p codex-app-server-protocol`
- `just test -p codex-core
suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
- `just test -p codex-core
suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
- Loaded broken rollouts on local fixed builds, and started/completed
new turns.

I also attempted `just test -p codex-core`; the local broad run did not
finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
out. The failures were broad timeout/deadline failures across unrelated
areas; targeted changed-path core tests above passed.

rhan-oai · 2026-05-26 16:24:33 -07:00

dc4e54d061

code-mode: merge stored values by key (#24159 )

## Summary

Change code-mode stored value updates to merge writes by key instead of
replacing the session's complete stored-value map after each cell
completes.

Previously, each cell received a snapshot of stored values and returned
the complete resulting map. When multiple cells ran concurrently, a
later completion could overwrite values written by another cell because
it committed an older snapshot.

This change moves stored-value ownership into `CodeModeService`:

- Each runtime starts from the service's current stored values.
- Runtime completion reports only keys written by that cell.
- The service merges those writes into the current stored-value map on
successful completion.
- Core no longer replaces its stored-value state from a cell result.

As a result, concurrently executing cells can update different stored
keys without clobbering one another.

The move into CodeModeService is motivated by a desire to have this
lifetime tied to a new lifetime object on that side in a subsequent PR.

Channing Conger · 2026-05-22 19:09:02 -07:00

f94157a4b2

Preserve image detail in app-server inputs (#20693 )

## Summary

- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.

Curtis 'Fjord' Hawthorne · 2026-05-15 15:04:04 -07:00

8543e39885

code-mode: Add pending-aware code mode execution (#22280 )

Introduce execute_to_pending and wait_to_pending APIs that freeze
pending-mode runtimes until an explicit resume, while preserving the
existing continuously-running execute path. Add runtime and service
coverage for pending, resume, completion, and freeze behavior.

Channing Conger · 2026-05-12 17:16:57 -07:00

589b820d6e

code-mode: carry nested tool kind through runtime (#22377 )

## Why

Code mode only used nested spec lookup at execution time to rediscover
whether a nested tool should be invoked as a function tool or a freeform
tool.

That information is already present in the enabled tool metadata that
code mode builds to expose `tools.*` and `ALL_TOOLS`, so re-looking it
up from the router was redundant and kept execution coupled to a
separate spec lookup path.

## What Changed

- thread `CodeModeToolKind` through the code-mode runtime `ToolCall`
event and `CodeModeNestedToolCall`
- emit the nested tool kind directly from the V8 callback using the
already-enabled tool metadata
- build nested tool payloads from the propagated kind instead of calling
`find_spec`
- remove the now-unused `find_spec` plumbing from the router and
parallel runtime helpers
- add unit coverage for function vs freeform payload shaping and update
affected router tests

## Testing

- `cargo test -p codex-code-mode`
- `cargo test -p codex-core code_mode::tests`
- `cargo test -p codex-core
extension_tool_bundles_are_model_visible_and_dispatchable`
- `cargo test -p codex-core
model_visible_specs_filter_deferred_dynamic_tools`

pakrym-oai · 2026-05-12 23:34:37 +00:00

960d42ddae

Prune unused code-mode globals (#20542 )

Hide Atomics, SharedArrayBuffer, and WebAssembly from the code-mode
runtime since the harness does not expose worker support or need those
APIs.

Channing Conger · 2026-05-01 15:11:22 -07:00

a5fbcf1ab4

[rollout_trace] Trace tool and code-mode boundaries (#18878 )

## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.

cassirer-openai · 2026-04-23 12:22:11 -07:00

6d09b6752d

Update image outputs to default to high detail (#18386 )

Do not assume the default `detail`.

pakrym-oai · 2026-04-18 11:01:12 -07:00

53b1570367

Support original-detail metadata on MCP image outputs (#17714 )

## Summary
- honor `_meta["codex/imageDetail"] == "original"` on MCP image content
and map it to `detail: "original"` where supported
- strip that detail back out when the active model does not support
original-detail image inputs
- update code-mode `image(...)` to accept individual MCP image blocks
- teach `js_repl` / `codex.emitImage(...)` to preserve the same hint
from raw MCP image outputs
- document the new `_meta` contract and add generic RMCP-backed coverage
across protocol, core, code-mode, and js_repl paths

Curtis 'Fjord' Hawthorne · 2026-04-15 14:43:33 -07:00

9e2fc31854

register all mcp tools with namespace (#17404 )

stacked on #17402.

MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.

also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.

make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.

this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.

sayan-oai · 2026-04-15 21:02:59 +08:00

0df7e9a820

[codex] Initialize ICU data for code mode V8 (#17709 )

Link ICU data into code mode, otherwise locale-dependent methods cause a
panic and a crash.

pakrym-oai · 2026-04-13 22:01:58 -07:00

ad37389c18

Add setTimeout support to code mode (#16153 )

The implementation is less than ideal - it starts a thread per timer. A
better approach might be to switch to tokio and use their timer
imlementation.

pakrym-oai · 2026-04-06 17:46:28 -07:00

0de7662dab

Code mode on v8 (#15276 )

Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.

The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.

The major change is underneath that surface:

- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.

This PR also fixes the two migration regressions that fell out of that
substrate change:

- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.

---

- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.

---

Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max
cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB
warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB
cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB
warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB
cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB
warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```

---------

Co-authored-by: Codex <noreply@openai.com>

Channing Conger · 2026-03-20 23:36:58 -07:00

e4eedd6170

19 Commits