codex

[codex] disable Nagle on Rendezvous WebSockets (#30269 )

## Summary

Disable Nagle unconditionally for both exec-server Rendezvous WebSocket
connections.

- pass `disable_nagle=true` at the executor and harness connection call
sites
- keep the existing signed URL, protocol, and connection flow unchanged
- add no feature flag, rollout schema, path variant, or
experiment-specific telemetry

The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous
sockets: https://github.com/openai/openai/pull/1082463

## Why

Rendezvous carries small, latency-sensitive relay and JSON-RPC frames.
Three staging runs of 30 steady-state `process/read` calls per
configuration measured p50 improving from 139.1 ms to 81.5 ms and p95
from 162.0 ms to 95.8 ms with Nagle disabled.

The expected packet overhead is small at the current connection scale.
We will use existing latency, error, packet, and CPU monitoring and
revert normally if production regresses.

## Rollout and rollback

The client and accepted-socket changes can deploy independently. New
connections receive the setting as each side deploys. Rollback is a
normal code revert; there is no persisted assignment or gate state to
unwind.

## Validation

- `just test -p codex-exec-server --lib`: 164 passed
- `just fix -p codex-exec-server`: passed
- `just fmt`: passed
- independent final review found no actionable issue

richardopenai · 2026-06-29 19:14:47 -05:00

cfead68e5d

[app-server] expose environment info RPC (#30291 )

## Why

App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.

## What changed

- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.

## Protocol examples

Request:

```json
{
  "id": 42,
  "method": "environment/info",
  "params": {
    "environmentId": "remote-a"
  }
}
```

Successful response:

```json
{
  "id": 42,
  "result": {
    "shell": {
      "name": "zsh",
      "path": "/bin/zsh"
    },
    "cwd": "file:///workspace"
  }
}
```

If the exec-server initializes but does not answer the probe within 30
seconds:

```json
{
  "id": 42,
  "error": {
    "code": -32603,
    "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
  }
}
```

## Testing

- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>

Max Johnson · 2026-06-27 19:34:10 +00:00

e2398d0b16

[codex] consume pushed exec-server process events (#30273 )

## Summary

- complete unified-exec processes from the ordered event stream instead
of issuing a final zero-wait `process/read`
- add optional executor sandbox-denial state to `process/exited`
- retain `process/read` as a retained-output and compatibility fallback
for receiver lag, sequence gaps, and legacy servers
- recover sandbox-denial state across transport reconnection
- cover the real `TestCodex` remote-exec path without adding a public
test-only event constructor

## Why

A successful one-shot tool call currently receives its output and
terminal notifications, then pays another wide-area `process/read` round
trip before returning. Staging traces showed that remote response wait
accounted for more than 99.8% of RPC time; local serialization,
queueing, and deserialization were below 0.6 ms.

## Measured impact

A direct staging A/B used the same build and route and changed only
completion mode. Each arm ran three times with 30 one-shot
`/usr/bin/true` calls per run. The table reports the median of the three
per-run percentiles.

| Metric | Final `process/read` | Pushed events | Change |
| --- | ---: | ---: | ---: |
| End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
| End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
| Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
| Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |

TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
successful, complete, in-order event path issued zero final
`process/read` calls.

## Compatibility and recovery

- new servers send `sandboxDenied` on `process/exited`
- legacy servers omit it, which triggers one compatibility
`process/read`
- broadcast lag or a sequence gap triggers a retained-output read
- recovery remains bounded by the server's existing 1 MiB
retained-output window
- complete, in-order event streams issue no completion read
- sandbox denial is attached to the exit event before consumers can
observe process completion
- server-first and client-first rollouts remain wire-compatible;
server-first realizes the latency win immediately

## Integration coverage

The `TestCodex` suite exercises four distinct remote-exec contracts:

- complete pushed output/exit/close with zero reads
- direct pushed sandbox denial with zero reads
- legacy missing denial metadata with exactly one compatibility read
- count-bounded replay eviction recovered from retained output without
duplication

## Validation

- `just test -p codex-core
exec_command_consumes_pushed_remote_process_events`: 4 passed
- `just test -p codex-core unified_exec::process_tests::`: 4 passed
- `just test -p codex-exec-server`: 294 passed, 2 skipped
- `just test -p codex-exec-server-protocol`: 5 passed
- `just test -p codex-rmcp-client`: 89 passed, 2 skipped
- focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
- scoped `just fix` passed for core and exec-server
- `just fmt` passed

The complete workspace suite was not rerun; focused Cargo and Bazel
coverage passed for the changed behavior.

richardopenai · 2026-06-26 18:05:52 -07:00

d4ec08b8f0

Persist Cloudflare affinity cookies for MCP HTTP (#29516 )

[Codex Thread
019ef1f9-36e2-7e91-9337-504f097b9dc1](https://codex-thread-link.openai.chatgpt-team.site/thread/019ef1f9-36e2-7e91-9337-504f097b9dc1)

## Why

Hosted plugin-service Streamable HTTP MCP traffic uses
`https://chatgpt.com/backend-api/ps/mcp` and depends on Cloudflare's
`__cflb` cookie for load-balancer affinity. The local and exec-server
`http/request` path built a fresh reqwest client for each request
without installing Codex's existing shared ChatGPT Cloudflare cookie
store, so affinity could be lost between calls.

This is an affinity-hardening change motivated by an incident
investigation. It does not establish the broader connector-cache
incident RCA or claim to fix that incident in full.

## What changed

- Install the existing process-local, strictly allowlisted ChatGPT
Cloudflare cookie store on the reqwest client used by
`ReqwestHttpClient`.
- Fresh clients now share allowed Cloudflare infrastructure cookies
within the process that originates the local or exec-server network
request.
- Keep the existing HTTPS ChatGPT-host and Cloudflare-cookie-name
restrictions. This does not introduce a general cookie jar or send
ChatGPT Cloudflare cookies to unrelated hosts.

## Test coverage

- `codex-client` unit coverage verifies that the existing strict store
accepts and returns `__cflb` for HTTPS ChatGPT URLs.
- The exec-server HTTPS integration test sends four independent
`http/request` calls through a local TLS-intercepting proxy and verifies
that:
- `Set-Cookie: __cflb=west` is sent on the next plugin-service request;
- a later `Set-Cookie: __cflb=central` replaces the stored value;
- non-Cloudflare session cookies are discarded;
- no stored ChatGPT Cloudflare cookie is sent to a non-ChatGPT host.
- `just test -p codex-client` — 38 passed.
- `just test -p codex-exec-server --test chatgpt_cloudflare_affinity` —
1 passed.
- `just bazel-lock-check` — passed.

## Non-goals

- No persistence of ChatGPT auth, account, session, residency, or
arbitrary cookies.
- No cookie persistence for third-party MCP servers.
- No special composition of caller-provided `Cookie` headers.
- No plugin-service, connector-cache, Habitat/habicache, routing,
redirect, or API-contract changes.
- No broader incident RCA conclusions.

stevenlee-oai · 2026-06-26 02:23:24 -04:00

b5866eebd6

Test selected capabilities across availability and resume (#30157 )

## Why

This stack crosses World State, executor skills, selected plugin
metadata, MCP processes, connectors, dynamic environments, and resume.
This PR adds two end-to-end scenarios that validate those pieces
together.

Both tests enable `deferred_executor`, so they exercise the real
delayed-environment path.

## Scenario 1: availability across turns and resume

```text
1. Start a thread with one selected plugin root bound to E1.
2. E1 is unavailable.
   - executor skill is absent
   - selected MCP is absent
   - connector has no selected-plugin attribution
3. Start E1 and register the same stable environment ID.
4. Start a new turn.
   - the executor skill appears through World State
   - its body beats a colliding host skill
   - the selected MCP tool is advertised and executes inside E1
   - the connector is attributed to the selected plugin
5. Start another turn without changing E1.
   - the MCP PID stays the same, proving runtime reuse
6. Restart app-server and resume the thread.
   - durable selected-root intent is restored
   - skills, MCP, and connector attribution are restored
   - a new MCP PID proves ephemeral process state was rebuilt
```

## Scenario 2: availability changes inside one turn

```text
1. Start a turn while E1 is unavailable.
2. The first model sample sees no executor skill, MCP, or selected connector.
3. The turn pauses on request_user_input.
4. Start E1 and register it while that same turn is still active.
5. Continue the turn.
6. The very next model sample sees:
   - the executor skill catalog
   - the selected MCP tool
   - selected-plugin connector attribution
7. The model calls the MCP, and its output proves execution happened inside E1.
```

This second scenario specifically protects the aeon-style behavior:
capability state is captured again for every sampling step, not only at
the next user turn.

## Scope

These are integration tests only. They do not add a combinatorial matrix
for unsupported plugin-file mutation, environment generations, transport
disconnects, or delayed `required = true` executor MCPs.

jif · 2026-06-26 03:11:55 +01:00

25f50de6ed

[codex] Propagate traces through exec-server HTTP (#30117 )

Fixes distributed trace continuity across exec-server JSON-RPC HTTP
egress by adding an executor client span and injecting its W3C context
through a reusable `codex-otel` helper.

This preserves the caller trace across core/tool → executor →
provider/MCP instead of dropping parentage at raw reqwest.

Note that this doesn't include the websocket path, which is needed to
really get the full story but at least we cover the basic http path with
this change.

Tom · 2026-06-25 23:22:22 +00:00

8ce931ab76

[codex] Observe remote exec-server lifecycle (#27470 )

## Summary

- Record bounded duration and outcome metrics for remote environment
registration and Noise rendezvous connection attempts.
- Count reconnects by bounded reason: disconnect, connection failure, or
rejected registration.
- Trace registration at the owning client boundary without exporting raw
environment or registration identifiers.
- Replace the stale pre-Noise WebSocket observability design with the
current remote transport model.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests
2. #27467 — record bounded connection, request, and process lifecycle
metrics
3. #27470 — observe remote registration and Noise rendezvous lifecycle
**(this PR)**

## Validation

- `just test -p codex-exec-server --lib` (149 passed)
- `just test -p codex-cli --test exec_server` (4 passed)
- `just argument-comment-lint`
- `just bazel-lock-check`
- `just fix -p codex-exec-server -p codex-cli`
- `just fmt`

richardopenai · 2026-06-25 13:42:40 -07:00

3b22498f69

[codex] Retry temporarily offline exec-server recovery (#30098 )

## Summary

- retry ERS `409 environment_offline` responses inside the existing
exec-server recovery loop
- keep all other registry conflicts terminal
- add focused coverage for both cases

## Root cause

When an exec server disconnects and reconnects, the client already
starts recovery and calls ERS `/connect`. During the transient executor
presence gap, ERS can return `409 environment_offline`. The retry
classifier treated every 409 as terminal, so the first response aborted
the existing 25-second recovery window before the executor came back
online. That then caused active processes to be marked lost.

This change classifies only the structured `environment_offline`
conflict as retryable. Recovery continues with the existing bounded
deadline, exponential backoff, and jitter.

## Validation

- `just test -p codex-exec-server client::recovery::tests` — 4 passed
- `just fix -p codex-exec-server` — passed
- `just fmt` — passed
- Full `just test -p codex-exec-server` reached unrelated macOS
filesystem-sandbox integration failures because nested
`/usr/bin/sandbox-exec` is denied in this environment (`sandbox_apply:
Operation not permitted`).

richardopenai · 2026-06-25 19:25:04 +00:00

964b138c3d

[codex] Record exec-server lifecycle metrics (#27467 )

## Summary

- Record bounded connection, request, and process lifecycle metrics.
- Report active gauges from callbacks on every collection, including
delta exports.
- Serialize active-count updates so concurrent starts and finishes
cannot publish stale values.
- Serialize process exit, explicit termination, and shutdown through the
process registry so exactly one completion result wins.
- Keep the implementation small with single-owner RAII guards and one
real OTLP/HTTP integration test using the existing `wiremock`
dependency.

## Root cause

Process exit and session shutdown previously used cloned completion
state. That avoided duplicate emission, but it duplicated lifecycle
ownership and made the ordering harder to reason about. The process
registry mutex already defines the lifecycle ordering, so the final
implementation stores the metric guard and termination flag directly on
the process entry. Whichever path claims the entry first owns the
completion result.

Production metric export uses delta temporality. Event-only synchronous
gauge recordings disappear after the next collection when no count
changes, so active counts now use observable callbacks that report
current state on every collection.

The cleanup also removes the constant `result="accepted"` connection
tag, redundant route and response assertions, a custom HTTP collector,
and fallback initialization machinery that did not add behavior.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests
2. #27467 — record bounded connection, request, and process lifecycle
metrics **(this PR)**
3. #27470 — observe remote registration and Noise rendezvous lifecycle

## Validation

- `just test -p codex-exec-server --lib` (158 passed)
- `just test -p codex-cli --test exec_server` (3 passed)
- `just test -p codex-otel
observable_gauge_is_collected_on_every_delta_snapshot` (1 passed)
- `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server`
- `just fmt`
- `git diff --check`

richardopenai · 2026-06-25 11:02:11 -07:00

2dec46e30a

Persist selected capability roots and resolve availability per model step (#29856 )

## Why

`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”

The important product assumption is:

> One environment ID always names the same logical executor and stable
contents.

`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.

The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.

## The boundary

```text
persisted thread intent
plugin@1 -> environment "worker"
|
| capture the current step
v
model-step view
unavailable, or
plugin@1 + worker's exact captured ready handle
```

The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.

## What changes

### Persist the stable selection

Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.

Only stable data is persisted: root ID, environment ID, and root path.

### Capture readiness together with the exact handle

The environment snapshot records:

```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None // still starting in this step
```

This prevents readiness and execution from coming from different
registry snapshots.

For example:

```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```

Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.

This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.

### Resolve availability per model step

- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.

Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.

## Example

```text
thread selection: plugin@1 -> environment "worker"

step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```

Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.

## Compatibility

The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.

## Stack

1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.

jif · 2026-06-25 17:49:43 +00:00

8f02973d25

Support OAuth for HTTP MCP servers from selected executor plugins (#28529 )

## Why

#28522 routes selected-plugin HTTP MCP traffic through the owning
executor, but OAuth bootstrap and refresh still used host-local clients.
Executor-only servers therefore cannot complete discovery or login
through the same network boundary as the MCP connection.

## What changed

- adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
contract
- let RMCP own discovery, dynamic registration, PKCE, token exchange,
and refresh
- route auth status, persisted-token startup, and app-server login
through the server runtime while preserving the existing local discovery
path
- add optional `threadId` to `mcpServer/oauth/login` and echo it in the
completion notification
- implement RMCP's redirect policy and 1 MiB OAuth response limit over
executor HTTP
- cover selected-thread OAuth discovery and login through an
executor-only route

Depends on #28522.

jif · 2026-06-25 10:31:17 +01:00

b215961a56

Follow directory symlinks in filesystem walks (#29844 )

Stack 3 of 3. Stacked on #29842.

## What changes

Adds an opt-in `followDirectorySymlinks` setting to `fs/walk`.

When enabled, the walk follows directory symlinks but continues to
ignore symlinked files. Canonical directory identities prevent symlink
cycles, while normal paths keep their existing spelling.

Environment skill discovery enables the setting so symlinked skill
directories continue to work with the new single-RPC scan.

jif · 2026-06-24 20:52:36 +01:00

96d8e34712

[codex] Trace exec-server JSON-RPC requests (#27466 )

## Why

Exec-server JSON-RPC calls can cross local and remote transports, but
trace context stopped at the RPC boundary. That made client and server
work difficult to correlate when diagnosing latency or failures.

## What changed

- Propagate the current W3C trace context on outbound JSON-RPC requests.
- Parent inbound request spans from received trace context.
- Record the received JSON-RPC method on server spans and keep each span
open through response enqueue.
- Add only the OTEL dependencies required by the exec-server crate.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests **(this PR)**
2. #27467 — record bounded connection, request, and process lifecycle
metrics
3. #27470 — observe remote registration and Noise rendezvous lifecycle

## Validation

- `just test -p codex-exec-server --lib` (153 passed)
- `just bazel-lock-check`
- `just fix -p codex-exec-server`

richardopenai · 2026-06-24 12:50:18 -07:00

74dcce594d

Add a bounded filesystem walk RPC (#29841 )

Stack 1 of 3. Follow-ups: #29842 and #29844.

## What changes

Adds a general bounded `fs/walk` operation to the exec server.

The operation returns file and directory entries plus recoverable
per-path errors. It skips symlinks, preserves the existing filesystem
sandbox routing, and enforces depth, directory, entry, and response-size
limits.

This PR only defines and wires the filesystem operation. It does not
change any callers yet.

jif · 2026-06-24 16:05:43 +01:00

c14623d04c

test: add app-server auto environment helper (#29746 )

## Why

Start moving towards app-server tests defaulting to running against
remote & foreign OS executors. To do so we need a point of indirection
similar to core integration tests' `build_with_auto_env`, but with the
flexibility of letting tests control environment registration if they
need to.

## What

This adds:

- `TestAppServer::new_with_auto_env()` for constructing an app server
with a default environment defined by the test runner (e.g. bazel)
- `TestAppServer::auto_env_params()` for tests to easily acquire turn
env params tailored to the automatic environment
- `TestAppServer::send_thread_start_request_with_auto_env()` to make it
easy for tests to start a thread using the automatic environment

The above methods all fail if the test calling them has set up an
environment where the automatic environment configuration conflicts with
test-created state.

## Validation

Adds a couple of basic smoke tests to the app-server test suite.
Follow-ups will migrate more tests to use it.

Adam Perry @ OpenAI · 2026-06-24 01:06:29 +00:00

283bc4cf01

protocol: separate app and exec RPC ownership (#29714 )

## Why

The app-server and exec-server expose separate JSON-RPC APIs, but
exec-server currently sources its serialized protocol and envelope types
through app-server-oriented code. Giving each API an explicit owner
makes the crate boundary legible without introducing shared generic
envelopes.

## What changed

- Added `codex-exec-server-protocol` to own exec DTOs, process IDs, and
JSON-RPC envelopes.
- Updated exec-server clients, transports, handlers, and tests to use
the new crate.
- Exposed app-server's existing JSON-RPC types through a public `rpc`
module while retaining root re-exports.
- Preserved existing wire shapes, including exec `PathUri` behavior.

## Stack

This is PR 1 of 6. Next: [PR
#29721](https://github.com/openai/codex/pull/29721), which moves auth
mode below the app wire boundary.

## Validation

- Exec-server protocol and server coverage passed in the focused
protocol test runs.
- App-server protocol schema fixtures passed.

Adam Perry @ OpenAI · 2026-06-23 22:37:31 +00:00

829f5b6b59

path-uri: remove legacy path deserialization (#29158 )

## Why

I'd originally added `PathUri` legacy path deserialization thinking we'd
want it for having `PathUri` in public app-server APIs. Since then we've
added `LegacyAppPathString` to handle the messy conversions that we need
for backcompat. It's confusing for `PathUri` to support deserializing
legacy paths when we don't yet want to actually expose app-server
callers or rollout storage to the new URI format.

Stacked on top of #29472 to avoid breaking compatibility in case those
types ended up stored somewhere for someone.

## What changed

- Parse deserialized `PathUri` values exclusively as valid `file:` URIs.
- Replace legacy acceptance coverage with rejection coverage for
top-level filesystem paths and sandbox working directories.
- Serialize CWDs in hand-built exec-server process requests as `PathUri`
values.

Adam Perry @ OpenAI · 2026-06-23 21:47:00 +00:00

c26f961b85

[codex] Report the exec-server working directory (#29666 )

## Summary

- add the exec-server working directory to `environment/info` as an
optional `PathUri`
- populate it from the executor process's current directory
- preserve compatibility with older responses that omit `cwd`

## Why

Remote clients currently have no executor-native default working
directory. This forces callers such as app-server-backend to assume
`/workspace`, which fails for laptop environments. Reporting the cwd
alongside the detected shell lets clients use the path convention and
location of the actual executor.

## Impact

This is backward-compatible: the new response field is optional, and
clients can continue handling responses from older exec servers. A
follow-up app-server-backend change will consume the value for cwd-less
`command/exec` requests.

## Validation

- `just test -p codex-exec-server` (275 passed, 2 skipped)

Rasmus Rygaard · 2026-06-23 13:39:13 -07:00

66f0220c56

[codex] Preserve proxy state for filesystem sandbox helpers (#29671 )

## Why

Filesystem helpers intentionally run with a minimal environment that
excludes proxy variables. After filesystem operations started using the
Windows sandbox wrapper, the wrapper derived an empty proxy
configuration from that helper environment and compared it with the
persistent sandbox setup marker. When the marker contained proxy ports,
every filesystem operation appeared to require a firewall update, which
could launch elevated setup, show a UAC or loader dialog, and fail
operations such as `apply_patch` with error 1223.

Filesystem helpers do not use network access, so they should preserve
the proxy/firewall state established by normal sandboxed process
launches.

## What changed

- Add an explicit Windows sandbox proxy-settings mode for reconciling or
preserving persistent proxy state.
- Use preserve mode for filesystem helpers while normal process launches
continue to reconcile proxy settings from their environment.
- Carry the selected proxy state consistently through setup validation,
elevated setup, and non-elevated ACL refreshes.
- Cover wrapper argument propagation and marker-derived proxy
preservation.

## Validation

- `cargo build -p codex-cli --bin codex`
- `just test -p codex-windows-sandbox
preserving_proxy_settings_uses_the_existing_marker`
- `just test -p codex-windows-sandbox windows_wrapper_args_round_trip`
- `just test -p codex-windows-sandbox
setup_request_prefers_explicit_proxy_settings`
- `just test -p codex-sandboxing transform_for_direct_spawn_windows`
- `just test -p codex-exec-server fs_sandbox::tests`
- Ran the same sandboxed `fs/writeFile` reproduction against published
`0.142.0-alpha.6` and the new CLI. The published CLI launched elevated
setup and failed with `ShellExecuteExW ... 1223`; the new CLI completed
without elevation.

Related to #28359.

iceweasel-oai · 2026-06-23 12:29:46 -07:00

18fe1d9fe3

Prepare managed network sandbox context (#29456 )

## Why

Managed network configures commands to use local HTTP and SOCKS proxies.
For commands delegated to the exec server, the proxy environment and the
sandbox policy were prepared separately. On macOS, that meant a command
could receive `HTTPS_PROXY=http://127.0.0.1:43123` while Seatbelt still
denied access to port `43123`.

## What changed

`NetworkProxy` now prepares the command environment and sandbox context
together from the same runtime snapshot:

```text
Prepared managed network
├── command environment: HTTPS_PROXY=http://127.0.0.1:43123
└── sandbox context: allow outbound to 127.0.0.1:43123
```

That context travels with remote exec requests. The exec server
preserves the managed proxy and CA environment, and macOS Seatbelt
allows only the prepared loopback proxy ports without enabling broad
network access or local binding.

The protocol field is optional and the existing enforcement flag remains
in place, preserving compatibility with callers that do not send the new
context.

jif · 2026-06-23 20:07:09 +01:00

e476fc16ce

path-uri: clarify host-native path conversion (#29501 )

## Why

Downstream refactors are producing confusing code with this
functionality having a very generic name. Encoding the specific
conversion approach in the method name makes it clearer.

## What

Rename `PathUri::from_path` to `PathUri::from_host_native_path` and
update its Rust call sites.

Adam Perry @ OpenAI · 2026-06-23 00:02:33 +00:00

11fab432be

Report remote sandbox denials semantically (#29424 )

## Why

#29113 moved remote sandbox setup and enforcement to the exec server.
That gives the executor ownership of the platform-specific work: a Linux
executor chooses and runs a Linux sandbox even when the Codex
orchestrator is running on macOS or Windows.

It also means the orchestrator no longer knows which concrete sandbox
the executor selected. When that sandbox blocks a remote command, the
orchestrator currently sees only a failed process and can treat the
denial as an ordinary command failure. The existing sandbox approval and
retry path is then skipped.

This PR lets the executor report one portable fact:

> This command probably failed because the executor sandbox blocked it.

The executor keeps its concrete sandbox type private. The protocol sends
only the semantic result.

## Example

Suppose a local macOS Codex session asks a Linux devbox to write outside
the allowed workspace.

Before this PR:

```text
Linux sandbox blocks the write
    -> remote process exits with "Permission denied"
    -> local orchestrator sees an ordinary command failure
    -> the normal sandbox approval and retry path can be skipped
```

With this PR:

```text
Linux sandbox blocks the write
    -> executor reports sandboxDenied: true
    -> unified exec returns UnifiedExecError::SandboxDenied
    -> the existing approval prompt is shown
    -> an approved retry runs through the existing unsandboxed retry path
```

## What changes

### The executor remembers its selected sandbox

The prepared remote process now retains the executor-selected
`SandboxType`. This value never crosses the executor boundary.

Commands started without a sandbox retain `SandboxType::None` and are
never reported as sandbox denials.

### The executor uses the existing denial heuristic

The existing local denial heuristic moves from `codex-core` into the
shared `codex-sandboxing` crate.

When a sandboxed remote process exits, the executor:

1. waits the same short output grace period used by local unified exec;
2. reads the output currently available in the existing retained output
buffer;
3. runs the existing heuristic using the exit code and common denial
messages;
4. stores the yes/no result before publishing the process exit.

This deliberately matches the old local unified-exec behavior. It does
not add a new streaming classifier, another output buffer, or stronger
output-retention guarantees.

### The protocol reports a portable boolean

`process/read` gains `sandboxDenied`:

```json
{
  "exited": true,
  "exitCode": 1,
  "closed": false,
  "sandboxDenied": true
}
```

The field defaults to `false` when an older executor omits it. The
response does not expose the executor sandbox implementation or
executor-native paths.

### Unified exec uses the existing error path

The exec-server client carries `sandboxDenied` into the unified process
state. If it is true, unified exec returns the existing `SandboxDenied`
error instead of trying to classify remote output using an
orchestrator-side sandbox type.

Remote process exit remains visible as soon as the process exits. This
PR does not wait for stdout or stderr to close and does not change the
existing process lifecycle.

## Scope

This PR is intentionally limited to matching the existing local
unified-exec behavior for the initial command execution path.

It does not add:

- incremental denial tracking across the full output stream;
- new denial handling for commands completed later through
`write_stdin`;
- new guarantees for preserving the semantic flag during the narrow
reconnect-recovery race.

Those can be considered separately if the same behavior is added for
local execution.

## Test coverage

One remote end-to-end integration test covers the complete intended
flow:

```text
remote read-only sandbox
    -> denied write
    -> executor reports the denial
    -> Codex requests approval
    -> user approves
    -> retry succeeds on the remote executor
```

Existing lifecycle coverage continues to verify that remote process exit
is reported before late output streams close.

jif · 2026-06-22 19:33:28 +02:00

9f06cf1a09

Apply sandbox intent inside remote exec servers (#29113 )

## Why

PR #29108 lets the orchestrator send sandbox intent with `process/start`
without wrapping the command for its own operating system.

This PR completes that boundary by making the executor interpret and
enforce the intent using its own filesystem paths and sandbox
implementation.

For example, a macOS TUI targeting a Linux devbox sends `/bin/bash -lc
pwd`. The Linux executor turns that into its own `codex-linux-sandbox
... /bin/bash -lc pwd` launch.

## What changes

- Keep `process/start` unchanged when no sandbox intent is present.
- Convert sandbox `PathUri` values into native paths on the executor.
- Bind symbolic `:workspace_roots` permissions to the executor's native
sandbox cwd.
- Select the sandbox implementation on the executor and wrap the
original command immediately before spawning it.
- Reject sandbox-required execution before spawning when the executor
cannot enforce the intent.
- Pass exec-server runtime paths into process creation so Linux can
locate `codex-linux-sandbox`.

The boundary is therefore:

```text
orchestrator                         executor
original argv + sandbox intent  ->  select and enforce local sandbox
```

This PR intentionally treats a denied remote command as an ordinary
command failure. Draft follow-up #29424 carries a semantic
`sandboxDenied` result back to unified exec for the existing approval
and retry flow.

## Platform scope

Linux and macOS use their existing direct-spawn sandbox transforms.

Windows sandboxed remote process launch is intentionally unsupported in
this PR. The current Windows direct-spawn wrapper does not correctly
preserve arbitrary argv, TTY behavior, or pass the full child
environment out of band. The executor rejects the request instead of
running it incorrectly or unsandboxed.

## Known follow-ups

- The transported permission profile can still contain
orchestrator-materialized helper or explicit paths. A `TODO(jif)` marks
where the executor boundary should receive pre-host-materialization
permission intent.
- The sandbox wrapper currently replaces a requested custom inner
`arg0`. A `TODO(jif)` marks where this must be preserved or rejected
explicitly.
- Draft PR #29424 contains the deferred sandbox-denial classification
and approval/retry behavior.

## Rollout assumption

This executor-sandbox stack is unreleased and its client and executor
are expected to move together. This PR does not add mixed-version
negotiation with older exec servers.

jif · 2026-06-22 12:45:37 +02:00

9c3b10e5d4

Test pipelined scalar exec-server requests (#29325 )

## Summary

This adds focused coverage for the simpler same-connection scalar
request path.

The exec-server connection already supports multiple in-flight JSON-RPC
scalar requests on one connection. This test locks in that behavior by
sending two normal requests before reading either response, without
adding a batch frame or any new API surface.

## What changed

- Added a processor-level test that initializes an exec-server
connection.
- Sends two scalar `environment/info` requests back-to-back on the same
connection.
- Verifies both responses come back on the same connection by request
id.

Checked locally with:

- `just test -p codex-exec-server
connection_accepts_pipelined_scalar_requests`

jif · 2026-06-21 13:40:51 +02:00

1088b30fda

Carry sandbox intent to remote exec servers (#29108 )

## What changed

PR #29099 stopped sending the orchestrator's concrete sandbox wrapper to
a remote exec-server. Remote commands now arrive as plain native argv.

This PR adds the next piece: Codex also sends portable sandbox intent
next to that plain argv.

For a remote unified-exec command, the request can now include:

- the canonical permission profile before local workspace-root
materialization
- the sandbox cwd and workspace roots as `PathUri` values
- Windows sandbox settings
- the legacy Landlock setting
- whether managed networking must be enforced

The important part is that symbolic entries such as `:workspace_roots`
stay symbolic while crossing the boundary. The executor can then bind
them to its own workspace-root paths instead of receiving
orchestrator-local absolute paths.

The data travels through `ExecRequest` into `ExecParams`. Older
exec-servers can still deserialize requests because the new fields have
defaults.

## Why

The orchestrator should not decide how another machine implements
sandboxing.

For example:

- a local macOS Codex would normally build a Seatbelt command
- a remote Linux executor needs a Linux sandbox command instead

The orchestrator now sends the plain command plus the policy it intended
to enforce. A later PR can let the exec-server choose and build the
correct sandbox for its own operating system.

## Important detail

This keeps the portable intent separate from the local `SandboxType`.

`SandboxType::None` is ambiguous:

- it can mean the command was explicitly approved to run without a
sandbox
- it can also mean the orchestrator host has no concrete sandbox
implementation available

Those cases are different for remote execution. This PR adds
`sandbox_requested` so an executor can still receive sandbox intent when
the orchestrator cannot build a local wrapper. Explicit unsandboxed
retries still send no sandbox context.

## Behavior today

This PR only transports the intent. The exec-server accepts the new
fields but does not apply them yet.

Remote commands therefore remain unsandboxed after this PR, just as they
are after PR #29099.

## Follow-up

The next PR will make exec-server read this portable intent, bind
symbolic workspace permissions to executor-native roots, choose the
sandbox for its own operating system, build the wrapper locally, and
then spawn the command.

jif · 2026-06-21 12:33:21 +02:00

bd2968a4db

[3/3] app-server: configure environment connection timeout (#29025 )

## Why

Remote environments registered through `environment/add` currently use
the fixed 10-second WebSocket connection timeout. Slow-starting
executors need a caller-selected connection window, but this should not
add retry policy or couple exec-server behavior to Core’s
`deferred_executor` feature.

Make the timeout an optional part of the existing experimental request.
Existing clients continue using the current default, while callers that
know an executor may take longer can request a larger window explicitly.

Depends on #28683.

## What changed

- Add optional `connectTimeoutMs` to `EnvironmentAddParams` and document
it in the app-server README.
- Pass the optional timeout through `EnvironmentRequestProcessor` into
one `EnvironmentManager::upsert_environment()` path; the manager applies
the existing default when it is omitted.
- Preserve the existing single-attempt lifecycle. The configured value
controls WebSocket connection and handshake time for both initial
connection and later reconnects; initialization retains its separate
timeout.
- Add an app-server integration test that sends the real JSON-RPC
request and verifies a stalled handshake observes the requested timeout.

## Test plan

- `just test -p codex-app-server-protocol`
- `just test -p codex-exec-server`
- `just test -p codex-app-server
environment_add_applies_connect_timeout`

## Rollout

This is additive and does not enable `deferred_executor`. Callers should
send a non-default timeout only after a compatible app-server is
deployed; omitted or `null` values retain the existing 10-second
default.

sayan-oai · 2026-06-19 05:27:45 +00:00

f886e33e5a

[1/3] core: add remote environment connection lifecycle (#28674 )

## Why

Remote environments can be registered before their exec-server is first
used. Starting the connection at registration time uses that startup
window, while sharing one startup result prevents background work and
capability calls from opening competing connections.

Keep initial startup simple: each environment makes one connection
attempt using its configured transport timeout. A failed initial attempt
is final for that environment, while an environment that disconnects
after connecting can still recover on a later operation.

## What changed

- Start URL and Noise environments in the background when they are added
to `EnvironmentManager`. Provider snapshots are fully validated before
connection work begins.
- Share one initial connection attempt and its saved result across
metadata, process, filesystem, and HTTP callers.
- Keep configured stdio environments lazy until first use so
registration does not launch a process.
- Tie background startup work to the environment lifetime so replacing
or dropping an environment cancels unfinished work.
- After an established client disconnects, share one fresh connection
attempt across concurrent callers. A failed attempt fails the current
operation without permanently preventing a later attempt.
- Store the shared lazy client directly on `Environment` and expose
small methods for starting, observing, and awaiting startup.

## Test plan

- `just test -p codex-exec-server`
- `just test -p codex-app-server
turn_start_resolves_sticky_thread_local_environment_and_turn_overrides`

sayan-oai · 2026-06-18 21:50:15 -07:00

41988e6a24

core: load AGENTS.md from foreign environments (#28958 )

## Why

Make it possible to load AGENTS.md from remote exec-servers whose OS is
different than app-server.

## What

- keep `AGENTS.md` discovery and provenance as `PathUri`, with
root-aware parent and ancestor traversal
- expose lifecycle instruction sources as legacy app-server path strings
in events while retaining `PathUri` internally
- preserve and test mixed POSIX and Windows paths in model context and
TUI status output
- cover remote Windows loading end to end by seeding the Wine prefix
through host filesystem APIs
- fix bug in `PathUri`'s parent() implementation that would erase
Windows drive letters

Adam Perry @ OpenAI · 2026-06-18 15:06:23 -07:00

dce673905a

[codex] Initialize exec-server OpenTelemetry at startup (#25019 )

## Summary

- Initialize stderr tracing and the configured OpenTelemetry provider
for local and remote `codex exec-server` startup.
- Instrument the local and remote server entrypoints with a root runtime
span.
- Keep raw Noise environment, registration, and stream identifiers out
of exported spans while preserving them in local debug events.
- Keep telemetry setup in a focused CLI module instead of growing the
top-level command entrypoint.

## Stack

- Previous: none (`#27058` has merged)
- Next: #27466

## Validation

- `just test -p codex-exec-server --lib` (139 passed)
- `just test -p codex-cli --test exec_server` (3 passed)
- `just bazel-lock-check`
- `just fix -p codex-exec-server -p codex-cli`
- `just fmt`

---------

Co-authored-by: Richard Lee <richardlee@openai.com>

starr-openai · 2026-06-18 11:03:42 -07:00

4c7228e423

Recover exec process stdin writes (#28895 )

## Summary

Remote stdio MCP servers send tool calls by writing JSON-RPC bytes
through `process/write`.

When the exec-server websocket drops at the wrong time, the remote
process can survive session recovery, but the stdin write can still fail
back to RMCP as a transport send error. RMCP then closes the stdio MCP
transport, so tools like `node_repl` are lost even though the
process/session recovery path is working.

This changes `process/write` to be safe to retry across exec-server
recovery:

- adds a required `writeId` to `process/write`
- retries remote `Session::write` with the same `writeId` after
reconnect
- remembers accepted write ids per process so duplicate retries return
`Accepted` without writing the same bytes to child stdin again
- covers both the client retry path and server-side write id dedupe with
tests

In simple terms:

```text
before:
write to MCP stdin -> websocket closes -> write errors -> RMCP closes node_repl

after:
write to MCP stdin -> websocket closes -> reconnect -> retry same writeId
server either writes once or recognizes it already did
```

jif · 2026-06-18 19:04:26 +02:00

83e6a786a2

Add network environment ID plumbing (#28766 )

## Why

Prepare network approval scoping to distinguish execution environments
without changing behavior yet.

## What changed

- Add optional environment IDs to network policy requests.
- Add optional network environment IDs to exec and sandbox request
structs.
- Thread default None values through existing construction points.
- Fix stale constructor call sites that caused the CI compile failures.

## Not included

- Per-environment proxy listeners.
- Network approval cache or prompt behavior changes.
- Ambiguous request attribution handling.

Those behavior changes moved to stacked follow-up #28899.

## Validation

- just fmt
- CI will run tests and clippy

jif · 2026-06-18 14:09:38 +02:00

0369b24d54

Refresh signed exec-server URLs on reconnect (#28374 )

## Summary

- add a provider API that supplies a fresh signed WebSocket URL for each
remote exec-server connection
- refresh the signed URL after disconnects and retry once when a
handshake returns `401 Unauthorized`
- allow `EnvironmentManager` consumers to register remote environments
backed by the URL provider

## Tests

- `just test -p codex-exec-server -E
'test(remote_websocket_client_refreshes_url_after_unauthorized_handshake)
| test(remote_websocket_client_refreshes_url_after_disconnect)'` — 2
passed
- `cargo check -p codex-core-api` — passed
- `just fix -p codex-exec-server` — passed
- `just fix -p codex-core-api` — no test targets; no-op
- `just fmt` — passed
- `just test -p codex-exec-server` — 187 passed; 32 unrelated macOS
sandbox tests could not invoke nested `sandbox-exec` (`Operation not
permitted`)

Anton Panasenko · 2026-06-17 20:58:48 -07:00

ac3fe64100

feat(exec-server): add Noise rendezvous environment (#28774 )

## Why

Codex can run a remote exec server through the Noise relay, but the
normal
environment-manager path could not establish an
environment-registry-backed
harness connection. Signed rendezvous URLs and harness authorizations
are
short-lived, so reconnects must fetch a fresh bundle instead of
retaining
stale connection credentials. A stalled registry request must also fail
within
the regular remote connection deadline, without exposing these
credentials in
debug logs.

Issue: N/A (internal environment-service integration).

## What Changed

- Add environment-manager configuration for a registry-backed Noise
rendezvous
  environment.
- Request a fresh bundle from
`/cloud/environment/{environment_id}/connect` for every physical harness
  connection, using the existing 10-second remote connection timeout.
- Share the Environment Registry register, connect, and validate wire
payloads
  through `codex-exec-server` and `codex-core-api`.
- Redact the signed rendezvous URL and harness authorization from the
public
  connect response's `Debug` output.
- Add focused coverage for registry bundle retrieval, stalled requests,
and
  credential redaction.

Anton Panasenko · 2026-06-17 17:20:53 -07:00

c274a83f8b

exec-server: expose environment registry payloads (#28651 )

## Why

Services that proxy the exec-server environment registry endpoints need
to deserialize and forward the same Noise registration and harness-key
validation payloads. Those wire models currently live as private,
serialize-only structs in `exec-server`, which forces consumers to
duplicate the contract.

## What changed

- Add owned serde models for registration and harness-key validation
requests and responses.
- Use those models in the existing exec-server registry client.
- Re-export the models from `codex-exec-server` and `codex-core-api`.
- Keep the harness authorization request free of a derived `Debug`
implementation so it is not accidentally logged.

## Testing

- Focused exec-server registration and harness-key validation tests: 2
passed.
- `cargo check -p codex-core-api`

The full `codex-exec-server` suite compiled and ran 254 tests: 222
passed, while 32 existing filesystem sandbox tests could not run under
the nested macOS sandbox (`sandbox_apply: Operation not permitted`).

Co-authored-by: Codex <noreply@openai.com>

viyatb-oai · 2026-06-17 13:27:25 -07:00

a0586ad12d

unified-exec: preserve PathUri through exec-server (#28681 )

## Why

It should be possible for app-server to handle "foreign" OS paths in
unified_exec working directories, allowing e.g. a Linux app-server to
run processes on e.g. a Windows exec-server.

## What

Convert the core unified_exec cwd values to use `PathUri`.

Adds fallible path conversion in several places to try to minimize the
scope of this change. The only time this change suppresses errors from
converting `PathUri` to an `AbsolutePathBuf` is when the turn is
configured with no sandboxing at all to allow us to make progress
testing without sandboxing.

Future changes to apply_patch and sandboxing will clean up these error
paths.

A tool's cwd is resolved from joining a model-provided workdir to the
environment's cwd. When using `AbsolutePathBuf::join()`, an
absolute-path workdir would overwrite the environment's cwd and we would
resolve permissions/sandboxing against the model-provided path. This
change extends `PathUri::join()` to also treat an absolute rhs as an
override of the base/lhs.

This also removes some coverage from the remove_env_windows tests until
a follow-up converts foreign paths in command exec events correctly.

## Breaking Changes

When using `AbsolutePathBuf::join()` for workdir resolution, we ended up
resolving tilde-prefixed paths against the app-server's `$HOME`, e.g.
`~/foo/bar` becomes `/home/anp/foo/bar`. It's difficult to do this with
`PathUri` joining, so after offline discussion this PR no longer
implements it.

A quick check of some power users' rollouts suggests that models don't
actually generate home-prefixed absolute working directories for their
spawns, so this shouldn't have any real blast radius.

Adam Perry @ OpenAI · 2026-06-17 19:36:16 +00:00

5867b529ae

Run fs helper through Windows sandbox wrapper (#28359 )

## Why

This is the final PR in the Windows fs-helper sandbox stack and contains
the actual bug fix.

The exec-server filesystem helper is a direct-spawn path: it asks
`SandboxManager` for a `SandboxExecRequest`, then launches the returned
argv itself. That works on macOS and Linux because the transformed argv
is already a self-contained sandbox wrapper. On Windows, the transformed
request carried `WindowsRestrictedToken` metadata, but the direct-spawn
fs-helper runner still launched the helper argv directly.

That means Windows filesystem built-ins backed by the fs-helper could
run with the parent Codex process permissions instead of the configured
Windows sandbox. This PR makes the direct-spawn transform produce a
self-contained Windows wrapper argv before fs-helper launches it.

## What Changed

- Added `SandboxManager::transform_for_direct_spawn()` for callers that
launch the returned argv themselves.
- Wrapped Windows restricted-token direct-spawn requests with `codex.exe
--run-as-windows-sandbox` and then marked the outer request as
unsandboxed, matching the macOS/Linux wrapper argv shape.
- Updated `exec-server/src/fs_sandbox.rs` to use the direct-spawn
transform for fs-helper launches.
- Materialized the inner `codex.exe --codex-run-as-fs-helper` executable
into `.sandbox-bin` so the sandboxed user can run it.
- Carried runtime workspace roots through `FileSystemSandboxContext` as
`PathUri` values so `:workspace_roots` policies resolve correctly
without sending native client paths over exec-server JSON.
- Preserved wrapper setup identity environment needed by Windows sandbox
setup without changing the serialized inner helper environment.

## Verification

- `just bazel-lock-update`
- `just bazel-lock-check`
- `just test -p codex-sandboxing transform_for_direct_spawn_windows`
- `just test -p codex-exec-server fs_sandbox::tests`
- `just fix -p codex-windows-sandbox -p codex-sandboxing -p
codex-exec-server -p codex-core -p codex-file-system`

Local note: `just fmt` completed Rust formatting, but this workstation
still fails the non-Rust formatter phases because uv cannot open its
cache and the local buildifier/dotslash path is missing.

iceweasel-oai · 2026-06-17 10:00:42 -07:00

ef75171f18

Back off registry retries during exec recovery (#28546 )

## Why

PR #28512 retries a failed session recovery every 100 ms. Every Noise
recovery attempt first asks the environment registry for a fresh
connection bundle, even when the eventual failure comes from the
WebSocket or initialize handshake. During an outage, that could make
each disconnected client call the registry about 250 times during the
25-second recovery window.

## What changes

All retryable Noise recovery failures now use a separate backoff
schedule:

```text
base:    500 ms -> 1 s -> 2 s -> 4 s -> 5 s maximum
actual:  500-750 ms, 1-1.5 s, 2-3 s, 4-6 s, 5-7.5 s
```

The extra 0-50% is deterministic per-session jitter so disconnected
clients do not retry together. Direct WebSocket recovery keeps the
existing 100 ms retry because it does not re-enter the registry.

jif · 2026-06-17 11:52:23 +02:00

a5229e0686

Resume exec-server sessions after disconnect (#28512 )

Supersedes #28288 (closed).

## Why

A short WebSocket interruption currently ends every client-side process
handle, even though exec-server keeps the server session and its
processes alive for a short time.

This is especially visible for executor-backed stdio MCP servers: a
temporary connection loss becomes a permanent `Transport closed` error.
The server already has the information needed to resume the session, but
the client opens a fresh session instead of using it.

This change reconnects below the process and MCP layers. Existing
process handles stay valid, missed output is recovered, and the same
server-side processes continue running.

## State machine

One logical `ExecServerClient` stays alive while its underlying RPC
connection changes generations.

```text
                         transport closes
       +------------------------------------------------+
       |                                                v
+-------------+                                  +-------------+
|  Connected  |                                  | Recovering  |
+-------------+                                  +-------------+
       ^                                                |
       | session resumed, processes caught up           | retryable error
       +------------------------------------------------+ loops until deadline
                                                        |
                                                        | deadline or permanent error
                                                        v
                                                  +-------------+
                                                  |   Failed    |
                                                  +-------------+
```

### `Connected`

- New RPC calls use the current connection.
- Process notifications are published in sequence order.
- A disconnect only starts recovery if it came from the current
connection generation. Late events from older generations cannot replace
the active connection.

### `Recovering`

- New calls wait instead of choosing a half-connected RPC client.
- Existing process handles, wake subscriptions, and event subscriptions
stay open.
- Streaming HTTP response bodies fail immediately because their byte
streams cannot be resumed safely.
- Recovery first waits for process starts that were already in flight. A
start whose result became ambiguous is cleaned up after reconnection
instead of being silently adopted.
- The client reconnects with the learned `session_id`. The server may
briefly report that the old connection is still attached, so that error
is retried until the detach finishes.
- The notification consumer starts before the resume handshake
completes. This prevents a busy process from filling the notification
queue and blocking the initialize response.
- Before installing the new connection, the client catches up every
recoverable process with `process/read`.

### `Failed`

- Recovery stops after 25 seconds or after a permanent error.
- Waiting calls are released with one stable disconnect error.
- Existing process sessions receive a terminal failure instead of
waiting forever.

## Recovering process events

Output, exit, and close events share one sequence. During normal
operation, the client buffers early events until every lower sequence
has been published.

After reconnection, the client reads each process starting after its
last published sequence:

1. Retained output chunks are inserted by sequence number.
2. Exit and close state are reconstructed in their sequence positions.
3. Events already received as live notifications are ignored as
duplicates.
4. Newly contiguous events are published in order.
5. If the server no longer retains enough output to fill a sequence gap,
only that process is terminated and failed. The recovered connection
remains usable for other processes.

The server reports its full next event sequence for unbounded reads,
including exit and close events. Closed processes remain readable for
the same 30-second window used to retain detached sessions.

## Other details

- Detached server sessions are retained for 30 seconds, leaving margin
around the client's 25-second recovery deadline.
- Session attach and detach update the active notification sender under
the same attachment lock, so an old connection cannot clear a newly
attached sender.
- A dedicated error code distinguishes the temporary "session is still
attached" race from permanent initialization errors.
- Process starts are identity-checked on both client and server. Cleanup
from an older start cannot remove a newer process that reused the same
ID.
- Mutating requests that were already in flight when the transport
closed are not replayed, because the client cannot know whether the
server applied them. Requests started after recovery is known wait for
the replacement connection.
- We assume the server/client version stays in sync (on the before/after
this PR)

## User impact

Long-running commands and stdio MCP servers can survive a temporary
exec-server WebSocket interruption without changing process IDs or
losing output produced during the outage.

jif · 2026-06-17 10:20:39 +02:00

cf17e1bc20

[codex] exec-server: stream files in chunks (#28354 )

## Why

`fs/readFile` buffers the entire file in one response, which makes large
remote reads expensive and prevents callers from applying backpressure.
We need an opt-in streaming path with bounded block sizes while
preserving the existing single-call API for small and sandboxed reads.

## What changed

- Add `ExecServerClient::stream`, returning a named `FileReadStream`
that implements `futures::Stream` and yields immutable 1 MiB byte
blocks.
- Add internal `fs/open`, `fs/readBlock`, and `fs/close` RPCs.
`fs/readBlock` accepts an explicit offset and length.
- Keep unsandboxed files open between block reads, cap open handles per
connection, and clean them up on EOF, error, stream drop, explicit
close, or connection shutdown.
- Reject platform-sandboxed streaming opens instead of turning the
one-shot sandbox helper into a persistent server. Existing `fs/readFile`
behavior is unchanged.

## Testing

- `just test -p codex-exec-server`
- Integration coverage for 1 MiB chunking, exact block-boundary EOF,
sandbox rejection, and continued reads from the opened file after path
replacement.
- Handle-manager coverage for non-sequential offsets, variable block
lengths, the 128-handle limit, and capacity release after close.

pakrym-oai · 2026-06-16 09:50:55 -07:00

a4711b88dd

path-uri: clarify invalid host path errors (#28473 )

## Why

Ensure a consistent string format when exposing path conversion errors
to the model.

## What

- Render `PathUriParseError::InvalidFileUriPath` as `'$PATH' is invalid
on '$OS'`.

Adam Perry @ OpenAI · 2026-06-16 09:03:44 -07:00

7162030b37

[codex] Use expect in integration tests (#28441 )

The workspace denies `clippy::expect_used` in production. Although
`clippy.toml` allows `expect` in tests, Bazel Clippy compiles
integration-test helper code in a way that does not receive that
exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
and equivalent `match`/`let else` forms.

This allows `clippy::expect_used` once at each integration-test crate
root (including aggregated suites and test-support libraries), then
replaces manual panic-based Result and Option unwraps with
`expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
crate roots. Intentional assertion and unexpected-variant panics remain
unchanged, and the production `expect_used = "deny"` lint remains in
place.

The cleanup is mechanical and net-negative in line count.

pakrym-oai · 2026-06-15 21:53:47 -07:00

e752f7b4ae

exec-server: default remote transport to Noise (#26245 )

## Why

The transport in
[openai/codex#26242](https://github.com/openai/codex/pull/26242) needs
to be used by every remote orchestrator-to-executor connection before
JSON-RPC traffic starts.

## Changes

- Generates one executor Noise identity when remote exec-server starts
and registers its public key.
- Creates a harness identity for each physical remote environment
connection.
- Fetches a fresh registry bundle before connecting and validates the
authenticated harness key before completing the executor handshake.
- Multiplexes encrypted logical streams over the existing executor
WebSocket.
- Adds bounded stream, handshake-failure, and reassembly state.
- Adds safe lifecycle diagnostics without logging keys, authorizations,
plaintext, or ciphertext.
- Covers reconnects, replay rejection, validation failure, framing
limits, and encrypted JSON-RPC tool traffic.

## Stack

1. [openai/codex#26242](https://github.com/openai/codex/pull/26242):
Noise channel and relay transport
2. **[openai/codex#26245](https://github.com/openai/codex/pull/26245)**:
remote registration and runtime activation

## Verification

- `just test -p codex-exec-server`
- `just fix -p codex-exec-server`
- `just bazel-lock-check`
- `cargo shear`

---------

Co-authored-by: Codex <noreply@openai.com>

viyatb-oai · 2026-06-15 17:39:00 -07:00

6e50b22e55

Run core integration tests against a Wine-backed Windows executor (#28401 )

## Why

We want to exercise a linux app-server against a windows exec-server
without having to repeat every test case. This approach has slight
precedent in the remote docker test setup.

## What

Run the shared `codex-core` integration suite against Windows
exec-server behavior from Linux. This makes cross-OS path and shell
regressions visible while keeping unsupported cases owned by individual
tests.

- Add `local`, `docker`, and `wine-exec` test environment selection with
legacy Docker compatibility.
- Extend `codex_rust_crate` to generate a sharded Wine-exec variant
using a cross-built Windows server and pinned Bazel Wine/PowerShell
runtimes.
- Teach remote-aware helpers about Windows paths and track temporary
incompatibilities with source-local `skip_if_wine_exec!` calls and
follow-up reasons.

Adam Perry @ OpenAI · 2026-06-16 00:38:41 +00:00

1fe89de576

Use PathUri in filesystem permission paths for exec-server (#28165 )

## Why

Progress towards letting app-server and exec-server run on different
platforms, specifically for sandbox configuration.

## What

- Make the filesystem path containment hierarchy generic, defaulting to
`AbsolutePathBuf` for now.
- Have clients specify `AbsolutePathBuf` or `PathUri` directly where
needed.
- Use `PathUri` throughout exec-server filesystem protocol and trait
boundaries.
- Implement `From` for conversion to path URIs and `TryFrom` for
fallible conversion to absolute paths through the generic type
hierarchy.

Adam Perry @ OpenAI · 2026-06-15 23:55:23 +00:00

46f17930b6

exec-server: add Noise relay transport (#26242 )

## Why

Rendezvous forwards traffic between the orchestrator and exec-server.
The endpoints need to authenticate each other and encrypt that traffic
without trusting Rendezvous with plaintext or endpoint keys.

## Changes

- Adds a hybrid Noise IK channel through Clatter using X25519,
ML-KEM-768, AES-256-GCM, and SHA-256.
- Binds each handshake to `environment_id`, `executor_registration_id`,
and `stream_id`.
- Pins the registry-provided executor key and carries the harness
authorization inside the encrypted handshake.
- Orders relay frames before consuming Noise nonces and fragments large
JSON-RPC messages into bounded records.
- Bounds handshake payloads, frames, streams, and message reassembly.

Runtime activation is in
[openai/codex#26245](https://github.com/openai/codex/pull/26245).

## Stack

1. **[openai/codex#26242](https://github.com/openai/codex/pull/26242)**:
Noise channel and relay transport
2. [openai/codex#26245](https://github.com/openai/codex/pull/26245):
remote registration and runtime activation

## Verification

- `just test -p codex-exec-server`
- Oversized initiator payload regression coverage
- `just fix -p codex-exec-server`
- `just bazel-lock-check`
- `cargo shear`

---------

Co-authored-by: Codex <noreply@openai.com>

viyatb-oai · 2026-06-15 16:39:41 -07:00

428cd44154

chore: restore exec-server relay keepalives (#28286 )

## Why

The ws pump refactor removed the relay keepalive timers that had been
added to keep idle rendezvous connections alive. An idle relay could
therefore be closed by the rendezvous service or a load balancer,
disconnecting executor-backed MCP processes.

## What

- restore periodic WebSocket ping frames on both rendezvous relay
endpoints
- keep missed-tick behavior bounded with `MissedTickBehavior::Skip`
- cover the harness and remote-environment pumps with focused
traffic-after-keepalive tests

jif · 2026-06-15 17:24:36 +02:00

bbcfed8ac2

[codex] exec-server honors remote environment cwd and shell (#28122 )

## Why

Next slice needed to make progress on the `remote_env_windows` test is
to support passing a Windows cwd for the remote environment and using
that environment's native shell. This lets the test run a real Windows
process instead of only recording an early path or shell mismatch.

## What

- change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to
`PathUri`
- convert local cwd values to URIs when constructing selections
- preserve a remote primary cwd instead of replacing it with the local
legacy fallback
- prefer the selected environment's discovered shell for unified exec,
falling back to the session shell when unavailable
- convert back to a host-native absolute path at current native-only
consumer boundaries
- reject or deny unsupported foreign cwd values at the existing
request-permissions boundary, with TODOs for its future migration
- extend the hermetic Wine test to execute Windows PowerShell in
`C:\windows` and verify successful process completion
- record the current app-server rejection against the same Wine-backed
remote Windows fixture when its cwd is supplied as a native Windows path

Adam Perry @ OpenAI · 2026-06-14 06:07:46 +00:00

efbd00f21f

build: run buildifier from just fmt (#28125 )

## Intent

Keep Bazel and Starlark files consistently formatted without requiring
contributors to install or version buildifier themselves.

## Implementation

- Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
v8.5.1.
- Run buildifier from the shared `just fmt` and `just fmt-check` driver,
with Windows-safe explicit DotSlash invocation.
- Provision DotSlash in formatting CI and contributor devcontainers, and
document the source-build prerequisite.
- Apply the initial mechanical buildifier formatting baseline.

Adam Perry @ OpenAI · 2026-06-13 21:43:39 -07:00

740c4f269d

[codex] Carry exec-server cwd as PathUri (#28032 )

## Why

This is the second-to-last place in the exec-server protocol that needs
to migrate to URIs to support cross-OS operation.

## What

- Change `ExecParams.cwd` to `PathUri`.
- Keep the cwd URI-shaped through core and rmcp producers, converting it
to `AbsolutePathBuf` only in `LocalProcess::start_process`.
- Reject non-native cwd URIs before launch and update the affected
protocol documentation and call sites.

Adam Perry @ OpenAI · 2026-06-13 20:56:42 +00:00

0fed4497f5

[codex] Add hermetic Wine exec-server test (#27937 )

## Why

We want to make it possible for an app-server orchestrator on one OS to
control an exec-server on another host running a different OS. In
practice this kinda already works if you get lucky and the two hosts
have the same path format, but we mangle quite a lot of operations if
either end is Windows.

This test starts exercising that interaction, although right now the
initial bootstrap fails. Future changes will expand the test's
assertions to match improved support.

## What

Stacked on #27964. This adds a small Windows exec-server fixture and a
Linux protocol smoke test using the reusable Wine harness, covering
Windows environment discovery, non-TTY `cmd.exe` execution, output, exit
status, and working directory.

Once we've got the full codex binary cross-building under Bazel we could
consider moving to the real binary instead of the stripped down
exec-server-only binary used here.

Adam Perry @ OpenAI · 2026-06-12 20:20:23 -07:00

9d938a46d9

198 Commits