codex

[codex] narrow unused skills intro export (#29991 )

## Summary

- stop publicly re-exporting the internally used
`SKILLS_INTRO_WITH_ALIASES` constant
- keep the constant and all skills rendering behavior unchanged
- preserve every integration helper, API, fixture, assertion, and module
used by tests

## Scope guardrails

This revision keeps all remote/network-facing functionality and every
line introduced by `jif <jif@openai.com>`.

Following the test-preservation audit, it also restores the in-process
RMCP test transport, the original `codex-mcp` fixture,
`PluginLoadOutcome::effective_skill_roots` and its assertions, the
`EffectiveSkillRoots` API family, the test-only apps renderer, and the
TUI dead-code annotation. Those files now match the PR base exactly.

No test imports or directly references the remaining public skills
export being narrowed.

## Validation

- repository-wide test-reference audit: no test-used code remains
deleted or narrowed
- deleted-line `git blame` audit: zero Jif-authored deletions
- `cargo test -p codex-core-plugins -p codex-mcp -p codex-rmcp-client
--lib`: 467 passed
- `cargo test -p codex-core --lib apps::render`: 2 passed
- `cargo test -p codex-core-skills --lib render::tests`: 19 passed
- `cargo check -p codex-core-skills --all-targets`: passed
- `just fix -p codex-core-skills`: passed
- `just fmt`: passed
- `git diff --check`: passed

The full local `codex-core-skills` suite passed 106/108 tests; two
loader tests detected an ambient repository skills root outside the
package and failed their isolation assertions. The scoped renderer suite
and all-target compile pass, and CI runs in an isolated environment.

Final code delta: 1 insertion, 2 deletions across 2 files.

Ahmed Ibrahim · 2026-06-26 05:52:04 -07:00

914c8eeb4e

Test selected capabilities across unavailable resume (#30215 )

## Why

The selected-capability integration test already covers initial
attachment and cold resume, but it resumes while the selected executor
is still reachable.

That leaves an important World State transition untested: a thread
remembers its selected capability root, resumes while that environment
is unavailable, and later sees the same stable environment return.

## What this tests

This extends the existing end-to-end scenario:

```text
selected executor available
        ↓
app-server stops and the executor goes away
        ↓
thread resumes with the executor unavailable
        ↓
skills, selected MCP tools, and connector attribution are absent
        ↓
the same environment ID is attached again
        ↓
skills, MCP tools, and connector attribution return
```

The test also checks that the unavailable snapshot explicitly tells the
model that no selected-environment skills are currently available. After
reattachment, it invokes the selected skill again and verifies that a
new executor-owned MCP process starts.

## Scope

This is test-only. It keeps the existing assumption that an environment
ID refers to stable capability contents. It does not add package-file
invalidation or live transport reconnect behavior.

jif · 2026-06-26 11:02:27 +01:00

3c03bb4f18

Reuse MCP runtimes when selected availability changes nothing (#30148 )

## Why

MCP runtime reuse was keyed by every ready selected-capability
environment, even when an environment contributed no MCP servers or
connectors.

For example:

1. a global stdio MCP is running;
2. a selected remote environment contains only a skill;
3. that environment becomes ready;
4. the MCP and connector projection stays exactly the same;
5. Codex nevertheless rebuilds the MCP manager and restarts the global
stdio process.

That restart can interrupt active calls and discard process-local state
even though nothing about MCP changed.

## What changes

When selected-environment availability changes, Codex now resolves the
candidate MCP and connector projection before deciding whether to
replace the runtime:

- if the winning MCP servers or their ownership change, rebuild as
before;
- if the selected connector snapshot changes, rebuild as before;
- if an enabled MCP is explicitly bound to an environment whose
availability changed, rebuild as before;
- otherwise, keep the exact live manager and processes, and update only
the availability input remembered by the snapshot.

```text
ready selected environments:  [] -> [skills-env]
resolved MCP servers:          {global_probe} -> {global_probe}
resolved connectors:           {} -> {}
result:                         reuse manager; keep the same process
```

The comparison uses the resolved winning servers and their sources, so
plugin/config ownership remains part of the runtime identity.

## Existing stack coverage

The integration PR directly below this one already covers both rebuild
boundaries: a selected MCP becomes callable and a selected connector
tool becomes model-visible when their environment becomes available. It
also verifies that an unchanged selected MCP runtime keeps its process.

This PR does not add another remote-attachment integration scenario for
the no-change optimization. `environment/add` returns before readiness,
and app-server does not currently expose a deterministic readiness
signal for an environment that contributes only skills. Keeping a
fixed-delay test would add flake risk; adding a new readiness API would
be outside this fix.

## Scope and assumptions

- This does not change skill discovery, World State rendering, or plugin
metadata caching.
- This does not add file watching or hot reload behavior.
- This does not change disconnect/reconnect handling.
- Selected environment IDs and their capability contents retain the
stack's existing stability assumption.
- Delayed `required = true` executor MCP behavior remains out of scope.

jif · 2026-06-26 09:27:41 +01:00

6d2168f06a

[codex] fix CreateThreadParams test initializer (#30198 )

## Summary

- initialize `selected_capability_roots` in the new
`attach_in_memory_thread_store` test helper
- restore `codex-core` test compilation on `main`

## Root cause

[#30144](https://github.com/openai/codex/pull/30144) added the helper
from commit `0c3d0742`, whose parent was `c38b2e9b`. That branch was
based before [#29856](https://github.com/openai/codex/pull/29856) added
`selected_capability_roots` as a required field on `CreateThreadParams`.

The PR's Rust and Bazel workflows both passed against the stale branch
head `0c3d0742`. When #30144 was squashed onto newer `main`, its
initializer was integrated alongside the required field from #29856,
producing `E0063` in `core/src/session/tests.rs`. Because those
workflows tested the branch head rather than the integrated merge
result, they did not see the version-skew failure before merge.

## Impact

Any job that compiles the `codex-core` library tests fails, which turned
the main-branch `rust-ci-full` and `Bazel` workflows red across
platforms and blocks unrelated focused core tests. This change only
completes the test initializer; it does not alter production behavior or
workflow configuration.

## Validation

- `just fmt`
- `just test -p codex-core
turn_complete_flushes_terminal_event_after_delivery` (1 passed, 2909
skipped)
- `git diff --check`

Adam Perry @ OpenAI · 2026-06-26 08:47:27 +01:00

451c0a437f

[codex] wire process-owned code mode host into core (#30142 )

## Summary

- add the `code_mode_host` feature flag and select
`ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
- initialize code-mode sessions lazily so a missing host reports a tool
error without failing thread startup
- resolve `codex-code-mode-host` beside the running Codex binary by
default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
- add unit and end-to-end coverage for host resolution and graceful
missing-host behavior

## Why

This wires the process-owned session client from #30112 into the core
service behind an opt-in rollout gate. Packaged Codex installations can
place the helper in the same `bin` directory as the main executable
without relying on `PATH`, while development and custom installations
can continue to override the helper path.

## Stack

- Depends on #30112
- Base branch: `cconger/process-owned-session-runtime-4-client`

## Validation

Build `codex` and `codex-code-mode-host`
`CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
./target/debug/codex --enable code_mode_host`

Channing Conger · 2026-06-26 00:23:33 -07:00

7d8906b478

[codex] add process-owned code-mode session client (#30112 )

## Summary

- add `ProcessOwnedCodeModeSessionProvider` and logical session
generation/rebinding state
- add the supervised child-process connection, reader/writer tasks, and
driver state machine
- make dropped execute/wait/open callers cancellation-safe with explicit
ownership handoff and durable cleanup
- validate cell/delegate lifecycle state and reject invalid protocol
transitions
- add end-to-end stdio coverage for delegates, cancellation, frame
limits, child loss, stale generations, replacement, and long-lived
sessions

## Why

This final stage exposes the process-owned client only after the wire
protocol, host-safe runtime, and standalone host are independently in
place. Transport failure is fail-stop: the client closes local state,
cancels callbacks, reaps the child, and lazily rebuilds a fresh host
generation rather than transactionally recovering the old connection.

## Stack

This is **4 of 4** in the process-owned code-mode session stack.

- Depends on #30111
- Full stack: #30108 → #30110 → #30111 → this PR

## Validation

- `just test -p codex-code-mode -p codex-code-mode-host` — 86 passed
- `just fix -p codex-code-mode`
- `just fix -p codex-code-mode-host`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `bazel test //codex-rs/code-mode:code-mode-unit-tests
//codex-rs/code-mode-host:code-mode-host-unit-tests
//codex-rs/code-mode-host:code-mode-host-stdio-test
//codex-rs/code-mode-protocol:code-mode-protocol-unit-tests` — 4/4
passed
- `just fmt`

Channing Conger · 2026-06-25 23:46:17 -07:00

ab16046c88

Persist Cloudflare affinity cookies for MCP HTTP (#29516 )

[Codex Thread
019ef1f9-36e2-7e91-9337-504f097b9dc1](https://codex-thread-link.openai.chatgpt-team.site/thread/019ef1f9-36e2-7e91-9337-504f097b9dc1)

## Why

Hosted plugin-service Streamable HTTP MCP traffic uses
`https://chatgpt.com/backend-api/ps/mcp` and depends on Cloudflare's
`__cflb` cookie for load-balancer affinity. The local and exec-server
`http/request` path built a fresh reqwest client for each request
without installing Codex's existing shared ChatGPT Cloudflare cookie
store, so affinity could be lost between calls.

This is an affinity-hardening change motivated by an incident
investigation. It does not establish the broader connector-cache
incident RCA or claim to fix that incident in full.

## What changed

- Install the existing process-local, strictly allowlisted ChatGPT
Cloudflare cookie store on the reqwest client used by
`ReqwestHttpClient`.
- Fresh clients now share allowed Cloudflare infrastructure cookies
within the process that originates the local or exec-server network
request.
- Keep the existing HTTPS ChatGPT-host and Cloudflare-cookie-name
restrictions. This does not introduce a general cookie jar or send
ChatGPT Cloudflare cookies to unrelated hosts.

## Test coverage

- `codex-client` unit coverage verifies that the existing strict store
accepts and returns `__cflb` for HTTPS ChatGPT URLs.
- The exec-server HTTPS integration test sends four independent
`http/request` calls through a local TLS-intercepting proxy and verifies
that:
- `Set-Cookie: __cflb=west` is sent on the next plugin-service request;
- a later `Set-Cookie: __cflb=central` replaces the stored value;
- non-Cloudflare session cookies are discarded;
- no stored ChatGPT Cloudflare cookie is sent to a non-ChatGPT host.
- `just test -p codex-client` — 38 passed.
- `just test -p codex-exec-server --test chatgpt_cloudflare_affinity` —
1 passed.
- `just bazel-lock-check` — passed.

## Non-goals

- No persistence of ChatGPT auth, account, session, residency, or
arbitrary cookies.
- No cookie persistence for third-party MCP servers.
- No special composition of caller-provided `Cookie` headers.
- No plugin-service, connector-cache, Habitat/habicache, routing,
redirect, or API-contract changes.
- No broader incident RCA conclusions.

stevenlee-oai · 2026-06-26 02:23:24 -04:00

b5866eebd6

Retry failed Codex Apps MCP startup (#29920 )

## Problem

The built-in Codex Apps MCP client shares a future for the full startup
operation: connect, complete `initialize`, fetch the initial tools, and
return a usable client. Sharing deduplicates startup work, but it also
memoizes terminal errors.

After a transient connection, handshake, or initial `tools/list`
failure, later tool builds observe the same failed future. The thread
cannot reconnect after the backend recovers and continues serving its
startup-time cached tool snapshot, which may be empty or stale.

## Fix

When Apps MCP startup ends in an error, Codex starts bounded recovery
without putting startup latency on tool-router construction:

1. The current tool build immediately continues with the cached startup
snapshot.
2. After the initial failure is reported, Codex starts one fresh full
startup attempt in the background.
3. Concurrent tool builds share that in-flight attempt and also continue
with cached tools.
4. On success, the recovered client becomes active, refreshes the Apps
tools cache, emits a `Ready` startup status, and is reused by later
operations.
5. On failure, the cache remains unchanged and later tool builds may
start another background attempt after exponential cooldown: 1s, 2s, 4s,
8s, 16s, then 30s maximum.

Each recreated startup performs a fresh MCP `initialize` and uncached
`tools/list`. The MCP client retains its existing bounded retries for
retryable `initialize` and `tools/list` failures.

This avoids adding the Apps startup timeout to every request during a
sustained outage.

## Scope

This is limited to the built-in Codex Apps MCP client:

- no reconnects for user-configured MCP servers;
- no cache deletion; and
- no proactive refresh for a healthy client with stale tools.

## Tests

Coverage verifies:

- tool builds return cached tools without waiting for a blocked
reconnect;
- concurrent tool builds start only one background reconnect;
- failed reconnects preserve cached tools and respect exponential
cooldown;
- a recovered client is retained and reused; and
- a long-lived thread exposes recovered app tools on a later follow-up.

Validation:

- `just test -p codex-mcp` — 95 passed
- `just test -p codex-core
later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
--no-capture` — passed
- `just fix -p codex-mcp`
- `just fmt`

kbazzi · 2026-06-25 21:31:12 -07:00

92d2e1df70

[codex] fix terminal rollout event durability (#30144 )

Currently session code does not flush the thread store after appending
the `TurnComplete` / `TurnAborted` events.

This isn't a problem in practice for local storage because append_items
itself effectively blocks, but any thread stores that buffer in
append_items and only commit on flush effectively never get these events
persisted.

The fix adds explicit rollout flushes at the terminal emitters after
normal completion and interruption.

Added test cases that assert the number of flushes when completing or
aborting turns. These are admittedly a little brittle and I'm open to
better ideas on how to add automated testing.

Tom · 2026-06-25 21:01:11 -07:00

f5f812389e

Test selected capabilities across availability and resume (#30157 )

## Why

This stack crosses World State, executor skills, selected plugin
metadata, MCP processes, connectors, dynamic environments, and resume.
This PR adds two end-to-end scenarios that validate those pieces
together.

Both tests enable `deferred_executor`, so they exercise the real
delayed-environment path.

## Scenario 1: availability across turns and resume

```text
1. Start a thread with one selected plugin root bound to E1.
2. E1 is unavailable.
   - executor skill is absent
   - selected MCP is absent
   - connector has no selected-plugin attribution
3. Start E1 and register the same stable environment ID.
4. Start a new turn.
   - the executor skill appears through World State
   - its body beats a colliding host skill
   - the selected MCP tool is advertised and executes inside E1
   - the connector is attributed to the selected plugin
5. Start another turn without changing E1.
   - the MCP PID stays the same, proving runtime reuse
6. Restart app-server and resume the thread.
   - durable selected-root intent is restored
   - skills, MCP, and connector attribution are restored
   - a new MCP PID proves ephemeral process state was rebuilt
```

## Scenario 2: availability changes inside one turn

```text
1. Start a turn while E1 is unavailable.
2. The first model sample sees no executor skill, MCP, or selected connector.
3. The turn pauses on request_user_input.
4. Start E1 and register it while that same turn is still active.
5. Continue the turn.
6. The very next model sample sees:
   - the executor skill catalog
   - the selected MCP tool
   - selected-plugin connector attribution
7. The model calls the MCP, and its output proves execution happened inside E1.
```

This second scenario specifically protects the aeon-style behavior:
capability state is captured again for every sampling step, not only at
the next user turn.

## Scope

These are integration tests only. They do not add a combinatorial matrix
for unsupported plugin-file mutation, environment generations, transport
disconnects, or delayed `required = true` executor MCPs.

jif · 2026-06-26 03:11:55 +01:00

25f50de6ed

[codex] allow CCA image generation and web search extensions (#29909 )

## Summary

- allow the standalone image-generation and web-search extensions for
the actor-authorized provider shape used by CCA
- preserve builtin `image_generation` and `web_search` for older models
and existing flows
- keep ordinary non-OpenAI providers excluded from both extensions
- remove only the image extension local managed-AuthManager requirement
that CCA cannot satisfy
- share actor-authorization detection through `ModelProviderInfo`
- keep Core tests focused on routing behavior and cover header-shape
edge cases in `model-provider-info`
- add a Responses Lite regression that verifies both
`image_gen.imagegen` and `web.run`

## Why

CCA uses a provider named `local` with `requires_openai_auth: false` and
a non-empty `x-openai-actor-authorization` header. Core accepts that
provider shape, but both extension provider-name gates rejected it;
image generation additionally required a Codex-managed login.

The standalone paths must coexist with existing builtin tools. New
Responses Lite models can receive `image_gen.imagegen` and `web.run`,
while older models continue using builtin tools.

## Impact

This enables both standalone extensions for CCA once installed
downstream, without removing or changing builtin-tool compatibility for
older models.

## Validation

- `just test -p codex-core
responses_lite_exposes_standalone_tools_for_actor_authorized_provider`
- `just test -p codex-core
responses_lite_uses_standalone_web_search_and_image_generation`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-image-generation-extension`
- `just test -p codex-web-search-extension`
- `just test -p codex-model-provider-info`
- `just fmt`
- `git diff --check`

Won Park · 2026-06-25 18:34:35 -07:00

0d4351c1b8

Expose MCP app identity in app context (#29934 )

## Why

MCP tool-call events need to expose trusted app identity and action
metadata directly so v2 clients do not have to infer it from tool names
or resource URIs.

## What changed

- Add optional `appName`, `templateId`, and `actionName` fields to MCP
tool-call `appContext`.
- Populate `appName` and `templateId` from trusted Codex Apps metadata,
and derive `actionName` from the trusted app resource metadata.
- Preserve all three fields through core events, legacy protocol events,
persisted thread history, resume redaction, and app-server v2 responses.
- Document the public `appContext` fields in
`codex-rs/app-server/README.md`.
- Regenerate app-server JSON and TypeScript schemas and add coverage for
serialization, persistence, redaction, and metadata propagation.

## Validation

- `just test -p codex-app-server-protocol mcp_tool_call`
- `just test -p codex-core
mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
mcp_tool_call_item_includes_app_identity`
- `just write-app-server-schema`

---------

Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>

Martin Au-Yeung · 2026-06-25 18:31:10 -07:00

ec300bc7bd

Keep MCP elicitation routable across runtime refreshes (#30127 )

## Why

An MCP tool call can still be waiting for an elicitation response when
an environment update replaces the thread's MCP runtime.

Before this change:

```text
runtime A starts a tool call and asks the user
environment becomes ready, so runtime B is published
client answers the prompt through runtime B
runtime B cannot find runtime A's pending responder
```

The response is lost and the original tool call stays blocked.

## What changed

All MCP runtimes for one thread now share a small elicitation router:

```text
runtime A ---\
               shared router: response token -> exact pending responder
runtime B ---/
```

When Codex surfaces an MCP elicitation, it assigns a unique opaque
response token. The router records which pending request owns that
token. A replacement runtime reuses the same router, so the latest
runtime can deliver a response to a request started by the previous
runtime.

The Codex-owned token also prevents two runtime connections that reuse
the same MCP server request ID from receiving each other's responses.

This does not retain or search old MCP managers. Only the pending
responder map is shared.

## Covered scenario

The integration test exercises the complete failure mode:

1. A thread starts while its selected environment is still unavailable.
2. A configured MCP server starts a tool call and asks the client for
input.
3. The environment becomes ready, causing Codex to publish a replacement
MCP runtime.
4. The client answers the original prompt after the replacement.
5. The original tool call receives that answer and completes.

A focused routing test also creates two runtimes with the same server
request ID and verifies that each response reaches the exact request
that emitted its token.

## Scope

This PR changes only elicitation response routing across MCP runtime
replacement. It does not change when runtimes are rebuilt, which
environments contribute MCP configuration, or how environment
availability is detected.

jif · 2026-06-26 01:28:14 +00:00

fb8598df3f

Reinject missing World State fragments on resume (#30152 )

## Why

World State restores its structured snapshot on resume so unchanged
sections do not have to be rendered again. That is safe only when the
model-visible fragment represented by the snapshot is still present in
retained history.

For selected executor skills, the failing selected-capability scenario
exposed this state:

```text
persisted World State: selected skill catalog is known
retained model history: selected skill catalog message is missing
next diff: unchanged, so emit nothing
```

The model resumes without being told about the selected skill catalog.

## What changed

World State contributions may now optionally describe the concrete
model-visible fragment that must remain in retained history.

When a persisted snapshot is present:

```text
matching retained fragment exists -> trust snapshot, emit nothing
matching retained fragment missing -> treat section as absent, render current state once
```

The skills extension uses this for non-empty selected-environment
catalogs by matching its exact rendered catalog body. Empty or hidden
catalogs do not require a fragment.

## Scope

This does not clear or rebuild the whole World State baseline. It does
not change skill discovery, cache invalidation, environment
availability, or MCP runtime behavior. It only keeps a persisted section
snapshot and its retained model context consistent across resume/history
reconstruction.

## Coverage

A focused World State regression test verifies both sides:

- a missing retained fragment is rendered again
- a matching retained fragment avoids duplicate injection

jif · 2026-06-26 02:18:00 +01:00

723b23efd0

[codex] Attribute app-server analytics by thread originator (#29935 )

## Why

Desktop Work threads and regular Codex threads can share the same
app-server connection. App-server analytics currently copy
`product_client_id` from connection metadata for every thread-scoped
event, so Work thread activity is attributed to the Desktop connection
instead of the thread's resolved originator. This prevents analytics
from distinguishing the two products on a shared connection.

## What changed

- Publish the resolved originator after a thread is materialized,
covering new, resumed, forked, and subagent threads.
- Store that originator in the analytics reducer's existing per-thread
state.
- Override only `app_server_client.product_client_id` for thread, turn,
tool, review, goal, guardian, and compaction events while preserving the
connection's client name, version, and transport metadata.
- Fall back to the connection-wide product client ID when a thread has
no originator override.
- Preserve persisted originators in thread initialization analytics for
resume and fork flows.

## Validation

- `just test -p codex-analytics
thread_originator_overrides_shared_connection_across_thread_events
subagent_events_keep_thread_originator_with_explicit_turn_connection`
- `just test -p codex-app-server
turn_start_tracks_thread_originator_in_analytics
thread_start_tracks_thread_initialized_analytics
thread_fork_tracks_thread_initialized_analytics
thread_resume_tracks_thread_initialized_analytics`
- `just test -p codex-core thread_manager`

alexsong-oai · 2026-06-25 18:15:48 -07:00

841f30598c

[codex] implement standalone code-mode process host (#30111 )

## Summary

- implement the standalone `codex-code-mode-host` stdio service
- route sessions, cells, delegate requests, responses, and cancellation
through a bounded host peer
- supervise request, writer, cell-forwarding, actor, and V8 failure
boundaries
- bound request/session tombstones and fail-stop the connection on
invalid protocol state
- add host-only duplex protocol tests and local Cargo/Bazel run recipes

## Why

This stage makes the host process independently runnable and reviewable
before exposing any remote client in Codex. Transport or runtime failure
closes the connection and relies on process replacement rather than
transactional recovery.

## Stack

This is **3 of 4** in the process-owned code-mode session stack.

- Depends on #30110
- The final client PR targets this branch

## Validation

- `just test -p codex-code-mode-host` — 7 host-only tests passed
- `just fix -p codex-code-mode-host`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fmt`

Channing Conger · 2026-06-25 18:00:39 -07:00

da78d5fdc5

Reuse walk inventory for environment skill metadata (#30145 )

## Why

Environment skill discovery already asks the executor to run one
`fs/walk`. That response contains every regular file path found under
the selected root, including any `agents/openai.yaml` files.

Today Core keeps the discovered `SKILL.md` paths but discards the rest
of that file inventory. It then sends one `fs/getMetadata` request per
skill just to ask whether `agents/openai.yaml` exists. A root with 66
skills and no metadata therefore pays for 66 unnecessary network round
trips.

## What changes

- Keep the `fs/walk` file and directory inventory for the duration of
the scan.
- Associate each discovered `SKILL.md` with metadata that is known
present, known absent, or still requires a fallback probe.
- Read a known `agents/openai.yaml` directly instead of statting it
first.
- Skip the metadata request entirely when a complete walk shows that the
skill has no `agents` directory.
- Read a known `SKILL.md` and `agents/openai.yaml` concurrently.
- Keep parsing and validation in `core-skills`.

The inventory is scan-local. This does not add another cache or change
cache lifetime.

## Network impact

For a complete scan of 66 valid skills with no `agents/openai.yaml`, and
one root `.codex-plugin/plugin.json`:

| Operation | Current | After this PR |
| --- | ---: | ---: |
| `fs/walk` | 1 | 1 |
| Read `SKILL.md` | 66 | 66 |
| Stat `agents/openai.yaml` | 66 | 0 |
| Read `agents/openai.yaml` | 0 | 0 |
| Stat plugin manifest | 1 | 1 |
| Read plugin manifest | 1 | 1 |
| **Total executor RPCs** | **135** | **69** |

This removes exactly 66 request/response exchanges from the common cold
scan. Warm scans remain at zero discovery RPCs because the thread-level
executor catalog cache is unchanged.

When metadata exists, each file still requires one read. This PR removes
only the preceding existence check; it does not batch file contents into
a new RPC.

## Correctness fallbacks

Absence is trusted only when the walk is complete and the metadata
directory was not present. Core keeps the existing `getMetadata`
fallback when:

- the walk was truncated;
- the walk reported an error; or
- an `agents` directory was observed but `openai.yaml` was not, which
preserves support for file symlinks and traversal boundaries.

## Deliberate scope

This PR changes only the environment skill loader and its existing
filesystem-call regression coverage. It does not:

- change `fs/walk` or any exec-server protocol;
- add `readFiles` or a skills-list endpoint;
- change thread caching;
- change local skill discovery;
- change exec-server request concurrency; or
- optimize plugin-manifest lookup.

The plugin-manifest stat is intentionally left in place, which is why
this PR reaches 69 calls rather than the broader 68-call estimate. That
lookup has separate alternate-path, ancestor, and symlink semantics and
should not be mixed into this change.

jif · 2026-06-26 01:47:00 +01:00

8ebf71ec25

Project selected plugin runtime by environment availability (#30093 )

## Why

Selected plugin metadata is stable, but MCP processes are live runtime
state. They need different lifetimes:

- the MCP extension caches manifest, MCP, and connector declarations for
each stable selected root;
- each model step projects that cached metadata through the roots that
resolved as ready for that exact step;
- the MCP manager is rebuilt only when that availability projection
changes.

This matches executor skills: both features consume the same resolved
step roots instead of inferring readiness from the turn's selected
environments.

## Behavior

```text
E1 not ready for this step
  -> no E1 MCP servers or connectors
  -> cached plugin metadata stays in ext/mcp

E1 becomes ready
  -> reuse cached metadata
  -> publish one MCP runtime containing E1 capabilities

same ready roots on the next step
  -> reuse the exact runtime; no rediscovery and no MCP restart

resume
  -> create new extension thread state and a new MCP runtime
```

All model-facing consumers use the same step snapshot:

```text
resolved selected roots
        |
        v
extension MCP/connector projection
        |
        v
{ MCP config, connector snapshot, MCP manager }
        |
        +-> advertise model tools
        +-> build app/connector tools
        +-> execute MCP calls
```

## Cache contract

The existing MCP extension owns a cache keyed by the full
`SelectedCapabilityRoot`:

```rust
let state = thread_store.get_or_init(SelectedExecutorPluginMcpState::default);
```

The cache lives with extension thread state. Environment availability
filters projection but does not invalidate metadata. Resume creates new
thread state. There is no file watcher or executor generation because
contents behind a stable environment/root are assumed stable.

## What changes

- Keeps executor plugin discovery and cached metadata in `ext/mcp`.
- Caches MCP and connector declarations together per selected root.
- Uses the step's already-resolved capability roots, including lazy
environments that are not turn environments.
- Reuses the current MCP runtime when the ready-root projection is
unchanged.
- Uses the same step MCP manager and connector snapshot for
model-visible tools and execution.
- Resolves direct thread-scoped MCP requests from the current
selected-root projection.

## Deliberately out of scope

- `app/list` remains based on the latest global host-plugin state; this
PR does not make its response or notifications thread-specific.
- `required = true` startup semantics do not apply to delayed executor
MCP activation.
- No filesystem/content invalidation.
- No transport-disconnect watcher.
- No executor generations or environment replacement semantics.
- No client sharing across complete manager replacements.

## Stack

1. Extension-owned World State sections.
2. Project executor skills through World State.
3. Pin one MCP runtime to each model step.
4. **This PR:** project selected MCP and connector state from
extension-owned metadata.
5. Integration coverage for selected capability availability and resume.

## Verification

-
`selected_plugin_servers_use_managed_requirements_for_the_selected_root_id`
- The stacked integration PR covers unavailable to ready activation,
unchanged-runtime reuse, skills, MCP tools, connector attribution, and
cold resume.

jif · 2026-06-26 01:36:44 +01:00

3095ea9c3d

ci: narrow Windows test skips (#30134 )

## Why

The Windows cross-build skip used the broad `powershell` substring,
which hid unrelated Windows tests. Narrowing it exposed the same ConPTY
Ctrl-C timeout that is breaking `main`; that test is not reliable in
either cross-built or native Windows Bazel CI yet.

## What changed

- scope the cross-build PowerShell carve-out to the dedicated
parser-process test module
- exclude the exact ConPTY Ctrl-C test from Bazel CI while leaving local
Windows runs enabled
- repeat the exact exclusion in the cross-build config because it
replaces the base skip list

## Manual validation

- `just test-github-scripts`
- queried the PTY test target under both `ci-windows` and
`ci-windows-cross`
- verified the matcher excludes parser-process and ConPTY tests without
excluding unrelated PowerShell tests
- [Windows shard
4/4](https://github.com/openai/codex/actions/runs/28204844286/job/83553063859)
reproduced the `main` ConPTY timeout before the exact CI-only exclusion
was applied

Adam Perry @ OpenAI · 2026-06-25 17:01:43 -07:00

5044062704

Pin MCP runtimes to model steps (#30101 )

## Why

An MCP refresh can replace the session's current manager while a model
step is still running. The step must execute calls through the same
manager whose tools it advertised.

## Boundary

```text
current session MCP runtime
          |
          | capture once for this model step
          v
StepContext.mcp
  - exact MCP config
  - exact connection manager
  - exact runtime environment context
```

```rust
pub struct McpRuntimeSnapshot {
    config: Arc<McpConfig>,
    manager: Arc<McpConnectionManager>,
    runtime_context: McpRuntimeContext,
}
```

## Example

```text
step A captures runtime A and advertises A's tools
refresh publishes runtime B
step A tool call -> runtime A
next step        -> runtime B
```

Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
or make an RPC.

## What changes

- Captures one MCP runtime in `StepContext`.
- Uses it for tool planning, tool calls, resources, approvals, connector
attribution, and elicitation.
- Publishes replacement runtimes atomically.
- Lets an old runtime live only while an in-flight step or request still
holds its `Arc`.

Most of this diff is mechanical routing from the session-global manager
to `step_context.mcp`; it does not introduce selected-plugin discovery
yet.

## What does not change

- No plugin or extension migration.
- No new MCP cache policy.
- No environment file watching.
- No client sharing between separate managers.

## Stack

1. Extension-owned World State sections.
2. Project executor skills through World State.
3. **This PR:** pin one MCP runtime to each model step.
4. Project selected MCP/app/connector metadata by environment
availability.
5. One end-to-end integration scenario.

jif · 2026-06-26 00:53:07 +01:00

ee9e0f6387

[codex] Propagate traces through exec-server HTTP (#30117 )

Fixes distributed trace continuity across exec-server JSON-RPC HTTP
egress by adding an executor client span and injecting its W3C context
through a reusable `codex-otel` helper.

This preserves the caller trace across core/tool → executor →
provider/MCP instead of dropping parentage at raw reqwest.

Note that this doesn't include the websocket path, which is needed to
really get the full story but at least we cover the basic http path with
this change.

Tom · 2026-06-25 23:22:22 +00:00

8ce931ab76

Project executor skills through World State (#30088 )

## Why

A selected executor environment can be unavailable in one model step and
ready in the next. The model should see its skills only while that
environment is ready, without rescanning stable files on every sample.

The product assumption is simple:

- an environment ID names one stable logical environment;
- the selected root contents do not change during the thread.

## Behavior

```text
E1 unavailable -> do not show E1 skills
E1 ready       -> discover once, cache, show through World State
E1 unavailable -> hide skills, keep cache
E1 ready again -> reuse cache, show skills again
resume         -> create a new thread cache and discover again
```

The cache key is the full `SelectedCapabilityRoot`. Availability does
not invalidate it; dropping the extension's thread state does.

The step supplies the ready selected roots directly. They do not have to
be turn environments:

```text
turn environment: laptop
selected root:    worker:/plugins/lint-fix

worker ready -> lint-fix skills are visible
```

## What changes

- Keeps executor skill catalogs in the existing skills extension.
- Passes the roots resolved as ready for the step into World State
contributors.
- Loads each ready selected root at most once per thread.
- Contributes the executor catalog as the `skills` World State section.
- Uses the exact step catalog for explicit skill selection and body
reads.
- Leaves host and orchestrator skill behavior where it already lives.

Taking a step snapshot itself does not add an RPC. Executor filesystem
calls happen only on the first discovery of a stable root for that
thread.

## What does not change

- No filesystem watcher or content-based invalidation.
- No retry/generation framework.
- No skill runtime migration into core.
- No general rewrite of the skills extension.

## Stack

1. Extension-owned World State sections.
2. **This PR:** project cached executor skills through World State.
3. Pin one MCP runtime to each model step.
4. Project selected MCP/app/connector metadata by environment
availability.
5. One end-to-end integration scenario.

jif · 2026-06-26 00:13:43 +01:00

5eebeb8169

[codex] add code-mode host failure supervision hooks (#30110 )

## Why

A process host should be discarded and rebuilt after critical actor or
V8 failure, while the existing in-process production path must keep its
current cell-error semantics. This change establishes that failure
boundary without adding the host process or remote client.

## What changed

- add optional task-failure supervision to the transport-neutral
code-mode session runtime
- report Tokio cell-actor failures and V8 runtime-thread panics to a
host-provided fail-stop handler
- preserve the existing handler-less in-process behavior
- make host-owned cell ID allocation fail before numeric wraparound

## Follow-up

The V8 panic signal surfaced here should also be consumed by the
`InProcessCodeModeSession` manager in a future change so it can fail the
affected cell. This PR intentionally leaves the handler-less in-process
behavior unchanged while putting the required panic tracking in place.

## Stack

This is **2 of 4** in the process-owned code-mode session stack.

- #30108 is merged into `main`
- The next PR targets this branch

## Validation

- `just test -p codex-code-mode` — 53 passed
- `just argument-comment-lint -p codex-code-mode`
- `just fix -p codex-code-mode`

Channing Conger · 2026-06-25 15:33:58 -07:00

6c21297bba

Recognize Work web and mobile thread originators (#29988 )

## Summary

- recognize `codex_work_web` and `codex_work_mobile` as supported
`thread/start.serviceName` values
- use the recognized value as the thread-scoped originator, with the
same persistence and request propagation added for `codex_work_desktop`
- cover precedence over persisted and inherited originators

This is the Codex consumer for the service names introduced by
[openai/openai#1073178](https://github.com/openai/openai/pull/1073178).

## Rollout / Compatibility

The producer is ChatGPT's app-server integration in
openai/openai#1073178. This PR is the Codex app-server consumer that
converts those service names into the outgoing per-thread `originator`.

Until this change is deployed, the new service names are ignored and
Codex continues using its fallback originator. Deploy this mapper and
the matching codex-backend compatibility change in
[openai/openai#1073594](https://github.com/openai/openai/pull/1073594)
while the existing Flora egress overwrite remains in place. Remove that
overwrite in
[openai/openai#1073197](https://github.com/openai/openai/pull/1073197)
only after both consumers are deployed.

## Validation

- `just test -p codex-core
effective_originator_prefers_thread_scoped_sources_before_env_originator`
- `just fix -p codex-core`
- `just fmt`

chiam-oai · 2026-06-25 15:30:26 -07:00

e2746fd7e9

[codex] Surface MCP reauthentication-required startup failures (#29877 )

## Summary

- distinguish expired, non-refreshable stored MCP OAuth credentials from
first-time missing credentials
- carry a typed `failureReason: "reauthenticationRequired"` on the
existing `mcpServer/startupStatus/updated` notification only when user
action is required
- keep the public MCP auth-status API unchanged and regenerate the
app-server protocol schemas and documentation

## Why

An MCP server with an expired access token and no usable refresh token
currently fails startup without giving clients a reliable, typed
recovery signal.

The existing startup-status notification is the natural place to carry
this state. Its nullable `failureReason` keeps the recovery reason
attached to the failed startup transition without adding a one-off
notification. Internally, Codex distinguishes first-time login from
reauthentication and emits the reason only when the startup error itself
requires authentication.

## User impact

App clients can prompt an existing user to reconnect an MCP server when
automatic recovery is impossible by handling a failed
`mcpServer/startupStatus/updated` notification whose `failureReason` is
`reauthenticationRequired`. Starting, ready, cancelled, unrelated
failures, and first-time setup carry no reauthentication reason.

## Companion app PR

- openai/openai#1069582

## Validation

- `just test -p codex-app-server-protocol` — 248 passed; schema fixture
tests passed
- `cargo check -p codex-app-server -p codex-tui`
- `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped
- `just test -p codex-protocol -p codex-app-server-protocol -p
codex-mcp` — 579 passed
- `just write-app-server-schema`
- `just fmt`

felixxia-oai · 2026-06-25 21:50:36 +00:00

a6d20ed297

fix(app-server): suppress TUI rollback warning (#30124 )

## Why

The TUI uses `thread/rollback` internally for user-facing flows such as
prompt cancellation/backtracking. After `thread/rollback` was marked
deprecated, those internal calls started surfacing `deprecationNotice`
messages in the TUI, even though the user did not explicitly call the
deprecated app-server API.

The endpoint should remain deprecated for external app-server clients,
but the built-in `codex-tui` client should not show this
implementation-detail warning during normal interaction.

## What changed

- Pass the initialized app-server client name into the `thread/rollback`
request processor.
- Suppress the `thread/rollback` deprecation notice only for
`codex-tui`.
- Preserve the existing `deprecationNotice` behavior for non-TUI
clients.
- Add regression coverage for the `codex-tui` suppression path.

## How to Test

1. Start Codex TUI from this branch.
2. Type text into the composer and press `Esc` to cancel/backtrack.
3. Confirm the TUI restores/cancels the prompt without showing
`thread/rollback is deprecated and will be removed soon`.
4. Also verify an external app-server client that calls
`thread/rollback` still receives `deprecationNotice`.

Targeted tests:

- `just test -p codex-app-server thread_rollback`
- `just argument-comment-lint`

Felipe Coury · 2026-06-25 18:44:35 -03:00

b80fbb70cd

Let extensions contribute World State sections (#30100 )

## Why

#29856 already owns the durable thread intent and exact environment
binding. This PR adds only the small missing extension boundary: an
extension can contribute one named World State section, while core still
owns persistence, diffing, and model-visible fragment types.

This lets skills stay in the skills extension instead of moving their
runtime into core.

## Shape

```text
extension-owned state
        |
        | contribute section id + JSON snapshot + renderer
        v
core World State
        |
        | compare with the previous snapshot
        v
no message, or one incremental model-visible update
```

The extension API is deliberately small:

```rust
fn contribute_world_state(...) -> Vec<WorldStateSectionContribution>
```

Core adapts the rendered result to `ContextualUserFragment`, records the
snapshot, and keeps the existing compaction/resume behavior.

## What changes

- Adds extension-owned World State section contributions.
- Calls those contributors from the existing per-step World State
builder.
- Restores durable selected capability roots into extension thread state
on resume.
- Keeps the actual model-context fragment and rollout machinery in core.

## What does not change

- No skill or MCP implementation moves out of its extension.
- No new file watcher, generation, or RPC.
- No generic migration of existing World State sections.
- No change to the stable environment-ID assumption from #29856.

## Example

```text
step 1 snapshot: skills = []
step 2 snapshot: skills = [executor-demo:deploy]

core asks the skills extension to render only that change.
```

## Stack

1. **This PR:** let extensions contribute World State sections.
2. Project executor skills through the skills extension.
3. Pin one MCP runtime to each model step.
4. Project selected MCP/app/connector metadata by environment
availability.
5. One end-to-end integration scenario.

jif · 2026-06-25 22:23:51 +01:00

c9e6d9783d

[codex] Add managed MCP server matchers (#29648 )

## Summary

This PR extends the existing managed `mcp_servers` identity requirement
so that one name-qualified rule can use either:

- the released exact command or URL identity;
- an exact stdio executable with an exact-length, ordered argument
matcher list; or
- a direct MCP URL matcher.

Matcher-based rules stay under the released `identity` key and use the
same `McpServerRequirement` abstraction and `mcp_servers.<server_name>`
namespace.

## Behavior

Policy activation and name qualification are unchanged:

- If `mcp_servers` is absent, ordinary configured MCP servers remain
unrestricted.
- If `mcp_servers` is present, a server needs a matching same-name
requirement.
- `mcp_servers = {}` continues to deny every configured MCP server.
- Existing exact identity requirements keep their released semantics.

Plugin-bundled MCP servers use the same requirement shapes under
`plugins.<plugin_name>.mcp_servers.<server_name>`. Top-level non-empty
rules continue to govern only ordinary configured servers; plugin rules
remain explicitly plugin-scoped. The existing globally empty
`mcp_servers = {}` plugin kill switch is preserved.

Requirements layers continue to use the existing regular TOML merge
behavior. Atomic replacement of named MCP requirements is intentionally
out of scope here and is tracked independently in #30118.

## Requirement contract

The released exact identity contract remains valid:

```toml
[mcp_servers.docs.identity]
command = "codex-mcp"

[mcp_servers.remote.identity]
url = "https://example.com/mcp"
```

Command identities continue to check only `command`; they do not inspect
arguments, `cwd`, `env`, or `env_vars`.

A command matcher uses an exact executable plus an exact-length, ordered
argument list. Each argument position supports `exact`, `prefix`, or
full-value `regex` matching:

```toml
[mcp_servers.internal_mcp_proxy.identity]
command = { executable = "company-cli", args = [
  { match = "exact", value = "mcp" },
  { match = "exact", value = "proxy" },
  { match = "exact", value = "--server" },
  { match = "regex", expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?::443)?(?:/.*)?$' },
] }
```

Direct streamable HTTP MCP definitions can use the same value matcher
types through `identity.url`:

```toml
[mcp_servers.internal_http.identity]
url = {
  match = "regex",
  expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?:/.*)?$',
}
```

Plugin-bundled MCP matchers use the same contract inside the
plugin-qualified allowlist:

```toml
[plugins."sample@test".mcp_servers.internal_mcp_proxy.identity]
command = { executable = "company-cli", args = [
  { match = "exact", value = "mcp" },
  { match = "exact", value = "proxy" },
] }
```

Regexes are validated while managed requirements are loaded, and regex
matching must cover the complete value. Command matchers constrain only
the executable and arguments.

## Why

Enterprise administrators need to allow MCP servers by executable and
positional-argument shape, including fixed arguments plus constrained
values such as internal MCP URLs passed to a proxy.

## Validation

- `just fmt`
- `git diff --check`
- `just test -p codex-config` (198 passed)
- `just test -p codex-core mcp_servers_by_matchers --lib` (2 passed)

felixxia-oai · 2026-06-25 22:15:50 +01:00

db541f4553

release: consume standalone zsh artifacts (#30116 )

## Why

Once #30114 publishes zsh independently, regular Rust releases should
reuse that protected, versioned artifact set instead of rebuilding
identical zsh binaries for every Codex version. Keeping the zsh release
tag explicit in the workflow also makes future artifact upgrades
deliberate and easy to review.

This PR assumes the first standalone artifact release will be published
as `codex-zsh-v0.1.0` before this change lands.

## What changed

- Added `CODEX_ZSH_RELEASE_TAG` near the top of
`.github/workflows/rust-release.yml`, initially pinned to
`codex-zsh-v0.1.0`.
- Download the standalone release’s generated `codex-zsh` DotSlash
manifest before assembling Linux and macOS Codex packages.
- Added a `--zsh-manifest` package-builder override so release packaging
fetches the matching target archive and verifies the size and SHA-256
digest recorded in that manifest.
- Removed the reusable zsh build job from regular Rust releases.
- Stopped copying zsh archives into each Rust release and stopped
regenerating a zsh DotSlash manifest there.

Windows packaging remains unchanged because the patched zsh resource is
only shipped for supported Unix targets.

## Testing

- Added package-helper coverage that supplies a standalone manifest
override and verifies the extracted zsh bytes.
- Ran the `scripts/codex_package` unit test suite.
- Validated `.github/scripts/build-codex-package-archive.sh` with `bash
-n`.

Michael Bolin · 2026-06-25 14:05:49 -07:00

e23e7cbe46

release: publish standalone zsh artifacts (#30114 )

## Why

The patched zsh artifacts rarely change, but
`.github/workflows/rust-release-zsh.yml` currently runs as part of every
Rust release. Rebuilding the same four binaries for each Codex version
wastes release capacity and ties an independently versioned runtime
dependency to the main release cadence.

This establishes the producer side of a build-once flow. The existing
Rust release workflow remains unchanged until the first standalone
artifact release has been published and the checked-in DotSlash
manifests can be updated with its URLs and checksums.

## What changed

- Run the zsh release workflow for protected `codex-zsh-vX.Y.Z` tags
instead of as a reusable workflow.
- Validate the semantic release tag before starting the platform builds.
- Publish the four zsh archives to a GitHub prerelease so the release
never becomes the repository latest release.
- Publish the generated `codex-zsh` DotSlash manifest alongside the
archives.
- Document how to publish the next artifact version after changing the
pinned zsh commit or patch.

## Tag protection

An active repository tag ruleset named `codex-zsh-v*.*.*` targets
`refs/tags/codex-zsh-v*.*.*`. It restricts tag creation, updates,
deletion, and non-fast-forward changes; requires linear history; and
limits bypass to the configured repository role.

This was verified with:

```shell
gh api repos/openai/codex/rulesets/18140982
```

The response reported `"enforcement":"active"`, the expected tag
condition, and the `creation`, `update`, `deletion`, `non_fast_forward`,
and `required_linear_history` rules.

## Rollout

After this lands, publish the first `codex-zsh-vX.Y.Z` release. A
follow-up can then update the checked-in DotSlash manifests and remove
the zsh rebuild from `.github/workflows/rust-release.yml`.



---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/30114).
* #30116
* __->__ #30114

Michael Bolin · 2026-06-25 13:56:08 -07:00

891f1f4c85

feat(core, mcp): cache codex_apps tools in memory (#29003 )

## Description

This makes Codex Apps tool reads use a shared in-memory snapshot instead
of rereading the disk cache every time `list_all_tools()` runs. Disk
still seeds the cache on startup and gets updated after successful
fetches, but it is no longer the live read path.

The core change is that `McpManager` now owns a process-scoped
`CodexAppsToolsCache`. Codex threads in the same app-server process now
share this Codex Apps in-memory tools snapshot. The snapshot is keyed by
the Codex home plus the Codex Apps identity: the active Codex auth
user/workspace and the effective Codex Apps MCP source config.

There's already code to hard-refresh the cache, so we respect it in this
PR.

## Local benchmark

I ran a local steady-state microbenchmark of the exact repeated Codex
Apps cached-tools read this PR removes, using the same real local cache
payload in both trees: `3,678,138` bytes and `381` tools. The cache file
was already warm in the OS page cache, so this measures same-process
reread/deserialization work rather than cold-disk latency or full turn
latency. Each run is 25 iterations (mimicking a turn that makes 25
inference calls).

| Version | Run 1 | Run 2 | Avg |
|---|---:|---:|---:|
| `origin/main` disk read + JSON deserialize + `filter_tools` | `50.755
ms` | `52.894 ms` | `51.825 ms` |
| This branch in-memory `current_tools` + `filter_tools` | `0.740 ms` |
`0.778 ms` | `0.759 ms` |

That removes about `51 ms` from each repeated Codex Apps cached-tools
read on this machine, roughly `68x` faster for that subpath. It is
useful evidence for the hot path this PR changes, but not a claim that
every production turn gets `51 ms` faster; end-to-end impact also
depends on the rest of `list_all_tools()` and tool-payload construction.

This is on my M2 Max macbook, so with a slower disk this would be much
worse (and indeed we did see this really blew up turn runtime with a
slow disk).

Owen Lin · 2026-06-25 20:54:48 +00:00

703793c22e

[codex] poll external clock during sleep (#30113 )

## Summary

- make the external app-server time provider establish sleep deadlines
using `currentTime/read`
- poll the external clock once per second and complete `clock.sleep`
when the deadline is reached
- keep the system-clock timer and existing steer/agent-message
interruption behavior unchanged

## Why

This lets training control `clock.sleep` through its existing external
simulated clock without adding separate sleep/wake protocol methods.

## Testing

- `just fmt`
- `just test -p codex-app-server
external_sleep_polls_current_time_and_emits_items`

rka-oai · 2026-06-25 13:46:42 -07:00

62c7f506d9

[codex] Observe remote exec-server lifecycle (#27470 )

## Summary

- Record bounded duration and outcome metrics for remote environment
registration and Noise rendezvous connection attempts.
- Count reconnects by bounded reason: disconnect, connection failure, or
rejected registration.
- Trace registration at the owning client boundary without exporting raw
environment or registration identifiers.
- Replace the stale pre-Noise WebSocket observability design with the
current remote transport model.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests
2. #27467 — record bounded connection, request, and process lifecycle
metrics
3. #27470 — observe remote registration and Noise rendezvous lifecycle
**(this PR)**

## Validation

- `just test -p codex-exec-server --lib` (149 passed)
- `just test -p codex-cli --test exec_server` (4 passed)
- `just argument-comment-lint`
- `just bazel-lock-check`
- `just fix -p codex-exec-server -p codex-cli`
- `just fmt`

richardopenai · 2026-06-25 13:42:40 -07:00

3b22498f69

[codex] extend code-mode host IPC transport (#30108 )

## Summary

- add an `EncodedFrame` type so IPC payloads are serialized and
size-checked before entering bounded queues
- add the V1 `operation/cancel` client-to-host message
- pin the new wire shape with protocol tests

## Why

The process-owned code-mode host needs bounded, pre-encoded outbound
messages and a best-effort cancellation signal. Keeping these wire
primitives in a protocol-only change lets their compatibility contract
be reviewed independently from either endpoint.

## Stack

This is **1 of 4** in the process-owned code-mode session stack. The
next PR targets this branch.

## Validation

- `just test -p codex-code-mode-protocol` — 22 passed
- `just fix -p codex-code-mode-protocol`
- `just fmt`

Channing Conger · 2026-06-25 13:26:47 -07:00

3b78f58fb2

[codex] impl delivery_mode: current time reminders on response boundaries (#30033 )

## Summary
- track user-like input and tool-output boundaries in current-time
reminder state
- gate reminder injection when delivery_mode is
after_user_or_tool_output
- preserve interval debounce and forced reminders after context-window
changes

## Why
Training can request reminders only after user or tool-output items
while keeping the existing canonical pre-inference history-injection
path.

## Validation
- just test -p codex-core
current_time_reminders_can_follow_only_user_or_tool_outputs
- just test -p codex-core
current_time_reminders_follow_time_interval_and_persist_in_history
- just test -p codex-core
current_time_reminder_is_refreshed_after_compaction
- just fix -p codex-core

rka-oai · 2026-06-25 19:28:50 +00:00

adccb464d0

[codex] Retry temporarily offline exec-server recovery (#30098 )

## Summary

- retry ERS `409 environment_offline` responses inside the existing
exec-server recovery loop
- keep all other registry conflicts terminal
- add focused coverage for both cases

## Root cause

When an exec server disconnects and reconnects, the client already
starts recovery and calls ERS `/connect`. During the transient executor
presence gap, ERS can return `409 environment_offline`. The retry
classifier treated every 409 as terminal, so the first response aborted
the existing 25-second recovery window before the executor came back
online. That then caused active processes to be marked lost.

This change classifies only the structured `environment_offline`
conflict as retryable. Recovery continues with the existing bounded
deadline, exponential backoff, and jitter.

## Validation

- `just test -p codex-exec-server client::recovery::tests` — 4 passed
- `just fix -p codex-exec-server` — passed
- `just fmt` — passed
- Full `just test -p codex-exec-server` reached unrelated macOS
filesystem-sandbox integration failures because nested
`/usr/bin/sandbox-exec` is denied in this environment (`sandbox_apply:
Operation not permitted`).

richardopenai · 2026-06-25 19:25:04 +00:00

964b138c3d

[codex] add current time reminder delivery mode config (#30031 )

```python
delivery_mode = "any_inference" # default
delivery_mode = "after_user_or_tool_output" # new mode
``` 

## Validation
- just test -p codex-core load_config_resolves_current_time_reminder
- just test -p codex-core
lock_contains_prompts_and_materializes_features

rka-oai · 2026-06-25 19:06:43 +00:00

e8d4a1a411

core: expose permission profile to shell tools (#29941 )

## tl;dr

Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
of the current permission profile when invoking a shell tool.

## Why

Shell tool owners may need to launch nested commands under the same
named permission profile, including through `codex sandbox -P PROFILE
--include-managed-config`. Until now, child processes could observe
sandbox and network metadata but could not identify the active named
permission profile.

The `--include-managed-config` flag is essential when a helper
reconstructs the sandbox from a profile name: it ensures the nested
sandbox also loads managed enterprise requirements. Without it, using
the inherited profile could unintentionally create a sandbox that does
not enforce the organization's managed restrictions.

The new environment value is intentionally informational and **must not
be treated as trusted input**. Any process in the ancestry can overwrite
an environment variable, so a consumer that passes this value to `codex
sandbox -P` must first validate it against the profiles that helper is
authorized to use.

## Example Use Case

Suppose an organization provides a trusted `remote-bash` wrapper that
lets Codex run a command on an approved build host. The local shell
command uses the named `:workspace` permission profile:

```toml
default_permissions = ":workspace"
```

The command exposed to the model is a small zsh wrapper. It deliberately
delegates with `exec`, preserving the original arguments and process
environment:

```zsh
#!/usr/bin/env zsh
exec /opt/codex-tools/remote_bash.py "$@"
```

The model invokes the public wrapper, not its Python implementation:

```sh
/opt/codex-tools/remote-bash \
  --host builder.example.com \
  -- printf '%s' 'hello world'
```

Only the inner implementation is authorized to escape the local sandbox:

```starlark
prefix_rule(
    pattern=["/opt/codex-tools/remote_bash.py"],
    decision="allow",
)
```

With zsh-fork, execution begins with `remote-bash` inside the
`:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
rule matches `remote_bash.py`, so that inner script is restarted
unsandboxed. The escalated process inherits:

```text
CODEX_PERMISSION_PROFILE=:workspace
```

Inheritance does not make the value trustworthy. `remote_bash.py`
independently allowlists both the remote host and the permission profile
before using either value. In particular, a forged value such as
`:danger-full-access` is rejected before it can reach `codex sandbox
-P`:

```python
import argparse
import os
import shlex
import sys

ALLOWED_HOSTS = {"builder.example.com"}
ALLOWED_PROFILES = {":workspace"}

parser = argparse.ArgumentParser()
parser.add_argument("--host", required=True)
separator = sys.argv.index("--")
args = parser.parse_args(sys.argv[1:separator])
command = sys.argv[separator + 1:]

if args.host not in ALLOWED_HOSTS:
    parser.error("host is not allowlisted")
if not command:
    parser.error("the remote command must not be empty")

profile = os.environ.get("CODEX_PERMISSION_PROFILE")
if not profile:
    raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
if profile not in ALLOWED_PROFILES:
    raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")

remote_command = shlex.join(command)
sandbox_command = shlex.join([
    "codex", "sandbox", "-P", profile,
    "--include-managed-config", "--",
    "bash", "-lc", remote_command,
])
print(shlex.join(["ssh", args.host, sandbox_command]))
```

This builds each command layer as an argument vector and uses
`shlex.join()` at the boundary, rather than interpolating untrusted
shell text. After validation and parsing, the nested command has this
structure:

```text
ssh argv:
  ["ssh", "builder.example.com", SANDBOX_COMMAND]

SANDBOX_COMMAND argv:
  ["codex", "sandbox", "-P", ":workspace",
   "--include-managed-config", "--",
   "bash", "-lc", "printf %s 'hello world'"]

bash -lc payload argv:
  ["printf", "%s", "hello world"]
```

A production implementation could execute that SSH command. The
integration fixture prints it and parses the result back into arguments,
verifying the complete flow:

```text
model invokes outer wrapper
  -> zsh-fork starts wrapper under :workspace
  -> wrapper execs allowlisted Python script
  -> prefix rule restarts Python script unsandboxed
  -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
  -> Python script verifies :workspace is allowlisted
  -> remote command runs codex sandbox -P :workspace
     with --include-managed-config
  -> nested sandbox honors managed enterprise requirements
```

This gives the trusted helper access to resources outside the local
sandbox—such as SSH credentials—while ensuring that it can select only
an explicitly authorized profile and that work on the remote host
remains subject to the organization's managed requirements.

## What changed

- Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
evaluation so the active profile wins over inherited or configured stale
values.
- Apply the variable to both `shell_command` and unified `exec_command`,
including local, zsh-fork, and remote exec-server paths.
- Remove stale values when the session has no active named profile.
- Preserve the current profile value when loading a shell snapshot so a
parent snapshot cannot restore an older profile.

## Testing

- Added classic-shell integration coverage proving an exact prefix rule
can run a `require_escalated` script outside the `:workspace` sandbox
while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
- Added zsh-fork integration coverage in which the model invokes an
outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
unsandboxed, and its printed SSH command reconstructs the inherited
`:workspace` sandbox with `--include-managed-config` while preserving
every argument after `--`.
- The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
validates it against `ALLOWED_PROFILES` before constructing the nested
command.
- Assert that the reconstructed sandbox command includes
`--include-managed-config` so nested use of the inherited profile cannot
bypass managed enterprise requirements.
- Added coverage for overriding and removing stale profile values.
- Verified `shell_command` receives the selected active profile.
- Added shell snapshot coverage using `printenv
CODEX_PERMISSION_PROFILE`.

Michael Bolin · 2026-06-25 19:00:23 +00:00

c65cfeab14

[codex] current time reminder interval to be set to 0 (#30029 )

A zero interval lets callers request a reminder at every
otherwise-eligible inference boundary.

## Validation
- just test -p codex-core load_config_resolves_current_time_reminder

rka-oai · 2026-06-25 18:30:53 +00:00

cc78903379

cli: rename sandbox permission profile flag (#30095 )

## Why

`codex sandbox` accepts a single named permissions profile, so the
existing plural `--permissions-profile` spelling is misleading. The
canonical flag and its help text should use the singular form without
breaking scripts that already use the old spelling.

## What changed

- Make `--permission-profile` the canonical flag for all sandbox
backends.
- Keep `--permissions-profile` as a hidden backwards-compatible alias.
- Cover the canonical spelling, legacy alias, and help visibility with
regression tests.

## Testing

Ran `just c sandbox --help` and verified I saw:

```shell
  -P, --permission-profile <NAME>
          Named permissions profile to apply from the active configuration stack
```

Michael Bolin · 2026-06-25 11:25:19 -07:00

31b99f65cf

feat: add provider-aware model fallback to thread start (#29942 )

## Why

Helper threads such as task title generation can request a model ID that
is valid for the default OpenAI provider but unavailable from the active
provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
provider static catalog exposes Bedrock model IDs such as
`openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
404s and can surface a misleading turn error even when the main turn
succeeds.

Clients need an explicit way to ask app-server to resolve an unavailable
helper model to the active provider default. That fallback must remain
limited to providers with an authoritative static catalog so custom or
dynamically discovered model IDs are not rewritten based on an
incomplete catalog.

Fixes #28741.

## What changed

- Add the experimental `allowProviderModelFallback` option to
`thread/start`, defaulting to `false` to preserve existing behavior.
- Thread the option through thread creation and model selection.
- When enabled for a static model manager, preserve requested models
present in the catalog and replace unavailable models with the provider
default.
- Continue preserving explicit model IDs for dynamic model managers
without fetching a catalog solely to validate them.
- Document the new `thread/start` behavior in the app-server API
overview.

## Test
Temporary test-client harness:
```
ThreadStartParams {
    model: Some("gpt-5.4-mini".to_string()),
    allow_provider_model_fallback: true,
    ..Default::default()
}
```
Command:
```
CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
./target/debug/codex-app-server-test-client \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  send-message-v2 --experimental-api ignored
```
Relevant output:
```
> "method": "thread/start",
> "params": {
>   "model": "gpt-5.4-mini",
>   "modelProvider": null,
>   "allowProviderModelFallback": true,
>   ...
> }

< "result": {
<   "model": "openai.gpt-5.5",
<   "modelProvider": "amazon-bedrock",
<   ...
< }
```

Celia Chen · 2026-06-25 18:24:34 +00:00

6d9dbacf1a

[codex] Record exec-server lifecycle metrics (#27467 )

## Summary

- Record bounded connection, request, and process lifecycle metrics.
- Report active gauges from callbacks on every collection, including
delta exports.
- Serialize active-count updates so concurrent starts and finishes
cannot publish stale values.
- Serialize process exit, explicit termination, and shutdown through the
process registry so exactly one completion result wins.
- Keep the implementation small with single-owner RAII guards and one
real OTLP/HTTP integration test using the existing `wiremock`
dependency.

## Root cause

Process exit and session shutdown previously used cloned completion
state. That avoided duplicate emission, but it duplicated lifecycle
ownership and made the ordering harder to reason about. The process
registry mutex already defines the lifecycle ordering, so the final
implementation stores the metric guard and termination flag directly on
the process entry. Whichever path claims the entry first owns the
completion result.

Production metric export uses delta temporality. Event-only synchronous
gauge recordings disappear after the next collection when no count
changes, so active counts now use observable callbacks that report
current state on every collection.

The cleanup also removes the constant `result="accepted"` connection
tag, redundant route and response assertions, a custom HTTP collector,
and fallback initialization machinery that did not add behavior.

## Stack

Review and land this stack in order:

1. #27466 — trace exec-server JSON-RPC requests
2. #27467 — record bounded connection, request, and process lifecycle
metrics **(this PR)**
3. #27470 — observe remote registration and Noise rendezvous lifecycle

## Validation

- `just test -p codex-exec-server --lib` (158 passed)
- `just test -p codex-cli --test exec_server` (3 passed)
- `just test -p codex-otel
observable_gauge_is_collected_on_every_delta_snapshot` (1 passed)
- `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server`
- `just fmt`
- `git diff --check`

richardopenai · 2026-06-25 11:02:11 -07:00

2dec46e30a

Persist selected capability roots and resolve availability per model step (#29856 )

## Why

`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”

The important product assumption is:

> One environment ID always names the same logical executor and stable
contents.

`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.

The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.

## The boundary

```text
persisted thread intent
plugin@1 -> environment "worker"
|
| capture the current step
v
model-step view
unavailable, or
plugin@1 + worker's exact captured ready handle
```

The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.

## What changes

### Persist the stable selection

Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.

Only stable data is persisted: root ID, environment ID, and root path.

### Capture readiness together with the exact handle

The environment snapshot records:

```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None // still starting in this step
```

This prevents readiness and execution from coming from different
registry snapshots.

For example:

```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```

Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.

This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.

### Resolve availability per model step

- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.

Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.

## Example

```text
thread selection: plugin@1 -> environment "worker"

step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```

Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.

## Compatibility

The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.

## Stack

1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.

jif · 2026-06-25 17:49:43 +00:00

8f02973d25

chore(app-server): mark thread/rollback as deprecated (#29928 )

We will drop support for this in the near future due to the complexity
it introduces.

Owen Lin · 2026-06-25 17:15:46 +00:00

268328001f

Test executor-routed MCP OAuth token exchange (#29656 )

## Why

#28529 proves OAuth discovery uses the selected executor, but its
end-to-end test stops before the callback and token exchange.

## What changed

- add an executor-only mock token endpoint
- complete the OAuth callback using the authorization URL's `state` and
`redirect_uri`
- assert the PKCE token exchange reaches the executor-only endpoint
- assert the completion notification reports the selected thread and
succeeds

Depends on #28529.

jif · 2026-06-25 09:45:20 +00:00

c38b2e9ba6

Support OAuth for HTTP MCP servers from selected executor plugins (#28529 )

## Why

#28522 routes selected-plugin HTTP MCP traffic through the owning
executor, but OAuth bootstrap and refresh still used host-local clients.
Executor-only servers therefore cannot complete discovery or login
through the same network boundary as the MCP connection.

## What changed

- adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
contract
- let RMCP own discovery, dynamic registration, PKCE, token exchange,
and refresh
- route auth status, persisted-token startup, and app-server login
through the server runtime while preserving the existing local discovery
path
- add optional `threadId` to `mcpServer/oauth/login` and echo it in the
completion notification
- implement RMCP's redirect policy and 1 MiB OAuth response limit over
executor HTTP
- cover selected-thread OAuth discovery and login through an
executor-only route

Depends on #28522.

jif · 2026-06-25 10:31:17 +01:00

b215961a56

Support HTTP MCP servers from selected executor plugins (#28522 )

## Why

Selected executor plugins can declare both stdio and Streamable HTTP MCP
servers, but only stdio registrations were retained. That silently drops
part of the plugin's tool surface and prevents HTTP traffic from using
the owning executor's network.

## What changed

- retain selected-plugin Streamable HTTP MCP declarations alongside
stdio declarations
- route their HTTP clients through the owning executor environment
- preserve local auth-header environment references while rejecting them
for executor-hosted declarations
- cover thread isolation, refresh, and an executor-only HTTP route end
to end

jif · 2026-06-25 10:10:36 +01:00

6368937939

Parallelize environment skill loading (#29990 )

## Why

Avoid a request waterfall for loading lots of skills at once by hiding
latency in concurrent tasks.

## What changed

Poll the per-skill parse futures concurrently with an order-preserving
stream capped at 64 in-flight loads. Results retain discovery order, and
the existing filtering, warnings, and final catalog sorting are
unchanged.

Adam Perry @ OpenAI · 2026-06-25 10:02:07 +01:00

5579792b3b

core: reconcile legacy WorldState sections (#29997 )

## Why

Older rollouts can retain model-visible context for a WorldState section
without having a persisted snapshot for that section. Treating the
missing snapshot as definitely absent can duplicate old context or fail
to tell the model that it was replaced or removed.

This provides a generic migration path for sections moving into
WorldState, beginning with AGENTS.md.

Builds on #29810.

## What changed

- distinguish section state that is absent, known from a persisted
snapshot, or unknown because matching legacy context remains in history
- let WorldState sections identify their own legacy fragments while
`ContextManager` owns history reconciliation and baseline persistence
- make AGENTS.md emit one conservative replacement or removal update for
legacy history, then deduplicate from the newly persisted baseline
- preserve existing environment rendering when persisted section data is
missing or malformed

## Testing

- `just test -p codex-core world_state`
- `just test -p codex-core
cold_resume_invalidates_deleted_legacy_agents_md_once -- --exact`

sayan-oai · 2026-06-25 07:03:52 +00:00

ab80d4d484

core: make AGENTS.md react to environment changes (#29810 )

## Why

With deferred executors, a turn can begin before a remote environment
attaches. AGENTS.md discovery previously ran only during session setup,
so instructions from a later environment never reached the model or the
session instruction sources.

WorldState persistence has now landed, so this uses the durable
model-visible baseline directly instead of carrying a temporary
resume/fork compatibility path.

## What

- Add an `AgentsMdManager` in `SessionServices` to own host
instructions, loaded state, and refresh caching.
- When `DeferredExecutor` is enabled, refresh AGENTS.md when attached
environment selections change and freeze the result in the corresponding
`StepContext`.
- Represent AGENTS.md as a persisted WorldState section for every
session, with bounded initial, replacement, and removal updates.
- Remove duplicate AGENTS.md state and rendering from
`SessionConfiguration` and `TurnContext`.
- Build initial context, per-request updates, and compaction context
from the same step-scoped value.
- On resume and fork, compare current instructions with the restored
WorldState baseline and inject a replacement exactly once when they
differ.

Builds on #29833, #29835, and #29837.

## Tests

- Covers a remote environment becoming ready mid-turn, with AGENTS.md
appearing on the next request exactly once and updating canonical
instruction sources.
- Covers full, unchanged, replaced, and removed AGENTS.md WorldState
rendering.
- Covers changed instructions across cold resume and fork without
duplicate reinjection.
- Covers remote-v2 compaction retaining creation-time instructions in
the live session and cold resume appending one replacement when the
source changed.
- Ran focused `codex-core` AGENTS.md, WorldState, and context-update
test suites.

sayan-oai · 2026-06-24 22:57:42 -07:00

f2f80ef442

7884 Commits