codex

[codex] disable Nagle on Rendezvous WebSockets (#30269 )

## Summary

Disable Nagle unconditionally for both exec-server Rendezvous WebSocket
connections.

- pass `disable_nagle=true` at the executor and harness connection call
sites
- keep the existing signed URL, protocol, and connection flow unchanged
- add no feature flag, rollout schema, path variant, or
experiment-specific telemetry

The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous
sockets: https://github.com/openai/openai/pull/1082463

## Why

Rendezvous carries small, latency-sensitive relay and JSON-RPC frames.
Three staging runs of 30 steady-state `process/read` calls per
configuration measured p50 improving from 139.1 ms to 81.5 ms and p95
from 162.0 ms to 95.8 ms with Nagle disabled.

The expected packet overhead is small at the current connection scale.
We will use existing latency, error, packet, and CPU monitoring and
revert normally if production regresses.

## Rollout and rollback

The client and accepted-socket changes can deploy independently. New
connections receive the setting as each side deploys. Rollback is a
normal code revert; there is no persisted assignment or gate state to
unwind.

## Validation

- `just test -p codex-exec-server --lib`: 164 passed
- `just fix -p codex-exec-server`: passed
- `just fmt`: passed
- independent final review found no actionable issue

richardopenai · 2026-06-29 19:14:47 -05:00

cfead68e5d

[codex] auto-label AWS Bedrock issues (#30607 )

## Summary

AWS Bedrock issues currently fall under broader labels, which makes
provider-specific reports harder to find. The issue tracker now has an
`aws-bedrock` label, but the automated labeler does not know to apply
it.

Teach the issue labeler to select `aws-bedrock` for Amazon Bedrock
provider or Bedrock Mantle issues while excluding generic AWS
references.

Eric Traut · 2026-06-29 11:10:38 -07:00

4808c162ee

Update safety check links (#30491 )

## Summary

Bio/Cyber safety surfaces in the TUI could send users to stale Trusted
Access pages, and safety buffering did not always expose the Help
Center.

This follow-up to #30317 adds the missing Learn more action, refreshes
the Bio access URL and block copy, and updates the affected snapshots
while preserving the existing retry and wait behavior.

Eric Traut · 2026-06-29 11:10:11 -07:00

9d13291955

[codex] Treat max as a first-class reasoning effort (#30467 )

## Why

The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an
opaque custom effort. That made the reasoning picker render it as
lowercase `max` while known efforts use productized labels.

Making `max` a known effort aligns catalog data, parsing, and UI
presentation without changing the `max` wire value or persisted
representation.

## What changed

- Add first-class `ReasoningEffort::Max` parsing and serialization.
- Use the typed effort in the Bedrock catalog and render it as `Max` in
the TUI.
- Preserve forward-compatible custom-effort coverage with a genuinely
unknown `future` value.

### Before
<img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM"
src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364"
/>

### After
<img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM"
src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17"
/>

Shijie Rao · 2026-06-29 09:38:49 -07:00

80f54d1266

Revert "Make auto-review on-request prompt more proactive" (#30508 )

Reverts openai/codex#26496

Dylan Hurd · 2026-06-28 20:40:55 -07:00

ccdfb4f342

[codex] Restore v1 delegation guidance (#30511 )

## Summary

- restore the v1 clarification that requests for depth, research, or
investigation do not authorize subagent spawning
- restore guidance for keeping critical-path, urgent, tightly coupled,
or difficult work local
- update the focused v1 tool-search and spawn-description coverage

## Why

PR #27919 simplified the v1 `spawn_agent` prompt by removing its
delegation decision guidance. That left the authorization rule intact,
but removed the instructions that constrained what should be delegated
after spawning was authorized.

Restore those guardrails while preserving later support for explicit
delegation authorization from applicable AGENTS.md and skill
instructions. Multi-agent v2 prompts are unchanged.

## User impact

Models using the v1 multi-agent tool surface receive clearer guidance to
delegate independent side work while keeping blocking work on the main
rollout.

## Validation

- `just fmt`
- `git diff --check`
- tests not run locally per repository guidance; CI will validate the
focused coverage

Ahmed Ibrahim · 2026-06-28 20:34:47 -07:00

8dac605901

[codex] Use model metadata for skills usage instructions (#29740 )

## Summary

- add a false-by-default `include_skills_usage_instructions` model
metadata field
- enable the field for the bundled `gpt-5.5` model metadata
- consume the metadata in both core and extension skill rendering
- remove hardcoded legacy-model matching and its marker plumbing

ani-oai · 2026-06-29 09:44:36 +09:00

6b5f5743b3

fix(tui): clear completed safety buffering prompt (#30490 )

## Why

The safety-buffering prompt is a modal TUI view, but the normal
successful-turn path only hid the running status indicator. If the turn
completed while the prompt was open, the stale modal remained over the
composer until the user dismissed it or another turn started.

This aligns the TUI with the app behavior: keep the safety notice
visible while the turn is active, then remove it when the turn becomes
terminal. It also prevents the stale retry action from changing the
model and reasoning effort for a future turn after the buffered turn has
already completed.

| New copy |
|---|
| <img width="1014" height="313" alt="CleanShot 2026-06-28 at 20 27 18"
src="https://github.com/user-attachments/assets/f0f37359-5d77-442f-add2-9d1874bdc422"
/> |

## What changed

- Clear the active safety-buffering view and retry state when a turn
completes successfully.
- Update the retry-capable message to say “Hang tight or retry with a
faster model”.
- Extend the safety-buffering regression coverage to verify that the
prompt remains visible after assistant output starts and disappears when
the turn completes.
- Update the TUI snapshot for the revised copy.

This is a follow-up to #29919.

## How to Test

1. Start a TUI turn that receives `model/safetyBuffering/updated` with
`showBufferingUi: true` and a `fasterModel`.
2. Confirm the prompt says “Hang tight or retry with a faster model”.
3. Let the turn continue and confirm the prompt remains visible while
the turn is active.
4. Let the turn finish successfully and confirm the prompt disappears
and the composer is restored without requiring an extra keypress.
5. Confirm a buffering update without a faster model still shows the
shorter non-retry message.

Targeted automated coverage:

- `just test -p codex-tui safety_buffering` — 4 passed.
- `just test -p codex-tui` — 2,951 passed; two unrelated Guardian
feature-flag tests failed identically on `main` in this environment.

The argument-comment lint was also audited manually. The workspace Bazel
invocation was blocked by a missing external LLVM `compiler-rt` BUILD
file, and the packaged per-crate fallback uses a nightly older than the
current `sqlx` minimum Rust version.

Felipe Coury · 2026-06-28 20:55:53 -03:00

850da19dc4

[codex] Enable remote plugins by default (#30297 )

## Summary

- enable the remote plugin feature by default
- promote the remote plugin feature from under development to stable
- preserve the existing `features.remote_plugin` override for explicitly
disabling it
- keep legacy disabled-path coverage explicit in TUI and app-server
tests

## Impact

Remote plugin functionality is enabled by default for configurations
that do not set the feature flag. The existing Codex backend
authentication gate still applies.

## Validation

- `just fmt`
- `just test -p codex-features`
- `just test -p codex-tui
plugins_popup_remote_section_fallback_states_snapshot`
- targeted `codex-app-server` plugin-list and skills-list tests
- `git diff --check`

The full TUI and app-server suites were also exercised locally. All
remote-plugin-related coverage passed; unrelated local
sandbox/test-binary failures remain outside this change.

xl-openai · 2026-06-28 11:46:25 -07:00

e428a12d22

[app-server] increase currentTime/read timeout (#30384 )

## Summary

Increase the external currentTime/read request timeout from 5 seconds to
10 seconds.

## Validation

- just fmt
- Focused app-server test build was stopped to defer validation to CI.

rka-oai · 2026-06-27 16:42:03 -07:00

bdd282f3bb

[plugins] Enforce marketplace source policy at runtime (#29691 )

## Summary

- project effective marketplace/plugin config through the enterprise
source policy so blocked installed plugins become inactive
- filter plugin list/read/discovery and CLI marketplace source/snapshot
reporting using the same policy
- enforce source admission for background marketplace cache refreshes
- continue refreshing/upgrading independent marketplaces and plugins
when one entry fails, returning per-entry errors
- include policy-projected plugin state in cache and refresh keys so
requirement changes invalidate stale results

## Stack

This is PR 2 of 2 and is based on #29690. Review the admission model and
source matcher in #29690 first; this PR contains only runtime
enforcement.

## Test plan

- `just test -p codex-core-plugins` (287 tests)
- `just test -p codex-cli
plugin_list_ignores_implicit_system_marketplace_roots_without_manifests`
- `cargo check -p codex-cli -p codex-app-server --tests`

xl-openai · 2026-06-27 15:22:05 -07:00

9dbdb4e2c0

[app-server] expose environment info RPC (#30291 )

## Why

App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.

## What changed

- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.

## Protocol examples

Request:

```json
{
  "id": 42,
  "method": "environment/info",
  "params": {
    "environmentId": "remote-a"
  }
}
```

Successful response:

```json
{
  "id": 42,
  "result": {
    "shell": {
      "name": "zsh",
      "path": "/bin/zsh"
    },
    "cwd": "file:///workspace"
  }
}
```

If the exec-server initializes but does not answer the probe within 30
seconds:

```json
{
  "id": 42,
  "error": {
    "code": -32603,
    "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
  }
}
```

## Testing

- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>

Max Johnson · 2026-06-27 19:34:10 +00:00

e2398d0b16

core: stabilize synthesized call output IDs (#30327 )

## Why

Response item IDs represent stable conversation identity.
`ContextManager::for_prompt` repairs an unmatched call by synthesizing
an `"aborted"` output in the disposable prompt projection, but that
output previously had no ID. Assigning a fresh ID on every prompt build
would make retries and resumes change otherwise identical model context
and reduce prompt-cache reuse.

The concrete bug is that these normalization-created outputs bypass the
regular item-ID allocation path. Even with item IDs enabled, a prompt
could therefore contain an identified call paired with a synthetic
output whose `id` was missing. This change closes that gap by deriving
the output ID from the source call's item ID. For legacy calls that have
no item ID, the output remains ID-less because there is no stable source
identity to derive from.

The originating call already has a stable item ID under the item-ID
model introduced in #28814. A prompt-only output can therefore derive
stable identity from that call without mutating canonical history or
persisted rollouts. This addresses the failure exposed by #30311 while
keeping normalization read-only outside its detached prompt snapshot.

UUIDv5 is intentional here because it is the standard namespaced,
deterministic UUID construction. Using the output kind and source call
ID as the name produces the same UUID on every projection while keeping
output kinds in separate name domains. UUIDv7 would introduce randomness
and time, so keeping it stable would require persisting the synthetic
repair. UUIDv5 uses SHA-1 internally, but this is only an identity
mapping—not an authenticity or security boundary.

## What changed

- Derive a deterministic UUIDv5 ID for each synthesized call output from
the source call item ID.
- Use the Responses API prefix appropriate for function, custom-tool,
tool-search, and local-shell outputs.
- Preserve the existing insertion position immediately after the
unmatched call.
- Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
compaction, or raw-response behavior changes.

## Testing

- `just test -p codex-core
for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
- `just test -p codex-core
synthetic_call_output_id_is_stable_across_resumes`
- `just test -p codex-core normalize_adds_missing_output`
- `just test -p codex-core response_item_ids`

Michael Bolin · 2026-06-27 10:47:54 -07:00

d2885dc3cd

Preserve namespaces on custom tool calls (#30302 )

## Summary

- Preserve the optional namespace on custom tool calls during response
deserialization and app-server replay.
- Use the namespaced tool identifier for streaming argument handling and
tool dispatch.
- Regenerate app-server protocol schemas.
- Add regression tests covering namespace serialization and routing.

## Testing

- Ran affected protocol and app-server test suites.
- Ran the full core test suite; two load-sensitive timing tests passed
when rerun individually.
- Ran Clippy and formatting checks.
- Verified with a local end-to-end app-server replay that the namespace
is preserved through the complete request/response flow.

nhamidi-oai · 2026-06-27 09:54:56 -07:00

328e95110c

Update security check wording (#30317 )

Eric Traut · 2026-06-26 20:05:32 -07:00

c464468493

app-server: structure and test JSON shutdown logs (#30314 )

## Why

`LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
behavior was only covered indirectly. We should verify the actual JSONL
written by both user-facing entry points: `codex app-server` and the
standalone `codex-app-server` binary.

The existing processor shutdown message also always said the channel
closed, even though the processor can exit for several different
reasons. Structured fields make that event more accurate and useful to
log consumers.

## What changed

- Record the processor `exit_reason`, remaining connection count, and
forced-shutdown state as structured tracing fields.
- Add a shared process-test helper that enables JSON logging, validates
every stderr line as JSON, and verifies the top-level timestamp is RFC
3339.
- Cover both `codex app-server` and `codex-app-server`, asserting the
stable `level`, `fields`, and `target` payload.

## Test plan

- `just test -p codex-app-server
standalone_app_server_emits_json_info_events`
- `just test -p codex-cli app_server_emits_json_info_events`

Michael Bolin · 2026-06-26 18:19:56 -07:00

4f1b5a4b73

core: overlap diff root discovery with world state (#30286 )

## Why

Remote diff-root discovery is independent of world-state construction,
but it ran afterward and added filesystem metadata latency before the
first model request. Overlap the independent work so thread-cold turns
do not pay those waits serially.

## What

- Run `record_context_updates_and_set_reference_context_item` and
`turn_diff_display_roots` with `tokio::join!`.
- Reuse the same resolved display roots when constructing
`TurnDiffTracker`; no cache or behavior lifecycle changes are
introduced.

## Validation

A synthetic executor-skill benchmark with artificial network delay:
thread-cold model-request p50 improved from about 1.79 s to 1.58 s.

Adam Perry @ OpenAI · 2026-06-26 18:07:41 -07:00

3ae0543fdd

[codex] consume pushed exec-server process events (#30273 )

## Summary

- complete unified-exec processes from the ordered event stream instead
of issuing a final zero-wait `process/read`
- add optional executor sandbox-denial state to `process/exited`
- retain `process/read` as a retained-output and compatibility fallback
for receiver lag, sequence gaps, and legacy servers
- recover sandbox-denial state across transport reconnection
- cover the real `TestCodex` remote-exec path without adding a public
test-only event constructor

## Why

A successful one-shot tool call currently receives its output and
terminal notifications, then pays another wide-area `process/read` round
trip before returning. Staging traces showed that remote response wait
accounted for more than 99.8% of RPC time; local serialization,
queueing, and deserialization were below 0.6 ms.

## Measured impact

A direct staging A/B used the same build and route and changed only
completion mode. Each arm ran three times with 30 one-shot
`/usr/bin/true` calls per run. The table reports the median of the three
per-run percentiles.

| Metric | Final `process/read` | Pushed events | Change |
| --- | ---: | ---: | ---: |
| End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
| End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
| Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
| Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |

TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
successful, complete, in-order event path issued zero final
`process/read` calls.

## Compatibility and recovery

- new servers send `sandboxDenied` on `process/exited`
- legacy servers omit it, which triggers one compatibility
`process/read`
- broadcast lag or a sequence gap triggers a retained-output read
- recovery remains bounded by the server's existing 1 MiB
retained-output window
- complete, in-order event streams issue no completion read
- sandbox denial is attached to the exit event before consumers can
observe process completion
- server-first and client-first rollouts remain wire-compatible;
server-first realizes the latency win immediately

## Integration coverage

The `TestCodex` suite exercises four distinct remote-exec contracts:

- complete pushed output/exit/close with zero reads
- direct pushed sandbox denial with zero reads
- legacy missing denial metadata with exactly one compatibility read
- count-bounded replay eviction recovered from retained output without
duplication

## Validation

- `just test -p codex-core
exec_command_consumes_pushed_remote_process_events`: 4 passed
- `just test -p codex-core unified_exec::process_tests::`: 4 passed
- `just test -p codex-exec-server`: 294 passed, 2 skipped
- `just test -p codex-exec-server-protocol`: 5 passed
- `just test -p codex-rmcp-client`: 89 passed, 2 skipped
- focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
- scoped `just fix` passed for core and exec-server
- `just fmt` passed

The complete workspace suite was not rerun; focused Cargo and Bazel
coverage passed for the changed behavior.

richardopenai · 2026-06-26 18:05:52 -07:00

d4ec08b8f0

fix(remote-control): avoid server token refresh retry storms (#30201 )

## Why

Remote-control websocket reconnects and pairing requests proactively
refresh their server token. When `/server/refresh` returns a transient
error such as `502`, the still-valid token was discarded as a usable
connection path, causing reconnect failures and repeated refresh
attempts that could amplify an upstream incident.

## What Changed

- Start proactive refresh five minutes before token expiry and
distinguish it from a required refresh for missing or expired tokens.
- Continue websocket and pairing operations with the existing valid
token after `429`, `5xx`, or timeout failures.
- Share an in-memory `next_refresh_at` throttle across websocket and
pairing callers, honoring both `Retry-After` formats and otherwise using
a jittered 24–36 second delay.
- Keep required refreshes strict, preserve `404` enrollment replacement,
and clear token/throttle state for `401` and `403` auth recovery.
- Preserve refresh response metadata internally and add focused
wire-level and integration coverage.

## Verification

Added behavioral coverage proving that:

- a valid near-expiry token still completes websocket and pairing
requests after transient refresh failures;
- `Retry-After` suppresses a subsequent refresh across websocket and
pairing callers;
- request and response-body timeouts are classified as transient;
- an expired token, including one that expires during refresh, cannot
proceed to websocket connection;
- auth failures clear the attempted token without overwriting a
concurrently rotated token.

Anton Panasenko · 2026-06-26 17:34:52 -07:00

d047c33a1b

feat(protocol): define missing rollout turn items (#30282 )

## Description

This PR adds canonical core `TurnItem` shapes for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity, to
be stored in the rollout file soon.

It also teaches app-server protocol / `ThreadHistoryBuilder` how to
render those items, and adds the small legacy fanout helpers needed for
existing event-based consumers. No core producer or rollout persistence
behavior changes here, that will be done in a followup.

## Making ThreadHistoryBuilder stateless

This is the first PR in a stack to make `ThreadHistoryBuilder` stateless
enough that we can materialize app-server `ThreadItem`s from only a
given slice of `RolloutItem` history, without ever needing to replay the
whole thread from the beginning.

The persisted legacy `RolloutItem::EventMsg` records are mostly shaped
like live UI events, not like materialized `ThreadItem`s. They work if
we replay the full rollout in order, but they often do not contain
enough stable identity or complete item state to project an arbitrary
suffix on its own.

A few examples:

- `UserMessageEvent` and `AgentMessageEvent` have content, but
historically do not carry the persisted app-server item ID that should
become the SQLite primary key.
- `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are
fragments. `ThreadHistoryBuilder` currently merges them into the last
reasoning item, which means a slice starting in the middle of reasoning
cannot know whether to append to an earlier item or create a new one.
- `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and
similar legacy events can often render a final-looking item, but they
usually rely on prior replay state to know which turn owns the item.
- Begin/end legacy events are partial views of one logical item. The
builder correlates them by `call_id` and mutates prior state to
synthesize the final `ThreadItem`.

That is the problem this direction fixes. A persisted canonical
lifecycle record looks much closer to the read model we actually want
later:

```rust
ItemCompletedEvent {
    turn_id,
    item: TurnItem { id, ...full snapshot... },
    completed_at_ms,
}
```

Once rollout has explicit `turn_id`, stable `item.id`, and a canonical
completed item snapshot, the future SQLite projector can reduce only the
new rollout suffix and upsert the affected `thread_items` rows. It no
longer needs to synthesize `item-N`, infer item ownership from the
active turn, or replay earlier events just to reconstruct the current
item snapshot.

## What changed

- Added core `TurnItem` variants and item structs for command execution,
dynamic tool calls, collab agent tool calls, and sub-agent activity.
- Added conversions from those canonical items back into the legacy
event shapes where current consumers still need them.
- Added app-server v2 `ThreadItem` conversion for the new core item
variants.
- Taught `ThreadHistoryBuilder` and rollout persistence metrics to
recognize the new item variants.

## Follow-up

The next PR https://github.com/openai/codex/pull/30283 switches the live
core producers for these item families onto canonical `ItemStarted` /
`ItemCompleted` events.

Owen Lin · 2026-06-26 16:44:34 -07:00

a107b84967

[codex] group blocking and postmerge CI workflows (#30146 )

## Why

It's hard to change the set of required jobs when they're managed in the
GitHub UI, and when each workflow is responsible for choosing it's own
scheduling it's easy to end up with skew between what we enforce on PRs
vs. on main.

## What

- add a `blocking-ci` caller workflow, triggered by pull requests and
pushes to `main`, for Bazel, blob size, cargo-deny, Codespell,
`repo-checks`, rust CI, and SDK CI
- add an `always()` terminal job named `CI required` that fails unless
every called workflow succeeds
- add a `postmerge-ci` caller workflow for `rust-ci-full` and
`v8-canary`, with a terminal `Postmerge CI results` job
- centralize V8 relevance detection in `v8_canary_changes.py`; unrelated
PR and postmerge runs execute metadata only and skip the expensive build
matrices
- leave `v8-canary` outside the blocking gate and leave the external
`cla` check independent

## Rollout

A repository admin must replace the existing required GitHub Actions
contexts with `CI required` in the main-branch ruleset. Retain `cla` as
a separate required check. Until that change is coordinated, this PR
cannot satisfy the old standalone check names. In-flight PRs will need
to be rebased after this lands.

Adam Perry @ OpenAI · 2026-06-26 15:07:05 -07:00

1168254bd9

[codex] Support npm marketplace plugin sources (#29375 )

## Why

Marketplace source deserialization treated `{"source":"npm", ...}` as
unsupported. The loader logged and skipped the entry, so npm-backed
plugins never appeared in `plugin list --available` and `plugin add`
returned "plugin not found".

Codex plugins are installed from a plugin root, not from an npm
dependency tree. For npm-backed marketplace entries, Codex should fetch
the published package contents without running package scripts or
installing unrelated dependencies.

## What changed

- Add `npm` marketplace plugin sources with `package`, optional semver
`version` or version range, and optional HTTPS `registry`.
- Reject unsafe npm source fields before materialization, including
invalid package names, non-semver version selectors, plaintext or
credential-bearing registry URLs, and registry query/fragment data.
- Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
the resulting tarball through the existing hardened plugin bundle
extractor.
- Enforce npm archive and extracted-size limits, require the standard
npm `package/` archive root, and verify the extracted `package.json`
name matches the requested package before installing.
- Keep plugin listings, install-source descriptions, CLI JSON/human
output, app-server v2 `PluginSource`, TUI source summaries, regenerated
schema fixtures, and app-server documentation in sync.

## Impact

Marketplaces can distribute Codex plugins from public or configured
private HTTPS npm registries using the same install flow as existing
materialized plugin sources. `npm` must be available on `PATH` when an
npm-backed plugin is installed.

Fixes #27831

## Validation

- `just write-app-server-schema`
- `just test -p codex-core-plugins -p codex-app-server-protocol -p
codex-app-server -p codex-cli`
  - npm/schema/core-plugin coverage passed in the run.
- The full focused command finished with `1739 passed`, `11 failed`, and
`6 timed out`; the failures were unrelated local app-server environment
failures from `sandbox-exec: sandbox_apply: Operation not permitted`
plus one missing `test_stdio_server` helper binary.
- Installed an npm-published Codex plugin package through a throwaway
local marketplace and throwaway `CODEX_HOME` to exercise the real npm
materialization path end to end.

charlesgong-openai · 2026-06-26 17:24:46 -04:00

6509f3148a

[codex] Classify nested MCP authentication startup errors (#30257 )

## Summary

- classify authentication-required RMCP startup failures, including
errors nested inside `ClientInitializeError::TransportError`
- let `codex-mcp` consume that classification so the existing
`reauthenticationRequired` startup failure reason is emitted
- add a regression test that performs real startup with an expired
persisted OAuth token and no refresh token

## Why

Follow-up to #29877.

RMCP stores streamable HTTP initialization failures inside a dynamic
transport error whose payload is not exposed through the standard Rust
error source chain. The original `anyhow::Error::chain()` check
therefore missed the nested `AuthError::AuthorizationRequired` seen
during real MCP startup and emitted `failureReason: null`.

The transport-specific inspection now lives in `codex-rmcp-client`,
while `codex-mcp` consumes only the domain-level authentication-required
result. This classifier does not distinguish first-time login from
reauthentication; the existing auth-state logic remains responsible for
that distinction.

## User impact

When stored MCP OAuth credentials are expired and cannot be refreshed,
app clients now receive `failureReason: "reauthenticationRequired"` on
the failed startup update and can show the reconnect action. First-time
login and unrelated startup failures remain unchanged.

## Validation

- `just test -p codex-rmcp-client --test streamable_http_oauth_startup
identifies_expired_unrefreshable_token_startup_error`
- `just test -p codex-mcp
startup_outcome_error_identifies_authentication_required`
- `just test -p codex-mcp
mcp_startup_failure_reason_requires_existing_oauth_and_auth_failure`
- `cargo build -p codex-cli --bin codex`
- local app-server probe emitted `failureReason:
"reauthenticationRequired"`
- manual end-to-end reconnect flow confirmed
- `just fmt`

felixxia-oai · 2026-06-26 14:11:13 -07:00

526f495f3a

Close thread persistence when submission channel closes (#30173 )

### Summary

Release live thread persistence when a session ends because its
submission channel closes. This prevents a later same-process resume
from failing with `thread ... already has a live local writer`.

### Details

The issue is in the `codex-core` session teardown path used by Codex
hosts, rather than in Managed Agents API or exec-server itself.

Explicit shutdown already closes the `LiveThread`, which releases the
process-scoped writer held by `LocalThreadStore`. The
submission-channel-close fallback ran runtime and extension teardown but
skipped that persistence shutdown, leaving the thread ID registered as
having a live writer.

This change:

- closes the `LiveThread` on the channel-close fallback path;
- preserves the existing teardown order used by explicit shutdowns;
- extends the lifecycle regression test to assert that the thread store
receives `shutdown_thread`.

Context: [original
report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039),
[recent occurrence
1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV),
[recent occurrence
2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV)

### Testing

- `just test -p codex-core
submission_loop_channel_close_runs_full_thread_teardown`
- `just test -p codex-core --lib` (1,989 passed; 3 skipped)
- `just fix -p codex-core`
- `just fmt`
- Native code review: no findings

I also attempted `just test -p codex-core`. The new regression passed;
79 unrelated integration tests failed in the local harness, primarily
because helper binaries such as `test_stdio_server` were unavailable,
plus local proxy/shell timing failures.

Abdulrahman Alfozan · 2026-06-26 13:56:17 -07:00

c55ce3b51b

feat: add GPT-5.6 variants to Bedrock catalog (#30285 )

## Summary

- add Sol (`openai.gpt-5.6-sol`), Terra (`openai.gpt-5.6-terra`), and
Luna (`openai.gpt-5.6-luna`) to the Amazon Bedrock static model catalog
- derive all three entries from the bundled GPT-5.5 metadata and add the
Bedrock-only `max` reasoning effort
- keep the new entries below the current GPT-5.5 and GPT-5.4 models at
priorities 2, 3, and 4, preserving GPT-5.5 as the default
- add deep-equality coverage for inherited model configuration, catalog
ordering, context windows, and service-tier behavior

Celia Chen · 2026-06-26 20:32:49 +00:00

69596f0e42

Let Codex consult user-level code-review-* skills. (#30143 )

## Why

I use the `$code-review` skill a lot and it'd be nice to add my own
additional review criteria in `$CODEX_HOME/skills/code-review-*`.

## What

Removes phrasing about "code-review-* skills in this repository" which
in practice seems like enough to get Codex to consult my user-level code
review skills in addition to the repo-level ones.

Adam Perry @ OpenAI · 2026-06-26 12:36:40 -07:00

ac85409b7b

feat(app-server): add optional turn_id to thread/fork (#30277 )

## Description

This adds stable optional `turnId` support to `thread/fork`. When
supplied, the fork copies persisted history through that terminal turn,
inclusive, and drops later turns from the new thread.

Omitting or passing `null` preserves the existing full-history fork
behavior, including the interruption marker when the stored source
history ends mid-turn.

## Why

We're deprecating `thread/rollback` and this will help certain UX use
cases work around it by using `thread/fork` + `turn_id` instead.

Owen Lin · 2026-06-26 19:35:54 +00:00

f72976a5f1

ensure thread.history_mode is immutable (#30261 )

## Description

This PR makes `thread.history_mode` immutable after the thread's
canonical first `SessionMeta` has been written. Later same-thread
`SessionMeta` lines are compatibility metadata writes, not a new thread
definition.

Without this, an older binary could append a `SessionMeta` that omits
`history_mode`; when a newer binary replays it, serde defaults that
missing field to `legacy` and SQLite could downgrade a paginated thread.

## Why

`history_mode` is the persisted thread storage contract.
Paginated-thread fail-closed behavior and SQLite memory filtering depend
on it staying aligned with canonical rollout metadata, especially when
multiple Codex binary versions can touch the same local rollout.

## What changed

- Stop generic rollout metadata replay from overwriting `history_mode`
from later `SessionMeta` items.
- Remove `history_mode` from `ThreadMetadataPatch`, so mutable metadata
sync and app-server metadata updates cannot rewrite it.
- When local metadata sync has to recreate a missing SQLite row, recover
`history_mode` from the rollout's canonical first `SessionMeta` instead
of from a mutable patch.
- Keep the in-memory thread store using the created thread's canonical
`history_mode` instead of metadata patches.
- Fill the one remaining core test `CreateThreadParams` initializer with
the new `history_mode` field; Bazel CI caught this after the parent
history-mode PR landed.

## Validation

- `just fmt`
- `just test -p codex-thread-store`
- `just test -p codex-state
session_meta_does_not_set_model_or_reasoning_effort`

Owen Lin · 2026-06-26 12:32:31 -07:00

812cd2bb57

[codex] Use managed defaults for TUI threads (#30147 )

## Why

#29683 exposes managed defaults for new-thread model settings through
`configRequirements/read` without applying them server-wide. The TUI is
an app-server client, so it should explicitly consume those defaults
when it creates a fresh thread.

This lets plain `codex` start on the managed model while preserving the
existing ability to change model settings within the thread.

## What changed

- Read `requirements.models.newThread` during TUI app-server bootstrap.
- Apply the managed model, reasoning effort, and service tier to the
initial fresh thread and subsequent `/new` or `/clear` threads.
- Keep explicit launch overrides above the managed defaults.
- Normalize the managed `fast` service tier to the `priority` request
value.
- Leave resumed and forked threads unchanged.

The application logic lives in a small TUI-only module; app-server
`thread/start` behavior remains unchanged for other clients.

## User experience

- Plain `codex` starts with the managed new-thread settings.
- A user can still change settings with `/model` or the existing
service-tier controls.
- Starting another fresh thread reapplies the managed defaults.
- Explicit launch choices such as `codex -m <model>` continue to win.

## Validation

- `just test -p codex-tui managed_new_thread_defaults`
- `just fix -p codex-tui`

Depends on #29683.

hefuc-oai · 2026-06-26 19:27:31 +00:00

cf36c688b3

[codex] allow AGENTS.md and skills to authorize delegation (#30274 )

Prompt update of MAv2 to include agents.md and skills more explicitly

should mimic: https://github.com/openai/codex/pull/27919

Charles Du · 2026-06-26 12:17:26 -07:00

79a8ffdbf7

Overlap executor skill reads with namespace discovery (#30225 )

## Why

Environment skill discovery needs two independent pieces of information:

- plugin namespaces from `plugin.json` files; and
- skill metadata from each `SKILL.md` file.

Today these happen in sequence. Codex waits for every plugin namespace
lookup to finish before it starts reading any skill files. On a remote
executor, that creates an avoidable network-latency barrier.

```text
before: walk -> namespace lookups -> skill reads -> build catalog
after:  walk -> namespace lookups ─┐
             -> skill reads ───────┴-> build catalog
```

## What changes

- Read and parse skill files without waiting for plugin namespace
discovery.
- Resolve root and nested plugin namespaces concurrently.
- Join both results only when constructing the final qualified skill
names.
- Keep the existing 64-skill concurrency bound, output ordering,
warnings, metadata behavior, and namespace rules.

## Testing

The regression test makes plugin manifest lookup wait until a `SKILL.md`
read has started. The old serialized pipeline would time out; the new
pipeline completes and still returns the correctly namespaced skill.

`just test -p codex-core-skills` passes all 111 tests.

## Out of scope

This does not add an exec-server endpoint, batch filesystem calls, or
reduce the number of files transferred. A frontmatter-only read or
server-side skill catalog can remain a separate follow-up if benchmarks
show that transferred bytes are the next bottleneck.

jif · 2026-06-26 18:37:59 +00:00

a938d5f607

[codex] Add managed new-thread model settings (#29683 )

## Why

Admins need persistent defaults for the model, reasoning effort, and
service tier shown when the Desktop App creates a new thread. These are
initialization defaults rather than runtime constraints: the App should
use them to initialize its draft while still allowing a user to make an
explicit selection.

The app-server therefore needs to expose the managed values before
thread creation without changing `thread/start` behavior for other
clients.

## What changed

- Parse `model`, `model_reasoning_effort`, and `service_tier` from
`[models.new_thread]` in `requirements.toml`.
- Compose the `models` requirements through the existing
requirements-layer precedence rules.
- Expose the resolved values through `configRequirements/read` as
`requirements.models.newThread`.
- Add the corresponding app-server protocol types and regenerate the
JSON and TypeScript schema fixtures.
- Document the new `configRequirements/read` fields in the app-server
README.

## Scope

This PR is data plumbing only. It does not apply these values during
`thread/start` and does not change thread creation for existing
app-server clients, resumed or forked sessions, internal or subagent
sessions, `codex exec`, or the TUI. A companion Desktop App change owns
draft initialization, sends the effective settings for ordinary and
prewarmed starts, and preserves explicit user changes.

## Validation

- Requirements deserialization coverage for `[models.new_thread]`
- Requirements-layer precedence coverage
- App-server API mapping coverage
- `configRequirements/read` integration coverage
- Regenerated app-server JSON and TypeScript schema fixtures

hefuc-oai · 2026-06-26 18:37:40 +00:00

d9cf931d0e

fix main (#30276 )

Introduced by a merge race around thread.history_mode.

Owen Lin · 2026-06-26 18:05:00 +00:00

f91334380e

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

Relax hooks.json top-level metadata validation (#30229 )

## Summary
- Allow a top-level `description` string in `hooks.json`.
- Continue rejecting unknown top-level keys and root-level hook events;
events must remain under `hooks`.

## Testing
- `just test -p codex-config`

charlesgong-openai · 2026-06-26 11:24:12 -04:00

2c5bc5e284

[codex] narrow unused skills intro export (#29991 )

## Summary

- stop publicly re-exporting the internally used
`SKILLS_INTRO_WITH_ALIASES` constant
- keep the constant and all skills rendering behavior unchanged
- preserve every integration helper, API, fixture, assertion, and module
used by tests

## Scope guardrails

This revision keeps all remote/network-facing functionality and every
line introduced by `jif <jif@openai.com>`.

Following the test-preservation audit, it also restores the in-process
RMCP test transport, the original `codex-mcp` fixture,
`PluginLoadOutcome::effective_skill_roots` and its assertions, the
`EffectiveSkillRoots` API family, the test-only apps renderer, and the
TUI dead-code annotation. Those files now match the PR base exactly.

No test imports or directly references the remaining public skills
export being narrowed.

## Validation

- repository-wide test-reference audit: no test-used code remains
deleted or narrowed
- deleted-line `git blame` audit: zero Jif-authored deletions
- `cargo test -p codex-core-plugins -p codex-mcp -p codex-rmcp-client
--lib`: 467 passed
- `cargo test -p codex-core --lib apps::render`: 2 passed
- `cargo test -p codex-core-skills --lib render::tests`: 19 passed
- `cargo check -p codex-core-skills --all-targets`: passed
- `just fix -p codex-core-skills`: passed
- `just fmt`: passed
- `git diff --check`: passed

The full local `codex-core-skills` suite passed 106/108 tests; two
loader tests detected an ambient repository skills root outside the
package and failed their isolation assertions. The scoped renderer suite
and all-target compile pass, and CI runs in an isolated environment.

Final code delta: 1 insertion, 2 deletions across 2 files.

Ahmed Ibrahim · 2026-06-26 05:52:04 -07:00

914c8eeb4e

Test selected capabilities across unavailable resume (#30215 )

## Why

The selected-capability integration test already covers initial
attachment and cold resume, but it resumes while the selected executor
is still reachable.

That leaves an important World State transition untested: a thread
remembers its selected capability root, resumes while that environment
is unavailable, and later sees the same stable environment return.

## What this tests

This extends the existing end-to-end scenario:

```text
selected executor available
        ↓
app-server stops and the executor goes away
        ↓
thread resumes with the executor unavailable
        ↓
skills, selected MCP tools, and connector attribution are absent
        ↓
the same environment ID is attached again
        ↓
skills, MCP tools, and connector attribution return
```

The test also checks that the unavailable snapshot explicitly tells the
model that no selected-environment skills are currently available. After
reattachment, it invokes the selected skill again and verifies that a
new executor-owned MCP process starts.

## Scope

This is test-only. It keeps the existing assumption that an environment
ID refers to stable capability contents. It does not add package-file
invalidation or live transport reconnect behavior.

jif · 2026-06-26 11:02:27 +01:00

3c03bb4f18

Reuse MCP runtimes when selected availability changes nothing (#30148 )

## Why

MCP runtime reuse was keyed by every ready selected-capability
environment, even when an environment contributed no MCP servers or
connectors.

For example:

1. a global stdio MCP is running;
2. a selected remote environment contains only a skill;
3. that environment becomes ready;
4. the MCP and connector projection stays exactly the same;
5. Codex nevertheless rebuilds the MCP manager and restarts the global
stdio process.

That restart can interrupt active calls and discard process-local state
even though nothing about MCP changed.

## What changes

When selected-environment availability changes, Codex now resolves the
candidate MCP and connector projection before deciding whether to
replace the runtime:

- if the winning MCP servers or their ownership change, rebuild as
before;
- if the selected connector snapshot changes, rebuild as before;
- if an enabled MCP is explicitly bound to an environment whose
availability changed, rebuild as before;
- otherwise, keep the exact live manager and processes, and update only
the availability input remembered by the snapshot.

```text
ready selected environments:  [] -> [skills-env]
resolved MCP servers:          {global_probe} -> {global_probe}
resolved connectors:           {} -> {}
result:                         reuse manager; keep the same process
```

The comparison uses the resolved winning servers and their sources, so
plugin/config ownership remains part of the runtime identity.

## Existing stack coverage

The integration PR directly below this one already covers both rebuild
boundaries: a selected MCP becomes callable and a selected connector
tool becomes model-visible when their environment becomes available. It
also verifies that an unchanged selected MCP runtime keeps its process.

This PR does not add another remote-attachment integration scenario for
the no-change optimization. `environment/add` returns before readiness,
and app-server does not currently expose a deterministic readiness
signal for an environment that contributes only skills. Keeping a
fixed-delay test would add flake risk; adding a new readiness API would
be outside this fix.

## Scope and assumptions

- This does not change skill discovery, World State rendering, or plugin
metadata caching.
- This does not add file watching or hot reload behavior.
- This does not change disconnect/reconnect handling.
- Selected environment IDs and their capability contents retain the
stack's existing stability assumption.
- Delayed `required = true` executor MCP behavior remains out of scope.

jif · 2026-06-26 09:27:41 +01:00

6d2168f06a

[codex] fix CreateThreadParams test initializer (#30198 )

## Summary

- initialize `selected_capability_roots` in the new
`attach_in_memory_thread_store` test helper
- restore `codex-core` test compilation on `main`

## Root cause

[#30144](https://github.com/openai/codex/pull/30144) added the helper
from commit `0c3d0742`, whose parent was `c38b2e9b`. That branch was
based before [#29856](https://github.com/openai/codex/pull/29856) added
`selected_capability_roots` as a required field on `CreateThreadParams`.

The PR's Rust and Bazel workflows both passed against the stale branch
head `0c3d0742`. When #30144 was squashed onto newer `main`, its
initializer was integrated alongside the required field from #29856,
producing `E0063` in `core/src/session/tests.rs`. Because those
workflows tested the branch head rather than the integrated merge
result, they did not see the version-skew failure before merge.

## Impact

Any job that compiles the `codex-core` library tests fails, which turned
the main-branch `rust-ci-full` and `Bazel` workflows red across
platforms and blocks unrelated focused core tests. This change only
completes the test initializer; it does not alter production behavior or
workflow configuration.

## Validation

- `just fmt`
- `just test -p codex-core
turn_complete_flushes_terminal_event_after_delivery` (1 passed, 2909
skipped)
- `git diff --check`

Adam Perry @ OpenAI · 2026-06-26 08:47:27 +01:00

451c0a437f

[codex] wire process-owned code mode host into core (#30142 )

## Summary

- add the `code_mode_host` feature flag and select
`ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
- initialize code-mode sessions lazily so a missing host reports a tool
error without failing thread startup
- resolve `codex-code-mode-host` beside the running Codex binary by
default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
- add unit and end-to-end coverage for host resolution and graceful
missing-host behavior

## Why

This wires the process-owned session client from #30112 into the core
service behind an opt-in rollout gate. Packaged Codex installations can
place the helper in the same `bin` directory as the main executable
without relying on `PATH`, while development and custom installations
can continue to override the helper path.

## Stack

- Depends on #30112
- Base branch: `cconger/process-owned-session-runtime-4-client`

## Validation

Build `codex` and `codex-code-mode-host`
`CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
./target/debug/codex --enable code_mode_host`

Channing Conger · 2026-06-26 00:23:33 -07:00

7d8906b478

[codex] add process-owned code-mode session client (#30112 )

## Summary

- add `ProcessOwnedCodeModeSessionProvider` and logical session
generation/rebinding state
- add the supervised child-process connection, reader/writer tasks, and
driver state machine
- make dropped execute/wait/open callers cancellation-safe with explicit
ownership handoff and durable cleanup
- validate cell/delegate lifecycle state and reject invalid protocol
transitions
- add end-to-end stdio coverage for delegates, cancellation, frame
limits, child loss, stale generations, replacement, and long-lived
sessions

## Why

This final stage exposes the process-owned client only after the wire
protocol, host-safe runtime, and standalone host are independently in
place. Transport failure is fail-stop: the client closes local state,
cancels callbacks, reaps the child, and lazily rebuilds a fresh host
generation rather than transactionally recovering the old connection.

## Stack

This is **4 of 4** in the process-owned code-mode session stack.

- Depends on #30111
- Full stack: #30108 → #30110 → #30111 → this PR

## Validation

- `just test -p codex-code-mode -p codex-code-mode-host` — 86 passed
- `just fix -p codex-code-mode`
- `just fix -p codex-code-mode-host`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `bazel test //codex-rs/code-mode:code-mode-unit-tests
//codex-rs/code-mode-host:code-mode-host-unit-tests
//codex-rs/code-mode-host:code-mode-host-stdio-test
//codex-rs/code-mode-protocol:code-mode-protocol-unit-tests` — 4/4
passed
- `just fmt`

Channing Conger · 2026-06-25 23:46:17 -07:00

ab16046c88

Persist Cloudflare affinity cookies for MCP HTTP (#29516 )

[Codex Thread
019ef1f9-36e2-7e91-9337-504f097b9dc1](https://codex-thread-link.openai.chatgpt-team.site/thread/019ef1f9-36e2-7e91-9337-504f097b9dc1)

## Why

Hosted plugin-service Streamable HTTP MCP traffic uses
`https://chatgpt.com/backend-api/ps/mcp` and depends on Cloudflare's
`__cflb` cookie for load-balancer affinity. The local and exec-server
`http/request` path built a fresh reqwest client for each request
without installing Codex's existing shared ChatGPT Cloudflare cookie
store, so affinity could be lost between calls.

This is an affinity-hardening change motivated by an incident
investigation. It does not establish the broader connector-cache
incident RCA or claim to fix that incident in full.

## What changed

- Install the existing process-local, strictly allowlisted ChatGPT
Cloudflare cookie store on the reqwest client used by
`ReqwestHttpClient`.
- Fresh clients now share allowed Cloudflare infrastructure cookies
within the process that originates the local or exec-server network
request.
- Keep the existing HTTPS ChatGPT-host and Cloudflare-cookie-name
restrictions. This does not introduce a general cookie jar or send
ChatGPT Cloudflare cookies to unrelated hosts.

## Test coverage

- `codex-client` unit coverage verifies that the existing strict store
accepts and returns `__cflb` for HTTPS ChatGPT URLs.
- The exec-server HTTPS integration test sends four independent
`http/request` calls through a local TLS-intercepting proxy and verifies
that:
- `Set-Cookie: __cflb=west` is sent on the next plugin-service request;
- a later `Set-Cookie: __cflb=central` replaces the stored value;
- non-Cloudflare session cookies are discarded;
- no stored ChatGPT Cloudflare cookie is sent to a non-ChatGPT host.
- `just test -p codex-client` — 38 passed.
- `just test -p codex-exec-server --test chatgpt_cloudflare_affinity` —
1 passed.
- `just bazel-lock-check` — passed.

## Non-goals

- No persistence of ChatGPT auth, account, session, residency, or
arbitrary cookies.
- No cookie persistence for third-party MCP servers.
- No special composition of caller-provided `Cookie` headers.
- No plugin-service, connector-cache, Habitat/habicache, routing,
redirect, or API-contract changes.
- No broader incident RCA conclusions.

stevenlee-oai · 2026-06-26 02:23:24 -04:00

b5866eebd6

Retry failed Codex Apps MCP startup (#29920 )

## Problem

The built-in Codex Apps MCP client shares a future for the full startup
operation: connect, complete `initialize`, fetch the initial tools, and
return a usable client. Sharing deduplicates startup work, but it also
memoizes terminal errors.

After a transient connection, handshake, or initial `tools/list`
failure, later tool builds observe the same failed future. The thread
cannot reconnect after the backend recovers and continues serving its
startup-time cached tool snapshot, which may be empty or stale.

## Fix

When Apps MCP startup ends in an error, Codex starts bounded recovery
without putting startup latency on tool-router construction:

1. The current tool build immediately continues with the cached startup
snapshot.
2. After the initial failure is reported, Codex starts one fresh full
startup attempt in the background.
3. Concurrent tool builds share that in-flight attempt and also continue
with cached tools.
4. On success, the recovered client becomes active, refreshes the Apps
tools cache, emits a `Ready` startup status, and is reused by later
operations.
5. On failure, the cache remains unchanged and later tool builds may
start another background attempt after exponential cooldown: 1s, 2s, 4s,
8s, 16s, then 30s maximum.

Each recreated startup performs a fresh MCP `initialize` and uncached
`tools/list`. The MCP client retains its existing bounded retries for
retryable `initialize` and `tools/list` failures.

This avoids adding the Apps startup timeout to every request during a
sustained outage.

## Scope

This is limited to the built-in Codex Apps MCP client:

- no reconnects for user-configured MCP servers;
- no cache deletion; and
- no proactive refresh for a healthy client with stale tools.

## Tests

Coverage verifies:

- tool builds return cached tools without waiting for a blocked
reconnect;
- concurrent tool builds start only one background reconnect;
- failed reconnects preserve cached tools and respect exponential
cooldown;
- a recovered client is retained and reused; and
- a long-lived thread exposes recovered app tools on a later follow-up.

Validation:

- `just test -p codex-mcp` — 95 passed
- `just test -p codex-core
later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
--no-capture` — passed
- `just fix -p codex-mcp`
- `just fmt`

kbazzi · 2026-06-25 21:31:12 -07:00

92d2e1df70

[codex] fix terminal rollout event durability (#30144 )

Currently session code does not flush the thread store after appending
the `TurnComplete` / `TurnAborted` events.

This isn't a problem in practice for local storage because append_items
itself effectively blocks, but any thread stores that buffer in
append_items and only commit on flush effectively never get these events
persisted.

The fix adds explicit rollout flushes at the terminal emitters after
normal completion and interruption.

Added test cases that assert the number of flushes when completing or
aborting turns. These are admittedly a little brittle and I'm open to
better ideas on how to add automated testing.

Tom · 2026-06-25 21:01:11 -07:00

f5f812389e

Test selected capabilities across availability and resume (#30157 )

## Why

This stack crosses World State, executor skills, selected plugin
metadata, MCP processes, connectors, dynamic environments, and resume.
This PR adds two end-to-end scenarios that validate those pieces
together.

Both tests enable `deferred_executor`, so they exercise the real
delayed-environment path.

## Scenario 1: availability across turns and resume

```text
1. Start a thread with one selected plugin root bound to E1.
2. E1 is unavailable.
   - executor skill is absent
   - selected MCP is absent
   - connector has no selected-plugin attribution
3. Start E1 and register the same stable environment ID.
4. Start a new turn.
   - the executor skill appears through World State
   - its body beats a colliding host skill
   - the selected MCP tool is advertised and executes inside E1
   - the connector is attributed to the selected plugin
5. Start another turn without changing E1.
   - the MCP PID stays the same, proving runtime reuse
6. Restart app-server and resume the thread.
   - durable selected-root intent is restored
   - skills, MCP, and connector attribution are restored
   - a new MCP PID proves ephemeral process state was rebuilt
```

## Scenario 2: availability changes inside one turn

```text
1. Start a turn while E1 is unavailable.
2. The first model sample sees no executor skill, MCP, or selected connector.
3. The turn pauses on request_user_input.
4. Start E1 and register it while that same turn is still active.
5. Continue the turn.
6. The very next model sample sees:
   - the executor skill catalog
   - the selected MCP tool
   - selected-plugin connector attribution
7. The model calls the MCP, and its output proves execution happened inside E1.
```

This second scenario specifically protects the aeon-style behavior:
capability state is captured again for every sampling step, not only at
the next user turn.

## Scope

These are integration tests only. They do not add a combinatorial matrix
for unsupported plugin-file mutation, environment generations, transport
disconnects, or delayed `required = true` executor MCPs.

jif · 2026-06-26 03:11:55 +01:00

25f50de6ed

[codex] allow CCA image generation and web search extensions (#29909 )

## Summary

- allow the standalone image-generation and web-search extensions for
the actor-authorized provider shape used by CCA
- preserve builtin `image_generation` and `web_search` for older models
and existing flows
- keep ordinary non-OpenAI providers excluded from both extensions
- remove only the image extension local managed-AuthManager requirement
that CCA cannot satisfy
- share actor-authorization detection through `ModelProviderInfo`
- keep Core tests focused on routing behavior and cover header-shape
edge cases in `model-provider-info`
- add a Responses Lite regression that verifies both
`image_gen.imagegen` and `web.run`

## Why

CCA uses a provider named `local` with `requires_openai_auth: false` and
a non-empty `x-openai-actor-authorization` header. Core accepts that
provider shape, but both extension provider-name gates rejected it;
image generation additionally required a Codex-managed login.

The standalone paths must coexist with existing builtin tools. New
Responses Lite models can receive `image_gen.imagegen` and `web.run`,
while older models continue using builtin tools.

## Impact

This enables both standalone extensions for CCA once installed
downstream, without removing or changing builtin-tool compatibility for
older models.

## Validation

- `just test -p codex-core
responses_lite_exposes_standalone_tools_for_actor_authorized_provider`
- `just test -p codex-core
responses_lite_uses_standalone_web_search_and_image_generation`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-image-generation-extension`
- `just test -p codex-web-search-extension`
- `just test -p codex-model-provider-info`
- `just fmt`
- `git diff --check`

Won Park · 2026-06-25 18:34:35 -07:00

0d4351c1b8

Expose MCP app identity in app context (#29934 )

## Why

MCP tool-call events need to expose trusted app identity and action
metadata directly so v2 clients do not have to infer it from tool names
or resource URIs.

## What changed

- Add optional `appName`, `templateId`, and `actionName` fields to MCP
tool-call `appContext`.
- Populate `appName` and `templateId` from trusted Codex Apps metadata,
and derive `actionName` from the trusted app resource metadata.
- Preserve all three fields through core events, legacy protocol events,
persisted thread history, resume redaction, and app-server v2 responses.
- Document the public `appContext` fields in
`codex-rs/app-server/README.md`.
- Regenerate app-server JSON and TypeScript schemas and add coverage for
serialization, persistence, redaction, and metadata propagation.

## Validation

- `just test -p codex-app-server-protocol mcp_tool_call`
- `just test -p codex-core
mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
mcp_tool_call_item_includes_app_identity`
- `just write-app-server-schema`

---------

Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>

Martin Au-Yeung · 2026-06-25 18:31:10 -07:00

ec300bc7bd

Keep MCP elicitation routable across runtime refreshes (#30127 )

## Why

An MCP tool call can still be waiting for an elicitation response when
an environment update replaces the thread's MCP runtime.

Before this change:

```text
runtime A starts a tool call and asks the user
environment becomes ready, so runtime B is published
client answers the prompt through runtime B
runtime B cannot find runtime A's pending responder
```

The response is lost and the original tool call stays blocked.

## What changed

All MCP runtimes for one thread now share a small elicitation router:

```text
runtime A ---\
               shared router: response token -> exact pending responder
runtime B ---/
```

When Codex surfaces an MCP elicitation, it assigns a unique opaque
response token. The router records which pending request owns that
token. A replacement runtime reuses the same router, so the latest
runtime can deliver a response to a request started by the previous
runtime.

The Codex-owned token also prevents two runtime connections that reuse
the same MCP server request ID from receiving each other's responses.

This does not retain or search old MCP managers. Only the pending
responder map is shared.

## Covered scenario

The integration test exercises the complete failure mode:

1. A thread starts while its selected environment is still unavailable.
2. A configured MCP server starts a tool call and asks the client for
input.
3. The environment becomes ready, causing Codex to publish a replacement
MCP runtime.
4. The client answers the original prompt after the replacement.
5. The original tool call receives that answer and completes.

A focused routing test also creates two runtimes with the same server
request ID and verifies that each response reaches the exact request
that emitted its token.

## Scope

This PR changes only elicitation response routing across MCP runtime
replacement. It does not change when runtimes are rebuilt, which
environments contribute MCP configuration, or how environment
availability is detected.

jif · 2026-06-26 01:28:14 +00:00

fb8598df3f

Reinject missing World State fragments on resume (#30152 )

## Why

World State restores its structured snapshot on resume so unchanged
sections do not have to be rendered again. That is safe only when the
model-visible fragment represented by the snapshot is still present in
retained history.

For selected executor skills, the failing selected-capability scenario
exposed this state:

```text
persisted World State: selected skill catalog is known
retained model history: selected skill catalog message is missing
next diff: unchanged, so emit nothing
```

The model resumes without being told about the selected skill catalog.

## What changed

World State contributions may now optionally describe the concrete
model-visible fragment that must remain in retained history.

When a persisted snapshot is present:

```text
matching retained fragment exists -> trust snapshot, emit nothing
matching retained fragment missing -> treat section as absent, render current state once
```

The skills extension uses this for non-empty selected-environment
catalogs by matching its exact rendered catalog body. Empty or hidden
catalogs do not require a fragment.

## Scope

This does not clear or rebuild the whole World State baseline. It does
not change skill discovery, cache invalidation, environment
availability, or MCP runtime behavior. It only keeps a persisted section
snapshot and its retained model context consistent across resume/history
reconstruction.

## Coverage

A focused World State regression test verifies both sides:

- a missing retained fragment is rendered again
- a matching retained fragment avoids duplicate injection

jif · 2026-06-26 02:18:00 +01:00

723b23efd0

[codex] Attribute app-server analytics by thread originator (#29935 )

## Why

Desktop Work threads and regular Codex threads can share the same
app-server connection. App-server analytics currently copy
`product_client_id` from connection metadata for every thread-scoped
event, so Work thread activity is attributed to the Desktop connection
instead of the thread's resolved originator. This prevents analytics
from distinguishing the two products on a shared connection.

## What changed

- Publish the resolved originator after a thread is materialized,
covering new, resumed, forked, and subagent threads.
- Store that originator in the analytics reducer's existing per-thread
state.
- Override only `app_server_client.product_client_id` for thread, turn,
tool, review, goal, guardian, and compaction events while preserving the
connection's client name, version, and transport metadata.
- Fall back to the connection-wide product client ID when a thread has
no originator override.
- Preserve persisted originators in thread initialization analytics for
resume and fork flows.

## Validation

- `just test -p codex-analytics
thread_originator_overrides_shared_connection_across_thread_events
subagent_events_keep_thread_originator_with_explicit_turn_connection`
- `just test -p codex-app-server
turn_start_tracks_thread_originator_in_analytics
thread_start_tracks_thread_initialized_analytics
thread_fork_tracks_thread_initialized_analytics
thread_resume_tracks_thread_initialized_analytics`
- `just test -p codex-core thread_manager`

alexsong-oai · 2026-06-25 18:15:48 -07:00

841f30598c

7919 Commits