codex

[codex] Add comp_hash to model metadata (#27532 )

## Summary
- add optional `comp_hash` metadata to `ModelInfo`
- update `ModelInfo` fixtures for the shared schema change
- keep older model responses compatible by defaulting the field to
`None`

## Why
The models endpoint needs an opaque identifier for compaction-compatible
model configurations. This PR only exposes that value in model metadata;
it does not add it to turn context or change runtime behavior.

Follow-up #27520 carries the value through turn context and rollouts,
then uses it to trigger compaction.

## Stack
- based directly on `main`
- replaces #27519, which was accidentally merged into the wrong base
branch
- functionality follow-up: #27520

## Testing
- `just test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `just fix -p codex-core -p codex-protocol -p codex-analytics -p
codex-models-manager`

Ahmed Ibrahim · 2026-06-10 20:42:55 -07:00

e614fad02e

feat: add Bedrock API key as a managed auth mode (#27443 )

## Why

Codex needs to manage Amazon Bedrock API key credentials through the
existing auth lifecycle instead of introducing a separate auth manager
or provider-specific credential file. Treating Bedrock API key login as
a primary auth mode gives it the same persistence, keyring, reload, and
logout behavior as the existing OpenAI API key and ChatGPT modes.

The credential is valid only for the `amazon-bedrock` model provider.
OpenAI-compatible providers must reject this auth mode rather than
treating the Bedrock key as an OpenAI bearer token.

## What changed

- Added `bedrockApiKey` as an app-server `AuthMode` and
`CodexAuth::BedrockApiKey` as a primary `AuthManager` mode.
- Added `BedrockApiKeyAuth`, containing the API key and AWS region, to
the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or
the configured keyring backend.
- Added `login_with_bedrock_api_key(...)`, parallel to
`login_with_api_key(...)`, which replaces the current stored login with
Bedrock credentials.
- Reused generic auth reload and logout behavior instead of adding a
Bedrock-specific auth manager or logout path.
- Updated login restrictions, status reporting, diagnostics, telemetry
classification, generated app-server schemas, and auth fixtures for the
new mode.
- Added explicit errors when Bedrock API key auth is selected with an
OpenAI-compatible model provider.

This PR establishes managed storage and auth-mode behavior. Routing the
managed key and region into Amazon Bedrock requests will be in follow-up
PRs.

Celia Chen · 2026-06-10 20:42:38 -07:00

06afd63f4a

[codex] Add new context window tool (#27488 )

## Why

The token budget feature tells the model how much room remains in the
current context window. When the model decides the current window is no
longer useful, it needs a way to ask Codex to start over with a fresh
context window without spending tokens on a compaction summary.

This PR adds that model-requestable escape hatch on top of #27438.

## What changed

- Added a direct-model-only `new_context` tool behind
`Feature::TokenBudget`.
- Stores the tool request on `AutoCompactWindow` and consumes it after
sampling so the next follow-up request in the same turn starts in the
new window.
- Starts the new window as a no-summary compaction checkpoint that
contains only fresh initial context, not preserved conversation history.
- Keeps the new window aligned with token-budget startup context,
including the `Current context window Z` message.
- Added integration coverage and a snapshot showing the same-turn
`new_context` flow into a fresh full-context follow-up request.

## Validation

- `just test -p codex-core token_budget`

pakrym-oai · 2026-06-11 03:39:07 +00:00

87ab01834a

tools: simplify default tool search text (#27526 )

## Why

Default tool search text currently derives identity from both `ToolName`
and `ToolSpec`. For function and namespace specs, this indexes the same
names more than once and also adds a flattened `{namespace}{name}` token
that is not model-visible.

## What changed

- Derive default search text entirely from `ToolSpec` while preserving
names, descriptions, namespace metadata, and recursive schema metadata.
- Keep the default search-text builder private and remove the unused
`ToolName` argument.
- Add coverage for the exact search text generated for a namespaced tool
with nested schema metadata.

## Example

For the `codex_app` namespace and `automation_update` tool (schema terms
omitted):

- Before: `codex_appautomation_update automation update codex_app
codex_app Manage Codex automations. automation_update automation update
...`
- After: `codex_app Manage Codex automations. automation_update
automation update ...`

## Testing

- `just test -p codex-tools`

sayan-oai · 2026-06-11 03:37:25 +00:00

728b8243a9

[codex] Expand hosted web search citation guidance (#27501 )

## Summary

- Expand the hosted web search prompt with explicit Markdown-link
citation guidance.
- Keep internal `turnX` reference IDs out of final responses and place
citations next to supported claims.

## Context


https://openai.slack.com/archives/C0AU83S0ZQU/p1781133381448499?thread_ts=1780352049.512299&cid=C0AU83S0ZQU

## Test plan

- Confirmed `codex-rs/ext/web-search/web_run_description.md` exactly
matches the supplied target prompt.
- `UV_CACHE_DIR=/tmp/codex-uv-cache
PATH=/tmp/codex-just/bin:/home/dev-user/.rustup/toolchains/1.95.0-x86_64-unknown-linux-gnu/bin:$PATH
python3 scripts/format.py --check`
- `git diff --check`

yuning-oai · 2026-06-11 03:30:44 +00:00

f1f9d0c3b5

[codex] Add token budget context feature (#27438 )

## Why

The model should be able to see bounded context-window budget metadata
when the `token_budget` feature is enabled. The full-window message is
only injected with full context, while normal turns get a smaller
follow-up only when reported usage first crosses a budget threshold.

## What changed

- Added the `TokenBudget` feature flag.
- Added `<token_budget>` developer fragments for full context-window
metadata and current-window remaining tokens.
- Inserted the threshold message during normal turn handling by
comparing token usage before and after sampling, avoiding persistent
threshold bookkeeping.
- Added core integration coverage for full-context-only metadata and
25/50/75 percent threshold messages.

## Verification

- `just test -p codex-core token_budget`
- `git diff --check`

pakrym-oai · 2026-06-10 20:07:06 -07:00

658af936fd

Trim TUI legacy telemetry and migration dependencies (#27487 )

## Why

The TUI still reached through `codex-app-server-client::legacy_core` for
process telemetry setup and personality migration, exposing core-only
details after the TUI moved onto the app-server layer.

This is part of our ongoing efforts to whittle away at the legacy_core
shim that was left over after migrating the TUI to the app server.

This change is just a refactor/rename and should be behavior-neutral and
low risk.

## What changed

- expose OTEL provider construction through the app-server client and
keep the small process/SQLite telemetry adapters local to the TUI
- collapse personality migration results to the config-reload decision
the TUI needs
- remove the `legacy_core::otel_init` and
`legacy_core::personality_migration` subnamespaces

Eric Traut · 2026-06-10 19:50:57 -07:00

ab4ce40042

core: resize all history images behind a feature flag (#27247 )

## Summary

Adds complete client-side image preparation behind the default-off
`resize_all_images` feature flag.

When enabled, local image producers defer decoding and resizing. Images
are prepared centrally before insertion into conversation history,
covering user input, `view_image`, and structured tool-output images.

## Behavior

- Processes base64 `data:` images in messages and function/custom tool
outputs.
- Leaves non-data URLs, including HTTP(S) URLs, unchanged.
- Applies image-detail budgets:
  - `high` and omitted: 2048px maximum dimension and 2.5K 32px patches.
  - `original`: 6000px maximum dimension and 10K 32px patches.
  - `auto`: uses the same 2048px / 2.5K-patch budget as high.
  - `low`: unsupported and replaced with an actionable placeholder.
- Preserves original image bytes when no resize or format conversion is
needed.
- Enforces the shared 1 GiB encoded and decoded data-URL sanity limits.
- Replaces only an image that fails preparation, preserving sibling
content and tool-output metadata.
- Uses bounded placeholders distinguishing generic processing failures,
oversized images, and unsupported `low` detail.
- Prepares resumed and forked history before installing it as live
history without modifying persisted rollouts.

## Flag-Off Behavior

When `resize_all_images` is disabled:

- Existing local user-input and `view_image` processing remains
unchanged.
- Existing decoding and error behavior remains unchanged.
- Arbitrary tool-output images are not processed.
- HTTP(S) image URLs continue to be forwarded unchanged.


#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/27245
- 👉 `2` https://github.com/openai/codex/pull/27247
- ⏳ `3` https://github.com/openai/codex/pull/27246
- ⏳ `4` https://github.com/openai/codex/pull/27266

Curtis 'Fjord' Hawthorne · 2026-06-10 19:21:24 -07:00

a6f435ea94

Add session delete commands in CLI and TUI (#27476 )

## Summary

The app server exposes `thread/delete`, but users cannot invoke it from
the CLI or TUI. Because deletion is irreversible, the user-facing
commands need deliberate confirmation and safer handling of name-based
targets.

- Add `codex delete <SESSION>` with interactive confirmation,
restricting `--force` to UUID targets.
- Resolve exact names across active and archived sessions, including
renamed sessions, and validate prompted UUID targets before
confirmation.
- Add a `/delete` command with a confirmation popup that warns the
current session and its subagent threads will be permanently deleted.

## Manual testing

- Deleted by UUID with `--force` and verified the rollout, session-index
entry, and database row were removed.
- Exercised name-based confirmation for both cancellation and
affirmative deletion; cancellation preserved the session and
confirmation removed it.
- Verified deletion refuses to proceed without `--force`, while
`--force` rejects names, including duplicate names.
- Verified duplicate-name confirmation displays the concrete UUID
selected.
- Deleted an archived session by name.
- Verified an already-missing UUID fails before displaying a
confirmation prompt.
- Exercised `/delete` in the TUI: the popup defaults to No, cancellation
preserves the session, and confirmation deletes the session and exits.
- Verified that `codex delete` works for both archived and non-archived
sessions.

Eric Traut · 2026-06-10 18:04:02 -07:00

9d87b771ce

Remove TUI legacy core test_support dependencies (#27484 )

## Why

The TUI now sits on the app-server layer, but
`app-server-client::legacy_core` still exposed core test helpers solely
for TUI tests. We've been whittling away the remaining dependencies.
This is the next step on that journey.

There is no functional change — just a refactor, and this affects only
test code, so it should be low risk.

## What changed

- remove the `legacy_core::test_support` re-export and call
model-manager test helpers directly
- keep the bundled model-preset cache local to TUI test support
- import constraint types directly from `codex-config`

Eric Traut · 2026-06-10 17:55:49 -07:00

36fc79c6f4

[codex] Remove redundant plugin app auth state (#27465 )

## Summary

- remove the redundant `needsAuth` field from `AppSummary` and generated
app-server schemas
- stop `plugin/read` from querying Apps MCP solely to hydrate unused
connector auth state
- preserve `plugin/install.appsNeedingAuth` membership and
`app/list.isAccessible` as the authentication signals

## Why

Codex App and TUI do not consume `plugin/read.plugin.apps[].needsAuth`.
Hydrating it could establish an Apps MCP connection and discover tools
on a cold `plugin/read` request, adding avoidable latency. The plugin
APIs are still marked under development, so removing this wire field is
preferable to retaining a misleading default.

## Verification

- `just write-app-server-schema`
- `just fmt`
- `just test -p codex-app-server-protocol`
- `just test -p codex-app-server
plugin_install_uses_remote_apps_needing_auth_response`
- `just test -p codex-app-server
plugin_install_returns_apps_needing_auth`
- `just test -p codex-app-server
plugin_read_returns_plugin_details_with_bundle_contents`
- `just test -p codex-tui
plugin_detail_popup_snapshot_shows_install_actions_and_capability_summaries`
- `$xin-build` simplify and debug reviews

xl-openai · 2026-06-10 17:33:56 -07:00

1a9efd473b

core: cache turn diff rendering (#27489 )

## Summary

Turn diff updates repeatedly rendered and serialized the entire
accumulated diff after every `apply_patch`. The event path also rendered
once before updating the tracker solely to test whether a diff existed.
In production feedback CODEX-20PW, 2,589 patches across 72 paths
produced 401 notifications totaling 441 MB, with the hottest paths
patched 518 and 495 times.

This change:

- replaces the pre-update render with a cheap cached-state check
- caches each rendered file diff by path and content revision, so an
update only invokes Myers for affected paths
- caches the deterministic aggregate diff so event emission and turn
completion reuse it without recomputation
- preserves invalidation and net-zero clear notifications
- applies a 100 ms per-file `similar` timeout; ordinary files complete
far below this threshold, while pathological rewrites fall back to a
coarse unified hunk that still represents the exact final contents

The 100 ms deadline bounds synchronous tool-completion latency while
leaving substantial headroom for normal diffs. The regression test
applies the fallback diff through the repository's patch parser and
verifies byte-for-byte final contents.

## Validation

- `cargo test -p codex-core turn_diff_tracker::tests` (14 passed)
- `cargo test -p codex-core tools::events::tests` (4 passed)
- `just fix -p codex-core`
- `just fmt`

Focused coverage verifies that 42 updates across two files perform 42
file renders rather than repeatedly rendering the accumulated set,
unchanged paths are not re-diffed, clear events remain correct, and a
48,000-line near-total rewrite returns promptly and applies to the exact
expected result. The full `codex-core` suite was not used as the final
gate because an unrelated existing multi-agent test hit a stack overflow
when run during investigation.

## Bug context

- Sentry feedback: CODEX-20PW
- Correlation IDs: `019eb2a9-13d2-74e0-b690-27ee224ffb6d`,
`019e9ad7-09c3-7cb2-b728-ee3acba103ab`

Jeremy Rose · 2026-06-10 17:17:44 -07:00

b389b950e1

[codex] Preserve build-script dependencies in rules_rs annotations (#27322 )

## Why

Bazel compiles Cargo build scripts in the exec configuration. For
`openssl-sys`, that means the target-specific optional `openssl-src`
dependency can disappear when producing musl release binaries, even
though the build script still needs the vendored source crate.

## What changed

Patch `rules_rs` to expose its existing unconditional
`build_script_deps` input through `crate.annotation`, then annotate
`openssl-sys` with the pinned `openssl-src` target. Target-derived build
dependencies continue to use the existing selected dependency path.

## Validation

- `just bazel-lock-check`

Stack: 2 of 6. Follows #27321.

Adam Perry @ OpenAI · 2026-06-10 17:08:35 -07:00

ac9c534c21

[codex-analytics] emit internally started turn events (#27392 )

## Why
Currently, the analytics reducer omits `codex_turn_event` for internally
started subagent turns
- It uses `TurnState.connection_id` to select app-server client and
runtime metadata
- `turn/start` sets this field for client-started turns, while internal
subagent turns bypass that path
- Spawned child threads inherit the correct connection, but turn
emission does not use thread state

## What Changed
- Keeps explicit `TurnState.connection_id` authoritative for
client-started turns
- Falls back to the matching thread’s inherited connection when the turn
connection is absent
- Preserves completeness gates, event schema, and post-emission state
removal
- Extends subagent lifecycle test coverage

## Verification
- `just test -p codex-analytics` (71 tests passed)
- `just fix -p codex-analytics`
- `just fmt`

marksteinbrick-oai · 2026-06-10 15:35:41 -07:00

b39f943a63

image: add shared data URL preparation utilities (#27245 )

## Summary

Add shared image-processing primitives needed for centralized image
preparation in a follow-up PR.

- Add `load_data_url_for_prompt` for decoding and preparing base64 image
data URLs.
- Add configurable maximum-dimension and 32px patch-budget resizing.
- Enforce a 1 GiB sanity limit on both encoded and decoded data-URL
representations.
- Preserve original PNG, JPEG, and WebP bytes when resizing is
unnecessary.
- Preserve the existing GIF-to-PNG behavior.
- Move image utility tests into the existing sidecar test module.

## Behavior

This PR is intended to be runtime behavior-preserving.

Existing production callers continue using
`PromptImageMode::ResizeToFit` and `PromptImageMode::Original` with
their existing semantics. The new data-URL entrypoint and configurable
resize mode have no production callers in this PR; they are used by the
next PR in the stack.

This PR does not change user-input handling, `view_image`, history
insertion, request construction, HTTP image URL forwarding, or
app-server behavior.


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/27245
- ⏳ `2` https://github.com/openai/codex/pull/27247
- ⏳ `3` https://github.com/openai/codex/pull/27246
- ⏳ `4` https://github.com/openai/codex/pull/27266

Curtis 'Fjord' Hawthorne · 2026-06-10 15:27:34 -07:00

c72205239f

[codex] Add reusable OTEL gauge instruments (#27057 )

## Why

Exec-server observability needs current-value measurements in addition
to counters. The reusable OTEL client should expose that primitive
without coupling it to exec-server runtime behavior.

## What changed

- Adds integer gauge instruments, with optional descriptions.
- Caches gauges by name and description so instrument metadata remains
part of the declaration identity.
- Covers gauge values, descriptions, merged attributes, and OTLP HTTP
export.

This PR only adds the gauge primitive. It does not add second-based
duration histograms or exec-server adoption.

## Stack

1. #26091: counter descriptions
2. **#27057: gauge instruments**
3. #27058: second-based duration histograms

Related independent coverage: #27059 tests OTLP HTTP log and trace event
export.

## Validation

- `just test -p codex-otel`
- `just fix -p codex-otel`
- `just fmt`

richardopenai · 2026-06-10 21:36:38 +00:00

7e5e41daea

Forward standalone assistant output to realtime (#27319 )

## Why

When a realtime session is open without an active frontend-model
handoff, completed Codex assistant messages are currently dropped. That
prevents the frontend model from hearing orchestrator preambles and
final responses produced by typed turns or other non-handoff work, which
makes the two models present as disconnected personas.

Active handoffs already forward each completed assistant message,
including preambles. This change leaves those V1 and V2 paths intact and
fills only the no-active-handoff gap.

## What changed

- Send standalone V1 assistant messages through
`conversation.handoff.append` with a stable synthetic handoff ID
- Send standalone V2 assistant messages as normal `[BACKEND]`
`conversation.item.create` message items, then enqueue `response.create`
so the frontend model responds
- Preserve the existing active V1 and V2 transport and completion
behavior
- Continue excluding user messages from realtime mirroring
- Skip empty output and cap each complete context injection, including
its V2 prefix, at 1,000 tokens
- Add end-to-end coverage for both wire formats, V2 response creation,
preambles, final responses, and truncation

## Test plan

- CI

guinness-oai · 2026-06-10 21:32:29 +00:00

22dd6ebc7d

[codex] reuse release artifacts for npm staging (#27312 )

The release job already downloads every workflow artifact into `dist`,
but npm staging creates a new cache and downloads the six target
artifacts again.

Reuse `dist` as the staging script's artifact cache while preserving the
existing download fallback for missing artifacts and standalone callers.
The script retains ownership of temporary caches but does not delete a
caller-provided directory.

In https://github.com/openai/codex/actions/runs/27242495616, the
duplicate
download transferred 3.3 GiB and took 4 minutes 13 seconds. This should
reduce total release time by about 4 minutes.

Tamir Duberstein · 2026-06-10 13:15:43 -07:00

387adc6c4b

[codex] Preserve disabled MCP servers across runtime overlays (#27414 )

## Why

Recent MCP runtime overlay changes replace same-name configured server
entries with compatibility or extension-provided configs. Those
replacement configs default to enabled, so an MCP server explicitly
configured with `enabled = false` could be initialized anyway.

The connection manager still filters disabled servers correctly, but the
configured disabled state was lost before initialization reached that
filter.

## What changed

- Remember MCP servers that are disabled in the configured view before
applying runtime fallbacks and extension overlays.
- Restore `enabled = false` for those servers after overlays, while
leaving all other overlay fields and `Remove` precedence unchanged.
- Add focused extension-backed regression coverage for a disabled
`codex_apps` server.

## Testing

- `just fmt`
- `just test -p codex-mcp-extension`
- `just fix -p codex-core`
- `just fix -p codex-mcp-extension`

The full workspace `just test` suite was not run.

e-provencher · 2026-06-10 16:11:20 -04:00

980f60b664

[codex] Skip local curated discovery for remote plugins (#27311 )

## Summary

- skip the local `openai-curated` marketplace before marketplace loading
when tool-suggest discovery uses remote plugins
- preserve existing marketplace listing behavior for all other callers
and when remote plugins are disabled
- add regression coverage proving the curated marketplace is excluded
before its malformed manifest can be read

## Why

Tool-suggest discovery previously loaded every local `openai-curated`
plugin manifest and only discarded that marketplace afterward when
remote plugins were enabled. The remote catalog is used in that mode, so
the local scan consumed CPU without contributing discoverable plugins.

## Impact

Remote-plugin tool suggestion discovery no longer reads the local
curated marketplace and its plugin manifests. `openai-bundled`,
configured marketplaces, normal `plugin/list` behavior, and local
curated discovery when remote plugins are disabled are unchanged.

## Validation

- `just test -p codex-core-plugins
list_marketplaces_can_skip_openai_curated_before_loading`
- `just test -p codex-core
list_tool_suggest_discoverable_plugins_omits_openai_curated_when_remote_enabled`
- `just fmt`
- `git diff --check`

xl-openai · 2026-06-10 13:11:09 -07:00

7011044c1c

[codex] add /import for external agents (#27071 )

## Why

External-agent import should be discoverable and deliberate without
blocking startup or claiming the public `codex [PROMPT]` CLI namespace.
The slash command keeps the flow local to the interactive TUI and reuses
the existing app-server import API.

## What changed

- add the user-facing `/import` slash command
- detect external-agent importable items only when the command is
invoked
- run imports through the embedded local app-server
- show start and completion messages, refresh configuration, and block
duplicate imports while one is pending
- reject the flow for unsupported remote and local-daemon sessions

## Validation

- `just test -p codex-tui external_agent_config_migration` (10 passed)
- manually exercised an isolated TUI fixture with existing
external-agent setup and session data using a fresh `CODEX_HOME`
- verified picker customization, plugin and session detection, import
completion, repeated invocation, and imported-session resume context
- the broader `just test -p codex-tui` run passed 2,805 tests, with 2
unrelated guardian feature-flag failures and 4 skipped tests

## Draft follow-ups

- review whether completion messaging should remain attached to the
initiating chat if the user switches chats during an import
- review shutdown semantics for an in-progress background import

## Stack

1. [#27064](https://github.com/openai/codex/pull/27064): remove the
startup migration flow
2. [#27065](https://github.com/openai/codex/pull/27065): extract the
picker renderer
3. [#27070](https://github.com/openai/codex/pull/27070): add the
external-agent import picker UX
4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow
through `/import`

**This PR is stack item 4.** Draft while the lower stack dependencies
are reviewed.

stefanstokic-oai · 2026-06-10 15:53:15 -04:00

b4445f2758

[codex] Move release platform rules into bazel package (#27321 )

## Intent

Keep release-specific Bazel helpers out of the shared Rust crate
definitions and colocate them with Bazel platform configuration.

## Implementation

Moves `multiplatform_binaries` and its platform list from `defs.bzl`
into `bazel/platforms/release_binaries.bzl` and updates the CLI load
site. Behavior is unchanged.

## Validation

- `bazel query //codex-rs/cli:release_binaries`

Stack: 1 of 6.

Adam Perry @ OpenAI · 2026-06-10 19:45:29 +00:00

1deae7bd4a

[codex] add external agent import picker UX (#27070 )

## Why

Users need to understand what external-agent data Codex detected, what
is selected, and how to proceed before an import begins. The updated
picker makes focus, selection state, and the submission path explicit
while preserving the existing import backend.

## What changed

- replace the old migration prompt with a two-step external-agent import
picker
- add a customize view with explicit item focus, selection state,
counts, and a review action
- separate detected import data into a view model
- add Unix and Windows snapshots for prompt, item-focus, and
action-focus states

## Validation

- `just test -p codex-tui external_agent_config_migration` (10 passed)
- manually exercised an isolated TUI fixture covering customization,
selection toggles, review, import, repeated invocation, and session
resume
- the broader `just test -p codex-tui` run passed 2,805 tests, with 2
unrelated guardian feature-flag failures and 4 skipped tests

## Review note

This is the largest layer in the stack because the interaction state,
rendering changes, and required snapshots move together. It remains a
draft in case reviewers prefer a further presentation/state split.

## Stack

1. [#27064](https://github.com/openai/codex/pull/27064): remove the
startup migration flow
2. [#27065](https://github.com/openai/codex/pull/27065): extract the
picker renderer
3. [#27070](https://github.com/openai/codex/pull/27070): add the
external-agent import picker UX
4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow
through `/import`

**This PR is stack item 3.** Draft while the lower stack dependencies
are reviewed.

stefanstokic-oai · 2026-06-10 15:19:37 -04:00

e3528434cd

Guard core test subprocess cleanup (#27343 )

## Why

Local integration-heavy `codex-core` CLI tests can time out or be
interrupted after spawning `codex exec`. Stopping only the direct child
is not enough: `codex exec` can leave grandchildren behind, including
`python3`/`python3.12` processes that get reparented to PID 1 and keep
running after the test is gone.

This PR fixes that failure mode directly for the affected CLI
integration tests, without changing production code or reducing local
test concurrency.

## What

- Run the `cli_stream` `codex exec` subprocesses through a small private
wrapper in `core/tests/suite/cli_stream.rs`.
- Spawn those subprocesses in their own process group before execution.
- Keep `.output()`-style stdout/stderr capture and the existing
30-second timeout behavior.
- Own each spawned process with a drop guard that kills the whole
process group on success, timeout, panic, or other early return.

The switch from `assert_cmd::Command` to `std::process::Command` is only
for these subprocess launches; `assert_cmd` does not expose a pre-spawn
hook for setting the process group.

## Verification

- `just test -p codex-core --test all responses_mode_stream_cli`

This is limited to core integration tests; it does not change production
`src` code paths.

Eric Traut · 2026-06-10 12:19:26 -07:00

13468115fc

feat: make ThreadStore available on ThreadExtensionDependencies (#27439 )

Generally useful for extensions.

Michael Bolin · 2026-06-10 15:17:15 -04:00

2e377ce5e5

[plugins] Inject remote_plugin_id into install elicitations (#26409 )

Summary
- Propagate cached remote plugin IDs through Codex plugin discovery.
- Inject `remote_plugin_id` and connector IDs into
`request_plugin_install` elicitation `_meta` from the resolved plugin.
- Keep the remote plugin ID out of the model-facing tool schema,
arguments, and result.

Validation
- `just test -p codex-tools`
- `just test -p codex-core-plugins`
- `just test -p codex-core
list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins`
- `just fix -p codex-tools`
- `just fix -p codex-core-plugins`
- `just fix -p codex-core`
- `git diff --check`
- `just test -p codex-core` was also attempted: 2,581 passed, 55 failed,
and 1 timed out across unrelated sandbox/environment-sensitive
integration tests.

Alex Daley · 2026-06-10 12:01:03 -07:00

020bf49346

[codex] extract external agent import picker renderer (#27065 )

## Why

The external-agent import picker is easier to review when its rendering
refactor lands separately from new state and interaction behavior. This
layer is intended to be behavior-neutral.

## What changed

- extract external-agent migration rendering into a dedicated `render`
module
- preserve existing behavior while separating presentation from
interaction logic
- establish a smaller foundation for the import picker UX in the next PR

## Validation

- `just test -p codex-tui external_agent_config_migration` (10 passed)

## Stack

1. [#27064](https://github.com/openai/codex/pull/27064): remove the
startup migration flow
2. [#27065](https://github.com/openai/codex/pull/27065): extract the
picker renderer
3. [#27070](https://github.com/openai/codex/pull/27070): add the
external-agent import picker UX
4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow
through `/import`

**This PR is stack item 2.** Draft while the lower stack dependency is
reviewed.

stefanstokic-oai · 2026-06-10 14:48:30 -04:00

72667f4f41

[codex] Retry transient Guardian review failures (#27062 )

## Background

Codex can use **Auto Review** for permission requests. Instead of asking
the user immediately, Codex starts a separate locked-down reviewer
session called **Guardian**, which returns a structured `allow` or
`deny` assessment.

The Guardian reviewer is itself a Codex session, so its model request
can fail for transient infrastructure reasons such as model overload,
HTTP connection failure, or response-stream disconnect. Today, any such
failure immediately ends the Auto Review attempt and blocks the action.

This PR adds bounded retries for failures that the existing protocol
explicitly identifies as transient.

Linear context:
[CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual)

## What changes

A Guardian review can now make at most **three total attempts**:

1. Run the review normally.
2. Retry after a jittered delay of roughly 180–220 ms if the first
attempt fails with an eligible error.
3. Retry after a jittered delay of roughly 360–440 ms if the second
attempt also fails with an eligible error.

All attempts share the original review deadline. Jitter spreads retries
from concurrent clients to reduce synchronized load during broader
outages. The retries do not reset the user's maximum wait time, and the
backoff waits terminate early if the review is cancelled or the deadline
expires.

Before retrying, the existing Guardian session lifecycle decides whether
the session remains usable. Healthy trunks are reused, broken trunks are
removed by the existing cleanup path, and ephemeral sessions continue to
clean themselves up.

The review still emits one logical lifecycle to clients. Recoverable
intermediate failures do not produce warnings or terminal events.

## Retry policy

### Retried up to twice

- model/server overload
- HTTP connection failure
- response-stream connection failure
- response-stream disconnect
- internal server error
- a final reviewer message that cannot be parsed as the required
Guardian assessment

### Not retried

- bad or invalid requests
- authentication failures
- usage limits
- cyber-policy failures
- errors without a structured category
- a request that already exhausted the lower-level Responses retry
budget
- a completed Guardian turn with no assessment payload
- prompt-construction failures
- Guardian review timeout
- cancellation or abort
- a valid `deny` assessment

The session-error classification uses `ErrorEvent.codex_error_info`; it
does not inspect error-message strings.

## Implementation notes

- `wait_for_guardian_review` preserves the complete `ErrorEvent`,
including structured `codex_error_info`.
- Guardian session failures preserve the original message and optional
structured `CodexErrorInfo`.
- The retry policy classifies the explicitly transient `CodexErrorInfo`
variants; unknown, absent, and deterministic categories are not retried.
- The Guardian session manager receives the caller's deadline rather
than creating a new timeout per attempt.
- Analytics record the final `attempt_count`.
- Retry orchestration does not add a separate session-cleanup protocol;
it relies on the existing trunk and ephemeral lifecycle decisions.

## Automated testing

Focused Guardian coverage verifies:

- every supported transient `CodexErrorInfo` is classified as retryable,
while absent and non-transient categories are not;
- structured transient session failure -> retry -> approval with the
healthy trunk reused;
- two invalid Guardian responses -> third attempt -> approval, with
exactly three requests;
- three invalid responses -> existing fail-closed result, with exactly
three requests and one terminal lifecycle;
- valid denial, missing payload, invalid request, timeout, cancellation,
and prompt/session construction failures are not retried;
- retry eligibility ends after the third attempt;
- retry delays use the shared exponential backoff helper and remain
within the expected jitter bounds;
- cancellation and deadline expiry interrupt the backoff wait;
- healthy trunks are reused across retryable failures;
- broken event streams remove the trunk through the existing lifecycle
cleanup;
- an ephemeral retry does not disturb a concurrent trunk review.

Validation performed:

- `just test -p codex-core guardian_review_
guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history
run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**;
- `just test -p codex-analytics` — **71 passed**;
- scoped Clippy fixes for `codex-core` and `codex-analytics` passed.

A prior full `codex-core` run had unrelated environment-sensitive
failures outside Guardian coverage.

## Manual QA

The focused integration tests use the local mock Responses server to
inspect exact request counts and emitted lifecycle events. They confirm
that retries are internal, a successful later attempt supplies the final
decision, non-retryable failures issue only one request, and exhausted
retries emit only one terminal result.

kbazzi · 2026-06-10 11:46:57 -07:00

ccf1a18518

[codex] Raise app-server recursion limit (#27421 )

## Summary

Unblock Rust release builds after tracing instrumentation increased the
async future query depth beyond rustc's default limit.

Set the `codex-app-server` crate recursion limit to 256. This changes
compilation only; runtime behavior is unchanged.

## Validation

- `just test -p codex-app-server`
- `cargo build --release --bin codex-app-server`

Adam Perry @ OpenAI · 2026-06-10 11:37:14 -07:00

42415443d0

[codex] remove blocking external agent migration flow (#27064 )

## Why

External-agent import should be initiated deliberately instead of
interrupting eligible TUI startups. This cleanup removes the blocking
startup flow before the replacement import experience is introduced
later in the stack.

## What changed

- remove the startup-blocking external-agent migration prompt
- remove the now-unused external migration feature gate
- remove the obsolete TUI app-server migration wrappers
- retain the dormant picker behind a module-scoped dead-code allowance
until the next stack item wires it back in
- keep normal TUI startup focused on entering Codex immediately

## Validation

- `bazel build --config=clippy //codex-rs/tui:tui
//codex-rs/tui:tui-unit-tests-bin`
- `just test -p codex-tui external_agent_config_migration` (8 passed)
- `just test -p codex-tui` (2,786 passed, 12 unrelated local
environment-sensitive failures, 4 skipped)
- `just fix -p codex-tui`
- `just fmt`

## Stack

1. [#27064](https://github.com/openai/codex/pull/27064): remove the
startup migration flow
2. [#27065](https://github.com/openai/codex/pull/27065): extract the
picker renderer
3. [#27070](https://github.com/openai/codex/pull/27070): add the
external-agent import picker UX
4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow
through `/import`

**This PR is stack item 1.**

stefanstokic-oai · 2026-06-10 14:25:04 -04:00

636cc11398

fix: Auto-recover from corrupted sqlite databases (#26859 )

Further investigation of the sqlite incidents showed that the problems
are due to corruption from the older version of SQLite that we recently
upgraded, and that the data is truly corrupted in the root database --
recovery of all data is not possible. Given that the data is
reconstructable from the rollouts on disk, we should just auto-backup
the database and let codex rebuild the rollout info from the disk
rollouts.

The new behavior is that appserver auto-backs-up and rebuilds (with logs
reflecting that behavior). The CLI now pops a message letting you know
this happened and the paths of the backed-up corrupt db and the new
database. There is also context added so that the desktop app can read
the rebuild info from it and inform the user with it.

David de Regt · 2026-06-10 11:24:29 -07:00

3691fe5b76

Add app-server thread/delete API (#25018 )

## Why

Clients can archive and unarchive threads today, but there is no
app-server API for permanently removing a thread. Deletion also needs to
cover the full session tree: deleting a main thread should remove
spawned subagent threads and the related local metadata instead of
leaving orphaned rollout files, goals, or subagent state behind.

## What

- Adds the v2 `thread/delete` request and `thread/deleted` notification,
with the response shape kept consistent with `thread/archive`.
- Implements local hard delete for active and archived rollout files.
- Deletes the requested thread's state DB row as the commit point, then
best-effort cleans associated state including spawned descendants,
goals, spawn edges, logs, dynamic tools, and agent job assignments.
- Updates app-server API docs and generated protocol schema/TypeScript
fixtures.

Eric Traut · 2026-06-10 11:22:12 -07:00

a19d43a40a

Add app-server background terminal process APIs (#26041 )

## Summary

Codex Apps needs app-server as the source of truth for chat-started
background terminals instead of guessing from local process trees.

This PR adds experimental v2 APIs to list and terminate background
terminals for a loaded thread using app-server process ids, so clients
can manage background terminals without local PID discovery.

## Changes

- `thread/backgroundTerminals/list` returns paginated background
terminal records with `itemId`, app-server `processId`, `command`,
`cwd`, nullable `osPid`, nullable `cpuPercent`, and nullable `rssKb`.
- `thread/backgroundTerminals/terminate` terminates one running
background terminal by app-server `processId` and returns whether a
process was terminated.
- Background terminal list and terminate operations use unified-exec
process manager state as their source of truth.

Eric Traut · 2026-06-10 11:18:09 -07:00

a1a8807e9d

[codex] Remove async_trait from ToolExecutor (#27304 )

## Why

We're now [discouraging use of
`async_trait`](https://github.com/openai/codex/pull/20242).

Removing use of `async_trait` from `ToolExecutor` yields a `codex_core`
debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine.

Stacked on #27299, this PR applies the trait change after the handler
bodies have been outlined.

## What

Changed `ToolExecutor::handle` to return an explicit boxed
`ToolExecutorFuture` instead of using `async_trait`.

Updated ToolExecutor implementors to return `Box::pin(...)`, reexported
the future alias through `codex-tools` and `codex-extension-api`, and
removed `codex-tools` direct `async-trait` dependency.

Adam Perry @ OpenAI · 2026-06-10 10:26:53 -07:00

2704ecea9a

Fix compressed rollout search path matching (#27407 )

## Why

`thread/search` found content inside compressed rollouts but could drop
the result when joining it with SQLite-backed thread metadata. Search
returned the physical `.jsonl.zst` path while SQLite retained the
logical `.jsonl` path, so exact path matching failed.

## What changed

- Key rollout search matches by their canonical logical `.jsonl` path,
independent of the on-disk representation.
- Canonicalize thread-list paths before joining them with content-search
matches.
- Update compressed-rollout coverage to assert the logical-path
contract.

## Validation

- Ran `just fmt`.
- Ran `git diff --check`.
- Tests and Clippy were intentionally left to CI.

jif · 2026-06-10 19:23:42 +02:00

d3abd8774e

Index visible thread list ordering (#27391 )

## Summary

- add partial SQLite indexes for visible thread lists ordered by
creation or update time
- match the `archived` and non-empty `preview` filters used by
`thread/list`
- add query-plan coverage for both supported sort orders

## Query performance

Benchmarked the production query shape on a snapshot of my database with
~10k threads before and after applying these indexes. The query selected
the full thread projection with `archived = 0`, `preview <> ''`, the
`openai` provider filter, and a page size of 201. Results are the mean
of 30 runs after 5 warmups:

| Query | Before | After | Speedup |
| --- | ---: | ---: | ---: |
| First page, `created_at_ms DESC` | 132.3 ms | 15.1 ms | 8.78x |
| First page, `updated_at_ms DESC` | 123.6 ms | 15.5 ms | 7.99x |
| Cursor page near row 4,000, `created_at_ms DESC` | 51.8 ms | 16.8 ms |
3.07x |
| Cursor page near row 4,000, `updated_at_ms DESC` | 52.4 ms | 17.1 ms |
3.06x |

Before this change, SQLite used `idx_threads_archived`, filtered the
candidate rows, and built a temporary B-tree for the requested ordering.
With the partial indexes, SQLite reads matching visible rows directly in
timestamp order and stops at the page limit. `EXPLAIN QUERY PLAN` no
longer reports `USE TEMP B-TREE FOR ORDER BY`.

The result rows were identical before and after. The two partial indexes
occupy approximately 168 KiB combined on this snapshot.

## Performance under contention

I noticed this issue on a database with high-contention and tried to use
simulated contention to validate the performance in that context.

A synthetic SQLite benchmark ran five concurrent readers, matching the
state database pool size, and fetched 101 rows per query. Results are
the median of three runs on fresh copies of the same database snapshot:

| Query | Before | After |
| --- | ---: | ---: |
| `created_at_ms` mean latency under saturation | 328 ms | 12 ms |
| `created_at_ms` throughput | 16 queries/s | 412 queries/s |
| `updated_at_ms` mean latency under saturation | 336 ms | 14 ms |
| `updated_at_ms` throughput | 15 queries/s | 357 queries/s |

For a burst of 100 queries queued through five connections, p95
completion time fell from 6.90 seconds to 226 ms for `created_at_ms`,
and from 6.31 seconds to 473 ms for `updated_at_ms`.

## Validation

- `just test -p codex-state` (135 tests passed)
- query-plan regression covers created-at and updated-at ordering,
requires the corresponding index, and rejects `TEMP B-TREE`
- `just fmt`

Zanie Blue · 2026-06-10 11:52:17 -05:00

2ef007dc1a

[codex] Outline ToolExecutor handler bodies (#27299 )

## Why

We're now [discouraging use of
`async_trait`](https://github.com/openai/codex/pull/20242).

Removing use of `async_trait` from `ToolExecutor` yields a `codex_core`
debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine.

For ease of reviewing, this is a prefactor to extract trait method
implementations to inherent methods. This will prevent changing
indentation from creating a huge diff.

## What

Outlined existing `ToolExecutor::handle` bodies into inherent async
`handle_call` methods across core and extension tool handlers.

The trait methods still use `async_trait` and now delegate to
`self.handle_call(...).await`; handler behavior is unchanged.

Adam Perry @ OpenAI · 2026-06-10 09:40:41 -07:00

db531b4a6c

Reduce archive rollout lookup CPU (#27276 )

## Why

Archiving a thread can spike app-server CPU when the state DB does not
have a usable rollout path. The archive path falls back to locating the
rollout by thread id; because rollout filenames already contain the
UUID, the cheap fallback should find the file directly before invoking
broader file search.

## What Changed

- In `codex-rs/rollout/src/list.rs`, try the exact rollout filename
lookup before `codex-file-search`.
- Keep fuzzy search as the final legacy fallback when no filename match
is found.
- Preserve the legacy fallback when the filename scan hits a traversal
error, so an inaccessible stale subtree does not block lookup elsewhere.

## Verification

- `just test -p codex-rollout`
- `just test -p codex-thread-store`
- `just test -p codex-app-server thread_archive`

Eric Traut · 2026-06-10 09:28:12 -07:00

c365b8a4ab

[codex] link Windows releases with LLD (#27315 )

Windows x64 release builds spend about 36.5 of 48 minutes in final
LLVM code generation and MSVC linking. Use the existing target-aware
MSVC
setup action to select LLD for release builds; the Windows ARM64 archive
path already exercises the action and its LLD wrapper.

In https://github.com/openai/codex/actions/runs/27242495616, macOS
becomes
the critical path after roughly four minutes of Windows improvement, so
this is expected to reduce total workflow time by about four minutes.

Tamir Duberstein · 2026-06-10 09:18:19 -07:00

24aee3eb5e

[codex] add io PathUri native conversion APIs (#27280 )

## Why

Discovered some rough edges in the API while making use of it more
widely within exec-server. It would be a lot more convenient for
existing users of `AbsolutePathBuf` if `PathUri` conversion methods
returned `std::io::Result`s.

## What

* `PathUri::to_native_path()` -> `PathUri::to_abs_path()`
* `PathUri::from_file_path()` -> `PathUri::from_abs_path()`

Adam Perry @ OpenAI · 2026-06-10 08:51:17 -07:00

0428e20a0b

[codex] Store compact window id in rollout (#27264 )

## Why

Compaction window identity is part of session history, not model-client
transport state. Persisting it with the compacted rollout item lets
resumed threads continue from the reconstructed window without keeping
mutable window state on `ModelClient`.

## What changed

- Added `window_id` to `CompactedItem` and stamp it when
`replace_compacted_history` installs compacted history.
- Moved auto-compact window id ownership into `AutoCompactWindow` /
`SessionState`; `ModelClient` now receives the request window id from
callers instead of storing it.
- Returned `window_id` from rollout reconstruction for resume.
Reconstruction uses the newest surviving compacted item's stored
`window_id` when present, and falls back to the legacy compacted-item
count when it is absent.
- Kept fork startup at the fresh default window id and updated direct
model-client tests to pass explicit test window ids.

## Validation

- `cargo check -p codex-core --tests`

pakrym-oai · 2026-06-10 08:47:16 -07:00

30ddb3325e

Use latest-wins MCP manager replacement (#27259 )

## Summary

We originally addressed startup prewarming holding the read side of
`RwLock<McpConnectionManager>` by snapshotting tool-list state. Review
feedback identified the broader ownership problem: the outer
synchronization should only publish or retrieve the current manager,
while MCP operations rely on the manager's internal synchronization. A
follow-up preserved operation retirement with a separate gate, but
further review questioned whether that synchronization was actually
required and whether we could support latest-wins replacement instead.

This PR now stores the current MCP manager in `ArcSwap`. Each operation
uses `load_full()` to obtain an owned `Arc<McpConnectionManager>`, then
performs MCP I/O without retaining the publication mechanism. Refresh
cancels obsolete startup work, constructs a replacement, and atomically
publishes it. New operations see the latest manager, while operations
that already loaded the previous manager retain a valid handle. Refresh
happens at a turn boundary, so there should be no active user tool calls
to drain.

Git history supports dropping the outer `RwLock`. It was introduced in
`03ffe4d595` on November 17, 2025 for non-blocking MCP startup: the
session published an empty manager, startup initialized that same object
while holding the write lock, and readers waited for initialization.
`7cd2e84026` on February 19, 2026 removed that two-phase initialization
in favor of constructing a fresh manager and swapping it in, explicitly
noting that `Option` or `OnceCell` could replace the placeholder design.
Hot reload later reused the existing lock to publish a replacement, but
I found no indication that the lock was introduced to guarantee
in-flight tool calls finish before refresh or shutdown.

Terminal shutdown remains separate from refresh: it aborts startup
prewarming and active tasks before shutting down the current manager, so
tool calls may be interrupted and no model WebSocket work continues
after shutdown. Focused regression coverage exercises pending tool-list
cancellation, deferred refresh, and startup-prewarm shutdown.

Charlie Marsh · 2026-06-10 08:33:21 -07:00

41b4fabbb4

Remove async-trait from extension contributors (#27383 )

## Why

Extension contributors are registered behind `dyn Trait` objects, so
native `async fn`/RPITIT methods would make these traits
non-object-safe. Spell out the boxed, `Send` future contract directly so
`extension-api` no longer needs `async-trait` while retaining the
existing runtime model.

## What changed

- add a shared `ExtensionFuture` alias and use it for asynchronous
contributor methods
- migrate production and test implementations to return `Box::pin(async
move { ... })`
- remove `async-trait` dependencies where they are no longer used,
keeping it dev-only where unrelated test executors still require it

## Behavior

No behavior change is intended. Contributor futures remain boxed,
`Send`, dynamically dispatched, and lazily executed; cancellation and
callback ordering stay unchanged.

## Testing

- `just test -p codex-extension-api` (11 passed)
- affected extension crates (64 passed)
- targeted `codex-core` contributor tests (14 passed)
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`

A broad local `codex-core` run compiled successfully but encountered
unrelated sandbox and missing test-binary fixture failures; CI will run
the full checks.

jif · 2026-06-10 14:31:09 +02:00

d2f6d23c6c

[codex] Tag multi-agent spawn metrics with version (#27375 )

## Summary
- tag legacy multi-agent spawn metrics with `version=v1`
- tag multi-agent v2 spawn metrics with `version=v2`

## Why
`codex.multi_agent.spawn` is emitted by both runtimes, so the existing
metric cannot distinguish v2 adoption from aggregate multi-agent
spawning. The bounded version tag makes that breakdown directly
queryable without changing the counter's success-only semantics.

## Validation
- `just fmt`
- `git diff --check`
- Tests and Clippy were intentionally left to CI.

jif · 2026-06-10 13:06:48 +02:00

ced1b8aa88

Use plugin-service MCP as the hosted plugin runtime (#27198 )

## Stack

- Base: #27191
- This PR is the third vertical and should be reviewed against
`jif/external-plugins-2`, not `main`.

## Why

#27191 moves the host-owned Apps MCP registration behind an extension
contributor, but deliberately preserves the existing endpoint-selection
feature while that contribution contract lands. App-server can therefore
resolve the server through extensions, yet the hosted plugin endpoint is
still selected through temporary `apps_mcp_path_override` plumbing.

That is not the long-term plugin model. A plugin can bundle skills,
connectors, MCP servers, and hooks, and those components do not all need
the same source or execution environment. In particular, an
authenticated HTTP MCP server can expose plugin capabilities directly
from a backend without an executor or an orchestrator filesystem.

This PR completes that hosted vertical. App-server's MCP extension now
owns the aggregate hosted plugin runtime at `/ps/mcp`. Connector actions
continue to arrive as MCP tools, while backend-provided skills arrive as
MCP resources and use Codex's existing resource list/read paths. No
second backend client, skill filesystem, or generic plugin activation
framework is introduced.

The backend route remains the hosted implementation. This change
replaces Codex's temporary endpoint-selection mechanism, not the service
behind the endpoint.

## What changed

### Hosted plugin runtime

The MCP extension now contributes `codex_apps` as the hosted plugin
runtime rather than as a configurable Apps endpoint:

- `https://chatgpt.com` resolves to
`https://chatgpt.com/backend-api/ps/mcp`;
- a bare custom ChatGPT base resolves to `/api/codex/ps/mcp`;
- the existing product-SKU header and ChatGPT authentication behavior
are preserved;
- executor availability is never consulted for this streamable HTTP
transport.

The same MCP connection carries both component shapes supported by the
hosted endpoint:

- connector actions are discovered and invoked as MCP tools;
- hosted skills are enumerated and read as MCP resources through the
existing `list_mcp_resources` and `read_mcp_resource` paths.

This keeps component access in the subsystem that already owns the
protocol instead of downloading backend skills into an orchestrator
filesystem or inventing a parallel hosted-skill client.

### Explicit runtime ordering

`McpManager` now resolves the reserved `codex_apps` entry in three
ordered phases:

1. install the legacy Apps fallback for compatibility;
2. apply ordered extension `Set` or `Remove` overlays;
3. apply the final ChatGPT-auth gate without synthesizing the server
again.

This ordering is important:

- an ordinary configured or plugin MCP server cannot claim the
auth-bearing `codex_apps` name;
- an extension-contributed hosted runtime wins over the fallback;
- an extension `Remove` remains authoritative;
- a host without the MCP extension retains the legacy Apps endpoint and
current local-only behavior.

The temporary `legacy_apps_mcp_loader_enabled` coordination flag is no
longer needed.

### Remove the path override

The `apps_mcp_path_override` feature and its runtime plumbing are
removed, including:

- the feature registry entry and structured feature config;
- `Config` and `McpConfig` fields;
- config schema output;
- config-lock materialization;
- URL override handling in `codex-mcp`.

Existing boolean and structured forms still deserialize as ignored
compatibility input. They are omitted from new serialized config, and
config-lock comparison normalizes the removed input so older locks
remain replayable.

### App-server coverage

App-server MCP fixtures now serve the hosted route at
`/api/codex/ps/mcp`. Existing resource-read and tool/elicitation flows
therefore exercise the extension-owned endpoint rather than succeeding
through the legacy fallback.

The stack also adds the missing `codex_chatgpt::connectors` re-export
for the manager-backed connector helper introduced in #27191.

## Compatibility

- App-server installs the extension and uses `/ps/mcp` for the hosted
runtime.
- CLI and other hosts that do not install the extension retain the
legacy Apps endpoint.
- Apps disabled or non-ChatGPT authentication removes `codex_apps` from
the effective runtime view.
- Existing local plugins, local skills, executor-selected skills,
configured MCP servers, and MCP OAuth behavior are otherwise unchanged.
- Backend plugin enablement remains account/workspace state owned by the
hosted endpoint; this PR does not add thread-local backend plugin
selection.

## Architectural fit

The stack now proves two independent runtime shapes:

1. #27184 resolves filesystem-backed skills through the executor that
owns a selected root.
2. #27191 and this PR resolve a backend-hosted HTTP MCP through an
extension with no executor.

Together they preserve the intended separation:

- selection identifies a plugin/root when explicit selection is needed;
- each component's owning extension resolves its concrete access
mechanism;
- execution stays with the runtime required by that component;
- existing skills, MCP, connector, and hook subsystems remain the
downstream consumers.

## Planned follow-ups

1. **Executor stdio MCP:** selecting an executor plugin registers a
manifest-declared stdio MCP server and executes it in the environment
that owns the plugin.
2. **Optional backend selection:** only if CCA needs thread-local
selection distinct from backend account/workspace enablement, add a
concrete backend-owned capability location and surface those selected
skills through the skills catalog.
3. **Connector metadata and hooks:** activate those plugin components
through their existing owning subsystems, with executor hooks remaining
environment-bound.
4. **Propagation and persistence:** define explicit resume, fork,
subagent, refresh, and environment-removal semantics once selected roots
have multiple real consumers.
5. **Local convergence:** migrate legacy local skill, MCP, connector,
and hook paths behind their owning extensions one vertical at a time,
then remove duplicate core managers and compatibility plumbing after
parity.

## Verification

Coverage in this change exercises:

- extension-owned `/backend-api/ps/mcp` registration without an
executor;
- preservation of the legacy endpoint in hosts without the extension;
- extension `Set` and `Remove` precedence over the legacy fallback;
- ChatGPT-auth gating for the reserved server;
- hosted MCP resource reads with and without an active thread;
- connector tool invocation and MCP elicitation through the hosted
route;
- ignored boolean and structured forms of the removed path override;
- config-lock replay compatibility for the removed feature.

`cargo check -p codex-features -p codex-mcp-extension -p
codex-app-server` passes. Tests and Clippy were not run locally under
the current development instruction; CI provides the full validation
pass.

jif · 2026-06-10 12:54:21 +02:00

9cd11e9e62

feat: keep child MCP warnings out of parent transcript (#27174 )

## Why

MCP startup status notifications are thread-owned, but `ChatWidget`
trusted upstream routing. If routing state delivered a tagged child
notification to the active parent widget, the child MCP failure could
still mutate the parent's startup state and transcript. Rejecting it
only inside the MCP handler was also too late because shared
notification handling could already restore and consume the parent's
retry status.

## What changed

- Validate a tagged MCP status notification against the visible
`ChatWidget` thread before shared notification handling mutates any
parent state.
- Cover child `Starting` and `Failed` notifications delivered to a
retrying parent widget, asserting that they preserve its visible retry
error and saved status header while producing no history or MCP status
mutation.

## User impact

Subagent MCP startup failures remain scoped to the child transcript
instead of appearing as duplicate warnings in the parent transcript.

## Testing

- `just test -p codex-tui mcp_startup_ignores_status_for_other_thread`
- `just test -p codex-tui
primary_thread_ignores_child_mcp_startup_notifications`
- `just fmt`

jif · 2026-06-10 11:45:49 +02:00

0ffcefaf3d

[codex] Make MCP connection startup fallible (#27261 )

## Why

Required MCP server startup was enforced in `Session::new` after
`McpConnectionManager` had already created the clients. That split let
other manager construction paths bypass the same requirement and exposed
manager internals solely so the session could validate them. Keeping
required-server readiness in the constructor gives every caller one
consistent startup contract.

## What changed

- make `McpConnectionManager::new` return `anyhow::Result<Self>` and
fail when an enabled, required server cannot initialize
- pass the startup cancellation token into the constructor so
required-server waits remain cancellable
- propagate constructor failures through resource reads, connector
discovery, and MCP status collection
- preserve the active manager and cancellation token when a refreshed
replacement fails
- keep required-startup failure collection private and cover the
constructor error contract directly

## Validation

- updated the focused connection-manager test to assert the complete
required-server startup error
- local tests not run; relying on CI

Ahmed Ibrahim · 2026-06-10 00:17:58 -07:00

a7b6baecc6

Add spans to run_turn (#27107 )

## Why
Codex app-server latency traces do not granularly cover turn
orchestration, sampling-request preparation, and tool-loading work.
These spans help separate local coordination/setup costs from model
streaming and tool execution.

## What changed
- Add `run_turn.*` spans around sampling-request input preparation and
post-sampling state collection
- Add function-level trace spans around turn setup, hook execution,
compaction, prompt construction, and MCP tool exposure
- Add `built_tools.*` spans around plugin loading and discoverable-tool
loading

## Verification
Trigger Codex rollout and observe new spans are included

mchen-oai · 2026-06-10 04:41:06 +00:00

00a25e1e0c

[codex] Fix post-merge analytics integration failures (#27285 )

## Why

Recent merges left `main` with analytics integration build failures.
Local Cargo runs also made the trimmed-skills test depend on
developer-installed skills, while Bazel used an isolated home.

## What changed

- Clone `thread_metadata.thread_source` when constructing goal analytics
event parameters.
- Group app-server thread extension inputs into
`ThreadExtensionDependencies`.
- Isolate the trimmed-skills test home so its exact fixture count is
stable across Cargo and Bazel.

## Validation

- `cargo check -p codex-analytics`
- `just test -p codex-analytics` (71 tests)
- `just test -p codex-app-server` (837 tests; one unrelated zsh-fork
timeout passed on retry)

Adam Perry @ OpenAI · 2026-06-09 20:52:09 -07:00

e0cb4ede4e

[codex-analytics] emit goal lifecycle analytics (#27078 )

## Why
- Currently, there is no analytics event for `/goal` behavior
- Existing events cannot identify goal execution or its resulting
outcome
- The original update in
[#26182](https://github.com/openai/codex/pull/26182) was implemented
before `/goal` moved into `codex-goal-extension`.

## What Changed
- Adds `codex_goal_event` serialization and enrichment to
`codex-analytics`
- Emits goal events from the canonical `codex-goal-extension` mutation
and accounting paths:
  - `created` when a new logical goal is persisted
  - `usage_accounted` when cumulative goal usage is persisted
  - `status_changed` when the stored goal status changes
  - `cleared` when the goal is deleted
- Preserves causal `turn_id` for turn driven events and uses null
attribution for external or idle lifecycle events
- Changes goal deletion to return the deleted row so `cleared` retains
the stable goal ID

## Event Details

Includes standard analytics metadata along with goal specific fields:
- `goal_id`: Stable ID stored in the local SQLite goal row and shared
across the goal's events
- `event_kind`: Observed operation (see the 4 lifecycle events cited in
the above bullet)
- `goal_status`: Resulting or last stored status: `active`, `paused`,
`blocked`, `usage_limited`, etc.
  - `has_token_budget`: Indicates whether a token budget is configured
  - `turn_id`: Causal turn ID, or null when no causal turn exists
- `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted`
events; null otherwise
- `cumulative_time_accounted_seconds`: Cumulative active time on
`usage_accounted` events; null otherwise

## Validation
- `just test -p codex-analytics -p codex-state -p codex-goal-extension`
- `just test -p codex-core -E 'test(/goal/)'`
- `just test -p codex-app-server`
- `cargo build -p codex-analytics -p codex-core -p codex-state -p
codex-app-server`

marksteinbrick-oai · 2026-06-09 18:45:54 -07:00

608b8b1cc6

7327 Commits