mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
e614fad02e2ac2d3acf6513c7aa29d180de028cd
7327 Commits
-
[codex] Add comp_hash to model metadata (#27532)
## Summary - add optional `comp_hash` metadata to `ModelInfo` - update `ModelInfo` fixtures for the shared schema change - keep older model responses compatible by defaulting the field to `None` ## Why The models endpoint needs an opaque identifier for compaction-compatible model configurations. This PR only exposes that value in model metadata; it does not add it to turn context or change runtime behavior. Follow-up #27520 carries the value through turn context and rollouts, then uses it to trigger compaction. ## Stack - based directly on `main` - replaces #27519, which was accidentally merged into the wrong base branch - functionality follow-up: #27520 ## Testing - `just test -p codex-protocol model_info_defaults_availability_nux_to_none_when_omitted` - `just fix -p codex-core -p codex-protocol -p codex-analytics -p codex-models-manager`
Ahmed Ibrahim ·
2026-06-10 20:42:55 -07:00 -
feat: add Bedrock API key as a managed auth mode (#27443)
## Why Codex needs to manage Amazon Bedrock API key credentials through the existing auth lifecycle instead of introducing a separate auth manager or provider-specific credential file. Treating Bedrock API key login as a primary auth mode gives it the same persistence, keyring, reload, and logout behavior as the existing OpenAI API key and ChatGPT modes. The credential is valid only for the `amazon-bedrock` model provider. OpenAI-compatible providers must reject this auth mode rather than treating the Bedrock key as an OpenAI bearer token. ## What changed - Added `bedrockApiKey` as an app-server `AuthMode` and `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode. - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or the configured keyring backend. - Added `login_with_bedrock_api_key(...)`, parallel to `login_with_api_key(...)`, which replaces the current stored login with Bedrock credentials. - Reused generic auth reload and logout behavior instead of adding a Bedrock-specific auth manager or logout path. - Updated login restrictions, status reporting, diagnostics, telemetry classification, generated app-server schemas, and auth fixtures for the new mode. - Added explicit errors when Bedrock API key auth is selected with an OpenAI-compatible model provider. This PR establishes managed storage and auth-mode behavior. Routing the managed key and region into Amazon Bedrock requests will be in follow-up PRs.
Celia Chen ·
2026-06-10 20:42:38 -07:00 -
[codex] Add new context window tool (#27488)
## Why The token budget feature tells the model how much room remains in the current context window. When the model decides the current window is no longer useful, it needs a way to ask Codex to start over with a fresh context window without spending tokens on a compaction summary. This PR adds that model-requestable escape hatch on top of #27438. ## What changed - Added a direct-model-only `new_context` tool behind `Feature::TokenBudget`. - Stores the tool request on `AutoCompactWindow` and consumes it after sampling so the next follow-up request in the same turn starts in the new window. - Starts the new window as a no-summary compaction checkpoint that contains only fresh initial context, not preserved conversation history. - Keeps the new window aligned with token-budget startup context, including the `Current context window Z` message. - Added integration coverage and a snapshot showing the same-turn `new_context` flow into a fresh full-context follow-up request. ## Validation - `just test -p codex-core token_budget`
pakrym-oai ·
2026-06-11 03:39:07 +00:00 -
tools: simplify default tool search text (#27526)
## Why Default tool search text currently derives identity from both `ToolName` and `ToolSpec`. For function and namespace specs, this indexes the same names more than once and also adds a flattened `{namespace}{name}` token that is not model-visible. ## What changed - Derive default search text entirely from `ToolSpec` while preserving names, descriptions, namespace metadata, and recursive schema metadata. - Keep the default search-text builder private and remove the unused `ToolName` argument. - Add coverage for the exact search text generated for a namespaced tool with nested schema metadata. ## Example For the `codex_app` namespace and `automation_update` tool (schema terms omitted): - Before: `codex_appautomation_update automation update codex_app codex_app Manage Codex automations. automation_update automation update ...` - After: `codex_app Manage Codex automations. automation_update automation update ...` ## Testing - `just test -p codex-tools`sayan-oai ·
2026-06-11 03:37:25 +00:00 -
[codex] Expand hosted web search citation guidance (#27501)
## Summary - Expand the hosted web search prompt with explicit Markdown-link citation guidance. - Keep internal `turnX` reference IDs out of final responses and place citations next to supported claims. ## Context https://openai.slack.com/archives/C0AU83S0ZQU/p1781133381448499?thread_ts=1780352049.512299&cid=C0AU83S0ZQU ## Test plan - Confirmed `codex-rs/ext/web-search/web_run_description.md` exactly matches the supplied target prompt. - `UV_CACHE_DIR=/tmp/codex-uv-cache PATH=/tmp/codex-just/bin:/home/dev-user/.rustup/toolchains/1.95.0-x86_64-unknown-linux-gnu/bin:$PATH python3 scripts/format.py --check` - `git diff --check`
yuning-oai ·
2026-06-11 03:30:44 +00:00 -
[codex] Add token budget context feature (#27438)
## Why The model should be able to see bounded context-window budget metadata when the `token_budget` feature is enabled. The full-window message is only injected with full context, while normal turns get a smaller follow-up only when reported usage first crosses a budget threshold. ## What changed - Added the `TokenBudget` feature flag. - Added `<token_budget>` developer fragments for full context-window metadata and current-window remaining tokens. - Inserted the threshold message during normal turn handling by comparing token usage before and after sampling, avoiding persistent threshold bookkeeping. - Added core integration coverage for full-context-only metadata and 25/50/75 percent threshold messages. ## Verification - `just test -p codex-core token_budget` - `git diff --check`
pakrym-oai ·
2026-06-10 20:07:06 -07:00 -
Trim TUI legacy telemetry and migration dependencies (#27487)
## Why The TUI still reached through `codex-app-server-client::legacy_core` for process telemetry setup and personality migration, exposing core-only details after the TUI moved onto the app-server layer. This is part of our ongoing efforts to whittle away at the legacy_core shim that was left over after migrating the TUI to the app server. This change is just a refactor/rename and should be behavior-neutral and low risk. ## What changed - expose OTEL provider construction through the app-server client and keep the small process/SQLite telemetry adapters local to the TUI - collapse personality migration results to the config-reload decision the TUI needs - remove the `legacy_core::otel_init` and `legacy_core::personality_migration` subnamespaces
Eric Traut ·
2026-06-10 19:50:57 -07:00 -
core: resize all history images behind a feature flag (#27247)
## Summary Adds complete client-side image preparation behind the default-off `resize_all_images` feature flag. When enabled, local image producers defer decoding and resizing. Images are prepared centrally before insertion into conversation history, covering user input, `view_image`, and structured tool-output images. ## Behavior - Processes base64 `data:` images in messages and function/custom tool outputs. - Leaves non-data URLs, including HTTP(S) URLs, unchanged. - Applies image-detail budgets: - `high` and omitted: 2048px maximum dimension and 2.5K 32px patches. - `original`: 6000px maximum dimension and 10K 32px patches. - `auto`: uses the same 2048px / 2.5K-patch budget as high. - `low`: unsupported and replaced with an actionable placeholder. - Preserves original image bytes when no resize or format conversion is needed. - Enforces the shared 1 GiB encoded and decoded data-URL sanity limits. - Replaces only an image that fails preparation, preserving sibling content and tool-output metadata. - Uses bounded placeholders distinguishing generic processing failures, oversized images, and unsupported `low` detail. - Prepares resumed and forked history before installing it as live history without modifying persisted rollouts. ## Flag-Off Behavior When `resize_all_images` is disabled: - Existing local user-input and `view_image` processing remains unchanged. - Existing decoding and error behavior remains unchanged. - Arbitrary tool-output images are not processed. - HTTP(S) image URLs continue to be forwarded unchanged. #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/27245 - 👉 `2` https://github.com/openai/codex/pull/27247 - ⏳ `3` https://github.com/openai/codex/pull/27246 - ⏳ `4` https://github.com/openai/codex/pull/27266
Curtis 'Fjord' Hawthorne ·
2026-06-10 19:21:24 -07:00 -
Add session delete commands in CLI and TUI (#27476)
## Summary The app server exposes `thread/delete`, but users cannot invoke it from the CLI or TUI. Because deletion is irreversible, the user-facing commands need deliberate confirmation and safer handling of name-based targets. - Add `codex delete <SESSION>` with interactive confirmation, restricting `--force` to UUID targets. - Resolve exact names across active and archived sessions, including renamed sessions, and validate prompted UUID targets before confirmation. - Add a `/delete` command with a confirmation popup that warns the current session and its subagent threads will be permanently deleted. ## Manual testing - Deleted by UUID with `--force` and verified the rollout, session-index entry, and database row were removed. - Exercised name-based confirmation for both cancellation and affirmative deletion; cancellation preserved the session and confirmation removed it. - Verified deletion refuses to proceed without `--force`, while `--force` rejects names, including duplicate names. - Verified duplicate-name confirmation displays the concrete UUID selected. - Deleted an archived session by name. - Verified an already-missing UUID fails before displaying a confirmation prompt. - Exercised `/delete` in the TUI: the popup defaults to No, cancellation preserves the session, and confirmation deletes the session and exits. - Verified that `codex delete` works for both archived and non-archived sessions.
Eric Traut ·
2026-06-10 18:04:02 -07:00 -
Remove TUI legacy core test_support dependencies (#27484)
## Why The TUI now sits on the app-server layer, but `app-server-client::legacy_core` still exposed core test helpers solely for TUI tests. We've been whittling away the remaining dependencies. This is the next step on that journey. There is no functional change — just a refactor, and this affects only test code, so it should be low risk. ## What changed - remove the `legacy_core::test_support` re-export and call model-manager test helpers directly - keep the bundled model-preset cache local to TUI test support - import constraint types directly from `codex-config`
Eric Traut ·
2026-06-10 17:55:49 -07:00 -
[codex] Remove redundant plugin app auth state (#27465)
## Summary - remove the redundant `needsAuth` field from `AppSummary` and generated app-server schemas - stop `plugin/read` from querying Apps MCP solely to hydrate unused connector auth state - preserve `plugin/install.appsNeedingAuth` membership and `app/list.isAccessible` as the authentication signals ## Why Codex App and TUI do not consume `plugin/read.plugin.apps[].needsAuth`. Hydrating it could establish an Apps MCP connection and discover tools on a cold `plugin/read` request, adding avoidable latency. The plugin APIs are still marked under development, so removing this wire field is preferable to retaining a misleading default. ## Verification - `just write-app-server-schema` - `just fmt` - `just test -p codex-app-server-protocol` - `just test -p codex-app-server plugin_install_uses_remote_apps_needing_auth_response` - `just test -p codex-app-server plugin_install_returns_apps_needing_auth` - `just test -p codex-app-server plugin_read_returns_plugin_details_with_bundle_contents` - `just test -p codex-tui plugin_detail_popup_snapshot_shows_install_actions_and_capability_summaries` - `$xin-build` simplify and debug reviews
xl-openai ·
2026-06-10 17:33:56 -07:00 -
core: cache turn diff rendering (#27489)
## Summary Turn diff updates repeatedly rendered and serialized the entire accumulated diff after every `apply_patch`. The event path also rendered once before updating the tracker solely to test whether a diff existed. In production feedback CODEX-20PW, 2,589 patches across 72 paths produced 401 notifications totaling 441 MB, with the hottest paths patched 518 and 495 times. This change: - replaces the pre-update render with a cheap cached-state check - caches each rendered file diff by path and content revision, so an update only invokes Myers for affected paths - caches the deterministic aggregate diff so event emission and turn completion reuse it without recomputation - preserves invalidation and net-zero clear notifications - applies a 100 ms per-file `similar` timeout; ordinary files complete far below this threshold, while pathological rewrites fall back to a coarse unified hunk that still represents the exact final contents The 100 ms deadline bounds synchronous tool-completion latency while leaving substantial headroom for normal diffs. The regression test applies the fallback diff through the repository's patch parser and verifies byte-for-byte final contents. ## Validation - `cargo test -p codex-core turn_diff_tracker::tests` (14 passed) - `cargo test -p codex-core tools::events::tests` (4 passed) - `just fix -p codex-core` - `just fmt` Focused coverage verifies that 42 updates across two files perform 42 file renders rather than repeatedly rendering the accumulated set, unchanged paths are not re-diffed, clear events remain correct, and a 48,000-line near-total rewrite returns promptly and applies to the exact expected result. The full `codex-core` suite was not used as the final gate because an unrelated existing multi-agent test hit a stack overflow when run during investigation. ## Bug context - Sentry feedback: CODEX-20PW - Correlation IDs: `019eb2a9-13d2-74e0-b690-27ee224ffb6d`, `019e9ad7-09c3-7cb2-b728-ee3acba103ab`
Jeremy Rose ·
2026-06-10 17:17:44 -07:00 -
[codex] Preserve build-script dependencies in rules_rs annotations (#27322)
## Why Bazel compiles Cargo build scripts in the exec configuration. For `openssl-sys`, that means the target-specific optional `openssl-src` dependency can disappear when producing musl release binaries, even though the build script still needs the vendored source crate. ## What changed Patch `rules_rs` to expose its existing unconditional `build_script_deps` input through `crate.annotation`, then annotate `openssl-sys` with the pinned `openssl-src` target. Target-derived build dependencies continue to use the existing selected dependency path. ## Validation - `just bazel-lock-check` Stack: 2 of 6. Follows #27321.
Adam Perry @ OpenAI ·
2026-06-10 17:08:35 -07:00 -
[codex-analytics] emit internally started turn events (#27392)
## Why Currently, the analytics reducer omits `codex_turn_event` for internally started subagent turns - It uses `TurnState.connection_id` to select app-server client and runtime metadata - `turn/start` sets this field for client-started turns, while internal subagent turns bypass that path - Spawned child threads inherit the correct connection, but turn emission does not use thread state ## What Changed - Keeps explicit `TurnState.connection_id` authoritative for client-started turns - Falls back to the matching thread’s inherited connection when the turn connection is absent - Preserves completeness gates, event schema, and post-emission state removal - Extends subagent lifecycle test coverage ## Verification - `just test -p codex-analytics` (71 tests passed) - `just fix -p codex-analytics` - `just fmt`
marksteinbrick-oai ·
2026-06-10 15:35:41 -07:00 -
image: add shared data URL preparation utilities (#27245)
## Summary Add shared image-processing primitives needed for centralized image preparation in a follow-up PR. - Add `load_data_url_for_prompt` for decoding and preparing base64 image data URLs. - Add configurable maximum-dimension and 32px patch-budget resizing. - Enforce a 1 GiB sanity limit on both encoded and decoded data-URL representations. - Preserve original PNG, JPEG, and WebP bytes when resizing is unnecessary. - Preserve the existing GIF-to-PNG behavior. - Move image utility tests into the existing sidecar test module. ## Behavior This PR is intended to be runtime behavior-preserving. Existing production callers continue using `PromptImageMode::ResizeToFit` and `PromptImageMode::Original` with their existing semantics. The new data-URL entrypoint and configurable resize mode have no production callers in this PR; they are used by the next PR in the stack. This PR does not change user-input handling, `view_image`, history insertion, request construction, HTTP image URL forwarding, or app-server behavior. #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/27245 - ⏳ `2` https://github.com/openai/codex/pull/27247 - ⏳ `3` https://github.com/openai/codex/pull/27246 - ⏳ `4` https://github.com/openai/codex/pull/27266
Curtis 'Fjord' Hawthorne ·
2026-06-10 15:27:34 -07:00 -
[codex] Add reusable OTEL gauge instruments (#27057)
## Why Exec-server observability needs current-value measurements in addition to counters. The reusable OTEL client should expose that primitive without coupling it to exec-server runtime behavior. ## What changed - Adds integer gauge instruments, with optional descriptions. - Caches gauges by name and description so instrument metadata remains part of the declaration identity. - Covers gauge values, descriptions, merged attributes, and OTLP HTTP export. This PR only adds the gauge primitive. It does not add second-based duration histograms or exec-server adoption. ## Stack 1. #26091: counter descriptions 2. **#27057: gauge instruments** 3. #27058: second-based duration histograms Related independent coverage: #27059 tests OTLP HTTP log and trace event export. ## Validation - `just test -p codex-otel` - `just fix -p codex-otel` - `just fmt`
richardopenai ·
2026-06-10 21:36:38 +00:00 -
Forward standalone assistant output to realtime (#27319)
## Why When a realtime session is open without an active frontend-model handoff, completed Codex assistant messages are currently dropped. That prevents the frontend model from hearing orchestrator preambles and final responses produced by typed turns or other non-handoff work, which makes the two models present as disconnected personas. Active handoffs already forward each completed assistant message, including preambles. This change leaves those V1 and V2 paths intact and fills only the no-active-handoff gap. ## What changed - Send standalone V1 assistant messages through `conversation.handoff.append` with a stable synthetic handoff ID - Send standalone V2 assistant messages as normal `[BACKEND]` `conversation.item.create` message items, then enqueue `response.create` so the frontend model responds - Preserve the existing active V1 and V2 transport and completion behavior - Continue excluding user messages from realtime mirroring - Skip empty output and cap each complete context injection, including its V2 prefix, at 1,000 tokens - Add end-to-end coverage for both wire formats, V2 response creation, preambles, final responses, and truncation ## Test plan - CI
guinness-oai ·
2026-06-10 21:32:29 +00:00 -
[codex] reuse release artifacts for npm staging (#27312)
The release job already downloads every workflow artifact into `dist`, but npm staging creates a new cache and downloads the six target artifacts again. Reuse `dist` as the staging script's artifact cache while preserving the existing download fallback for missing artifacts and standalone callers. The script retains ownership of temporary caches but does not delete a caller-provided directory. In https://github.com/openai/codex/actions/runs/27242495616, the duplicate download transferred 3.3 GiB and took 4 minutes 13 seconds. This should reduce total release time by about 4 minutes.
Tamir Duberstein ·
2026-06-10 13:15:43 -07:00 -
[codex] Preserve disabled MCP servers across runtime overlays (#27414)
## Why Recent MCP runtime overlay changes replace same-name configured server entries with compatibility or extension-provided configs. Those replacement configs default to enabled, so an MCP server explicitly configured with `enabled = false` could be initialized anyway. The connection manager still filters disabled servers correctly, but the configured disabled state was lost before initialization reached that filter. ## What changed - Remember MCP servers that are disabled in the configured view before applying runtime fallbacks and extension overlays. - Restore `enabled = false` for those servers after overlays, while leaving all other overlay fields and `Remove` precedence unchanged. - Add focused extension-backed regression coverage for a disabled `codex_apps` server. ## Testing - `just fmt` - `just test -p codex-mcp-extension` - `just fix -p codex-core` - `just fix -p codex-mcp-extension` The full workspace `just test` suite was not run.
e-provencher ·
2026-06-10 16:11:20 -04:00 -
[codex] Skip local curated discovery for remote plugins (#27311)
## Summary - skip the local `openai-curated` marketplace before marketplace loading when tool-suggest discovery uses remote plugins - preserve existing marketplace listing behavior for all other callers and when remote plugins are disabled - add regression coverage proving the curated marketplace is excluded before its malformed manifest can be read ## Why Tool-suggest discovery previously loaded every local `openai-curated` plugin manifest and only discarded that marketplace afterward when remote plugins were enabled. The remote catalog is used in that mode, so the local scan consumed CPU without contributing discoverable plugins. ## Impact Remote-plugin tool suggestion discovery no longer reads the local curated marketplace and its plugin manifests. `openai-bundled`, configured marketplaces, normal `plugin/list` behavior, and local curated discovery when remote plugins are disabled are unchanged. ## Validation - `just test -p codex-core-plugins list_marketplaces_can_skip_openai_curated_before_loading` - `just test -p codex-core list_tool_suggest_discoverable_plugins_omits_openai_curated_when_remote_enabled` - `just fmt` - `git diff --check`
xl-openai ·
2026-06-10 13:11:09 -07:00 -
[codex] add /import for external agents (#27071)
## Why External-agent import should be discoverable and deliberate without blocking startup or claiming the public `codex [PROMPT]` CLI namespace. The slash command keeps the flow local to the interactive TUI and reuses the existing app-server import API. ## What changed - add the user-facing `/import` slash command - detect external-agent importable items only when the command is invoked - run imports through the embedded local app-server - show start and completion messages, refresh configuration, and block duplicate imports while one is pending - reject the flow for unsupported remote and local-daemon sessions ## Validation - `just test -p codex-tui external_agent_config_migration` (10 passed) - manually exercised an isolated TUI fixture with existing external-agent setup and session data using a fresh `CODEX_HOME` - verified picker customization, plugin and session detection, import completion, repeated invocation, and imported-session resume context - the broader `just test -p codex-tui` run passed 2,805 tests, with 2 unrelated guardian feature-flag failures and 4 skipped tests ## Draft follow-ups - review whether completion messaging should remain attached to the initiating chat if the user switches chats during an import - review shutdown semantics for an in-progress background import ## Stack 1. [#27064](https://github.com/openai/codex/pull/27064): remove the startup migration flow 2. [#27065](https://github.com/openai/codex/pull/27065): extract the picker renderer 3. [#27070](https://github.com/openai/codex/pull/27070): add the external-agent import picker UX 4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow through `/import` **This PR is stack item 4.** Draft while the lower stack dependencies are reviewed.
stefanstokic-oai ·
2026-06-10 15:53:15 -04:00 -
[codex] Move release platform rules into bazel package (#27321)
## Intent Keep release-specific Bazel helpers out of the shared Rust crate definitions and colocate them with Bazel platform configuration. ## Implementation Moves `multiplatform_binaries` and its platform list from `defs.bzl` into `bazel/platforms/release_binaries.bzl` and updates the CLI load site. Behavior is unchanged. ## Validation - `bazel query //codex-rs/cli:release_binaries` Stack: 1 of 6.
Adam Perry @ OpenAI ·
2026-06-10 19:45:29 +00:00 -
[codex] add external agent import picker UX (#27070)
## Why Users need to understand what external-agent data Codex detected, what is selected, and how to proceed before an import begins. The updated picker makes focus, selection state, and the submission path explicit while preserving the existing import backend. ## What changed - replace the old migration prompt with a two-step external-agent import picker - add a customize view with explicit item focus, selection state, counts, and a review action - separate detected import data into a view model - add Unix and Windows snapshots for prompt, item-focus, and action-focus states ## Validation - `just test -p codex-tui external_agent_config_migration` (10 passed) - manually exercised an isolated TUI fixture covering customization, selection toggles, review, import, repeated invocation, and session resume - the broader `just test -p codex-tui` run passed 2,805 tests, with 2 unrelated guardian feature-flag failures and 4 skipped tests ## Review note This is the largest layer in the stack because the interaction state, rendering changes, and required snapshots move together. It remains a draft in case reviewers prefer a further presentation/state split. ## Stack 1. [#27064](https://github.com/openai/codex/pull/27064): remove the startup migration flow 2. [#27065](https://github.com/openai/codex/pull/27065): extract the picker renderer 3. [#27070](https://github.com/openai/codex/pull/27070): add the external-agent import picker UX 4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow through `/import` **This PR is stack item 3.** Draft while the lower stack dependencies are reviewed.
stefanstokic-oai ·
2026-06-10 15:19:37 -04:00 -
Guard core test subprocess cleanup (#27343)
## Why Local integration-heavy `codex-core` CLI tests can time out or be interrupted after spawning `codex exec`. Stopping only the direct child is not enough: `codex exec` can leave grandchildren behind, including `python3`/`python3.12` processes that get reparented to PID 1 and keep running after the test is gone. This PR fixes that failure mode directly for the affected CLI integration tests, without changing production code or reducing local test concurrency. ## What - Run the `cli_stream` `codex exec` subprocesses through a small private wrapper in `core/tests/suite/cli_stream.rs`. - Spawn those subprocesses in their own process group before execution. - Keep `.output()`-style stdout/stderr capture and the existing 30-second timeout behavior. - Own each spawned process with a drop guard that kills the whole process group on success, timeout, panic, or other early return. The switch from `assert_cmd::Command` to `std::process::Command` is only for these subprocess launches; `assert_cmd` does not expose a pre-spawn hook for setting the process group. ## Verification - `just test -p codex-core --test all responses_mode_stream_cli` This is limited to core integration tests; it does not change production `src` code paths.
Eric Traut ·
2026-06-10 12:19:26 -07:00 -
feat: make ThreadStore available on ThreadExtensionDependencies (#27439)
Generally useful for extensions.
Michael Bolin ·
2026-06-10 15:17:15 -04:00 -
[plugins] Inject remote_plugin_id into install elicitations (#26409)
Summary - Propagate cached remote plugin IDs through Codex plugin discovery. - Inject `remote_plugin_id` and connector IDs into `request_plugin_install` elicitation `_meta` from the resolved plugin. - Keep the remote plugin ID out of the model-facing tool schema, arguments, and result. Validation - `just test -p codex-tools` - `just test -p codex-core-plugins` - `just test -p codex-core list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins` - `just fix -p codex-tools` - `just fix -p codex-core-plugins` - `just fix -p codex-core` - `git diff --check` - `just test -p codex-core` was also attempted: 2,581 passed, 55 failed, and 1 timed out across unrelated sandbox/environment-sensitive integration tests.
Alex Daley ·
2026-06-10 12:01:03 -07:00 -
[codex] extract external agent import picker renderer (#27065)
## Why The external-agent import picker is easier to review when its rendering refactor lands separately from new state and interaction behavior. This layer is intended to be behavior-neutral. ## What changed - extract external-agent migration rendering into a dedicated `render` module - preserve existing behavior while separating presentation from interaction logic - establish a smaller foundation for the import picker UX in the next PR ## Validation - `just test -p codex-tui external_agent_config_migration` (10 passed) ## Stack 1. [#27064](https://github.com/openai/codex/pull/27064): remove the startup migration flow 2. [#27065](https://github.com/openai/codex/pull/27065): extract the picker renderer 3. [#27070](https://github.com/openai/codex/pull/27070): add the external-agent import picker UX 4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow through `/import` **This PR is stack item 2.** Draft while the lower stack dependency is reviewed.
stefanstokic-oai ·
2026-06-10 14:48:30 -04:00 -
[codex] Retry transient Guardian review failures (#27062)
## Background Codex can use **Auto Review** for permission requests. Instead of asking the user immediately, Codex starts a separate locked-down reviewer session called **Guardian**, which returns a structured `allow` or `deny` assessment. The Guardian reviewer is itself a Codex session, so its model request can fail for transient infrastructure reasons such as model overload, HTTP connection failure, or response-stream disconnect. Today, any such failure immediately ends the Auto Review attempt and blocks the action. This PR adds bounded retries for failures that the existing protocol explicitly identifies as transient. Linear context: [CA-539](https://linear.app/openai/issue/CA-539/retry-auto-review-infrastructure-failures-and-fall-back-to-manual) ## What changes A Guardian review can now make at most **three total attempts**: 1. Run the review normally. 2. Retry after a jittered delay of roughly 180–220 ms if the first attempt fails with an eligible error. 3. Retry after a jittered delay of roughly 360–440 ms if the second attempt also fails with an eligible error. All attempts share the original review deadline. Jitter spreads retries from concurrent clients to reduce synchronized load during broader outages. The retries do not reset the user's maximum wait time, and the backoff waits terminate early if the review is cancelled or the deadline expires. Before retrying, the existing Guardian session lifecycle decides whether the session remains usable. Healthy trunks are reused, broken trunks are removed by the existing cleanup path, and ephemeral sessions continue to clean themselves up. The review still emits one logical lifecycle to clients. Recoverable intermediate failures do not produce warnings or terminal events. ## Retry policy ### Retried up to twice - model/server overload - HTTP connection failure - response-stream connection failure - response-stream disconnect - internal server error - a final reviewer message that cannot be parsed as the required Guardian assessment ### Not retried - bad or invalid requests - authentication failures - usage limits - cyber-policy failures - errors without a structured category - a request that already exhausted the lower-level Responses retry budget - a completed Guardian turn with no assessment payload - prompt-construction failures - Guardian review timeout - cancellation or abort - a valid `deny` assessment The session-error classification uses `ErrorEvent.codex_error_info`; it does not inspect error-message strings. ## Implementation notes - `wait_for_guardian_review` preserves the complete `ErrorEvent`, including structured `codex_error_info`. - Guardian session failures preserve the original message and optional structured `CodexErrorInfo`. - The retry policy classifies the explicitly transient `CodexErrorInfo` variants; unknown, absent, and deterministic categories are not retried. - The Guardian session manager receives the caller's deadline rather than creating a new timeout per attempt. - Analytics record the final `attempt_count`. - Retry orchestration does not add a separate session-cleanup protocol; it relies on the existing trunk and ephemeral lifecycle decisions. ## Automated testing Focused Guardian coverage verifies: - every supported transient `CodexErrorInfo` is classified as retryable, while absent and non-transient categories are not; - structured transient session failure -> retry -> approval with the healthy trunk reused; - two invalid Guardian responses -> third attempt -> approval, with exactly three requests; - three invalid responses -> existing fail-closed result, with exactly three requests and one terminal lifecycle; - valid denial, missing payload, invalid request, timeout, cancellation, and prompt/session construction failures are not retried; - retry eligibility ends after the third attempt; - retry delays use the shared exponential backoff helper and remain within the expected jitter bounds; - cancellation and deadline expiry interrupt the backoff wait; - healthy trunks are reused across retryable failures; - broken event streams remove the trunk through the existing lifecycle cleanup; - an ephemeral retry does not disturb a concurrent trunk review. Validation performed: - `just test -p codex-core guardian_review_ guardian_ephemeral_retry_preserves_parallel_trunk_and_fork_history run_review_removes_trunk_when_event_stream_is_broken` — **42 passed**; - `just test -p codex-analytics` — **71 passed**; - scoped Clippy fixes for `codex-core` and `codex-analytics` passed. A prior full `codex-core` run had unrelated environment-sensitive failures outside Guardian coverage. ## Manual QA The focused integration tests use the local mock Responses server to inspect exact request counts and emitted lifecycle events. They confirm that retries are internal, a successful later attempt supplies the final decision, non-retryable failures issue only one request, and exhausted retries emit only one terminal result.
kbazzi ·
2026-06-10 11:46:57 -07:00 -
[codex] Raise app-server recursion limit (#27421)
## Summary Unblock Rust release builds after tracing instrumentation increased the async future query depth beyond rustc's default limit. Set the `codex-app-server` crate recursion limit to 256. This changes compilation only; runtime behavior is unchanged. ## Validation - `just test -p codex-app-server` - `cargo build --release --bin codex-app-server`
Adam Perry @ OpenAI ·
2026-06-10 11:37:14 -07:00 -
[codex] remove blocking external agent migration flow (#27064)
## Why External-agent import should be initiated deliberately instead of interrupting eligible TUI startups. This cleanup removes the blocking startup flow before the replacement import experience is introduced later in the stack. ## What changed - remove the startup-blocking external-agent migration prompt - remove the now-unused external migration feature gate - remove the obsolete TUI app-server migration wrappers - retain the dormant picker behind a module-scoped dead-code allowance until the next stack item wires it back in - keep normal TUI startup focused on entering Codex immediately ## Validation - `bazel build --config=clippy //codex-rs/tui:tui //codex-rs/tui:tui-unit-tests-bin` - `just test -p codex-tui external_agent_config_migration` (8 passed) - `just test -p codex-tui` (2,786 passed, 12 unrelated local environment-sensitive failures, 4 skipped) - `just fix -p codex-tui` - `just fmt` ## Stack 1. [#27064](https://github.com/openai/codex/pull/27064): remove the startup migration flow 2. [#27065](https://github.com/openai/codex/pull/27065): extract the picker renderer 3. [#27070](https://github.com/openai/codex/pull/27070): add the external-agent import picker UX 4. [#27071](https://github.com/openai/codex/pull/27071): expose the flow through `/import` **This PR is stack item 1.**
stefanstokic-oai ·
2026-06-10 14:25:04 -04:00 -
fix: Auto-recover from corrupted sqlite databases (#26859)
Further investigation of the sqlite incidents showed that the problems are due to corruption from the older version of SQLite that we recently upgraded, and that the data is truly corrupted in the root database -- recovery of all data is not possible. Given that the data is reconstructable from the rollouts on disk, we should just auto-backup the database and let codex rebuild the rollout info from the disk rollouts. The new behavior is that appserver auto-backs-up and rebuilds (with logs reflecting that behavior). The CLI now pops a message letting you know this happened and the paths of the backed-up corrupt db and the new database. There is also context added so that the desktop app can read the rebuild info from it and inform the user with it.
David de Regt ·
2026-06-10 11:24:29 -07:00 -
Add app-server
thread/deleteAPI (#25018)## Why Clients can archive and unarchive threads today, but there is no app-server API for permanently removing a thread. Deletion also needs to cover the full session tree: deleting a main thread should remove spawned subagent threads and the related local metadata instead of leaving orphaned rollout files, goals, or subagent state behind. ## What - Adds the v2 `thread/delete` request and `thread/deleted` notification, with the response shape kept consistent with `thread/archive`. - Implements local hard delete for active and archived rollout files. - Deletes the requested thread's state DB row as the commit point, then best-effort cleans associated state including spawned descendants, goals, spawn edges, logs, dynamic tools, and agent job assignments. - Updates app-server API docs and generated protocol schema/TypeScript fixtures.
Eric Traut ·
2026-06-10 11:22:12 -07:00 -
Add app-server background terminal process APIs (#26041)
## Summary Codex Apps needs app-server as the source of truth for chat-started background terminals instead of guessing from local process trees. This PR adds experimental v2 APIs to list and terminate background terminals for a loaded thread using app-server process ids, so clients can manage background terminals without local PID discovery. ## Changes - `thread/backgroundTerminals/list` returns paginated background terminal records with `itemId`, app-server `processId`, `command`, `cwd`, nullable `osPid`, nullable `cpuPercent`, and nullable `rssKb`. - `thread/backgroundTerminals/terminate` terminates one running background terminal by app-server `processId` and returns whether a process was terminated. - Background terminal list and terminate operations use unified-exec process manager state as their source of truth.
Eric Traut ·
2026-06-10 11:18:09 -07:00 -
[codex] Remove async_trait from ToolExecutor (#27304)
## Why We're now [discouraging use of `async_trait`](https://github.com/openai/codex/pull/20242). Removing use of `async_trait` from `ToolExecutor` yields a `codex_core` debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine. Stacked on #27299, this PR applies the trait change after the handler bodies have been outlined. ## What Changed `ToolExecutor::handle` to return an explicit boxed `ToolExecutorFuture` instead of using `async_trait`. Updated ToolExecutor implementors to return `Box::pin(...)`, reexported the future alias through `codex-tools` and `codex-extension-api`, and removed `codex-tools` direct `async-trait` dependency.
Adam Perry @ OpenAI ·
2026-06-10 10:26:53 -07:00 -
Fix compressed rollout search path matching (#27407)
## Why `thread/search` found content inside compressed rollouts but could drop the result when joining it with SQLite-backed thread metadata. Search returned the physical `.jsonl.zst` path while SQLite retained the logical `.jsonl` path, so exact path matching failed. ## What changed - Key rollout search matches by their canonical logical `.jsonl` path, independent of the on-disk representation. - Canonicalize thread-list paths before joining them with content-search matches. - Update compressed-rollout coverage to assert the logical-path contract. ## Validation - Ran `just fmt`. - Ran `git diff --check`. - Tests and Clippy were intentionally left to CI.
jif ·
2026-06-10 19:23:42 +02:00 -
Index visible thread list ordering (#27391)
## Summary - add partial SQLite indexes for visible thread lists ordered by creation or update time - match the `archived` and non-empty `preview` filters used by `thread/list` - add query-plan coverage for both supported sort orders ## Query performance Benchmarked the production query shape on a snapshot of my database with ~10k threads before and after applying these indexes. The query selected the full thread projection with `archived = 0`, `preview <> ''`, the `openai` provider filter, and a page size of 201. Results are the mean of 30 runs after 5 warmups: | Query | Before | After | Speedup | | --- | ---: | ---: | ---: | | First page, `created_at_ms DESC` | 132.3 ms | 15.1 ms | 8.78x | | First page, `updated_at_ms DESC` | 123.6 ms | 15.5 ms | 7.99x | | Cursor page near row 4,000, `created_at_ms DESC` | 51.8 ms | 16.8 ms | 3.07x | | Cursor page near row 4,000, `updated_at_ms DESC` | 52.4 ms | 17.1 ms | 3.06x | Before this change, SQLite used `idx_threads_archived`, filtered the candidate rows, and built a temporary B-tree for the requested ordering. With the partial indexes, SQLite reads matching visible rows directly in timestamp order and stops at the page limit. `EXPLAIN QUERY PLAN` no longer reports `USE TEMP B-TREE FOR ORDER BY`. The result rows were identical before and after. The two partial indexes occupy approximately 168 KiB combined on this snapshot. ## Performance under contention I noticed this issue on a database with high-contention and tried to use simulated contention to validate the performance in that context. A synthetic SQLite benchmark ran five concurrent readers, matching the state database pool size, and fetched 101 rows per query. Results are the median of three runs on fresh copies of the same database snapshot: | Query | Before | After | | --- | ---: | ---: | | `created_at_ms` mean latency under saturation | 328 ms | 12 ms | | `created_at_ms` throughput | 16 queries/s | 412 queries/s | | `updated_at_ms` mean latency under saturation | 336 ms | 14 ms | | `updated_at_ms` throughput | 15 queries/s | 357 queries/s | For a burst of 100 queries queued through five connections, p95 completion time fell from 6.90 seconds to 226 ms for `created_at_ms`, and from 6.31 seconds to 473 ms for `updated_at_ms`. ## Validation - `just test -p codex-state` (135 tests passed) - query-plan regression covers created-at and updated-at ordering, requires the corresponding index, and rejects `TEMP B-TREE` - `just fmt`
Zanie Blue ·
2026-06-10 11:52:17 -05:00 -
[codex] Outline ToolExecutor handler bodies (#27299)
## Why We're now [discouraging use of `async_trait`](https://github.com/openai/codex/pull/20242). Removing use of `async_trait` from `ToolExecutor` yields a `codex_core` debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine. For ease of reviewing, this is a prefactor to extract trait method implementations to inherent methods. This will prevent changing indentation from creating a huge diff. ## What Outlined existing `ToolExecutor::handle` bodies into inherent async `handle_call` methods across core and extension tool handlers. The trait methods still use `async_trait` and now delegate to `self.handle_call(...).await`; handler behavior is unchanged.
Adam Perry @ OpenAI ·
2026-06-10 09:40:41 -07:00 -
Reduce archive rollout lookup CPU (#27276)
## Why Archiving a thread can spike app-server CPU when the state DB does not have a usable rollout path. The archive path falls back to locating the rollout by thread id; because rollout filenames already contain the UUID, the cheap fallback should find the file directly before invoking broader file search. ## What Changed - In `codex-rs/rollout/src/list.rs`, try the exact rollout filename lookup before `codex-file-search`. - Keep fuzzy search as the final legacy fallback when no filename match is found. - Preserve the legacy fallback when the filename scan hits a traversal error, so an inaccessible stale subtree does not block lookup elsewhere. ## Verification - `just test -p codex-rollout` - `just test -p codex-thread-store` - `just test -p codex-app-server thread_archive`
Eric Traut ·
2026-06-10 09:28:12 -07:00 -
[codex] link Windows releases with LLD (#27315)
Windows x64 release builds spend about 36.5 of 48 minutes in final LLVM code generation and MSVC linking. Use the existing target-aware MSVC setup action to select LLD for release builds; the Windows ARM64 archive path already exercises the action and its LLD wrapper. In https://github.com/openai/codex/actions/runs/27242495616, macOS becomes the critical path after roughly four minutes of Windows improvement, so this is expected to reduce total workflow time by about four minutes.
Tamir Duberstein ·
2026-06-10 09:18:19 -07:00 -
[codex] add io PathUri native conversion APIs (#27280)
## Why Discovered some rough edges in the API while making use of it more widely within exec-server. It would be a lot more convenient for existing users of `AbsolutePathBuf` if `PathUri` conversion methods returned `std::io::Result`s. ## What * `PathUri::to_native_path()` -> `PathUri::to_abs_path()` * `PathUri::from_file_path()` -> `PathUri::from_abs_path()`
Adam Perry @ OpenAI ·
2026-06-10 08:51:17 -07:00 -
[codex] Store compact window id in rollout (#27264)
## Why Compaction window identity is part of session history, not model-client transport state. Persisting it with the compacted rollout item lets resumed threads continue from the reconstructed window without keeping mutable window state on `ModelClient`. ## What changed - Added `window_id` to `CompactedItem` and stamp it when `replace_compacted_history` installs compacted history. - Moved auto-compact window id ownership into `AutoCompactWindow` / `SessionState`; `ModelClient` now receives the request window id from callers instead of storing it. - Returned `window_id` from rollout reconstruction for resume. Reconstruction uses the newest surviving compacted item's stored `window_id` when present, and falls back to the legacy compacted-item count when it is absent. - Kept fork startup at the fresh default window id and updated direct model-client tests to pass explicit test window ids. ## Validation - `cargo check -p codex-core --tests`
pakrym-oai ·
2026-06-10 08:47:16 -07:00 -
Use latest-wins MCP manager replacement (#27259)
## Summary We originally addressed startup prewarming holding the read side of `RwLock<McpConnectionManager>` by snapshotting tool-list state. Review feedback identified the broader ownership problem: the outer synchronization should only publish or retrieve the current manager, while MCP operations rely on the manager's internal synchronization. A follow-up preserved operation retirement with a separate gate, but further review questioned whether that synchronization was actually required and whether we could support latest-wins replacement instead. This PR now stores the current MCP manager in `ArcSwap`. Each operation uses `load_full()` to obtain an owned `Arc<McpConnectionManager>`, then performs MCP I/O without retaining the publication mechanism. Refresh cancels obsolete startup work, constructs a replacement, and atomically publishes it. New operations see the latest manager, while operations that already loaded the previous manager retain a valid handle. Refresh happens at a turn boundary, so there should be no active user tool calls to drain. Git history supports dropping the outer `RwLock`. It was introduced in `03ffe4d595` on November 17, 2025 for non-blocking MCP startup: the session published an empty manager, startup initialized that same object while holding the write lock, and readers waited for initialization. `7cd2e84026` on February 19, 2026 removed that two-phase initialization in favor of constructing a fresh manager and swapping it in, explicitly noting that `Option` or `OnceCell` could replace the placeholder design. Hot reload later reused the existing lock to publish a replacement, but I found no indication that the lock was introduced to guarantee in-flight tool calls finish before refresh or shutdown. Terminal shutdown remains separate from refresh: it aborts startup prewarming and active tasks before shutting down the current manager, so tool calls may be interrupted and no model WebSocket work continues after shutdown. Focused regression coverage exercises pending tool-list cancellation, deferred refresh, and startup-prewarm shutdown.
Charlie Marsh ·
2026-06-10 08:33:21 -07:00 -
Remove async-trait from extension contributors (#27383)
## Why Extension contributors are registered behind `dyn Trait` objects, so native `async fn`/RPITIT methods would make these traits non-object-safe. Spell out the boxed, `Send` future contract directly so `extension-api` no longer needs `async-trait` while retaining the existing runtime model. ## What changed - add a shared `ExtensionFuture` alias and use it for asynchronous contributor methods - migrate production and test implementations to return `Box::pin(async move { ... })` - remove `async-trait` dependencies where they are no longer used, keeping it dev-only where unrelated test executors still require it ## Behavior No behavior change is intended. Contributor futures remain boxed, `Send`, dynamically dispatched, and lazily executed; cancellation and callback ordering stay unchanged. ## Testing - `just test -p codex-extension-api` (11 passed) - affected extension crates (64 passed) - targeted `codex-core` contributor tests (14 passed) - `just fmt` - `just bazel-lock-update` - `just bazel-lock-check` A broad local `codex-core` run compiled successfully but encountered unrelated sandbox and missing test-binary fixture failures; CI will run the full checks.jif ·
2026-06-10 14:31:09 +02:00 -
[codex] Tag multi-agent spawn metrics with version (#27375)
## Summary - tag legacy multi-agent spawn metrics with `version=v1` - tag multi-agent v2 spawn metrics with `version=v2` ## Why `codex.multi_agent.spawn` is emitted by both runtimes, so the existing metric cannot distinguish v2 adoption from aggregate multi-agent spawning. The bounded version tag makes that breakdown directly queryable without changing the counter's success-only semantics. ## Validation - `just fmt` - `git diff --check` - Tests and Clippy were intentionally left to CI.
jif ·
2026-06-10 13:06:48 +02:00 -
Use plugin-service MCP as the hosted plugin runtime (#27198)
## Stack - Base: #27191 - This PR is the third vertical and should be reviewed against `jif/external-plugins-2`, not `main`. ## Why #27191 moves the host-owned Apps MCP registration behind an extension contributor, but deliberately preserves the existing endpoint-selection feature while that contribution contract lands. App-server can therefore resolve the server through extensions, yet the hosted plugin endpoint is still selected through temporary `apps_mcp_path_override` plumbing. That is not the long-term plugin model. A plugin can bundle skills, connectors, MCP servers, and hooks, and those components do not all need the same source or execution environment. In particular, an authenticated HTTP MCP server can expose plugin capabilities directly from a backend without an executor or an orchestrator filesystem. This PR completes that hosted vertical. App-server's MCP extension now owns the aggregate hosted plugin runtime at `/ps/mcp`. Connector actions continue to arrive as MCP tools, while backend-provided skills arrive as MCP resources and use Codex's existing resource list/read paths. No second backend client, skill filesystem, or generic plugin activation framework is introduced. The backend route remains the hosted implementation. This change replaces Codex's temporary endpoint-selection mechanism, not the service behind the endpoint. ## What changed ### Hosted plugin runtime The MCP extension now contributes `codex_apps` as the hosted plugin runtime rather than as a configurable Apps endpoint: - `https://chatgpt.com` resolves to `https://chatgpt.com/backend-api/ps/mcp`; - a bare custom ChatGPT base resolves to `/api/codex/ps/mcp`; - the existing product-SKU header and ChatGPT authentication behavior are preserved; - executor availability is never consulted for this streamable HTTP transport. The same MCP connection carries both component shapes supported by the hosted endpoint: - connector actions are discovered and invoked as MCP tools; - hosted skills are enumerated and read as MCP resources through the existing `list_mcp_resources` and `read_mcp_resource` paths. This keeps component access in the subsystem that already owns the protocol instead of downloading backend skills into an orchestrator filesystem or inventing a parallel hosted-skill client. ### Explicit runtime ordering `McpManager` now resolves the reserved `codex_apps` entry in three ordered phases: 1. install the legacy Apps fallback for compatibility; 2. apply ordered extension `Set` or `Remove` overlays; 3. apply the final ChatGPT-auth gate without synthesizing the server again. This ordering is important: - an ordinary configured or plugin MCP server cannot claim the auth-bearing `codex_apps` name; - an extension-contributed hosted runtime wins over the fallback; - an extension `Remove` remains authoritative; - a host without the MCP extension retains the legacy Apps endpoint and current local-only behavior. The temporary `legacy_apps_mcp_loader_enabled` coordination flag is no longer needed. ### Remove the path override The `apps_mcp_path_override` feature and its runtime plumbing are removed, including: - the feature registry entry and structured feature config; - `Config` and `McpConfig` fields; - config schema output; - config-lock materialization; - URL override handling in `codex-mcp`. Existing boolean and structured forms still deserialize as ignored compatibility input. They are omitted from new serialized config, and config-lock comparison normalizes the removed input so older locks remain replayable. ### App-server coverage App-server MCP fixtures now serve the hosted route at `/api/codex/ps/mcp`. Existing resource-read and tool/elicitation flows therefore exercise the extension-owned endpoint rather than succeeding through the legacy fallback. The stack also adds the missing `codex_chatgpt::connectors` re-export for the manager-backed connector helper introduced in #27191. ## Compatibility - App-server installs the extension and uses `/ps/mcp` for the hosted runtime. - CLI and other hosts that do not install the extension retain the legacy Apps endpoint. - Apps disabled or non-ChatGPT authentication removes `codex_apps` from the effective runtime view. - Existing local plugins, local skills, executor-selected skills, configured MCP servers, and MCP OAuth behavior are otherwise unchanged. - Backend plugin enablement remains account/workspace state owned by the hosted endpoint; this PR does not add thread-local backend plugin selection. ## Architectural fit The stack now proves two independent runtime shapes: 1. #27184 resolves filesystem-backed skills through the executor that owns a selected root. 2. #27191 and this PR resolve a backend-hosted HTTP MCP through an extension with no executor. Together they preserve the intended separation: - selection identifies a plugin/root when explicit selection is needed; - each component's owning extension resolves its concrete access mechanism; - execution stays with the runtime required by that component; - existing skills, MCP, connector, and hook subsystems remain the downstream consumers. ## Planned follow-ups 1. **Executor stdio MCP:** selecting an executor plugin registers a manifest-declared stdio MCP server and executes it in the environment that owns the plugin. 2. **Optional backend selection:** only if CCA needs thread-local selection distinct from backend account/workspace enablement, add a concrete backend-owned capability location and surface those selected skills through the skills catalog. 3. **Connector metadata and hooks:** activate those plugin components through their existing owning subsystems, with executor hooks remaining environment-bound. 4. **Propagation and persistence:** define explicit resume, fork, subagent, refresh, and environment-removal semantics once selected roots have multiple real consumers. 5. **Local convergence:** migrate legacy local skill, MCP, connector, and hook paths behind their owning extensions one vertical at a time, then remove duplicate core managers and compatibility plumbing after parity. ## Verification Coverage in this change exercises: - extension-owned `/backend-api/ps/mcp` registration without an executor; - preservation of the legacy endpoint in hosts without the extension; - extension `Set` and `Remove` precedence over the legacy fallback; - ChatGPT-auth gating for the reserved server; - hosted MCP resource reads with and without an active thread; - connector tool invocation and MCP elicitation through the hosted route; - ignored boolean and structured forms of the removed path override; - config-lock replay compatibility for the removed feature. `cargo check -p codex-features -p codex-mcp-extension -p codex-app-server` passes. Tests and Clippy were not run locally under the current development instruction; CI provides the full validation pass.
jif ·
2026-06-10 12:54:21 +02:00 -
feat: keep child MCP warnings out of parent transcript (#27174)
## Why MCP startup status notifications are thread-owned, but `ChatWidget` trusted upstream routing. If routing state delivered a tagged child notification to the active parent widget, the child MCP failure could still mutate the parent's startup state and transcript. Rejecting it only inside the MCP handler was also too late because shared notification handling could already restore and consume the parent's retry status. ## What changed - Validate a tagged MCP status notification against the visible `ChatWidget` thread before shared notification handling mutates any parent state. - Cover child `Starting` and `Failed` notifications delivered to a retrying parent widget, asserting that they preserve its visible retry error and saved status header while producing no history or MCP status mutation. ## User impact Subagent MCP startup failures remain scoped to the child transcript instead of appearing as duplicate warnings in the parent transcript. ## Testing - `just test -p codex-tui mcp_startup_ignores_status_for_other_thread` - `just test -p codex-tui primary_thread_ignores_child_mcp_startup_notifications` - `just fmt`
jif ·
2026-06-10 11:45:49 +02:00 -
[codex] Make MCP connection startup fallible (#27261)
## Why Required MCP server startup was enforced in `Session::new` after `McpConnectionManager` had already created the clients. That split let other manager construction paths bypass the same requirement and exposed manager internals solely so the session could validate them. Keeping required-server readiness in the constructor gives every caller one consistent startup contract. ## What changed - make `McpConnectionManager::new` return `anyhow::Result<Self>` and fail when an enabled, required server cannot initialize - pass the startup cancellation token into the constructor so required-server waits remain cancellable - propagate constructor failures through resource reads, connector discovery, and MCP status collection - preserve the active manager and cancellation token when a refreshed replacement fails - keep required-startup failure collection private and cover the constructor error contract directly ## Validation - updated the focused connection-manager test to assert the complete required-server startup error - local tests not run; relying on CI
Ahmed Ibrahim ·
2026-06-10 00:17:58 -07:00 -
Add spans to run_turn (#27107)
## Why Codex app-server latency traces do not granularly cover turn orchestration, sampling-request preparation, and tool-loading work. These spans help separate local coordination/setup costs from model streaming and tool execution. ## What changed - Add `run_turn.*` spans around sampling-request input preparation and post-sampling state collection - Add function-level trace spans around turn setup, hook execution, compaction, prompt construction, and MCP tool exposure - Add `built_tools.*` spans around plugin loading and discoverable-tool loading ## Verification Trigger Codex rollout and observe new spans are included
mchen-oai ·
2026-06-10 04:41:06 +00:00 -
[codex] Fix post-merge analytics integration failures (#27285)
## Why Recent merges left `main` with analytics integration build failures. Local Cargo runs also made the trimmed-skills test depend on developer-installed skills, while Bazel used an isolated home. ## What changed - Clone `thread_metadata.thread_source` when constructing goal analytics event parameters. - Group app-server thread extension inputs into `ThreadExtensionDependencies`. - Isolate the trimmed-skills test home so its exact fixture count is stable across Cargo and Bazel. ## Validation - `cargo check -p codex-analytics` - `just test -p codex-analytics` (71 tests) - `just test -p codex-app-server` (837 tests; one unrelated zsh-fork timeout passed on retry)
Adam Perry @ OpenAI ·
2026-06-09 20:52:09 -07:00 -
[codex-analytics] emit goal lifecycle analytics (#27078)
## Why - Currently, there is no analytics event for `/goal` behavior - Existing events cannot identify goal execution or its resulting outcome - The original update in [#26182](https://github.com/openai/codex/pull/26182) was implemented before `/goal` moved into `codex-goal-extension`. ## What Changed - Adds `codex_goal_event` serialization and enrichment to `codex-analytics` - Emits goal events from the canonical `codex-goal-extension` mutation and accounting paths: - `created` when a new logical goal is persisted - `usage_accounted` when cumulative goal usage is persisted - `status_changed` when the stored goal status changes - `cleared` when the goal is deleted - Preserves causal `turn_id` for turn driven events and uses null attribution for external or idle lifecycle events - Changes goal deletion to return the deleted row so `cleared` retains the stable goal ID ## Event Details Includes standard analytics metadata along with goal specific fields: - `goal_id`: Stable ID stored in the local SQLite goal row and shared across the goal's events - `event_kind`: Observed operation (see the 4 lifecycle events cited in the above bullet) - `goal_status`: Resulting or last stored status: `active`, `paused`, `blocked`, `usage_limited`, etc. - `has_token_budget`: Indicates whether a token budget is configured - `turn_id`: Causal turn ID, or null when no causal turn exists - `cumulative_tokens_accounted`: Cumulative tokens on `usage_accounted` events; null otherwise - `cumulative_time_accounted_seconds`: Cumulative active time on `usage_accounted` events; null otherwise ## Validation - `just test -p codex-analytics -p codex-state -p codex-goal-extension` - `just test -p codex-core -E 'test(/goal/)'` - `just test -p codex-app-server` - `cargo build -p codex-analytics -p codex-core -p codex-state -p codex-app-server`
marksteinbrick-oai ·
2026-06-09 18:45:54 -07:00