mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
7919 Commits
-
[codex] disable Nagle on Rendezvous WebSockets (#30269)
## Summary Disable Nagle unconditionally for both exec-server Rendezvous WebSocket connections. - pass `disable_nagle=true` at the executor and harness connection call sites - keep the existing signed URL, protocol, and connection flow unchanged - add no feature flag, rollout schema, path variant, or experiment-specific telemetry The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous sockets: https://github.com/openai/openai/pull/1082463 ## Why Rendezvous carries small, latency-sensitive relay and JSON-RPC frames. Three staging runs of 30 steady-state `process/read` calls per configuration measured p50 improving from 139.1 ms to 81.5 ms and p95 from 162.0 ms to 95.8 ms with Nagle disabled. The expected packet overhead is small at the current connection scale. We will use existing latency, error, packet, and CPU monitoring and revert normally if production regresses. ## Rollout and rollback The client and accepted-socket changes can deploy independently. New connections receive the setting as each side deploys. Rollback is a normal code revert; there is no persisted assignment or gate state to unwind. ## Validation - `just test -p codex-exec-server --lib`: 164 passed - `just fix -p codex-exec-server`: passed - `just fmt`: passed - independent final review found no actionable issue
richardopenai ·
2026-06-29 19:14:47 -05:00 -
[codex] auto-label AWS Bedrock issues (#30607)
## Summary AWS Bedrock issues currently fall under broader labels, which makes provider-specific reports harder to find. The issue tracker now has an `aws-bedrock` label, but the automated labeler does not know to apply it. Teach the issue labeler to select `aws-bedrock` for Amazon Bedrock provider or Bedrock Mantle issues while excluding generic AWS references.
Eric Traut ·
2026-06-29 11:10:38 -07:00 -
Update safety check links (#30491)
## Summary Bio/Cyber safety surfaces in the TUI could send users to stale Trusted Access pages, and safety buffering did not always expose the Help Center. This follow-up to #30317 adds the missing Learn more action, refreshes the Bio access URL and block copy, and updates the affected snapshots while preserving the existing retry and wait behavior.
Eric Traut ·
2026-06-29 11:10:11 -07:00 -
[codex] Treat max as a first-class reasoning effort (#30467)
## Why The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an opaque custom effort. That made the reasoning picker render it as lowercase `max` while known efforts use productized labels. Making `max` a known effort aligns catalog data, parsing, and UI presentation without changing the `max` wire value or persisted representation. ## What changed - Add first-class `ReasoningEffort::Max` parsing and serialization. - Use the typed effort in the Bedrock catalog and render it as `Max` in the TUI. - Preserve forward-compatible custom-effort coverage with a genuinely unknown `future` value. ### Before <img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM" src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364" /> ### After <img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM" src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17" />
Shijie Rao ·
2026-06-29 09:38:49 -07:00 -
Dylan Hurd ·
2026-06-28 20:40:55 -07:00 -
[codex] Restore v1 delegation guidance (#30511)
## Summary - restore the v1 clarification that requests for depth, research, or investigation do not authorize subagent spawning - restore guidance for keeping critical-path, urgent, tightly coupled, or difficult work local - update the focused v1 tool-search and spawn-description coverage ## Why PR #27919 simplified the v1 `spawn_agent` prompt by removing its delegation decision guidance. That left the authorization rule intact, but removed the instructions that constrained what should be delegated after spawning was authorized. Restore those guardrails while preserving later support for explicit delegation authorization from applicable AGENTS.md and skill instructions. Multi-agent v2 prompts are unchanged. ## User impact Models using the v1 multi-agent tool surface receive clearer guidance to delegate independent side work while keeping blocking work on the main rollout. ## Validation - `just fmt` - `git diff --check` - tests not run locally per repository guidance; CI will validate the focused coverage
Ahmed Ibrahim ·
2026-06-28 20:34:47 -07:00 -
[codex] Use model metadata for skills usage instructions (#29740)
## Summary - add a false-by-default `include_skills_usage_instructions` model metadata field - enable the field for the bundled `gpt-5.5` model metadata - consume the metadata in both core and extension skill rendering - remove hardcoded legacy-model matching and its marker plumbing
ani-oai ·
2026-06-29 09:44:36 +09:00 -
fix(tui): clear completed safety buffering prompt (#30490)
## Why The safety-buffering prompt is a modal TUI view, but the normal successful-turn path only hid the running status indicator. If the turn completed while the prompt was open, the stale modal remained over the composer until the user dismissed it or another turn started. This aligns the TUI with the app behavior: keep the safety notice visible while the turn is active, then remove it when the turn becomes terminal. It also prevents the stale retry action from changing the model and reasoning effort for a future turn after the buffered turn has already completed. | New copy | |---| | <img width="1014" height="313" alt="CleanShot 2026-06-28 at 20 27 18" src="https://github.com/user-attachments/assets/f0f37359-5d77-442f-add2-9d1874bdc422" /> | ## What changed - Clear the active safety-buffering view and retry state when a turn completes successfully. - Update the retry-capable message to say “Hang tight or retry with a faster model”. - Extend the safety-buffering regression coverage to verify that the prompt remains visible after assistant output starts and disappears when the turn completes. - Update the TUI snapshot for the revised copy. This is a follow-up to #29919. ## How to Test 1. Start a TUI turn that receives `model/safetyBuffering/updated` with `showBufferingUi: true` and a `fasterModel`. 2. Confirm the prompt says “Hang tight or retry with a faster model”. 3. Let the turn continue and confirm the prompt remains visible while the turn is active. 4. Let the turn finish successfully and confirm the prompt disappears and the composer is restored without requiring an extra keypress. 5. Confirm a buffering update without a faster model still shows the shorter non-retry message. Targeted automated coverage: - `just test -p codex-tui safety_buffering` — 4 passed. - `just test -p codex-tui` — 2,951 passed; two unrelated Guardian feature-flag tests failed identically on `main` in this environment. The argument-comment lint was also audited manually. The workspace Bazel invocation was blocked by a missing external LLVM `compiler-rt` BUILD file, and the packaged per-crate fallback uses a nightly older than the current `sqlx` minimum Rust version.
Felipe Coury ·
2026-06-28 20:55:53 -03:00 -
[codex] Enable remote plugins by default (#30297)
## Summary - enable the remote plugin feature by default - promote the remote plugin feature from under development to stable - preserve the existing `features.remote_plugin` override for explicitly disabling it - keep legacy disabled-path coverage explicit in TUI and app-server tests ## Impact Remote plugin functionality is enabled by default for configurations that do not set the feature flag. The existing Codex backend authentication gate still applies. ## Validation - `just fmt` - `just test -p codex-features` - `just test -p codex-tui plugins_popup_remote_section_fallback_states_snapshot` - targeted `codex-app-server` plugin-list and skills-list tests - `git diff --check` The full TUI and app-server suites were also exercised locally. All remote-plugin-related coverage passed; unrelated local sandbox/test-binary failures remain outside this change.
xl-openai ·
2026-06-28 11:46:25 -07:00 -
[app-server] increase currentTime/read timeout (#30384)
## Summary Increase the external currentTime/read request timeout from 5 seconds to 10 seconds. ## Validation - just fmt - Focused app-server test build was stopped to defer validation to CI.
rka-oai ·
2026-06-27 16:42:03 -07:00 -
[plugins] Enforce marketplace source policy at runtime (#29691)
## Summary - project effective marketplace/plugin config through the enterprise source policy so blocked installed plugins become inactive - filter plugin list/read/discovery and CLI marketplace source/snapshot reporting using the same policy - enforce source admission for background marketplace cache refreshes - continue refreshing/upgrading independent marketplaces and plugins when one entry fails, returning per-entry errors - include policy-projected plugin state in cache and refresh keys so requirement changes invalidate stale results ## Stack This is PR 2 of 2 and is based on #29690. Review the admission model and source matcher in #29690 first; this PR contains only runtime enforcement. ## Test plan - `just test -p codex-core-plugins` (287 tests) - `just test -p codex-cli plugin_list_ignores_implicit_system_marketplace_roots_without_manifests` - `cargo check -p codex-cli -p codex-app-server --tests`
xl-openai ·
2026-06-27 15:22:05 -07:00 -
[app-server] expose environment info RPC (#30291)
## Why App-server clients that configure named execution environments need to discover an environment's shell and working directory before selecting it for a thread or turn. Because the environment can run on a different operating system than app-server, its working directory is represented as a canonical `file:` URI rather than a host-local path string. The probe also needs a bounded response time: an exec-server that completes initialization but never answers `environment/info` must not hold the environment serialization queue indefinitely. ## What changed - Add an experimental `environment/info` app-server RPC for named environments. - Route the probe through the managed environment connection and return target-native shell metadata plus the default working directory as a `PathUri`. - Return connection and protocol failures as JSON-RPC errors. - Bound the exec-server probe response to 30 seconds and remove timed-out calls from the pending-request table so later environment mutations can proceed. - Cover successful responses, omitted working directories, unknown environments, connection failures, and pending-call cleanup. ## Protocol examples Request: ```json { "id": 42, "method": "environment/info", "params": { "environmentId": "remote-a" } } ``` Successful response: ```json { "id": 42, "result": { "shell": { "name": "zsh", "path": "/bin/zsh" }, "cwd": "file:///workspace" } } ``` If the exec-server initializes but does not answer the probe within 30 seconds: ```json { "id": 42, "error": { "code": -32603, "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s" } } ``` ## Testing - App-server integration coverage for successful info (including omitted `cwd`), unknown environments, and connection failures. - Exec-server RPC coverage verifying a timed-out call is removed from the pending-request table. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>Max Johnson ·
2026-06-27 19:34:10 +00:00 -
core: stabilize synthesized call output IDs (#30327)
## Why Response item IDs represent stable conversation identity. `ContextManager::for_prompt` repairs an unmatched call by synthesizing an `"aborted"` output in the disposable prompt projection, but that output previously had no ID. Assigning a fresh ID on every prompt build would make retries and resumes change otherwise identical model context and reduce prompt-cache reuse. The concrete bug is that these normalization-created outputs bypass the regular item-ID allocation path. Even with item IDs enabled, a prompt could therefore contain an identified call paired with a synthetic output whose `id` was missing. This change closes that gap by deriving the output ID from the source call's item ID. For legacy calls that have no item ID, the output remains ID-less because there is no stable source identity to derive from. The originating call already has a stable item ID under the item-ID model introduced in #28814. A prompt-only output can therefore derive stable identity from that call without mutating canonical history or persisted rollouts. This addresses the failure exposed by #30311 while keeping normalization read-only outside its detached prompt snapshot. UUIDv5 is intentional here because it is the standard namespaced, deterministic UUID construction. Using the output kind and source call ID as the name produces the same UUID on every projection while keeping output kinds in separate name domains. UUIDv7 would introduce randomness and time, so keeping it stable would require persisting the synthetic repair. UUIDv5 uses SHA-1 internally, but this is only an identity mapping—not an authenticity or security boundary. ## What changed - Derive a deterministic UUIDv5 ID for each synthesized call output from the source call item ID. - Use the Responses API prefix appropriate for function, custom-tool, tool-search, and local-shell outputs. - Preserve the existing insertion position immediately after the unmatched call. - Keep synthesized outputs prompt-only; no rollout, task-lifecycle, compaction, or raw-response behavior changes. ## Testing - `just test -p codex-core for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history` - `just test -p codex-core synthetic_call_output_id_is_stable_across_resumes` - `just test -p codex-core normalize_adds_missing_output` - `just test -p codex-core response_item_ids`
Michael Bolin ·
2026-06-27 10:47:54 -07:00 -
Preserve namespaces on custom tool calls (#30302)
## Summary - Preserve the optional namespace on custom tool calls during response deserialization and app-server replay. - Use the namespaced tool identifier for streaming argument handling and tool dispatch. - Regenerate app-server protocol schemas. - Add regression tests covering namespace serialization and routing. ## Testing - Ran affected protocol and app-server test suites. - Ran the full core test suite; two load-sensitive timing tests passed when rerun individually. - Ran Clippy and formatting checks. - Verified with a local end-to-end app-server replay that the namespace is preserved through the complete request/response flow.
nhamidi-oai ·
2026-06-27 09:54:56 -07:00 -
Eric Traut ·
2026-06-26 20:05:32 -07:00 -
app-server: structure and test JSON shutdown logs (#30314)
## Why `LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the behavior was only covered indirectly. We should verify the actual JSONL written by both user-facing entry points: `codex app-server` and the standalone `codex-app-server` binary. The existing processor shutdown message also always said the channel closed, even though the processor can exit for several different reasons. Structured fields make that event more accurate and useful to log consumers. ## What changed - Record the processor `exit_reason`, remaining connection count, and forced-shutdown state as structured tracing fields. - Add a shared process-test helper that enables JSON logging, validates every stderr line as JSON, and verifies the top-level timestamp is RFC 3339. - Cover both `codex app-server` and `codex-app-server`, asserting the stable `level`, `fields`, and `target` payload. ## Test plan - `just test -p codex-app-server standalone_app_server_emits_json_info_events` - `just test -p codex-cli app_server_emits_json_info_events`
Michael Bolin ·
2026-06-26 18:19:56 -07:00 -
core: overlap diff root discovery with world state (#30286)
## Why Remote diff-root discovery is independent of world-state construction, but it ran afterward and added filesystem metadata latency before the first model request. Overlap the independent work so thread-cold turns do not pay those waits serially. ## What - Run `record_context_updates_and_set_reference_context_item` and `turn_diff_display_roots` with `tokio::join!`. - Reuse the same resolved display roots when constructing `TurnDiffTracker`; no cache or behavior lifecycle changes are introduced. ## Validation A synthetic executor-skill benchmark with artificial network delay: thread-cold model-request p50 improved from about 1.79 s to 1.58 s.
Adam Perry @ OpenAI ·
2026-06-26 18:07:41 -07:00 -
[codex] consume pushed exec-server process events (#30273)
## Summary - complete unified-exec processes from the ordered event stream instead of issuing a final zero-wait `process/read` - add optional executor sandbox-denial state to `process/exited` - retain `process/read` as a retained-output and compatibility fallback for receiver lag, sequence gaps, and legacy servers - recover sandbox-denial state across transport reconnection - cover the real `TestCodex` remote-exec path without adding a public test-only event constructor ## Why A successful one-shot tool call currently receives its output and terminal notifications, then pays another wide-area `process/read` round trip before returning. Staging traces showed that remote response wait accounted for more than 99.8% of RPC time; local serialization, queueing, and deserialization were below 0.6 ms. ## Measured impact A direct staging A/B used the same build and route and changed only completion mode. Each arm ran three times with 30 one-shot `/usr/bin/true` calls per run. The table reports the median of the three per-run percentiles. | Metric | Final `process/read` | Pushed events | Change | | --- | ---: | ---: | ---: | | End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) | | End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) | | Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) | | Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms | TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The successful, complete, in-order event path issued zero final `process/read` calls. ## Compatibility and recovery - new servers send `sandboxDenied` on `process/exited` - legacy servers omit it, which triggers one compatibility `process/read` - broadcast lag or a sequence gap triggers a retained-output read - recovery remains bounded by the server's existing 1 MiB retained-output window - complete, in-order event streams issue no completion read - sandbox denial is attached to the exit event before consumers can observe process completion - server-first and client-first rollouts remain wire-compatible; server-first realizes the latency win immediately ## Integration coverage The `TestCodex` suite exercises four distinct remote-exec contracts: - complete pushed output/exit/close with zero reads - direct pushed sandbox denial with zero reads - legacy missing denial metadata with exactly one compatibility read - count-bounded replay eviction recovered from retained output without duplication ## Validation - `just test -p codex-core exec_command_consumes_pushed_remote_process_events`: 4 passed - `just test -p codex-core unified_exec::process_tests::`: 4 passed - `just test -p codex-exec-server`: 294 passed, 2 skipped - `just test -p codex-exec-server-protocol`: 5 passed - `just test -p codex-rmcp-client`: 89 passed, 2 skipped - focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards - scoped `just fix` passed for core and exec-server - `just fmt` passed The complete workspace suite was not rerun; focused Cargo and Bazel coverage passed for the changed behavior.
richardopenai ·
2026-06-26 18:05:52 -07:00 -
fix(remote-control): avoid server token refresh retry storms (#30201)
## Why Remote-control websocket reconnects and pairing requests proactively refresh their server token. When `/server/refresh` returns a transient error such as `502`, the still-valid token was discarded as a usable connection path, causing reconnect failures and repeated refresh attempts that could amplify an upstream incident. ## What Changed - Start proactive refresh five minutes before token expiry and distinguish it from a required refresh for missing or expired tokens. - Continue websocket and pairing operations with the existing valid token after `429`, `5xx`, or timeout failures. - Share an in-memory `next_refresh_at` throttle across websocket and pairing callers, honoring both `Retry-After` formats and otherwise using a jittered 24–36 second delay. - Keep required refreshes strict, preserve `404` enrollment replacement, and clear token/throttle state for `401` and `403` auth recovery. - Preserve refresh response metadata internally and add focused wire-level and integration coverage. ## Verification Added behavioral coverage proving that: - a valid near-expiry token still completes websocket and pairing requests after transient refresh failures; - `Retry-After` suppresses a subsequent refresh across websocket and pairing callers; - request and response-body timeouts are classified as transient; - an expired token, including one that expires during refresh, cannot proceed to websocket connection; - auth failures clear the attempted token without overwriting a concurrently rotated token.
Anton Panasenko ·
2026-06-26 17:34:52 -07:00 -
feat(protocol): define missing rollout turn items (#30282)
## Description This PR adds canonical core `TurnItem` shapes for command execution, dynamic tool calls, collab agent tool calls, and sub-agent activity, to be stored in the rollout file soon. It also teaches app-server protocol / `ThreadHistoryBuilder` how to render those items, and adds the small legacy fanout helpers needed for existing event-based consumers. No core producer or rollout persistence behavior changes here, that will be done in a followup. ## Making ThreadHistoryBuilder stateless This is the first PR in a stack to make `ThreadHistoryBuilder` stateless enough that we can materialize app-server `ThreadItem`s from only a given slice of `RolloutItem` history, without ever needing to replay the whole thread from the beginning. The persisted legacy `RolloutItem::EventMsg` records are mostly shaped like live UI events, not like materialized `ThreadItem`s. They work if we replay the full rollout in order, but they often do not contain enough stable identity or complete item state to project an arbitrary suffix on its own. A few examples: - `UserMessageEvent` and `AgentMessageEvent` have content, but historically do not carry the persisted app-server item ID that should become the SQLite primary key. - `AgentReasoningEvent` and `AgentReasoningRawContentEvent` are fragments. `ThreadHistoryBuilder` currently merges them into the last reasoning item, which means a slice starting in the middle of reasoning cannot know whether to append to an earlier item or create a new one. - `WebSearchEndEvent`, `McpToolCallEndEvent`, collab end events, and similar legacy events can often render a final-looking item, but they usually rely on prior replay state to know which turn owns the item. - Begin/end legacy events are partial views of one logical item. The builder correlates them by `call_id` and mutates prior state to synthesize the final `ThreadItem`. That is the problem this direction fixes. A persisted canonical lifecycle record looks much closer to the read model we actually want later: ```rust ItemCompletedEvent { turn_id, item: TurnItem { id, ...full snapshot... }, completed_at_ms, } ``` Once rollout has explicit `turn_id`, stable `item.id`, and a canonical completed item snapshot, the future SQLite projector can reduce only the new rollout suffix and upsert the affected `thread_items` rows. It no longer needs to synthesize `item-N`, infer item ownership from the active turn, or replay earlier events just to reconstruct the current item snapshot. ## What changed - Added core `TurnItem` variants and item structs for command execution, dynamic tool calls, collab agent tool calls, and sub-agent activity. - Added conversions from those canonical items back into the legacy event shapes where current consumers still need them. - Added app-server v2 `ThreadItem` conversion for the new core item variants. - Taught `ThreadHistoryBuilder` and rollout persistence metrics to recognize the new item variants. ## Follow-up The next PR https://github.com/openai/codex/pull/30283 switches the live core producers for these item families onto canonical `ItemStarted` / `ItemCompleted` events.Owen Lin ·
2026-06-26 16:44:34 -07:00 -
[codex] group blocking and postmerge CI workflows (#30146)
## Why It's hard to change the set of required jobs when they're managed in the GitHub UI, and when each workflow is responsible for choosing it's own scheduling it's easy to end up with skew between what we enforce on PRs vs. on main. ## What - add a `blocking-ci` caller workflow, triggered by pull requests and pushes to `main`, for Bazel, blob size, cargo-deny, Codespell, `repo-checks`, rust CI, and SDK CI - add an `always()` terminal job named `CI required` that fails unless every called workflow succeeds - add a `postmerge-ci` caller workflow for `rust-ci-full` and `v8-canary`, with a terminal `Postmerge CI results` job - centralize V8 relevance detection in `v8_canary_changes.py`; unrelated PR and postmerge runs execute metadata only and skip the expensive build matrices - leave `v8-canary` outside the blocking gate and leave the external `cla` check independent ## Rollout A repository admin must replace the existing required GitHub Actions contexts with `CI required` in the main-branch ruleset. Retain `cla` as a separate required check. Until that change is coordinated, this PR cannot satisfy the old standalone check names. In-flight PRs will need to be rebased after this lands.
Adam Perry @ OpenAI ·
2026-06-26 15:07:05 -07:00 -
[codex] Support npm marketplace plugin sources (#29375)
## Why Marketplace source deserialization treated `{"source":"npm", ...}` as unsupported. The loader logged and skipped the entry, so npm-backed plugins never appeared in `plugin list --available` and `plugin add` returned "plugin not found". Codex plugins are installed from a plugin root, not from an npm dependency tree. For npm-backed marketplace entries, Codex should fetch the published package contents without running package scripts or installing unrelated dependencies. ## What changed - Add `npm` marketplace plugin sources with `package`, optional semver `version` or version range, and optional HTTPS `registry`. - Reject unsafe npm source fields before materialization, including invalid package names, non-semver version selectors, plaintext or credential-bearing registry URLs, and registry query/fragment data. - Materialize npm plugins with `npm pack --ignore-scripts`, then unpack the resulting tarball through the existing hardened plugin bundle extractor. - Enforce npm archive and extracted-size limits, require the standard npm `package/` archive root, and verify the extracted `package.json` name matches the requested package before installing. - Keep plugin listings, install-source descriptions, CLI JSON/human output, app-server v2 `PluginSource`, TUI source summaries, regenerated schema fixtures, and app-server documentation in sync. ## Impact Marketplaces can distribute Codex plugins from public or configured private HTTPS npm registries using the same install flow as existing materialized plugin sources. `npm` must be available on `PATH` when an npm-backed plugin is installed. Fixes #27831 ## Validation - `just write-app-server-schema` - `just test -p codex-core-plugins -p codex-app-server-protocol -p codex-app-server -p codex-cli` - npm/schema/core-plugin coverage passed in the run. - The full focused command finished with `1739 passed`, `11 failed`, and `6 timed out`; the failures were unrelated local app-server environment failures from `sandbox-exec: sandbox_apply: Operation not permitted` plus one missing `test_stdio_server` helper binary. - Installed an npm-published Codex plugin package through a throwaway local marketplace and throwaway `CODEX_HOME` to exercise the real npm materialization path end to end.charlesgong-openai ·
2026-06-26 17:24:46 -04:00 -
[codex] Classify nested MCP authentication startup errors (#30257)
## Summary - classify authentication-required RMCP startup failures, including errors nested inside `ClientInitializeError::TransportError` - let `codex-mcp` consume that classification so the existing `reauthenticationRequired` startup failure reason is emitted - add a regression test that performs real startup with an expired persisted OAuth token and no refresh token ## Why Follow-up to #29877. RMCP stores streamable HTTP initialization failures inside a dynamic transport error whose payload is not exposed through the standard Rust error source chain. The original `anyhow::Error::chain()` check therefore missed the nested `AuthError::AuthorizationRequired` seen during real MCP startup and emitted `failureReason: null`. The transport-specific inspection now lives in `codex-rmcp-client`, while `codex-mcp` consumes only the domain-level authentication-required result. This classifier does not distinguish first-time login from reauthentication; the existing auth-state logic remains responsible for that distinction. ## User impact When stored MCP OAuth credentials are expired and cannot be refreshed, app clients now receive `failureReason: "reauthenticationRequired"` on the failed startup update and can show the reconnect action. First-time login and unrelated startup failures remain unchanged. ## Validation - `just test -p codex-rmcp-client --test streamable_http_oauth_startup identifies_expired_unrefreshable_token_startup_error` - `just test -p codex-mcp startup_outcome_error_identifies_authentication_required` - `just test -p codex-mcp mcp_startup_failure_reason_requires_existing_oauth_and_auth_failure` - `cargo build -p codex-cli --bin codex` - local app-server probe emitted `failureReason: "reauthenticationRequired"` - manual end-to-end reconnect flow confirmed - `just fmt`
felixxia-oai ·
2026-06-26 14:11:13 -07:00 -
Close thread persistence when submission channel closes (#30173)
### Summary Release live thread persistence when a session ends because its submission channel closes. This prevents a later same-process resume from failing with `thread ... already has a live local writer`. ### Details The issue is in the `codex-core` session teardown path used by Codex hosts, rather than in Managed Agents API or exec-server itself. Explicit shutdown already closes the `LiveThread`, which releases the process-scoped writer held by `LocalThreadStore`. The submission-channel-close fallback ran runtime and extension teardown but skipped that persistence shutdown, leaving the thread ID registered as having a live writer. This change: - closes the `LiveThread` on the channel-close fallback path; - preserves the existing teardown order used by explicit shutdowns; - extends the lifecycle regression test to assert that the thread store receives `shutdown_thread`. Context: [original report](https://openai.slack.com/archives/C0B4NBHQGTV/p1782136364948039), [recent occurrence 1](https://openai.slack.com/archives/C0B4NBHQGTV/p1782434817895839?thread_ts=1782136364.948039&cid=C0B4NBHQGTV), [recent occurrence 2](https://openai.slack.com/archives/C0B4NBHQGTV/p1782335107474429?thread_ts=1782136364.948039&cid=C0B4NBHQGTV) ### Testing - `just test -p codex-core submission_loop_channel_close_runs_full_thread_teardown` - `just test -p codex-core --lib` (1,989 passed; 3 skipped) - `just fix -p codex-core` - `just fmt` - Native code review: no findings I also attempted `just test -p codex-core`. The new regression passed; 79 unrelated integration tests failed in the local harness, primarily because helper binaries such as `test_stdio_server` were unavailable, plus local proxy/shell timing failures.
Abdulrahman Alfozan ·
2026-06-26 13:56:17 -07:00 -
feat: add GPT-5.6 variants to Bedrock catalog (#30285)
## Summary - add Sol (`openai.gpt-5.6-sol`), Terra (`openai.gpt-5.6-terra`), and Luna (`openai.gpt-5.6-luna`) to the Amazon Bedrock static model catalog - derive all three entries from the bundled GPT-5.5 metadata and add the Bedrock-only `max` reasoning effort - keep the new entries below the current GPT-5.5 and GPT-5.4 models at priorities 2, 3, and 4, preserving GPT-5.5 as the default - add deep-equality coverage for inherited model configuration, catalog ordering, context windows, and service-tier behavior
Celia Chen ·
2026-06-26 20:32:49 +00:00 -
Let Codex consult user-level code-review-* skills. (#30143)
## Why I use the `$code-review` skill a lot and it'd be nice to add my own additional review criteria in `$CODEX_HOME/skills/code-review-*`. ## What Removes phrasing about "code-review-* skills in this repository" which in practice seems like enough to get Codex to consult my user-level code review skills in addition to the repo-level ones.
Adam Perry @ OpenAI ·
2026-06-26 12:36:40 -07:00 -
feat(app-server): add optional turn_id to thread/fork (#30277)
## Description This adds stable optional `turnId` support to `thread/fork`. When supplied, the fork copies persisted history through that terminal turn, inclusive, and drops later turns from the new thread. Omitting or passing `null` preserves the existing full-history fork behavior, including the interruption marker when the stored source history ends mid-turn. ## Why We're deprecating `thread/rollback` and this will help certain UX use cases work around it by using `thread/fork` + `turn_id` instead.
Owen Lin ·
2026-06-26 19:35:54 +00:00 -
ensure thread.history_mode is immutable (#30261)
## Description This PR makes `thread.history_mode` immutable after the thread's canonical first `SessionMeta` has been written. Later same-thread `SessionMeta` lines are compatibility metadata writes, not a new thread definition. Without this, an older binary could append a `SessionMeta` that omits `history_mode`; when a newer binary replays it, serde defaults that missing field to `legacy` and SQLite could downgrade a paginated thread. ## Why `history_mode` is the persisted thread storage contract. Paginated-thread fail-closed behavior and SQLite memory filtering depend on it staying aligned with canonical rollout metadata, especially when multiple Codex binary versions can touch the same local rollout. ## What changed - Stop generic rollout metadata replay from overwriting `history_mode` from later `SessionMeta` items. - Remove `history_mode` from `ThreadMetadataPatch`, so mutable metadata sync and app-server metadata updates cannot rewrite it. - When local metadata sync has to recreate a missing SQLite row, recover `history_mode` from the rollout's canonical first `SessionMeta` instead of from a mutable patch. - Keep the in-memory thread store using the created thread's canonical `history_mode` instead of metadata patches. - Fill the one remaining core test `CreateThreadParams` initializer with the new `history_mode` field; Bazel CI caught this after the parent history-mode PR landed. ## Validation - `just fmt` - `just test -p codex-thread-store` - `just test -p codex-state session_meta_does_not_set_model_or_reasoning_effort`
Owen Lin ·
2026-06-26 12:32:31 -07:00 -
[codex] Use managed defaults for TUI threads (#30147)
## Why #29683 exposes managed defaults for new-thread model settings through `configRequirements/read` without applying them server-wide. The TUI is an app-server client, so it should explicitly consume those defaults when it creates a fresh thread. This lets plain `codex` start on the managed model while preserving the existing ability to change model settings within the thread. ## What changed - Read `requirements.models.newThread` during TUI app-server bootstrap. - Apply the managed model, reasoning effort, and service tier to the initial fresh thread and subsequent `/new` or `/clear` threads. - Keep explicit launch overrides above the managed defaults. - Normalize the managed `fast` service tier to the `priority` request value. - Leave resumed and forked threads unchanged. The application logic lives in a small TUI-only module; app-server `thread/start` behavior remains unchanged for other clients. ## User experience - Plain `codex` starts with the managed new-thread settings. - A user can still change settings with `/model` or the existing service-tier controls. - Starting another fresh thread reapplies the managed defaults. - Explicit launch choices such as `codex -m <model>` continue to win. ## Validation - `just test -p codex-tui managed_new_thread_defaults` - `just fix -p codex-tui` Depends on #29683.
hefuc-oai ·
2026-06-26 19:27:31 +00:00 -
[codex] allow AGENTS.md and skills to authorize delegation (#30274)
Prompt update of MAv2 to include agents.md and skills more explicitly should mimic: https://github.com/openai/codex/pull/27919
Charles Du ·
2026-06-26 12:17:26 -07:00 -
Overlap executor skill reads with namespace discovery (#30225)
## Why Environment skill discovery needs two independent pieces of information: - plugin namespaces from `plugin.json` files; and - skill metadata from each `SKILL.md` file. Today these happen in sequence. Codex waits for every plugin namespace lookup to finish before it starts reading any skill files. On a remote executor, that creates an avoidable network-latency barrier. ```text before: walk -> namespace lookups -> skill reads -> build catalog after: walk -> namespace lookups ─┐ -> skill reads ───────┴-> build catalog ``` ## What changes - Read and parse skill files without waiting for plugin namespace discovery. - Resolve root and nested plugin namespaces concurrently. - Join both results only when constructing the final qualified skill names. - Keep the existing 64-skill concurrency bound, output ordering, warnings, metadata behavior, and namespace rules. ## Testing The regression test makes plugin manifest lookup wait until a `SKILL.md` read has started. The old serialized pipeline would time out; the new pipeline completes and still returns the correctly namespaced skill. `just test -p codex-core-skills` passes all 111 tests. ## Out of scope This does not add an exec-server endpoint, batch filesystem calls, or reduce the number of files transferred. A frontmatter-only read or server-side skill catalog can remain a separate follow-up if benchmarks show that transferred bytes are the next bottleneck.jif ·
2026-06-26 18:37:59 +00:00 -
[codex] Add managed new-thread model settings (#29683)
## Why Admins need persistent defaults for the model, reasoning effort, and service tier shown when the Desktop App creates a new thread. These are initialization defaults rather than runtime constraints: the App should use them to initialize its draft while still allowing a user to make an explicit selection. The app-server therefore needs to expose the managed values before thread creation without changing `thread/start` behavior for other clients. ## What changed - Parse `model`, `model_reasoning_effort`, and `service_tier` from `[models.new_thread]` in `requirements.toml`. - Compose the `models` requirements through the existing requirements-layer precedence rules. - Expose the resolved values through `configRequirements/read` as `requirements.models.newThread`. - Add the corresponding app-server protocol types and regenerate the JSON and TypeScript schema fixtures. - Document the new `configRequirements/read` fields in the app-server README. ## Scope This PR is data plumbing only. It does not apply these values during `thread/start` and does not change thread creation for existing app-server clients, resumed or forked sessions, internal or subagent sessions, `codex exec`, or the TUI. A companion Desktop App change owns draft initialization, sends the effective settings for ordinary and prewarmed starts, and preserves explicit user changes. ## Validation - Requirements deserialization coverage for `[models.new_thread]` - Requirements-layer precedence coverage - App-server API mapping coverage - `configRequirements/read` integration coverage - Regenerated app-server JSON and TypeScript schema fixtures
hefuc-oai ·
2026-06-26 18:37:40 +00:00 -
fix main (#30276)
Introduced by a merge race around thread.history_mode.
Owen Lin ·
2026-06-26 18:05:00 +00:00 -
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
Relax hooks.json top-level metadata validation (#30229)
## Summary - Allow a top-level `description` string in `hooks.json`. - Continue rejecting unknown top-level keys and root-level hook events; events must remain under `hooks`. ## Testing - `just test -p codex-config`
charlesgong-openai ·
2026-06-26 11:24:12 -04:00 -
[codex] narrow unused skills intro export (#29991)
## Summary - stop publicly re-exporting the internally used `SKILLS_INTRO_WITH_ALIASES` constant - keep the constant and all skills rendering behavior unchanged - preserve every integration helper, API, fixture, assertion, and module used by tests ## Scope guardrails This revision keeps all remote/network-facing functionality and every line introduced by `jif <jif@openai.com>`. Following the test-preservation audit, it also restores the in-process RMCP test transport, the original `codex-mcp` fixture, `PluginLoadOutcome::effective_skill_roots` and its assertions, the `EffectiveSkillRoots` API family, the test-only apps renderer, and the TUI dead-code annotation. Those files now match the PR base exactly. No test imports or directly references the remaining public skills export being narrowed. ## Validation - repository-wide test-reference audit: no test-used code remains deleted or narrowed - deleted-line `git blame` audit: zero Jif-authored deletions - `cargo test -p codex-core-plugins -p codex-mcp -p codex-rmcp-client --lib`: 467 passed - `cargo test -p codex-core --lib apps::render`: 2 passed - `cargo test -p codex-core-skills --lib render::tests`: 19 passed - `cargo check -p codex-core-skills --all-targets`: passed - `just fix -p codex-core-skills`: passed - `just fmt`: passed - `git diff --check`: passed The full local `codex-core-skills` suite passed 106/108 tests; two loader tests detected an ambient repository skills root outside the package and failed their isolation assertions. The scoped renderer suite and all-target compile pass, and CI runs in an isolated environment. Final code delta: 1 insertion, 2 deletions across 2 files.
Ahmed Ibrahim ·
2026-06-26 05:52:04 -07:00 -
Test selected capabilities across unavailable resume (#30215)
## Why The selected-capability integration test already covers initial attachment and cold resume, but it resumes while the selected executor is still reachable. That leaves an important World State transition untested: a thread remembers its selected capability root, resumes while that environment is unavailable, and later sees the same stable environment return. ## What this tests This extends the existing end-to-end scenario: ```text selected executor available ↓ app-server stops and the executor goes away ↓ thread resumes with the executor unavailable ↓ skills, selected MCP tools, and connector attribution are absent ↓ the same environment ID is attached again ↓ skills, MCP tools, and connector attribution return ``` The test also checks that the unavailable snapshot explicitly tells the model that no selected-environment skills are currently available. After reattachment, it invokes the selected skill again and verifies that a new executor-owned MCP process starts. ## Scope This is test-only. It keeps the existing assumption that an environment ID refers to stable capability contents. It does not add package-file invalidation or live transport reconnect behavior.jif ·
2026-06-26 11:02:27 +01:00 -
Reuse MCP runtimes when selected availability changes nothing (#30148)
## Why MCP runtime reuse was keyed by every ready selected-capability environment, even when an environment contributed no MCP servers or connectors. For example: 1. a global stdio MCP is running; 2. a selected remote environment contains only a skill; 3. that environment becomes ready; 4. the MCP and connector projection stays exactly the same; 5. Codex nevertheless rebuilds the MCP manager and restarts the global stdio process. That restart can interrupt active calls and discard process-local state even though nothing about MCP changed. ## What changes When selected-environment availability changes, Codex now resolves the candidate MCP and connector projection before deciding whether to replace the runtime: - if the winning MCP servers or their ownership change, rebuild as before; - if the selected connector snapshot changes, rebuild as before; - if an enabled MCP is explicitly bound to an environment whose availability changed, rebuild as before; - otherwise, keep the exact live manager and processes, and update only the availability input remembered by the snapshot. ```text ready selected environments: [] -> [skills-env] resolved MCP servers: {global_probe} -> {global_probe} resolved connectors: {} -> {} result: reuse manager; keep the same process ``` The comparison uses the resolved winning servers and their sources, so plugin/config ownership remains part of the runtime identity. ## Existing stack coverage The integration PR directly below this one already covers both rebuild boundaries: a selected MCP becomes callable and a selected connector tool becomes model-visible when their environment becomes available. It also verifies that an unchanged selected MCP runtime keeps its process. This PR does not add another remote-attachment integration scenario for the no-change optimization. `environment/add` returns before readiness, and app-server does not currently expose a deterministic readiness signal for an environment that contributes only skills. Keeping a fixed-delay test would add flake risk; adding a new readiness API would be outside this fix. ## Scope and assumptions - This does not change skill discovery, World State rendering, or plugin metadata caching. - This does not add file watching or hot reload behavior. - This does not change disconnect/reconnect handling. - Selected environment IDs and their capability contents retain the stack's existing stability assumption. - Delayed `required = true` executor MCP behavior remains out of scope.jif ·
2026-06-26 09:27:41 +01:00 -
[codex] fix CreateThreadParams test initializer (#30198)
## Summary - initialize `selected_capability_roots` in the new `attach_in_memory_thread_store` test helper - restore `codex-core` test compilation on `main` ## Root cause [#30144](https://github.com/openai/codex/pull/30144) added the helper from commit `0c3d0742`, whose parent was `c38b2e9b`. That branch was based before [#29856](https://github.com/openai/codex/pull/29856) added `selected_capability_roots` as a required field on `CreateThreadParams`. The PR's Rust and Bazel workflows both passed against the stale branch head `0c3d0742`. When #30144 was squashed onto newer `main`, its initializer was integrated alongside the required field from #29856, producing `E0063` in `core/src/session/tests.rs`. Because those workflows tested the branch head rather than the integrated merge result, they did not see the version-skew failure before merge. ## Impact Any job that compiles the `codex-core` library tests fails, which turned the main-branch `rust-ci-full` and `Bazel` workflows red across platforms and blocks unrelated focused core tests. This change only completes the test initializer; it does not alter production behavior or workflow configuration. ## Validation - `just fmt` - `just test -p codex-core turn_complete_flushes_terminal_event_after_delivery` (1 passed, 2909 skipped) - `git diff --check`
Adam Perry @ OpenAI ·
2026-06-26 08:47:27 +01:00 -
[codex] wire process-owned code mode host into core (#30142)
## Summary - add the `code_mode_host` feature flag and select `ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled - initialize code-mode sessions lazily so a missing host reports a tool error without failing thread startup - resolve `codex-code-mode-host` beside the running Codex binary by default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override - add unit and end-to-end coverage for host resolution and graceful missing-host behavior ## Why This wires the process-owned session client from #30112 into the core service behind an opt-in rollout gate. Packaged Codex installations can place the helper in the same `bin` directory as the main executable without relying on `PATH`, while development and custom installations can continue to override the helper path. ## Stack - Depends on #30112 - Base branch: `cconger/process-owned-session-runtime-4-client` ## Validation Build `codex` and `codex-code-mode-host` `CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host" ./target/debug/codex --enable code_mode_host`
Channing Conger ·
2026-06-26 00:23:33 -07:00 -
[codex] add process-owned code-mode session client (#30112)
## Summary - add `ProcessOwnedCodeModeSessionProvider` and logical session generation/rebinding state - add the supervised child-process connection, reader/writer tasks, and driver state machine - make dropped execute/wait/open callers cancellation-safe with explicit ownership handoff and durable cleanup - validate cell/delegate lifecycle state and reject invalid protocol transitions - add end-to-end stdio coverage for delegates, cancellation, frame limits, child loss, stale generations, replacement, and long-lived sessions ## Why This final stage exposes the process-owned client only after the wire protocol, host-safe runtime, and standalone host are independently in place. Transport failure is fail-stop: the client closes local state, cancels callbacks, reaps the child, and lazily rebuilds a fresh host generation rather than transactionally recovering the old connection. ## Stack This is **4 of 4** in the process-owned code-mode session stack. - Depends on #30111 - Full stack: #30108 → #30110 → #30111 → this PR ## Validation - `just test -p codex-code-mode -p codex-code-mode-host` — 86 passed - `just fix -p codex-code-mode` - `just fix -p codex-code-mode-host` - `just bazel-lock-update` - `just bazel-lock-check` - `bazel test //codex-rs/code-mode:code-mode-unit-tests //codex-rs/code-mode-host:code-mode-host-unit-tests //codex-rs/code-mode-host:code-mode-host-stdio-test //codex-rs/code-mode-protocol:code-mode-protocol-unit-tests` — 4/4 passed - `just fmt`
Channing Conger ·
2026-06-25 23:46:17 -07:00 -
Persist Cloudflare affinity cookies for MCP HTTP (#29516)
[Codex Thread 019ef1f9-36e2-7e91-9337-504f097b9dc1](https://codex-thread-link.openai.chatgpt-team.site/thread/019ef1f9-36e2-7e91-9337-504f097b9dc1) ## Why Hosted plugin-service Streamable HTTP MCP traffic uses `https://chatgpt.com/backend-api/ps/mcp` and depends on Cloudflare's `__cflb` cookie for load-balancer affinity. The local and exec-server `http/request` path built a fresh reqwest client for each request without installing Codex's existing shared ChatGPT Cloudflare cookie store, so affinity could be lost between calls. This is an affinity-hardening change motivated by an incident investigation. It does not establish the broader connector-cache incident RCA or claim to fix that incident in full. ## What changed - Install the existing process-local, strictly allowlisted ChatGPT Cloudflare cookie store on the reqwest client used by `ReqwestHttpClient`. - Fresh clients now share allowed Cloudflare infrastructure cookies within the process that originates the local or exec-server network request. - Keep the existing HTTPS ChatGPT-host and Cloudflare-cookie-name restrictions. This does not introduce a general cookie jar or send ChatGPT Cloudflare cookies to unrelated hosts. ## Test coverage - `codex-client` unit coverage verifies that the existing strict store accepts and returns `__cflb` for HTTPS ChatGPT URLs. - The exec-server HTTPS integration test sends four independent `http/request` calls through a local TLS-intercepting proxy and verifies that: - `Set-Cookie: __cflb=west` is sent on the next plugin-service request; - a later `Set-Cookie: __cflb=central` replaces the stored value; - non-Cloudflare session cookies are discarded; - no stored ChatGPT Cloudflare cookie is sent to a non-ChatGPT host. - `just test -p codex-client` — 38 passed. - `just test -p codex-exec-server --test chatgpt_cloudflare_affinity` — 1 passed. - `just bazel-lock-check` — passed. ## Non-goals - No persistence of ChatGPT auth, account, session, residency, or arbitrary cookies. - No cookie persistence for third-party MCP servers. - No special composition of caller-provided `Cookie` headers. - No plugin-service, connector-cache, Habitat/habicache, routing, redirect, or API-contract changes. - No broader incident RCA conclusions.
stevenlee-oai ·
2026-06-26 02:23:24 -04:00 -
Retry failed Codex Apps MCP startup (#29920)
## Problem The built-in Codex Apps MCP client shares a future for the full startup operation: connect, complete `initialize`, fetch the initial tools, and return a usable client. Sharing deduplicates startup work, but it also memoizes terminal errors. After a transient connection, handshake, or initial `tools/list` failure, later tool builds observe the same failed future. The thread cannot reconnect after the backend recovers and continues serving its startup-time cached tool snapshot, which may be empty or stale. ## Fix When Apps MCP startup ends in an error, Codex starts bounded recovery without putting startup latency on tool-router construction: 1. The current tool build immediately continues with the cached startup snapshot. 2. After the initial failure is reported, Codex starts one fresh full startup attempt in the background. 3. Concurrent tool builds share that in-flight attempt and also continue with cached tools. 4. On success, the recovered client becomes active, refreshes the Apps tools cache, emits a `Ready` startup status, and is reused by later operations. 5. On failure, the cache remains unchanged and later tool builds may start another background attempt after exponential cooldown: 1s, 2s, 4s, 8s, 16s, then 30s maximum. Each recreated startup performs a fresh MCP `initialize` and uncached `tools/list`. The MCP client retains its existing bounded retries for retryable `initialize` and `tools/list` failures. This avoids adding the Apps startup timeout to every request during a sustained outage. ## Scope This is limited to the built-in Codex Apps MCP client: - no reconnects for user-configured MCP servers; - no cache deletion; and - no proactive refresh for a healthy client with stale tools. ## Tests Coverage verifies: - tool builds return cached tools without waiting for a blocked reconnect; - concurrent tool builds start only one background reconnect; - failed reconnects preserve cached tools and respect exponential cooldown; - a recovered client is retained and reused; and - a long-lived thread exposes recovered app tools on a later follow-up. Validation: - `just test -p codex-mcp` — 95 passed - `just test -p codex-core later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures --no-capture` — passed - `just fix -p codex-mcp` - `just fmt`
kbazzi ·
2026-06-25 21:31:12 -07:00 -
[codex] fix terminal rollout event durability (#30144)
Currently session code does not flush the thread store after appending the `TurnComplete` / `TurnAborted` events. This isn't a problem in practice for local storage because append_items itself effectively blocks, but any thread stores that buffer in append_items and only commit on flush effectively never get these events persisted. The fix adds explicit rollout flushes at the terminal emitters after normal completion and interruption. Added test cases that assert the number of flushes when completing or aborting turns. These are admittedly a little brittle and I'm open to better ideas on how to add automated testing.
Tom ·
2026-06-25 21:01:11 -07:00 -
Test selected capabilities across availability and resume (#30157)
## Why This stack crosses World State, executor skills, selected plugin metadata, MCP processes, connectors, dynamic environments, and resume. This PR adds two end-to-end scenarios that validate those pieces together. Both tests enable `deferred_executor`, so they exercise the real delayed-environment path. ## Scenario 1: availability across turns and resume ```text 1. Start a thread with one selected plugin root bound to E1. 2. E1 is unavailable. - executor skill is absent - selected MCP is absent - connector has no selected-plugin attribution 3. Start E1 and register the same stable environment ID. 4. Start a new turn. - the executor skill appears through World State - its body beats a colliding host skill - the selected MCP tool is advertised and executes inside E1 - the connector is attributed to the selected plugin 5. Start another turn without changing E1. - the MCP PID stays the same, proving runtime reuse 6. Restart app-server and resume the thread. - durable selected-root intent is restored - skills, MCP, and connector attribution are restored - a new MCP PID proves ephemeral process state was rebuilt ``` ## Scenario 2: availability changes inside one turn ```text 1. Start a turn while E1 is unavailable. 2. The first model sample sees no executor skill, MCP, or selected connector. 3. The turn pauses on request_user_input. 4. Start E1 and register it while that same turn is still active. 5. Continue the turn. 6. The very next model sample sees: - the executor skill catalog - the selected MCP tool - selected-plugin connector attribution 7. The model calls the MCP, and its output proves execution happened inside E1. ``` This second scenario specifically protects the aeon-style behavior: capability state is captured again for every sampling step, not only at the next user turn. ## Scope These are integration tests only. They do not add a combinatorial matrix for unsupported plugin-file mutation, environment generations, transport disconnects, or delayed `required = true` executor MCPs.
jif ·
2026-06-26 03:11:55 +01:00 -
[codex] allow CCA image generation and web search extensions (#29909)
## Summary - allow the standalone image-generation and web-search extensions for the actor-authorized provider shape used by CCA - preserve builtin `image_generation` and `web_search` for older models and existing flows - keep ordinary non-OpenAI providers excluded from both extensions - remove only the image extension local managed-AuthManager requirement that CCA cannot satisfy - share actor-authorization detection through `ModelProviderInfo` - keep Core tests focused on routing behavior and cover header-shape edge cases in `model-provider-info` - add a Responses Lite regression that verifies both `image_gen.imagegen` and `web.run` ## Why CCA uses a provider named `local` with `requires_openai_auth: false` and a non-empty `x-openai-actor-authorization` header. Core accepts that provider shape, but both extension provider-name gates rejected it; image generation additionally required a Codex-managed login. The standalone paths must coexist with existing builtin tools. New Responses Lite models can receive `image_gen.imagegen` and `web.run`, while older models continue using builtin tools. ## Impact This enables both standalone extensions for CCA once installed downstream, without removing or changing builtin-tool compatibility for older models. ## Validation - `just test -p codex-core responses_lite_exposes_standalone_tools_for_actor_authorized_provider` - `just test -p codex-core responses_lite_uses_standalone_web_search_and_image_generation` - `just test -p codex-core hosted_tools_follow_provider_auth_model_and_config_gates` - `just test -p codex-image-generation-extension` - `just test -p codex-web-search-extension` - `just test -p codex-model-provider-info` - `just fmt` - `git diff --check`
Won Park ·
2026-06-25 18:34:35 -07:00 -
Expose MCP app identity in app context (#29934)
## Why MCP tool-call events need to expose trusted app identity and action metadata directly so v2 clients do not have to infer it from tool names or resource URIs. ## What changed - Add optional `appName`, `templateId`, and `actionName` fields to MCP tool-call `appContext`. - Populate `appName` and `templateId` from trusted Codex Apps metadata, and derive `actionName` from the trusted app resource metadata. - Preserve all three fields through core events, legacy protocol events, persisted thread history, resume redaction, and app-server v2 responses. - Document the public `appContext` fields in `codex-rs/app-server/README.md`. - Regenerate app-server JSON and TypeScript schemas and add coverage for serialization, persistence, redaction, and metadata propagation. ## Validation - `just test -p codex-app-server-protocol mcp_tool_call` - `just test -p codex-core mcp_tool_call_item_metadata_only_trusts_codex_apps_identity mcp_tool_call_item_includes_app_identity` - `just write-app-server-schema` --------- Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
Martin Au-Yeung ·
2026-06-25 18:31:10 -07:00 -
Keep MCP elicitation routable across runtime refreshes (#30127)
## Why An MCP tool call can still be waiting for an elicitation response when an environment update replaces the thread's MCP runtime. Before this change: ```text runtime A starts a tool call and asks the user environment becomes ready, so runtime B is published client answers the prompt through runtime B runtime B cannot find runtime A's pending responder ``` The response is lost and the original tool call stays blocked. ## What changed All MCP runtimes for one thread now share a small elicitation router: ```text runtime A ---\ shared router: response token -> exact pending responder runtime B ---/ ``` When Codex surfaces an MCP elicitation, it assigns a unique opaque response token. The router records which pending request owns that token. A replacement runtime reuses the same router, so the latest runtime can deliver a response to a request started by the previous runtime. The Codex-owned token also prevents two runtime connections that reuse the same MCP server request ID from receiving each other's responses. This does not retain or search old MCP managers. Only the pending responder map is shared. ## Covered scenario The integration test exercises the complete failure mode: 1. A thread starts while its selected environment is still unavailable. 2. A configured MCP server starts a tool call and asks the client for input. 3. The environment becomes ready, causing Codex to publish a replacement MCP runtime. 4. The client answers the original prompt after the replacement. 5. The original tool call receives that answer and completes. A focused routing test also creates two runtimes with the same server request ID and verifies that each response reaches the exact request that emitted its token. ## Scope This PR changes only elicitation response routing across MCP runtime replacement. It does not change when runtimes are rebuilt, which environments contribute MCP configuration, or how environment availability is detected.jif ·
2026-06-26 01:28:14 +00:00 -
Reinject missing World State fragments on resume (#30152)
## Why World State restores its structured snapshot on resume so unchanged sections do not have to be rendered again. That is safe only when the model-visible fragment represented by the snapshot is still present in retained history. For selected executor skills, the failing selected-capability scenario exposed this state: ```text persisted World State: selected skill catalog is known retained model history: selected skill catalog message is missing next diff: unchanged, so emit nothing ``` The model resumes without being told about the selected skill catalog. ## What changed World State contributions may now optionally describe the concrete model-visible fragment that must remain in retained history. When a persisted snapshot is present: ```text matching retained fragment exists -> trust snapshot, emit nothing matching retained fragment missing -> treat section as absent, render current state once ``` The skills extension uses this for non-empty selected-environment catalogs by matching its exact rendered catalog body. Empty or hidden catalogs do not require a fragment. ## Scope This does not clear or rebuild the whole World State baseline. It does not change skill discovery, cache invalidation, environment availability, or MCP runtime behavior. It only keeps a persisted section snapshot and its retained model context consistent across resume/history reconstruction. ## Coverage A focused World State regression test verifies both sides: - a missing retained fragment is rendered again - a matching retained fragment avoids duplicate injection
jif ·
2026-06-26 02:18:00 +01:00 -
[codex] Attribute app-server analytics by thread originator (#29935)
## Why Desktop Work threads and regular Codex threads can share the same app-server connection. App-server analytics currently copy `product_client_id` from connection metadata for every thread-scoped event, so Work thread activity is attributed to the Desktop connection instead of the thread's resolved originator. This prevents analytics from distinguishing the two products on a shared connection. ## What changed - Publish the resolved originator after a thread is materialized, covering new, resumed, forked, and subagent threads. - Store that originator in the analytics reducer's existing per-thread state. - Override only `app_server_client.product_client_id` for thread, turn, tool, review, goal, guardian, and compaction events while preserving the connection's client name, version, and transport metadata. - Fall back to the connection-wide product client ID when a thread has no originator override. - Preserve persisted originators in thread initialization analytics for resume and fork flows. ## Validation - `just test -p codex-analytics thread_originator_overrides_shared_connection_across_thread_events subagent_events_keep_thread_originator_with_explicit_turn_connection` - `just test -p codex-app-server turn_start_tracks_thread_originator_in_analytics thread_start_tracks_thread_initialized_analytics thread_fork_tracks_thread_initialized_analytics thread_resume_tracks_thread_initialized_analytics` - `just test -p codex-core thread_manager`
alexsong-oai ·
2026-06-25 18:15:48 -07:00