mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
841f30598c05eebb62139bdd75abef1b7b61d5f7
7870 Commits
-
[codex] Attribute app-server analytics by thread originator (#29935)
## Why Desktop Work threads and regular Codex threads can share the same app-server connection. App-server analytics currently copy `product_client_id` from connection metadata for every thread-scoped event, so Work thread activity is attributed to the Desktop connection instead of the thread's resolved originator. This prevents analytics from distinguishing the two products on a shared connection. ## What changed - Publish the resolved originator after a thread is materialized, covering new, resumed, forked, and subagent threads. - Store that originator in the analytics reducer's existing per-thread state. - Override only `app_server_client.product_client_id` for thread, turn, tool, review, goal, guardian, and compaction events while preserving the connection's client name, version, and transport metadata. - Fall back to the connection-wide product client ID when a thread has no originator override. - Preserve persisted originators in thread initialization analytics for resume and fork flows. ## Validation - `just test -p codex-analytics thread_originator_overrides_shared_connection_across_thread_events subagent_events_keep_thread_originator_with_explicit_turn_connection` - `just test -p codex-app-server turn_start_tracks_thread_originator_in_analytics thread_start_tracks_thread_initialized_analytics thread_fork_tracks_thread_initialized_analytics thread_resume_tracks_thread_initialized_analytics` - `just test -p codex-core thread_manager`
alexsong-oai ·
2026-06-25 18:15:48 -07:00 -
[codex] implement standalone code-mode process host (#30111)
## Summary - implement the standalone `codex-code-mode-host` stdio service - route sessions, cells, delegate requests, responses, and cancellation through a bounded host peer - supervise request, writer, cell-forwarding, actor, and V8 failure boundaries - bound request/session tombstones and fail-stop the connection on invalid protocol state - add host-only duplex protocol tests and local Cargo/Bazel run recipes ## Why This stage makes the host process independently runnable and reviewable before exposing any remote client in Codex. Transport or runtime failure closes the connection and relies on process replacement rather than transactional recovery. ## Stack This is **3 of 4** in the process-owned code-mode session stack. - Depends on #30110 - The final client PR targets this branch ## Validation - `just test -p codex-code-mode-host` — 7 host-only tests passed - `just fix -p codex-code-mode-host` - `just bazel-lock-update` - `just bazel-lock-check` - `just fmt`
Channing Conger ·
2026-06-25 18:00:39 -07:00 -
Reuse walk inventory for environment skill metadata (#30145)
## Why Environment skill discovery already asks the executor to run one `fs/walk`. That response contains every regular file path found under the selected root, including any `agents/openai.yaml` files. Today Core keeps the discovered `SKILL.md` paths but discards the rest of that file inventory. It then sends one `fs/getMetadata` request per skill just to ask whether `agents/openai.yaml` exists. A root with 66 skills and no metadata therefore pays for 66 unnecessary network round trips. ## What changes - Keep the `fs/walk` file and directory inventory for the duration of the scan. - Associate each discovered `SKILL.md` with metadata that is known present, known absent, or still requires a fallback probe. - Read a known `agents/openai.yaml` directly instead of statting it first. - Skip the metadata request entirely when a complete walk shows that the skill has no `agents` directory. - Read a known `SKILL.md` and `agents/openai.yaml` concurrently. - Keep parsing and validation in `core-skills`. The inventory is scan-local. This does not add another cache or change cache lifetime. ## Network impact For a complete scan of 66 valid skills with no `agents/openai.yaml`, and one root `.codex-plugin/plugin.json`: | Operation | Current | After this PR | | --- | ---: | ---: | | `fs/walk` | 1 | 1 | | Read `SKILL.md` | 66 | 66 | | Stat `agents/openai.yaml` | 66 | 0 | | Read `agents/openai.yaml` | 0 | 0 | | Stat plugin manifest | 1 | 1 | | Read plugin manifest | 1 | 1 | | **Total executor RPCs** | **135** | **69** | This removes exactly 66 request/response exchanges from the common cold scan. Warm scans remain at zero discovery RPCs because the thread-level executor catalog cache is unchanged. When metadata exists, each file still requires one read. This PR removes only the preceding existence check; it does not batch file contents into a new RPC. ## Correctness fallbacks Absence is trusted only when the walk is complete and the metadata directory was not present. Core keeps the existing `getMetadata` fallback when: - the walk was truncated; - the walk reported an error; or - an `agents` directory was observed but `openai.yaml` was not, which preserves support for file symlinks and traversal boundaries. ## Deliberate scope This PR changes only the environment skill loader and its existing filesystem-call regression coverage. It does not: - change `fs/walk` or any exec-server protocol; - add `readFiles` or a skills-list endpoint; - change thread caching; - change local skill discovery; - change exec-server request concurrency; or - optimize plugin-manifest lookup. The plugin-manifest stat is intentionally left in place, which is why this PR reaches 69 calls rather than the broader 68-call estimate. That lookup has separate alternate-path, ancestor, and symlink semantics and should not be mixed into this change.
jif ·
2026-06-26 01:47:00 +01:00 -
Project selected plugin runtime by environment availability (#30093)
## Why Selected plugin metadata is stable, but MCP processes are live runtime state. They need different lifetimes: - the MCP extension caches manifest, MCP, and connector declarations for each stable selected root; - each model step projects that cached metadata through the roots that resolved as ready for that exact step; - the MCP manager is rebuilt only when that availability projection changes. This matches executor skills: both features consume the same resolved step roots instead of inferring readiness from the turn's selected environments. ## Behavior ```text E1 not ready for this step -> no E1 MCP servers or connectors -> cached plugin metadata stays in ext/mcp E1 becomes ready -> reuse cached metadata -> publish one MCP runtime containing E1 capabilities same ready roots on the next step -> reuse the exact runtime; no rediscovery and no MCP restart resume -> create new extension thread state and a new MCP runtime ``` All model-facing consumers use the same step snapshot: ```text resolved selected roots | v extension MCP/connector projection | v { MCP config, connector snapshot, MCP manager } | +-> advertise model tools +-> build app/connector tools +-> execute MCP calls ``` ## Cache contract The existing MCP extension owns a cache keyed by the full `SelectedCapabilityRoot`: ```rust let state = thread_store.get_or_init(SelectedExecutorPluginMcpState::default); ``` The cache lives with extension thread state. Environment availability filters projection but does not invalidate metadata. Resume creates new thread state. There is no file watcher or executor generation because contents behind a stable environment/root are assumed stable. ## What changes - Keeps executor plugin discovery and cached metadata in `ext/mcp`. - Caches MCP and connector declarations together per selected root. - Uses the step's already-resolved capability roots, including lazy environments that are not turn environments. - Reuses the current MCP runtime when the ready-root projection is unchanged. - Uses the same step MCP manager and connector snapshot for model-visible tools and execution. - Resolves direct thread-scoped MCP requests from the current selected-root projection. ## Deliberately out of scope - `app/list` remains based on the latest global host-plugin state; this PR does not make its response or notifications thread-specific. - `required = true` startup semantics do not apply to delayed executor MCP activation. - No filesystem/content invalidation. - No transport-disconnect watcher. - No executor generations or environment replacement semantics. - No client sharing across complete manager replacements. ## Stack 1. Extension-owned World State sections. 2. Project executor skills through World State. 3. Pin one MCP runtime to each model step. 4. **This PR:** project selected MCP and connector state from extension-owned metadata. 5. Integration coverage for selected capability availability and resume. ## Verification - `selected_plugin_servers_use_managed_requirements_for_the_selected_root_id` - The stacked integration PR covers unavailable to ready activation, unchanged-runtime reuse, skills, MCP tools, connector attribution, and cold resume.jif ·
2026-06-26 01:36:44 +01:00 -
ci: narrow Windows test skips (#30134)
## Why The Windows cross-build skip used the broad `powershell` substring, which hid unrelated Windows tests. Narrowing it exposed the same ConPTY Ctrl-C timeout that is breaking `main`; that test is not reliable in either cross-built or native Windows Bazel CI yet. ## What changed - scope the cross-build PowerShell carve-out to the dedicated parser-process test module - exclude the exact ConPTY Ctrl-C test from Bazel CI while leaving local Windows runs enabled - repeat the exact exclusion in the cross-build config because it replaces the base skip list ## Manual validation - `just test-github-scripts` - queried the PTY test target under both `ci-windows` and `ci-windows-cross` - verified the matcher excludes parser-process and ConPTY tests without excluding unrelated PowerShell tests - [Windows shard 4/4](https://github.com/openai/codex/actions/runs/28204844286/job/83553063859) reproduced the `main` ConPTY timeout before the exact CI-only exclusion was applied
Adam Perry @ OpenAI ·
2026-06-25 17:01:43 -07:00 -
Pin MCP runtimes to model steps (#30101)
## Why An MCP refresh can replace the session's current manager while a model step is still running. The step must execute calls through the same manager whose tools it advertised. ## Boundary ```text current session MCP runtime | | capture once for this model step v StepContext.mcp - exact MCP config - exact connection manager - exact runtime environment context ``` ```rust pub struct McpRuntimeSnapshot { config: Arc<McpConfig>, manager: Arc<McpConnectionManager>, runtime_context: McpRuntimeContext, } ``` ## Example ```text step A captures runtime A and advertises A's tools refresh publishes runtime B step A tool call -> runtime A next step -> runtime B ``` Capturing the snapshot is only an `Arc` clone. It does not restart MCPs or make an RPC. ## What changes - Captures one MCP runtime in `StepContext`. - Uses it for tool planning, tool calls, resources, approvals, connector attribution, and elicitation. - Publishes replacement runtimes atomically. - Lets an old runtime live only while an in-flight step or request still holds its `Arc`. Most of this diff is mechanical routing from the session-global manager to `step_context.mcp`; it does not introduce selected-plugin discovery yet. ## What does not change - No plugin or extension migration. - No new MCP cache policy. - No environment file watching. - No client sharing between separate managers. ## Stack 1. Extension-owned World State sections. 2. Project executor skills through World State. 3. **This PR:** pin one MCP runtime to each model step. 4. Project selected MCP/app/connector metadata by environment availability. 5. One end-to-end integration scenario.jif ·
2026-06-26 00:53:07 +01:00 -
[codex] Propagate traces through exec-server HTTP (#30117)
Fixes distributed trace continuity across exec-server JSON-RPC HTTP egress by adding an executor client span and injecting its W3C context through a reusable `codex-otel` helper. This preserves the caller trace across core/tool → executor → provider/MCP instead of dropping parentage at raw reqwest. Note that this doesn't include the websocket path, which is needed to really get the full story but at least we cover the basic http path with this change.
Tom ·
2026-06-25 23:22:22 +00:00 -
Project executor skills through World State (#30088)
## Why A selected executor environment can be unavailable in one model step and ready in the next. The model should see its skills only while that environment is ready, without rescanning stable files on every sample. The product assumption is simple: - an environment ID names one stable logical environment; - the selected root contents do not change during the thread. ## Behavior ```text E1 unavailable -> do not show E1 skills E1 ready -> discover once, cache, show through World State E1 unavailable -> hide skills, keep cache E1 ready again -> reuse cache, show skills again resume -> create a new thread cache and discover again ``` The cache key is the full `SelectedCapabilityRoot`. Availability does not invalidate it; dropping the extension's thread state does. The step supplies the ready selected roots directly. They do not have to be turn environments: ```text turn environment: laptop selected root: worker:/plugins/lint-fix worker ready -> lint-fix skills are visible ``` ## What changes - Keeps executor skill catalogs in the existing skills extension. - Passes the roots resolved as ready for the step into World State contributors. - Loads each ready selected root at most once per thread. - Contributes the executor catalog as the `skills` World State section. - Uses the exact step catalog for explicit skill selection and body reads. - Leaves host and orchestrator skill behavior where it already lives. Taking a step snapshot itself does not add an RPC. Executor filesystem calls happen only on the first discovery of a stable root for that thread. ## What does not change - No filesystem watcher or content-based invalidation. - No retry/generation framework. - No skill runtime migration into core. - No general rewrite of the skills extension. ## Stack 1. Extension-owned World State sections. 2. **This PR:** project cached executor skills through World State. 3. Pin one MCP runtime to each model step. 4. Project selected MCP/app/connector metadata by environment availability. 5. One end-to-end integration scenario.
jif ·
2026-06-26 00:13:43 +01:00 -
[codex] add code-mode host failure supervision hooks (#30110)
## Why A process host should be discarded and rebuilt after critical actor or V8 failure, while the existing in-process production path must keep its current cell-error semantics. This change establishes that failure boundary without adding the host process or remote client. ## What changed - add optional task-failure supervision to the transport-neutral code-mode session runtime - report Tokio cell-actor failures and V8 runtime-thread panics to a host-provided fail-stop handler - preserve the existing handler-less in-process behavior - make host-owned cell ID allocation fail before numeric wraparound ## Follow-up The V8 panic signal surfaced here should also be consumed by the `InProcessCodeModeSession` manager in a future change so it can fail the affected cell. This PR intentionally leaves the handler-less in-process behavior unchanged while putting the required panic tracking in place. ## Stack This is **2 of 4** in the process-owned code-mode session stack. - #30108 is merged into `main` - The next PR targets this branch ## Validation - `just test -p codex-code-mode` — 53 passed - `just argument-comment-lint -p codex-code-mode` - `just fix -p codex-code-mode`
Channing Conger ·
2026-06-25 15:33:58 -07:00 -
Recognize Work web and mobile thread originators (#29988)
## Summary - recognize `codex_work_web` and `codex_work_mobile` as supported `thread/start.serviceName` values - use the recognized value as the thread-scoped originator, with the same persistence and request propagation added for `codex_work_desktop` - cover precedence over persisted and inherited originators This is the Codex consumer for the service names introduced by [openai/openai#1073178](https://github.com/openai/openai/pull/1073178). ## Rollout / Compatibility The producer is ChatGPT's app-server integration in openai/openai#1073178. This PR is the Codex app-server consumer that converts those service names into the outgoing per-thread `originator`. Until this change is deployed, the new service names are ignored and Codex continues using its fallback originator. Deploy this mapper and the matching codex-backend compatibility change in [openai/openai#1073594](https://github.com/openai/openai/pull/1073594) while the existing Flora egress overwrite remains in place. Remove that overwrite in [openai/openai#1073197](https://github.com/openai/openai/pull/1073197) only after both consumers are deployed. ## Validation - `just test -p codex-core effective_originator_prefers_thread_scoped_sources_before_env_originator` - `just fix -p codex-core` - `just fmt`
chiam-oai ·
2026-06-25 15:30:26 -07:00 -
[codex] Surface MCP reauthentication-required startup failures (#29877)
## Summary - distinguish expired, non-refreshable stored MCP OAuth credentials from first-time missing credentials - carry a typed `failureReason: "reauthenticationRequired"` on the existing `mcpServer/startupStatus/updated` notification only when user action is required - keep the public MCP auth-status API unchanged and regenerate the app-server protocol schemas and documentation ## Why An MCP server with an expired access token and no usable refresh token currently fails startup without giving clients a reliable, typed recovery signal. The existing startup-status notification is the natural place to carry this state. Its nullable `failureReason` keeps the recovery reason attached to the failed startup transition without adding a one-off notification. Internally, Codex distinguishes first-time login from reauthentication and emits the reason only when the startup error itself requires authentication. ## User impact App clients can prompt an existing user to reconnect an MCP server when automatic recovery is impossible by handling a failed `mcpServer/startupStatus/updated` notification whose `failureReason` is `reauthenticationRequired`. Starting, ready, cancelled, unrelated failures, and first-time setup carry no reauthentication reason. ## Companion app PR - openai/openai#1069582 ## Validation - `just test -p codex-app-server-protocol` — 248 passed; schema fixture tests passed - `cargo check -p codex-app-server -p codex-tui` - `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped - `just test -p codex-protocol -p codex-app-server-protocol -p codex-mcp` — 579 passed - `just write-app-server-schema` - `just fmt`
felixxia-oai ·
2026-06-25 21:50:36 +00:00 -
fix(app-server): suppress TUI rollback warning (#30124)
## Why The TUI uses `thread/rollback` internally for user-facing flows such as prompt cancellation/backtracking. After `thread/rollback` was marked deprecated, those internal calls started surfacing `deprecationNotice` messages in the TUI, even though the user did not explicitly call the deprecated app-server API. The endpoint should remain deprecated for external app-server clients, but the built-in `codex-tui` client should not show this implementation-detail warning during normal interaction. ## What changed - Pass the initialized app-server client name into the `thread/rollback` request processor. - Suppress the `thread/rollback` deprecation notice only for `codex-tui`. - Preserve the existing `deprecationNotice` behavior for non-TUI clients. - Add regression coverage for the `codex-tui` suppression path. ## How to Test 1. Start Codex TUI from this branch. 2. Type text into the composer and press `Esc` to cancel/backtrack. 3. Confirm the TUI restores/cancels the prompt without showing `thread/rollback is deprecated and will be removed soon`. 4. Also verify an external app-server client that calls `thread/rollback` still receives `deprecationNotice`. Targeted tests: - `just test -p codex-app-server thread_rollback` - `just argument-comment-lint`
Felipe Coury ·
2026-06-25 18:44:35 -03:00 -
Let extensions contribute World State sections (#30100)
## Why #29856 already owns the durable thread intent and exact environment binding. This PR adds only the small missing extension boundary: an extension can contribute one named World State section, while core still owns persistence, diffing, and model-visible fragment types. This lets skills stay in the skills extension instead of moving their runtime into core. ## Shape ```text extension-owned state | | contribute section id + JSON snapshot + renderer v core World State | | compare with the previous snapshot v no message, or one incremental model-visible update ``` The extension API is deliberately small: ```rust fn contribute_world_state(...) -> Vec<WorldStateSectionContribution> ``` Core adapts the rendered result to `ContextualUserFragment`, records the snapshot, and keeps the existing compaction/resume behavior. ## What changes - Adds extension-owned World State section contributions. - Calls those contributors from the existing per-step World State builder. - Restores durable selected capability roots into extension thread state on resume. - Keeps the actual model-context fragment and rollout machinery in core. ## What does not change - No skill or MCP implementation moves out of its extension. - No new file watcher, generation, or RPC. - No generic migration of existing World State sections. - No change to the stable environment-ID assumption from #29856. ## Example ```text step 1 snapshot: skills = [] step 2 snapshot: skills = [executor-demo:deploy] core asks the skills extension to render only that change. ``` ## Stack 1. **This PR:** let extensions contribute World State sections. 2. Project executor skills through the skills extension. 3. Pin one MCP runtime to each model step. 4. Project selected MCP/app/connector metadata by environment availability. 5. One end-to-end integration scenario.
jif ·
2026-06-25 22:23:51 +01:00 -
[codex] Add managed MCP server matchers (#29648)
## Summary This PR extends the existing managed `mcp_servers` identity requirement so that one name-qualified rule can use either: - the released exact command or URL identity; - an exact stdio executable with an exact-length, ordered argument matcher list; or - a direct MCP URL matcher. Matcher-based rules stay under the released `identity` key and use the same `McpServerRequirement` abstraction and `mcp_servers.<server_name>` namespace. ## Behavior Policy activation and name qualification are unchanged: - If `mcp_servers` is absent, ordinary configured MCP servers remain unrestricted. - If `mcp_servers` is present, a server needs a matching same-name requirement. - `mcp_servers = {}` continues to deny every configured MCP server. - Existing exact identity requirements keep their released semantics. Plugin-bundled MCP servers use the same requirement shapes under `plugins.<plugin_name>.mcp_servers.<server_name>`. Top-level non-empty rules continue to govern only ordinary configured servers; plugin rules remain explicitly plugin-scoped. The existing globally empty `mcp_servers = {}` plugin kill switch is preserved. Requirements layers continue to use the existing regular TOML merge behavior. Atomic replacement of named MCP requirements is intentionally out of scope here and is tracked independently in #30118. ## Requirement contract The released exact identity contract remains valid: ```toml [mcp_servers.docs.identity] command = "codex-mcp" [mcp_servers.remote.identity] url = "https://example.com/mcp" ``` Command identities continue to check only `command`; they do not inspect arguments, `cwd`, `env`, or `env_vars`. A command matcher uses an exact executable plus an exact-length, ordered argument list. Each argument position supports `exact`, `prefix`, or full-value `regex` matching: ```toml [mcp_servers.internal_mcp_proxy.identity] command = { executable = "company-cli", args = [ { match = "exact", value = "mcp" }, { match = "exact", value = "proxy" }, { match = "exact", value = "--server" }, { match = "regex", expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?::443)?(?:/.*)?$' }, ] } ``` Direct streamable HTTP MCP definitions can use the same value matcher types through `identity.url`: ```toml [mcp_servers.internal_http.identity] url = { match = "regex", expression = '^https://[A-Za-z0-9-]+\.mcp\.internal\.example\.com(?:/.*)?$', } ``` Plugin-bundled MCP matchers use the same contract inside the plugin-qualified allowlist: ```toml [plugins."sample@test".mcp_servers.internal_mcp_proxy.identity] command = { executable = "company-cli", args = [ { match = "exact", value = "mcp" }, { match = "exact", value = "proxy" }, ] } ``` Regexes are validated while managed requirements are loaded, and regex matching must cover the complete value. Command matchers constrain only the executable and arguments. ## Why Enterprise administrators need to allow MCP servers by executable and positional-argument shape, including fixed arguments plus constrained values such as internal MCP URLs passed to a proxy. ## Validation - `just fmt` - `git diff --check` - `just test -p codex-config` (198 passed) - `just test -p codex-core mcp_servers_by_matchers --lib` (2 passed)felixxia-oai ·
2026-06-25 22:15:50 +01:00 -
release: consume standalone zsh artifacts (#30116)
## Why Once #30114 publishes zsh independently, regular Rust releases should reuse that protected, versioned artifact set instead of rebuilding identical zsh binaries for every Codex version. Keeping the zsh release tag explicit in the workflow also makes future artifact upgrades deliberate and easy to review. This PR assumes the first standalone artifact release will be published as `codex-zsh-v0.1.0` before this change lands. ## What changed - Added `CODEX_ZSH_RELEASE_TAG` near the top of `.github/workflows/rust-release.yml`, initially pinned to `codex-zsh-v0.1.0`. - Download the standalone release’s generated `codex-zsh` DotSlash manifest before assembling Linux and macOS Codex packages. - Added a `--zsh-manifest` package-builder override so release packaging fetches the matching target archive and verifies the size and SHA-256 digest recorded in that manifest. - Removed the reusable zsh build job from regular Rust releases. - Stopped copying zsh archives into each Rust release and stopped regenerating a zsh DotSlash manifest there. Windows packaging remains unchanged because the patched zsh resource is only shipped for supported Unix targets. ## Testing - Added package-helper coverage that supplies a standalone manifest override and verifies the extracted zsh bytes. - Ran the `scripts/codex_package` unit test suite. - Validated `.github/scripts/build-codex-package-archive.sh` with `bash -n`.
Michael Bolin ·
2026-06-25 14:05:49 -07:00 -
release: publish standalone zsh artifacts (#30114)
## Why The patched zsh artifacts rarely change, but `.github/workflows/rust-release-zsh.yml` currently runs as part of every Rust release. Rebuilding the same four binaries for each Codex version wastes release capacity and ties an independently versioned runtime dependency to the main release cadence. This establishes the producer side of a build-once flow. The existing Rust release workflow remains unchanged until the first standalone artifact release has been published and the checked-in DotSlash manifests can be updated with its URLs and checksums. ## What changed - Run the zsh release workflow for protected `codex-zsh-vX.Y.Z` tags instead of as a reusable workflow. - Validate the semantic release tag before starting the platform builds. - Publish the four zsh archives to a GitHub prerelease so the release never becomes the repository latest release. - Publish the generated `codex-zsh` DotSlash manifest alongside the archives. - Document how to publish the next artifact version after changing the pinned zsh commit or patch. ## Tag protection An active repository tag ruleset named `codex-zsh-v*.*.*` targets `refs/tags/codex-zsh-v*.*.*`. It restricts tag creation, updates, deletion, and non-fast-forward changes; requires linear history; and limits bypass to the configured repository role. This was verified with: ```shell gh api repos/openai/codex/rulesets/18140982 ``` The response reported `"enforcement":"active"`, the expected tag condition, and the `creation`, `update`, `deletion`, `non_fast_forward`, and `required_linear_history` rules. ## Rollout After this lands, publish the first `codex-zsh-vX.Y.Z` release. A follow-up can then update the checked-in DotSlash manifests and remove the zsh rebuild from `.github/workflows/rust-release.yml`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/30114). * #30116 * __->__ #30114
Michael Bolin ·
2026-06-25 13:56:08 -07:00 -
feat(core, mcp): cache codex_apps tools in memory (#29003)
## Description This makes Codex Apps tool reads use a shared in-memory snapshot instead of rereading the disk cache every time `list_all_tools()` runs. Disk still seeds the cache on startup and gets updated after successful fetches, but it is no longer the live read path. The core change is that `McpManager` now owns a process-scoped `CodexAppsToolsCache`. Codex threads in the same app-server process now share this Codex Apps in-memory tools snapshot. The snapshot is keyed by the Codex home plus the Codex Apps identity: the active Codex auth user/workspace and the effective Codex Apps MCP source config. There's already code to hard-refresh the cache, so we respect it in this PR. ## Local benchmark I ran a local steady-state microbenchmark of the exact repeated Codex Apps cached-tools read this PR removes, using the same real local cache payload in both trees: `3,678,138` bytes and `381` tools. The cache file was already warm in the OS page cache, so this measures same-process reread/deserialization work rather than cold-disk latency or full turn latency. Each run is 25 iterations (mimicking a turn that makes 25 inference calls). | Version | Run 1 | Run 2 | Avg | |---|---:|---:|---:| | `origin/main` disk read + JSON deserialize + `filter_tools` | `50.755 ms` | `52.894 ms` | `51.825 ms` | | This branch in-memory `current_tools` + `filter_tools` | `0.740 ms` | `0.778 ms` | `0.759 ms` | That removes about `51 ms` from each repeated Codex Apps cached-tools read on this machine, roughly `68x` faster for that subpath. It is useful evidence for the hot path this PR changes, but not a claim that every production turn gets `51 ms` faster; end-to-end impact also depends on the rest of `list_all_tools()` and tool-payload construction. This is on my M2 Max macbook, so with a slower disk this would be much worse (and indeed we did see this really blew up turn runtime with a slow disk).
Owen Lin ·
2026-06-25 20:54:48 +00:00 -
[codex] poll external clock during sleep (#30113)
## Summary - make the external app-server time provider establish sleep deadlines using `currentTime/read` - poll the external clock once per second and complete `clock.sleep` when the deadline is reached - keep the system-clock timer and existing steer/agent-message interruption behavior unchanged ## Why This lets training control `clock.sleep` through its existing external simulated clock without adding separate sleep/wake protocol methods. ## Testing - `just fmt` - `just test -p codex-app-server external_sleep_polls_current_time_and_emits_items`
rka-oai ·
2026-06-25 13:46:42 -07:00 -
[codex] Observe remote exec-server lifecycle (#27470)
## Summary - Record bounded duration and outcome metrics for remote environment registration and Noise rendezvous connection attempts. - Count reconnects by bounded reason: disconnect, connection failure, or rejected registration. - Trace registration at the owning client boundary without exporting raw environment or registration identifiers. - Replace the stale pre-Noise WebSocket observability design with the current remote transport model. ## Stack Review and land this stack in order: 1. #27466 — trace exec-server JSON-RPC requests 2. #27467 — record bounded connection, request, and process lifecycle metrics 3. #27470 — observe remote registration and Noise rendezvous lifecycle **(this PR)** ## Validation - `just test -p codex-exec-server --lib` (149 passed) - `just test -p codex-cli --test exec_server` (4 passed) - `just argument-comment-lint` - `just bazel-lock-check` - `just fix -p codex-exec-server -p codex-cli` - `just fmt`
richardopenai ·
2026-06-25 13:42:40 -07:00 -
[codex] extend code-mode host IPC transport (#30108)
## Summary - add an `EncodedFrame` type so IPC payloads are serialized and size-checked before entering bounded queues - add the V1 `operation/cancel` client-to-host message - pin the new wire shape with protocol tests ## Why The process-owned code-mode host needs bounded, pre-encoded outbound messages and a best-effort cancellation signal. Keeping these wire primitives in a protocol-only change lets their compatibility contract be reviewed independently from either endpoint. ## Stack This is **1 of 4** in the process-owned code-mode session stack. The next PR targets this branch. ## Validation - `just test -p codex-code-mode-protocol` — 22 passed - `just fix -p codex-code-mode-protocol` - `just fmt`
Channing Conger ·
2026-06-25 13:26:47 -07:00 -
[codex] impl delivery_mode: current time reminders on response boundaries (#30033)
## Summary - track user-like input and tool-output boundaries in current-time reminder state - gate reminder injection when delivery_mode is after_user_or_tool_output - preserve interval debounce and forced reminders after context-window changes ## Why Training can request reminders only after user or tool-output items while keeping the existing canonical pre-inference history-injection path. ## Validation - just test -p codex-core current_time_reminders_can_follow_only_user_or_tool_outputs - just test -p codex-core current_time_reminders_follow_time_interval_and_persist_in_history - just test -p codex-core current_time_reminder_is_refreshed_after_compaction - just fix -p codex-core
rka-oai ·
2026-06-25 19:28:50 +00:00 -
[codex] Retry temporarily offline exec-server recovery (#30098)
## Summary - retry ERS `409 environment_offline` responses inside the existing exec-server recovery loop - keep all other registry conflicts terminal - add focused coverage for both cases ## Root cause When an exec server disconnects and reconnects, the client already starts recovery and calls ERS `/connect`. During the transient executor presence gap, ERS can return `409 environment_offline`. The retry classifier treated every 409 as terminal, so the first response aborted the existing 25-second recovery window before the executor came back online. That then caused active processes to be marked lost. This change classifies only the structured `environment_offline` conflict as retryable. Recovery continues with the existing bounded deadline, exponential backoff, and jitter. ## Validation - `just test -p codex-exec-server client::recovery::tests` — 4 passed - `just fix -p codex-exec-server` — passed - `just fmt` — passed - Full `just test -p codex-exec-server` reached unrelated macOS filesystem-sandbox integration failures because nested `/usr/bin/sandbox-exec` is denied in this environment (`sandbox_apply: Operation not permitted`).
richardopenai ·
2026-06-25 19:25:04 +00:00 -
[codex] add current time reminder delivery mode config (#30031)
```python delivery_mode = "any_inference" # default delivery_mode = "after_user_or_tool_output" # new mode ``` ## Validation - just test -p codex-core load_config_resolves_current_time_reminder - just test -p codex-core lock_contains_prompts_and_materializes_features
rka-oai ·
2026-06-25 19:06:43 +00:00 -
core: expose permission profile to shell tools (#29941)
## tl;dr Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name of the current permission profile when invoking a shell tool. ## Why Shell tool owners may need to launch nested commands under the same named permission profile, including through `codex sandbox -P PROFILE --include-managed-config`. Until now, child processes could observe sandbox and network metadata but could not identify the active named permission profile. The `--include-managed-config` flag is essential when a helper reconstructs the sandbox from a profile name: it ensures the nested sandbox also loads managed enterprise requirements. Without it, using the inherited profile could unintentionally create a sandbox that does not enforce the organization's managed restrictions. The new environment value is intentionally informational and **must not be treated as trusted input**. Any process in the ancestry can overwrite an environment variable, so a consumer that passes this value to `codex sandbox -P` must first validate it against the profiles that helper is authorized to use. ## Example Use Case Suppose an organization provides a trusted `remote-bash` wrapper that lets Codex run a command on an approved build host. The local shell command uses the named `:workspace` permission profile: ```toml default_permissions = ":workspace" ``` The command exposed to the model is a small zsh wrapper. It deliberately delegates with `exec`, preserving the original arguments and process environment: ```zsh #!/usr/bin/env zsh exec /opt/codex-tools/remote_bash.py "$@" ``` The model invokes the public wrapper, not its Python implementation: ```sh /opt/codex-tools/remote-bash \ --host builder.example.com \ -- printf '%s' 'hello world' ``` Only the inner implementation is authorized to escape the local sandbox: ```starlark prefix_rule( pattern=["/opt/codex-tools/remote_bash.py"], decision="allow", ) ``` With zsh-fork, execution begins with `remote-bash` inside the `:workspace` sandbox. When the wrapper calls `exec`, the exact prefix rule matches `remote_bash.py`, so that inner script is restarted unsandboxed. The escalated process inherits: ```text CODEX_PERMISSION_PROFILE=:workspace ``` Inheritance does not make the value trustworthy. `remote_bash.py` independently allowlists both the remote host and the permission profile before using either value. In particular, a forged value such as `:danger-full-access` is rejected before it can reach `codex sandbox -P`: ```python import argparse import os import shlex import sys ALLOWED_HOSTS = {"builder.example.com"} ALLOWED_PROFILES = {":workspace"} parser = argparse.ArgumentParser() parser.add_argument("--host", required=True) separator = sys.argv.index("--") args = parser.parse_args(sys.argv[1:separator]) command = sys.argv[separator + 1:] if args.host not in ALLOWED_HOSTS: parser.error("host is not allowlisted") if not command: parser.error("the remote command must not be empty") profile = os.environ.get("CODEX_PERMISSION_PROFILE") if not profile: raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty") if profile not in ALLOWED_PROFILES: raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted") remote_command = shlex.join(command) sandbox_command = shlex.join([ "codex", "sandbox", "-P", profile, "--include-managed-config", "--", "bash", "-lc", remote_command, ]) print(shlex.join(["ssh", args.host, sandbox_command])) ``` This builds each command layer as an argument vector and uses `shlex.join()` at the boundary, rather than interpolating untrusted shell text. After validation and parsing, the nested command has this structure: ```text ssh argv: ["ssh", "builder.example.com", SANDBOX_COMMAND] SANDBOX_COMMAND argv: ["codex", "sandbox", "-P", ":workspace", "--include-managed-config", "--", "bash", "-lc", "printf %s 'hello world'"] bash -lc payload argv: ["printf", "%s", "hello world"] ``` A production implementation could execute that SSH command. The integration fixture prints it and parses the result back into arguments, verifying the complete flow: ```text model invokes outer wrapper -> zsh-fork starts wrapper under :workspace -> wrapper execs allowlisted Python script -> prefix rule restarts Python script unsandboxed -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace -> Python script verifies :workspace is allowlisted -> remote command runs codex sandbox -P :workspace with --include-managed-config -> nested sandbox honors managed enterprise requirements ``` This gives the trusted helper access to resources outside the local sandbox—such as SSH credentials—while ensuring that it can select only an explicitly authorized profile and that work on the remote host remains subject to the organization's managed requirements. ## What changed - Inject `CODEX_PERMISSION_PROFILE` after shell environment policy evaluation so the active profile wins over inherited or configured stale values. - Apply the variable to both `shell_command` and unified `exec_command`, including local, zsh-fork, and remote exec-server paths. - Remove stale values when the session has no active named profile. - Preserve the current profile value when loading a shell snapshot so a parent snapshot cannot restore an older profile. ## Testing - Added classic-shell integration coverage proving an exact prefix rule can run a `require_escalated` script outside the `:workspace` sandbox while preserving `CODEX_PERMISSION_PROFILE=:workspace`. - Added zsh-fork integration coverage in which the model invokes an outer zsh wrapper, an inner allowlisted `remote_bash.py` runs unsandboxed, and its printed SSH command reconstructs the inherited `:workspace` sandbox with `--include-managed-config` while preserving every argument after `--`. - The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and validates it against `ALLOWED_PROFILES` before constructing the nested command. - Assert that the reconstructed sandbox command includes `--include-managed-config` so nested use of the inherited profile cannot bypass managed enterprise requirements. - Added coverage for overriding and removing stale profile values. - Verified `shell_command` receives the selected active profile. - Added shell snapshot coverage using `printenv CODEX_PERMISSION_PROFILE`.Michael Bolin ·
2026-06-25 19:00:23 +00:00 -
[codex] current time reminder interval to be set to 0 (#30029)
A zero interval lets callers request a reminder at every otherwise-eligible inference boundary. ## Validation - just test -p codex-core load_config_resolves_current_time_reminder
rka-oai ·
2026-06-25 18:30:53 +00:00 -
cli: rename sandbox permission profile flag (#30095)
## Why `codex sandbox` accepts a single named permissions profile, so the existing plural `--permissions-profile` spelling is misleading. The canonical flag and its help text should use the singular form without breaking scripts that already use the old spelling. ## What changed - Make `--permission-profile` the canonical flag for all sandbox backends. - Keep `--permissions-profile` as a hidden backwards-compatible alias. - Cover the canonical spelling, legacy alias, and help visibility with regression tests. ## Testing Ran `just c sandbox --help` and verified I saw: ```shell -P, --permission-profile <NAME> Named permissions profile to apply from the active configuration stack ```Michael Bolin ·
2026-06-25 11:25:19 -07:00 -
feat: add provider-aware model fallback to thread start (#29942)
## Why Helper threads such as task title generation can request a model ID that is valid for the default OpenAI provider but unavailable from the active provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the provider static catalog exposes Bedrock model IDs such as `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background 404s and can surface a misleading turn error even when the main turn succeeds. Clients need an explicit way to ask app-server to resolve an unavailable helper model to the active provider default. That fallback must remain limited to providers with an authoritative static catalog so custom or dynamically discovered model IDs are not rewritten based on an incomplete catalog. Fixes #28741. ## What changed - Add the experimental `allowProviderModelFallback` option to `thread/start`, defaulting to `false` to preserve existing behavior. - Thread the option through thread creation and model selection. - When enabled for a static model manager, preserve requested models present in the catalog and replace unavailable models with the provider default. - Continue preserving explicit model IDs for dynamic model managers without fetching a catalog solely to validate them. - Document the new `thread/start` behavior in the app-server API overview. ## Test Temporary test-client harness: ``` ThreadStartParams { model: Some("gpt-5.4-mini".to_string()), allow_provider_model_fallback: true, ..Default::default() } ``` Command: ``` CODEX_HOME=/tmp/codex-bedrock-thread-start-home \ CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \ ./target/debug/codex-app-server-test-client \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ send-message-v2 --experimental-api ignored ``` Relevant output: ``` > "method": "thread/start", > "params": { > "model": "gpt-5.4-mini", > "modelProvider": null, > "allowProviderModelFallback": true, > ... > } < "result": { < "model": "openai.gpt-5.5", < "modelProvider": "amazon-bedrock", < ... < } ```
Celia Chen ·
2026-06-25 18:24:34 +00:00 -
[codex] Record exec-server lifecycle metrics (#27467)
## Summary - Record bounded connection, request, and process lifecycle metrics. - Report active gauges from callbacks on every collection, including delta exports. - Serialize active-count updates so concurrent starts and finishes cannot publish stale values. - Serialize process exit, explicit termination, and shutdown through the process registry so exactly one completion result wins. - Keep the implementation small with single-owner RAII guards and one real OTLP/HTTP integration test using the existing `wiremock` dependency. ## Root cause Process exit and session shutdown previously used cloned completion state. That avoided duplicate emission, but it duplicated lifecycle ownership and made the ordering harder to reason about. The process registry mutex already defines the lifecycle ordering, so the final implementation stores the metric guard and termination flag directly on the process entry. Whichever path claims the entry first owns the completion result. Production metric export uses delta temporality. Event-only synchronous gauge recordings disappear after the next collection when no count changes, so active counts now use observable callbacks that report current state on every collection. The cleanup also removes the constant `result="accepted"` connection tag, redundant route and response assertions, a custom HTTP collector, and fallback initialization machinery that did not add behavior. ## Stack Review and land this stack in order: 1. #27466 — trace exec-server JSON-RPC requests 2. #27467 — record bounded connection, request, and process lifecycle metrics **(this PR)** 3. #27470 — observe remote registration and Noise rendezvous lifecycle ## Validation - `just test -p codex-exec-server --lib` (158 passed) - `just test -p codex-cli --test exec_server` (3 passed) - `just test -p codex-otel observable_gauge_is_collected_on_every_delta_snapshot` (1 passed) - `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server` - `just fmt` - `git diff --check`
richardopenai ·
2026-06-25 11:02:11 -07:00 -
Persist selected capability roots and resolve availability per model step (#29856)
## Why `selectedCapabilityRoots` is durable thread intent: “use this capability root from environment `worker`.” The important product assumption is: > One environment ID always names the same logical executor and stable contents. `worker` does not silently change from executor A to an unrelated executor B. The process-local connection handle for `worker` can still be replaced while Codex is running, though, for example when `environment/add` registers a fresh handle for the same logical environment. The thread should persist only the stable selection. Each model step should pair that selection with the exact ready handle captured for that step. ## The boundary ```text persisted thread intent plugin@1 -> environment "worker" | | capture the current step v model-step view unavailable, or plugin@1 + worker's exact captured ready handle ``` The environment ID is the stable identity and cache key. The `Arc<Environment>` is only a process-local handle retained so consumers of one model step use the same captured environment. It is never persisted and it does not imply different environment contents. ## What changes ### Persist the stable selection Selected roots are written into `SessionMeta` and restored with the thread. Forked subagents inherit the same selections, including bounded-history forks. Only stable data is persisted: root ID, environment ID, and root path. ### Capture readiness together with the exact handle The environment snapshot records: ```rust environment_id -> Some(Arc<Environment>) // ready in this step environment_id -> None // still starting in this step ``` This prevents readiness and execution from coming from different registry snapshots. For example: ```text step snapshot: worker -> handle A, ready environment/add: worker -> fresh handle B for the same logical environment current step: plugin@1 still uses captured handle A ``` Without carrying handle A in the snapshot, the resolver could combine “A was ready” with handle B and treat B as ready before it had finished starting. This does not change cache invalidation. Stable capability metadata remains identified by environment ID and capability root. Replacing a process-local handle under the same stable environment ID does not invalidate or rediscover that metadata. ### Resolve availability per model step - A ready captured environment produces resolved roots using its captured handle. - A starting, missing, or failed environment is omitted from that step. - A selected lazy environment that is outside the turn's captured environment set is asked to start, and a later step can observe it as ready. - No capability files are scanned here. Transient transport disconnects remain the remote client's reconnect concern. This PR models initial attachment/readiness; it does not add live socket-connectivity state. ## Example ```text thread selection: plugin@1 -> environment "worker" step 1: worker is starting -> plugin@1 unavailable step 2: worker is ready -> plugin@1 resolves through worker's captured handle step 3: fresh local handle -> current step remains pinned; a later step captures its own view ``` Temporary unavailability does not discard the durable selection. Later PRs can retain stable metadata caches while projecting only currently available capabilities into model-visible World State. ## Compatibility The app-server request shape does not change. Older rollouts without `selected_capability_roots` deserialize to an empty list. ## Stack 1. **This PR:** persist stable selected roots and resolve them through an exact model-step handle. 2. #29960: cache stable skill metadata and project available skills into World State. 3. #29946: cache stable plugin declarations and manage the separate live MCP runtime.jif ·
2026-06-25 17:49:43 +00:00 -
chore(app-server): mark thread/rollback as deprecated (#29928)
We will drop support for this in the near future due to the complexity it introduces.
Owen Lin ·
2026-06-25 17:15:46 +00:00 -
Test executor-routed MCP OAuth token exchange (#29656)
## Why #28529 proves OAuth discovery uses the selected executor, but its end-to-end test stops before the callback and token exchange. ## What changed - add an executor-only mock token endpoint - complete the OAuth callback using the authorization URL's `state` and `redirect_uri` - assert the PKCE token exchange reaches the executor-only endpoint - assert the completion notification reports the selected thread and succeeds Depends on #28529.
jif ·
2026-06-25 09:45:20 +00:00 -
Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
## Why #28522 routes selected-plugin HTTP MCP traffic through the owning executor, but OAuth bootstrap and refresh still used host-local clients. Executor-only servers therefore cannot complete discovery or login through the same network boundary as the MCP connection. ## What changed - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient` contract - let RMCP own discovery, dynamic registration, PKCE, token exchange, and refresh - route auth status, persisted-token startup, and app-server login through the server runtime while preserving the existing local discovery path - add optional `threadId` to `mcpServer/oauth/login` and echo it in the completion notification - implement RMCP's redirect policy and 1 MiB OAuth response limit over executor HTTP - cover selected-thread OAuth discovery and login through an executor-only route Depends on #28522.
jif ·
2026-06-25 10:31:17 +01:00 -
Support HTTP MCP servers from selected executor plugins (#28522)
## Why Selected executor plugins can declare both stdio and Streamable HTTP MCP servers, but only stdio registrations were retained. That silently drops part of the plugin's tool surface and prevents HTTP traffic from using the owning executor's network. ## What changed - retain selected-plugin Streamable HTTP MCP declarations alongside stdio declarations - route their HTTP clients through the owning executor environment - preserve local auth-header environment references while rejecting them for executor-hosted declarations - cover thread isolation, refresh, and an executor-only HTTP route end to end
jif ·
2026-06-25 10:10:36 +01:00 -
Parallelize environment skill loading (#29990)
## Why Avoid a request waterfall for loading lots of skills at once by hiding latency in concurrent tasks. ## What changed Poll the per-skill parse futures concurrently with an order-preserving stream capped at 64 in-flight loads. Results retain discovery order, and the existing filtering, warnings, and final catalog sorting are unchanged.
Adam Perry @ OpenAI ·
2026-06-25 10:02:07 +01:00 -
core: reconcile legacy WorldState sections (#29997)
## Why Older rollouts can retain model-visible context for a WorldState section without having a persisted snapshot for that section. Treating the missing snapshot as definitely absent can duplicate old context or fail to tell the model that it was replaced or removed. This provides a generic migration path for sections moving into WorldState, beginning with AGENTS.md. Builds on #29810. ## What changed - distinguish section state that is absent, known from a persisted snapshot, or unknown because matching legacy context remains in history - let WorldState sections identify their own legacy fragments while `ContextManager` owns history reconciliation and baseline persistence - make AGENTS.md emit one conservative replacement or removal update for legacy history, then deduplicate from the newly persisted baseline - preserve existing environment rendering when persisted section data is missing or malformed ## Testing - `just test -p codex-core world_state` - `just test -p codex-core cold_resume_invalidates_deleted_legacy_agents_md_once -- --exact`
sayan-oai ·
2026-06-25 07:03:52 +00:00 -
core: make AGENTS.md react to environment changes (#29810)
## Why With deferred executors, a turn can begin before a remote environment attaches. AGENTS.md discovery previously ran only during session setup, so instructions from a later environment never reached the model or the session instruction sources. WorldState persistence has now landed, so this uses the durable model-visible baseline directly instead of carrying a temporary resume/fork compatibility path. ## What - Add an `AgentsMdManager` in `SessionServices` to own host instructions, loaded state, and refresh caching. - When `DeferredExecutor` is enabled, refresh AGENTS.md when attached environment selections change and freeze the result in the corresponding `StepContext`. - Represent AGENTS.md as a persisted WorldState section for every session, with bounded initial, replacement, and removal updates. - Remove duplicate AGENTS.md state and rendering from `SessionConfiguration` and `TurnContext`. - Build initial context, per-request updates, and compaction context from the same step-scoped value. - On resume and fork, compare current instructions with the restored WorldState baseline and inject a replacement exactly once when they differ. Builds on #29833, #29835, and #29837. ## Tests - Covers a remote environment becoming ready mid-turn, with AGENTS.md appearing on the next request exactly once and updating canonical instruction sources. - Covers full, unchanged, replaced, and removed AGENTS.md WorldState rendering. - Covers changed instructions across cold resume and fork without duplicate reinjection. - Covers remote-v2 compaction retaining creation-time instructions in the live session and cold resume appending one replacement when the source changed. - Ran focused `codex-core` AGENTS.md, WorldState, and context-update test suites.
sayan-oai ·
2026-06-24 22:57:42 -07:00 -
feat: use run agent task auth for inference (#19051)
## Stack This is PR 3 of the simplified HAI single-run-task stack: - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity assertion and task-registration primitives, including the shared run-task helper used by existing Agent Identity JWT auth. - [#19049](https://github.com/openai/codex/pull/19049) Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted Agent Identity runtime auth and its single run task. - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped provider auth that uses one backend-owned task id for first-party inference and compaction requests. [#19054](https://github.com/openai/codex/pull/19054) collapsed out of the active stack because the simplified design no longer needs a separate background/control-plane task helper. ## Summary This PR moves Agent Identity usage into provider auth resolution. That keeps `AgentAssertion` auth tied to first-party OpenAI provider requests instead of applying a late session-wide override that could affect local, custom, Bedrock, API-key, or external-bearer providers. What changed: - adds a small `ProviderAuthScope` struct carrying the run auth policy and session source needed by provider-scoped auth resolution - lets `Session` opt the existing `ModelClient` into `ChatGptAuth` policy when `use_agent_identity` is enabled, without adding a second model-client constructor - resolves Agent Identity only for first-party OpenAI provider auth paths - uses the persisted run task id from the `AgentIdentityAuth` record to build `AgentAssertion` auth for Responses requests - routes shared request setup through scoped provider auth so unary compact requests use the same run-task assertion path as inference turns - keeps local/custom/Bedrock/env-key/external-bearer provider auth unchanged - lets missing run-task state surface through the existing model-request error path instead of silently falling back to bearer auth This PR intentionally does not create thread-scoped, target-scoped, or background-scoped task identities. The run task is the only task Codex registers in this POC shape. ## Testing - `just test -p codex-model-provider` - `just test -p codex-core client::tests::provider_auth_scope_uses` - `just test -p codex-core remote_compact_uses_agent_identity_assertion`
Adrian ·
2026-06-24 22:31:41 -07:00 -
[codex] route sleep through time providers (#29973)
## Summary - add a cancellable sleep operation to `TimeProvider` - route `clock.sleep` through the configured provider - extend the supported sleep duration to 12 hours - complete the sleep turn item before propagating provider failures ## Why This isolates the core clock abstraction needed by external clock integrations. Existing system and app-server behavior remains wall-clock based in this PR; the stacked follow-up supplies app-server sleeps from an external clock.
rka-oai ·
2026-06-24 22:17:43 -07:00 -
core: raise token budget message limits (#29970)
## Why Token-budget reminder and guidance messages can require more than 1,000 bytes to provide useful model-facing instructions. At the same time, these strings are injected into model-visible context, so their size must remain tightly bounded in response to the P0 context-growth concern. A 2,000-byte runtime cap provides additional room without allowing the substantially larger context growth of a 4 KiB limit. ## What changed - raises the runtime byte limits for token-budget reminder templates and guidance messages from 1,000 to 2,000 - raises the corresponding JSON Schema `maxLength` values to 2,000 - regenerates `codex-rs/core/config.schema.json` ## Testing - `just test -p codex-features` - `just test -p codex-core load_config_resolves_token_budget_config load_config_rejects_invalid_token_budget_reminder_template` The full `codex-core` test run completed 2,858 tests successfully and encountered seven unrelated environment-sensitive failures involving Seatbelt/network environment assertions, MCP capability setup, and abort timing.
Michael Bolin ·
2026-06-25 05:05:32 +00:00 -
Report MCP error codes with server attribution (#29969)
## Why MCP error-code telemetry special-cased Codex Apps: its reported error codes were retained, while codes from every other MCP server were replaced with `unknown`. Error reporting should behave consistently for every MCP server. The server name already identifies where an error came from, so telemetry does not need a separate Codex Apps classification. This follows up on [#28976](https://github.com/openai/codex/pull/28976), which introduced MCP error-code telemetry. ## What changed - Add the MCP server name to call, duration, and error metrics. - Retain bounded, sanitized tool error codes from every MCP server. - Remove `McpErrorCodeSource` and the Codex Apps ownership lookup from telemetry collection. - Use the same metric-tagging path for blocked, rejected, and executed MCP calls. ## Test plan - Verify the complete metric tag set includes the sanitized MCP server name. - Verify error codes from ordinary MCP servers are retained, bounded, and sanitized. - Preserve coverage for request failures, tool-result failures, nested auth failures, and span attributes.
Ahmed Ibrahim ·
2026-06-24 21:08:39 -07:00 -
[3/3] core: replay persisted world state (#29837)
## Why Persisting `WorldState` snapshots and patches is only useful if resume and fork restore that exact comparison baseline. Rebuilding it from `TurnContextItem` loses section state and can either repeat or suppress model-visible updates. This is the third PR in the WorldState persistence stack, built on #29835. ## What - Replay full WorldState snapshots and RFC 7386 patches through the existing rollout reconstruction segments. - Discard state from rolled-back turns and treat compaction as a baseline reset. - Hydrate `ContextManager` from the reconstructed snapshot on resume and fork. - Remove the synthetic `TurnContextItem` to WorldState conversion path. - Leave legacy or malformed rollouts without a baseline so the next update safely emits a full snapshot. ## Testing - `just test -p codex-core world_state` - `just test -p codex-core rollout_reconstruction_tests` - `just fix -p codex-core` - `just test -p codex-core` *(the changed tests passed; the full run also hit unrelated existing/test-environment failures, primarily a missing `test_stdio_server` binary)*
sayan-oai ·
2026-06-25 03:32:08 +00:00 -
[codex] Add Ultra reasoning effort (#29899)
## Why Ultra should be one user-facing reasoning selection for work that benefits from both maximum reasoning and proactive multi-agent delegation. Without it, clients must coordinate maximum reasoning with the experimental `multiAgentMode` setting, even though the inference backend still expects its existing `max` effort value. This change makes reasoning effort the source of truth: clients select `ultra`, core derives proactive multi-agent behavior when the turn is eligible for multi-agent V2, and inference requests continue to use the backend-compatible `max` value. ## What changed - Add `ultra` as a first-class reasoning effort and preserve model-catalog ordering when exposing it to clients. - Convert `ultra` to `max` at the inference request boundary, including Responses HTTP/WebSocket requests, startup prewarm, compaction, and memory summarization. - Derive effective multi-agent mode per turn from effective reasoning effort: - eligible multi-agent V2 + `ultra` → `proactive` - eligible multi-agent V2 + any other effort → `explicitRequestOnly` - V1 or otherwise ineligible sessions → no multi-agent mode instruction - Keep the derived effective mode in turn context history so successive turns can emit a developer-message update only when the effective mode changes. - Remove selected multi-agent mode from core session configuration, turn construction, thread settings, resume/fork restoration, and subagent spawn plumbing. Subagents inherit reasoning effort and derive their own effective mode. - Retain the experimental app-server `multiAgentMode` fields for wire compatibility while marking them deprecated. Request values are accepted but ignored; compatibility response fields report `explicitRequestOnly`. - Display Ultra in the TUI using the order supplied by `model/list`. ## Validation - `just test -p codex-core ultra_reasoning_uses_max_for_requests` - `just test -p codex-tui model_reasoning_selection_popup`
Shijie Rao ·
2026-06-24 20:13:52 -07:00 -
[2/3] core: persist world state in rollouts (#29835)
## Why `WorldState` currently remembers its model-visible diff baseline only in memory. That leaves no durable source for restoring the exact baseline after resume, fork, rollback, or compaction. This is the second PR in the WorldState persistence stack, built on #29833 and following #29249. It records durable state transitions; the next PR will replay them during rollout reconstruction. ## What - Add a `world_state` rollout item containing either a full snapshot or an RFC 7386 JSON Merge Patch. - Persist a full snapshot after initial context and after compaction establishes a new context window. - Persist non-empty patches when later sampling steps or turns advance the WorldState baseline. - Write model-visible history before its matching WorldState record, so an interrupted write can only cause a safe repeated update on replay. - Preserve WorldState records for full-history forks while excluding them from thread previews, metadata, and app-server history materialization. Older binaries read rollout lines independently, so they skip the unknown `world_state` records while retaining the rest of the thread. ## Testing - `just test -p codex-core snapshot_merge_patch_changes_and_removes_nested_values` - `just test -p codex-core world_state_baseline_deduplicates_until_history_is_replaced` - `just test -p codex-core deferred_executor_compaction_preserves_then_updates_environment_once` - `just test -p codex-protocol` - `just test -p codex-rollout` - `just test -p codex-state` - `just test -p codex-thread-store` - `just test -p codex-app-server-protocol`
sayan-oai ·
2026-06-24 20:13:49 -07:00 -
[codex] Populate remote plugin local versions (#29956)
# What - Carry installed remote release versions through remote plugin summaries as `localVersion`. - Keep the app-server mapping a pure adapter by populating that value in the remote catalog layer. # Why Remote plugin summaries always returned `localVersion: null` even after their versioned bundles had been installed locally. Consumers such as scheduled-task template discovery use `localVersion` to resolve a plugin's materialized root, so templates from remote curated plugins were silently skipped.
Abhinav ·
2026-06-25 03:13:03 +00:00 -
code-mode: define process host wire protocol (#29804)
## Why The process-owned code mode implementation needs an explicit, bounded wire contract before either side depends on it. Keeping framing and message semantics in `codex-code-mode-protocol` gives the client and sidecar one shared source of truth and makes compatibility failures detectable during connection setup. ## What changed - adds a versioned client/host handshake with required and optional capabilities - defines operation requests and responses for session lifecycle and cell control - defines reverse delegate request, response, cancellation, and cell-closure messages - adds a four-byte little-endian length-prefixed JSON codec with a hard frame cap - rejects malformed frames, unknown fields, invalid identifiers, and unsupported protocol states - locks the wire representation down with explicit JSON round-trip tests ## Testing - `just test -p codex-code-mode-protocol` ## Stack Part 1 of 6. Followed by [#29805](https://github.com/openai/codex/pull/29805).
Channing Conger ·
2026-06-24 20:03:22 -07:00 -
Represent MCP authentication with an enum (#29924)
## Why MCP authentication has distinct OAuth and ChatGPT-session flows. Representing that choice as `use_chatgpt_auth` makes one flow implicit and allows the configuration model to express the distinction only through a boolean. ChatGPT credential forwarding also needs a first-party trust boundary. A configurable `chatgpt_base_url` controls routing, but must not grant an MCP server permission to receive session credentials. This change builds on #29733, where the boolean was introduced. ## What changed - Replace `use_chatgpt_auth` with an `auth` field backed by the exhaustive `McpServerAuth` enum. - Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining the default. - Trust only the origin derived from the existing hardcoded `CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server. - Keep configured bearer tokens and authorization headers ahead of the selected authentication flow. - Update config writers, schema output, fixtures, and integration-test setup to use the enum. ## Verification Integration coverage exercises the complete streamable HTTP startup path in two independent configurations: - A directly constructed MCP configuration verifies that matching an overridden `chatgpt_base_url` does not grant ChatGPT auth. - A persisted `config.toml` containing an attacker-controlled `chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary through normal config parsing. Both tests complete MCP initialization and tool listing and assert that the full captured request sequence contains no authorization headers. Separate integration coverage verifies that configured authorization takes precedence over ChatGPT auth.
Ahmed Ibrahim ·
2026-06-24 19:51:51 -07:00 -
Eric Traut ·
2026-06-24 19:50:50 -07:00 -
[1/3] core: make world state snapshots serializable (#29833)
## Why `WorldState` currently keeps its diff baseline as live Rust objects keyed by process-local `TypeId`. That baseline cannot be written to a rollout or restored after resume, so Codex reconstructs an approximation from `TurnContextItem`. This is the first change in the WorldState persistence stack. It gives every section a stable persisted identity and a compact serializable comparison snapshot without changing rollout behavior yet. ## What changed - Require each `WorldStateSection` to define a stable ID and serializable snapshot type. - Reject duplicate section IDs when constructing `WorldState`. - Persist a dedicated environment comparison snapshot using model-visible strings instead of runtime path types. - Store only `WorldStateSnapshot` in `ContextManager`, removing the parallel live-object baseline. - Render diffs by restoring each section's typed snapshot; invalid snapshots fall back to a full section render. - Omit null object fields for future RFC 7386 patches while preserving null values inside arrays. Follow-up PRs will record full snapshots and merge patches, then restore the baseline during resume, fork, and rollback. ## Test plan - WorldState snapshot tests cover stable IDs, duplicate rejection, null omission, and array preservation. - Environment tests cover persistence-safe snapshot values and existing diff rendering. - ContextManager baseline deduplication and session context-update persistence tests. Related: #29249
sayan-oai ·
2026-06-24 19:26:55 -07:00 -
Allow ChatGPT-hosted MCP servers to use session auth (#29733)
## Why ChatGPT session authentication was inferred from the reserved Codex Apps server name. That couples credential routing to Codex Apps-specific behavior and prevents other MCP endpoints hosted by ChatGPT from explicitly using the current session. The opt-in also needs a clear security boundary: an arbitrary MCP configuration must not be able to redirect ChatGPT credentials to another origin. ## What changed - Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to `false`. - Honor the setting only when the parsed server URL has the same HTTP(S) origin as the configured `chatgpt_base_url`; otherwise remove the capability before startup. - Resolve bearer tokens and static or environment-backed authorization headers before selecting authentication, with configured authorization taking precedence over ChatGPT session auth. - Enable the setting for the built-in Codex Apps and hosted plugin runtime endpoints while keeping Codex Apps caching and tool normalization scoped to the reserved server. - Persist the setting through MCP config rewrite paths and expose it in the generated config schema. - Load the current login state for `codex mcp list` so reported auth status matches runtime behavior. ## Verification Core integration coverage exercises the complete streamable HTTP MCP startup path and verifies that: - a same-origin opted-in server receives the current ChatGPT access token; - an explicitly configured authorization header takes precedence; - a different-origin server completes MCP initialization and tool listing without receiving any ChatGPT authorization header.
Ahmed Ibrahim ·
2026-06-24 19:21:28 -07:00 -
TUI Plugin Sharing 5 - polish remote plugin catalog rows (#26705)
This is the final plugin sharing PR in the 5-PR stack. It applies the remaining TUI polish for remote plugin catalog rows and tabs: admin-disabled plugins now read as blocked/view-only instead of looking toggleable, admin-installed/default-installed plugins count and sort like installed plugins, plugin search matches richer metadata, and an empty successful `Shared with me` section stays hidden. - Admin-disabled rows use a blocked marker, show `Disabled`, and keep Enter-only detail behavior without a toggle hint. - Admin-installed/default-installed plugins show as installed in counts, ordering, tabs, and detail copy. - Plugin search now matches descriptions and keywords in addition to existing row metadata. - Successful-empty `Shared with me` tabs are hidden, while loading, error, workspace-empty, and real shared-plugin states remain visible. - Updates coverage in `plugins_popup_snapshot_shows_all_marketplaces_and_sorts_installed_then_name`, `plugins_popup_admin_disabled_installed_plugin_has_no_toggle_hint`, `plugins_popup_search_matches_plugin_descriptions`, and `plugins_popup_remote_section_fallback_states_snapshot`. - Updates snapshots `plugins_popup_curated_marketplace` and `plugins_popup_empty_shared_section_hidden`. <img width="2034" height="106" alt="image" src="https://github.com/user-attachments/assets/3f9a57e1-edd8-4e6c-b0b0-9f632a3c9529" /> <img width="2038" height="380" alt="image" src="https://github.com/user-attachments/assets/45a47491-3381-4846-a13d-496bc0051d42" />
canvrno-oai ·
2026-06-24 18:48:11 -07:00