mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
ff78e212155af07ec2db2abbcbd037fbc9d0889f
7780 Commits
-
[codex] Ignore local curated plugins when remote catalog is active (#29765)
## Summary - suppress configured `openai-curated` plugins when the remote plugin feature is enabled and auth uses the Codex backend - preserve `openai-api-curated` and non-Codex-backend behavior while including remote catalog activation in the plugin load cache key - add core plugin coverage and an app-server integration test for runtime feature enablement ## Why The Codex app enables remote plugins through process-local runtime feature enablement, which can happen after app-server startup tasks have already observed legacy local plugin state. The existing conflict logic only preferred a remote plugin when the same plugin was already installed remotely, so a configured legacy-only plugin could continue exposing skills and other capabilities from `openai-curated`. ## Impact When the remote catalog is active, legacy `openai-curated` plugins no longer contribute skills, MCP servers, apps, or hooks. Remote installed plugins continue to load normally, and `openai-api-curated` remains unaffected. This does not change remote fetch, bundle sync, or uninstall behavior. ## Validation - `just test -p codex-core-plugins remote_global_catalog_ignores_local_curated_plugins remote_plugin_feature_keeps_local_curated_without_codex_backend` - `just test -p codex-app-server runtime_remote_plugin_enablement_excludes_local_curated_plugin_skills` - `just fmt` - `git diff --check`
xl-openai ·
2026-06-23 19:51:31 -07:00 -
[plugins] Add marketplace source requirements (#29690)
## Why Managed deployments need a mergeable way to declare which marketplace sources Codex may use. An enterprise-keyed TOML table avoids array merge ambiguity and lets every requirements layer use the existing config precedence rules without a marketplace-specific merger. ## Requirements shape ```toml [marketplaces] restrict_to_allowed_sources = true [marketplaces.allowed_sources.company_plugins] source = "git" url = "https://github.com/example/company-plugins.git" ref = "main" [marketplaces.allowed_sources.internal_git] source = "host_pattern" host_pattern = "^git\\.example\\.com$" [marketplaces.allowed_sources.local_plugins] source = "local" path = "/opt/company/codex-plugins" ``` `restrict_to_allowed_sources` follows normal scalar precedence. `allowed_sources` follows normal recursive TOML table merge behavior: distinct keys accumulate and fields under the same key use normal layer precedence. The final `source` value later selects which fields the marketplace admission policy interprets. The raw rule fields remain optional while requirements layers are composed, so a higher-priority layer can override only `ref`, `url`, or another individual field. Source-specific validation and normalization intentionally belong to the marketplace admission layer, not requirements merging. This initial shape includes `git`, `host_pattern`, and `local` sources. It does not add npm or path-pattern rules. ## What changed - Add the marketplace requirements TOML shape to `ConfigRequirementsToml`, `ConfigRequirementsWithSources`, and `ConfigRequirements`. - Carry marketplace requirements through the existing regular requirements merge path. - Keep allowed-source entries as raw partial tables for downstream policy interpretation. - Cover partial same-key overlays, source changes, unknown fields, and unmodified local paths. This PR defines and composes the requirements only. Source admission is implemented by the next PR in the stack. ## Stack This is PR 1 of 3. #29753 adds source admission on top of this PR; draft #29691 will add runtime enforcement after it is rebased later. ## Test plan - `just test -p codex-config marketplace_`
xl-openai ·
2026-06-23 19:42:13 -07:00 -
[codex] Update bundled skill installer guidance (#29768)
## Summary - Update the bundled skill installer's post-install guidance to say the skill will be available on the user's next turn. - Remove the obsolete instruction to restart Codex. ## Why Codex refreshes its skill catalog between turns. The existing bundled instruction predates that behavior and causes the model to recommend an unnecessary restart. ## Impact Released Codex builds will materialize accurate post-install guidance for the bundled system skill. ## Related - Canonical skill change: https://github.com/openai/skills/pull/507 ## Validation - `just fmt` - `git diff --check` - `just test -p codex-app-server skills_changed_notification_is_emitted_after_skill_change` (passed during investigation) No test code was added because the existing live-refresh path and focused integration test already verify that skill changes are picked up without restarting.
sayan-oai ·
2026-06-23 19:36:17 -07:00 -
[codex] Reuse compacted history replacement for new context windows (#29762)
## Why `start_new_context_window` independently replaced in-memory history and persisted a compacted checkpoint instead of using the shared compacted-history path. That bypassed the centralized missing-item-ID assignment when `item_ids` is enabled, so fresh context messages could enter the new context window and its persisted replacement history without IDs. This follows up on the token-budget compaction reset flow introduced in [#29743](https://github.com/openai/codex/pull/29743). ## What changed - Delegate new context-window installation to `replace_compacted_history`. - Reuse its ID assignment, in-memory replacement, world-state baseline, checkpoint persistence, turn-context persistence, and session-start bookkeeping. - Add focused coverage that verifies generated IDs are present in live history and preserved in the persisted replacement history. ## Testing - `just test -p codex-core start_new_context_window_assigns_and_persists_item_ids` - `just test -p codex-core new_context_tool_starts_new_window_before_follow_up`
pakrym-oai ·
2026-06-23 18:53:35 -07:00 -
Let image generation extension hosts control output persistence (#29711)
## Why Some extension hosts need generated images returned without writing them to the local filesystem or giving the model a local path. ## What changed **tl;dr**: we now conduct all extension operations in the image gen extension - Let hosts provide an optional image save root when installing the extension. - Save images and return path hints only when a save root is configured. - Return image data without saving or adding a path hint when no save root is configured. - Preserve the extension-provided `saved_path` instead of persisting extension images again in core. - Leave built-in image generation unchanged. ## Validation - `just test -p codex-image-generation-extension` - `just test -p codex-app-server standalone_image_generation_returns_saved_path_hint_to_model` - `just test -p codex-core extension_tool_uses_granted_turn_permissions_without_local_persistence` - `just test -p codex-core tools::handlers::extension_tools::tests` - tested on CODEX CLI on both save_root: CODEX_HOME and None - tested on CODEX APP on both as well
Won Park ·
2026-06-23 18:51:49 -07:00 -
test: add app-server auto environment helper (#29746)
## Why Start moving towards app-server tests defaulting to running against remote & foreign OS executors. To do so we need a point of indirection similar to core integration tests' `build_with_auto_env`, but with the flexibility of letting tests control environment registration if they need to. ## What This adds: - `TestAppServer::new_with_auto_env()` for constructing an app server with a default environment defined by the test runner (e.g. bazel) - `TestAppServer::auto_env_params()` for tests to easily acquire turn env params tailored to the automatic environment - `TestAppServer::send_thread_start_request_with_auto_env()` to make it easy for tests to start a thread using the automatic environment The above methods all fail if the test calling them has set up an environment where the automatic environment configuration conflicts with test-created state. ## Validation Adds a couple of basic smoke tests to the app-server test suite. Follow-ups will migrate more tests to use it.
Adam Perry @ OpenAI ·
2026-06-24 01:06:29 +00:00 -
chore: assign
amsg_IDs to agent messages (#29750)## Why The `ItemIds` path fills in missing IDs before response items are persisted and emitted as raw item events. `ResponseItem::AgentMessage` is part of that same response-item stream, but it was skipped by the missing-ID repair path, leaving agent messages without stable item IDs while messages and tool items received generated IDs. Agent messages recorded through `InterAgentCommunication` also need the generated ID to survive rollout persistence and resume. Otherwise clients can observe an `amsg_` ID for the live raw response item, then see that same persisted agent message lose its item ID after restart. ## What changed - Assign missing `ResponseItem::AgentMessage` IDs with the `amsg_` prefix. - Persist the generated item ID on `InterAgentCommunication` and replay it back into the reconstructed `ResponseItem::AgentMessage` on resume. - Keep the persisted ID out of the model-visible inter-agent message envelope. - Keep `CompactionTrigger` and `Other` skipped because they do not get generated item IDs. - Update session/protocol tests for agent-message ID assignment and resume preservation. ## Manual Testing Run the local dev build using `just c --enable item_ids` to ensure this code is exercised: https://github.com/openai/codex/blob/322e33512b2d38d38d705e2ef692a8aca50decac/codex-rs/core/src/session/mod.rs#L2713-L2715 In the `.jsonl` file, I saw entries like: ```json { "timestamp": "2026-06-24T00:44:03.098Z", "type": "inter_agent_communication", "payload": { "id": "amsg_019ef715-849a-7a50-becc-ce63c6a9c994", ``` ## Test plan - `just test -p codex-core record_inter_agent_communication_preserves_item_id_in_rollout_and_resume` - `just test -p codex-core record_inter_agent_communication_sets_turn_id_in_rollout_and_resume` - `just test -p codex-protocol inter_agent_communication_response_input_item_preserves_commentary_phase`
Michael Bolin ·
2026-06-23 17:57:03 -07:00 -
[codex] trace MCP startup latency (#28630)
## Summary - add trace-level instrumentation around per-server MCP setup, client construction, initialization, and initial tool listing - trace Codex Apps tool and server-info cache loads - attach `server_name` to server-scoped spans so slow startup work can be attributed to a specific MCP server ## Why `session_init.mcp_manager_init` can occasionally be slow, but its existing coarse span does not identify whether time is spent loading the Codex Apps cache, constructing a client, initializing a transport, or listing tools. These definition-level spans provide that breakdown without changing startup behavior. ## Validation - `just test -p codex-mcp` (87 passed) - `just test -p codex-rmcp-client` (86 passed, 2 skipped)
rphilizaire-openai ·
2026-06-23 17:46:54 -07:00 -
core: add wait_for_environment for starting environments (#29745)
## Why With `DeferredExecutor`, a sampling request can begin while an environment is still starting. The model can see that pending state, but needs a way to wait for the environment within the same turn before continuing. Environment startup is owned by Core, so the wait tool should use the same request-frozen `StepContext` that advertised the starting environment. This keeps tool registration and execution tied to the exact startup operation the model saw, even if live thread state later changes. Supersedes #29735. ## What - register `wait_for_environment` when the current `StepContext` contains starting environments - wait on the selected `StartingTurnEnvironment` shared resolution and return a bounded ready or failed result - rebuild the next request normally, removing the wait tool and exposing ready environment tools, or reporting the environment as unavailable after failure ## Testing - `just test -p codex-core deferred_executor_` - verifies the wait tool is replaced by environment-backed tools after startup - verifies startup failure removes both the wait tool and unavailable environment tools while notifying the model
sayan-oai ·
2026-06-24 00:35:34 +00:00 -
Support thread-level originator overrides (#29477)
## Why Work(TPP) threads can be launched from the Desktop app, but if they all keep the Desktop app's default originator then downstream attribution cannot distinguish local Work launches from cloud-backed Work launches. `thread/start.serviceName` already carries that launch signal, while `SessionMeta.originator` is the durable thread-level value that survives resume and fork. This change converts the Desktop Work service names into an effective originator at thread creation time, persists that originator with the thread, and keeps using it for later model requests and memory writes. ## What changed - Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to per-thread originators, while preserving `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override. - Persist the effective originator in `SessionMeta.originator`, read it back on resume/fork, and inherit the parent originator for subagent spawns when there is no persisted session metadata. - Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling back to the live parent originator when the forked history no longer includes `SessionMeta`. - Thread the per-thread originator through Responses headers, websocket/compaction request paths, thread-store creation, rollout metadata, and memory stage-one telemetry. ## Verification - `just test -p codex-core agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta thread_manager::tests::originator_override_precedes_service_name_remapping` - `just test -p codex-core agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode` - `just test -p codex-memories-write` - `just fix -p codex-core -p codex-memories-write` - `git diff --check`
alexsong-oai ·
2026-06-23 17:23:38 -07:00 -
core: reset context for token budget compaction (#29743)
## Why When `Feature::TokenBudget` is enabled, compaction should behave like `new_context`: start a fresh context window with the standard injected context, without asking the server to summarize old history and without carrying prior user or assistant messages into the next model request. This is still a compaction operation from the client lifecycle perspective. Manual `/compact` and auto-compaction should keep the same observable side effects that clients and hooks expect, including compact hooks and `TurnItem::ContextCompaction`. ## What changed - Added `compact_token_budget` to run token-budget manual and inline auto-compaction through a shared compaction lifecycle. - Split pending `new_context` requests from forced context-window startup: `take_new_context_window_request()` consumes pending requests, and `start_new_context_window()` installs a fresh context window. - Routed token-budget manual `/compact` and inline auto-compaction to install a fresh context window locally instead of calling server/local summarization. - Preserved compact lifecycle side effects for token-budget compaction by running pre/post compact hooks and emitting `ContextCompaction` item start/completion events. - Updated token-budget tests to assert fresh window IDs, absence of server-side compaction calls, dropped prior transcript messages/tool output after reset, and compact hook/item lifecycle behavior. ## Testing - `just test -p codex-core token_budget_context_uses_new_window_after_compaction` - `just test -p codex-core token_budget_compaction_runs_compact_hooks` - `just test -p codex-core token_budget_mid_turn_auto_compaction_resets_before_active_follow_up` --------- Co-authored-by: pakrym-oai <pakrym@openai.com>
Michael Bolin ·
2026-06-23 16:59:04 -07:00 -
Andrey Mishchenko ·
2026-06-23 16:52:40 -07:00 -
[codex] rename rollout budget error to session budget error (#29744)
## Summary - rename the rollout-budget exhaustion error from `RolloutBudgetExceeded` to `SessionBudgetExceeded` - expose the matching app-server v2 wire value as `sessionBudgetExceeded` - regenerate JSON/TypeScript schema fixtures and update the app-server docs and focused tests This is a naming-only follow-up to #29715 based on [Pavel's review suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480). Runtime behavior is unchanged. ## Tests - `just test -p codex-core rollout_budget` - `just test -p codex-app-server-protocol` - `just fmt` - `just write-app-server-schema`
rka-oai ·
2026-06-23 16:49:13 -07:00 -
fix: scope context remaining to body window (#29665)
## Why With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the persistent prefix should not count against the active body window. `get_context_remaining` and the token-budget reminder should report the same usable body-after-prefix window that auto-compaction uses, rather than the total token count since the session began. This is stacked on #29664 so the mechanical move from `turn.rs` is isolated from the behavior fix. ## What - Extends `ContextWindowTokenStatus` with `context_remaining_tokens`. - Updates `get_context_remaining` to use the shared context-window accounting. - Adds integration coverage for body-after-prefix reminder timing and `get_context_remaining` output. ## Testing - `just test -p codex-core body_after_prefix_window` - `just test -p codex-core auto_compact_body_after_prefix` - `just fix -p codex-core`
Michael Bolin ·
2026-06-23 23:08:54 +00:00 -
refactor: extract context window token status (#29664)
## Why This PR keeps the mechanical helper extraction separate from the behavior change in #29665. The follow-up needs the token-window accounting from `turn.rs` in another call path, but reviewing that is much easier when the helper extraction is separate from the semantic change. ## What - Adds `session/context_window.rs` with `ContextWindowTokenStatus`. - Moves the existing auto-compaction token-status calculation out of `session/turn.rs`. - Replaces the duplicated inline remaining-token calculation in `turn.rs` with `tokens_until_compaction()`. This PR is intended to be behavior-preserving. The `get_context_remaining` behavior change is stacked separately in #29665. ## Testing - `just test -p codex-core auto_compact_body_after_prefix` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/29664). * #29665 * __->__ #29664
Michael Bolin ·
2026-06-23 15:49:30 -07:00 -
protocol: separate app and exec RPC ownership (#29714)
## Why The app-server and exec-server expose separate JSON-RPC APIs, but exec-server currently sources its serialized protocol and envelope types through app-server-oriented code. Giving each API an explicit owner makes the crate boundary legible without introducing shared generic envelopes. ## What changed - Added `codex-exec-server-protocol` to own exec DTOs, process IDs, and JSON-RPC envelopes. - Updated exec-server clients, transports, handlers, and tests to use the new crate. - Exposed app-server's existing JSON-RPC types through a public `rpc` module while retaining root re-exports. - Preserved existing wire shapes, including exec `PathUri` behavior. ## Stack This is PR 1 of 6. Next: [PR #29721](https://github.com/openai/codex/pull/29721), which moves auth mode below the app wire boundary. ## Validation - Exec-server protocol and server coverage passed in the focused protocol test runs. - App-server protocol schema fixtures passed.
Adam Perry @ OpenAI ·
2026-06-23 22:37:31 +00:00 -
Load executor skills without host path conversion (#29626)
## Why After #28918, selected skill roots are `PathUri`, but the executor skill provider still converts them to the app-server host's `AbsolutePathBuf`. A foreign Windows root therefore cannot be discovered by a Unix host, and the inverse has the same problem. This PR keeps executor skill discovery and reads on the filesystem that owns the selected root while reusing the existing skill rules. ## What changed - Generalize the existing skill traversal to operate on `PathUri` through `ExecutorFileSystem`, preserving its depth, directory, symlink, and sibling-metadata concurrency behavior. - Add a small environment skill loader that reuses the shared discovery, frontmatter validation, dependency parsing, product policy, and prompt-visibility rules. - Keep the environment id and entrypoint `PathUri` in the skill catalog, then route `skills.read` back through the same environment filesystem. - Preserve the executor's path convention when deriving catalog handles, including literal backslashes in POSIX filenames. - Resolve plugin namespaces from nearby manifests through URI-native filesystem reads. - Cover foreign Windows roots, executor-owned reads, namespaces, metadata, policy, and path identity. ```text selected root (PathUri) | v shared discovery over ExecutorFileSystem | v environment-bound catalog entry --skills.read--> same ExecutorFileSystem ``` No second filesystem abstraction or duplicate traversal implementation is introduced. ## Stack 1. #29614 — add lexical `PathUri` containment. 2. #29620 — share URI-native manifest path resolution. 3. #28918 — keep selected plugin roots and resources URI-native. 4. **This PR** — load executor skills without host path conversion. 5. #29628 — resolve executor MCP working directories without host path conversion.
jif ·
2026-06-23 23:26:06 +01:00 -
code-mode: Remove Session::is_alive() (#29732)
Remove this unused API. This API is insidious in that it implies that alive state should be determinable from the caller, and implies that a preflight should indicate routing. Lets drop this, and handle errors correctly from a failed session in the future.
Channing Conger ·
2026-06-23 15:14:13 -07:00 -
[codex] surface rollout budget exhaustion (#29715)
## Summary - surface shared rollout-budget exhaustion as `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn - map it through the existing `CodexErrorInfo` and app-server v2 `codexErrorInfo` path - keep local compaction from retrying after the shared rollout budget is exhausted This gives app-server clients a stable `rolloutBudgetExceeded` error they can classify without guessing from `status="interrupted"`. ## Tests - `just test -p codex-core rollout_budget`
rka-oai ·
2026-06-23 15:01:28 -07:00 -
[codex] define code mode host handshake protocol (#29515)
## Summary - add validated protocol-version, capability, and session identifier types - define explicit `ClientToHost` and `HostToClient` JSON envelopes for connection negotiation and session open/close acknowledgements - reject invalid states and unknown fields during decoding, with explicit wire-format and round-trip coverage ## Why This establishes the transport-neutral encoding shape needed to build and test the new code-mode host incrementally. Cell, tool callback, and failure-domain messages are intentionally deferred until their actors and behavior tests establish the required semantics. This is additive protocol scaffolding and does not change the current production code-mode implementation. ## Validation
Channing Conger ·
2026-06-23 14:57:44 -07:00 -
Make selected plugin roots URI-native (#28918)
## Why Selected capability roots belong to the executor filesystem, not the app-server host. Converting their path strings into the host's native `Path` breaks whenever the two machines use different path conventions, such as a Windows executor behind a Unix app-server. This PR establishes `PathUri` as the selected-plugin boundary so the executor remains authoritative for its paths. ## What changed - Require `selectedCapabilityRoots[].location.path` to be a canonical `file:` URI and deserialize it directly as `PathUri`; native path strings are rejected. - Update the app-server schema, generated TypeScript, examples, and request coverage for the URI contract. - Keep selected roots, resolved plugin locations, manifest paths, and manifest resources as `PathUri`. - Inspect and read plugin roots and manifests only through the selected environment's `ExecutorFileSystem`. - Parse executor manifests with the shared URI-native parser from #29620 instead of projecting them onto the host filesystem. - Enforce resource containment lexically and preserve the root URI's POSIX or Windows path convention. - Cover foreign Windows plugin roots and URI-native manifest resources. ```text thread/start selectedCapabilityRoots[].location.path = "file:///C:/plugins/demo" | PathUri v ExecutorFileSystem | +--> plugin.json +--> manifest resources ``` This PR stops at the shared selected-plugin representation. The next two PRs remove the remaining host-path projections in the skill and MCP consumers. ## Stack 1. #29614 — add lexical `PathUri` containment. 2. #29620 — share URI-native manifest path resolution. 3. **This PR** — keep selected plugin roots and resources URI-native. 4. #29626 — load executor skills without host path conversion. 5. #29628 — resolve executor MCP working directories without host path conversion.
jif ·
2026-06-23 22:51:19 +01:00 -
core: persist initial context window metadata (#29519)
## Why PR #29494 made context-window IDs visible to the model by wrapping the token-budget window payload in `<context_window>`, but rollout JSONL consumers still could not see the initial window identity by tailing the session file. Compacted rollout items carry window IDs only after compaction has happened, so a session with no compaction had no durable JSONL record for window 0. This change gives tailing consumers a stable initial-window record at session creation time. ## What Changed - Added `session_meta.context_window.window_id` for the initial context-window identity. - `CreateThreadParams` now requires `initial_window_id: String`, so thread-store callers cannot accidentally create new threads without window-0 metadata. - Live thread creation derives the persisted initial window ID from the same `AutoCompactWindowIds` used to initialize `SessionState`, keeping runtime state and JSONL metadata aligned. - Rollout reconstruction uses `session_meta.context_window.window_id` as the initial-window fallback and derives `window_number = 0`, `first_window_id = window_id`, and `previous_window_id = None` internally. - Fork reconstruction intentionally uses the same rollout reconstruction path; consumers that need to distinguish copied initial-window metadata can use the rollout `thread_id`. - Legacy compactions without `window_number` still use compaction-count fallback accounting instead of being reset to window 0 by the initial-window fallback. - Compacted rollout metadata still takes precedence once compaction records exist, preserving the richer chain fields there. ## JSONL Shape Real rollout JSONL is one object per line. This example is expanded for readability, but shows the new initial `session_meta.context_window` record followed by the existing compacted rollout item shape that also carries window IDs: ```jsonl { "timestamp": "2026-06-22T12:00:00.000Z", "type": "session_meta", "payload": { "session_id": "<THREAD_ID>", "id": "<THREAD_ID>", "timestamp": "2026-06-22T12:00:00.000Z", "cwd": "/repo", "originator": "codex", "cli_version": "0.0.0", "source": "cli", "model_provider": "<MODEL_PROVIDER>", "context_window": { "window_id": "<INITIAL_WINDOW_ID>" } } } ... { "timestamp": "2026-06-22T12:34:56.000Z", "type": "compacted", "payload": { "message": "<COMPACTION_SUMMARY>", "replacement_history": [ "..." ], "window_number": 1, "first_window_id": "<INITIAL_WINDOW_ID>", "previous_window_id": "<INITIAL_WINDOW_ID>", "window_id": "<NEXT_WINDOW_ID>" } } ``` The nested `context_window` object is intentional: it gives rollout consumers a stable namespace for context-window metadata while only writing the non-derivable initial `window_id`. For the initial window, `window_number`, `first_window_id`, and `previous_window_id` are derived internally instead of being written to the rollout. ## Verification - `just test -p codex-protocol` - `just test -p codex-rollout recorder_materializes_on_flush_with_pending_items` - `just test -p codex-core reconstruct_history` - `just test -p codex-core record_initial_history_reconstructs_forked_transcript` - `just test -p codex-thread-store` - `just test -p codex-state` - `just test -p codex-app-server thread_read_returns_summary_without_turns` - `just test -p codex-rollout persistence_metrics`
Michael Bolin ·
2026-06-23 21:50:50 +00:00 -
path-uri: remove legacy path deserialization (#29158)
## Why I'd originally added `PathUri` legacy path deserialization thinking we'd want it for having `PathUri` in public app-server APIs. Since then we've added `LegacyAppPathString` to handle the messy conversions that we need for backcompat. It's confusing for `PathUri` to support deserializing legacy paths when we don't yet want to actually expose app-server callers or rollout storage to the new URI format. Stacked on top of #29472 to avoid breaking compatibility in case those types ended up stored somewhere for someone. ## What changed - Parse deserialized `PathUri` values exclusively as valid `file:` URIs. - Replace legacy acceptance coverage with rejection coverage for top-level filesystem paths and sandbox working directories. - Serialize CWDs in hand-built exec-server process requests as `PathUri` values.
Adam Perry @ OpenAI ·
2026-06-23 21:47:00 +00:00 -
core tests: rename automatic environment builder (#29728)
## Why Use a clearer name for what happens when this helper sets up a test environment. ## What - Rename the builder and its harness wrapper to use `auto_env` instead of `remote_env` because the helper will set up a local environment if configured by the build system.
Adam Perry @ OpenAI ·
2026-06-23 21:45:06 +00:00 -
test: branch on target OS instead of runner flavor (#29712)
## Why Core tests should branch on the executor's operating system, not on runner details such as Docker or Wine. This keeps platform behavior stable as new test backends are added and reserves Wine-specific skips for actual runner debt. ## What - Add `TestTargetOs` and target/host-aware skip helpers while keeping `TestEnvironment` internal. - Replace topology enum access with remote predicates and a narrow Docker accessor. - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and document the skip taxonomy. ## Validation - `just test -p core_test_support` - `just test -p codex-core remote_test_env_can_connect_and_use_filesystem` - `bazel test //codex-rs/core:core-all-wine-exec-test --test_output=errors` reached test execution; unrelated existing view-image, path, and timing failures remain. - `just test -p codex-core` and `just test` reached broad test execution; this checkout has unrelated helper, sandbox, and timing failures.
Adam Perry @ OpenAI ·
2026-06-23 14:27:13 -07:00 -
code-mode: Rename codex_code_mode::CodeModeService (#29716)
Mechanical rename of CodeModeService => InProcessCodeModeSession This already implements a CodeModeSession as its prime interface to Core. The name was vestigial _and_ confusing af when embedded inside core::tools::code_mode::CodeModeService
Channing Conger ·
2026-06-23 14:17:51 -07:00 -
feat(app-server): thread/turns/items/list -> thread/items/list (#29705)
## Description Rename the experimental app-server item pagination API from `thread/turns/items/list` to `thread/items/list` and make `turnId` optional. Clients can now page persisted items across a thread, or still filter to one turn when needed. ## What changed - Rename the request/response protocol types and JSON-RPC method to `ThreadItemsList*` / `thread/items/list`. - Pass optional `turnId` through to `ThreadStore::list_items`. - Update app-server docs and focused protocol/app-server tests. ## Validation - `just test -p codex-app-server-protocol thread_items_list_round_trips` - `just test -p codex-app-server thread_items_list_returns_unsupported`
Owen Lin ·
2026-06-23 13:57:08 -07:00 -
[codex] Report the exec-server working directory (#29666)
## Summary - add the exec-server working directory to `environment/info` as an optional `PathUri` - populate it from the executor process's current directory - preserve compatibility with older responses that omit `cwd` ## Why Remote clients currently have no executor-native default working directory. This forces callers such as app-server-backend to assume `/workspace`, which fails for laptop environments. Reporting the cwd alongside the detected shell lets clients use the path convention and location of the actual executor. ## Impact This is backward-compatible: the new response field is optional, and clients can continue handling responses from older exec servers. A follow-up app-server-backend change will consume the value for cwd-less `command/exec` requests. ## Validation - `just test -p codex-exec-server` (275 passed, 2 skipped)
Rasmus Rygaard ·
2026-06-23 13:39:13 -07:00 -
Decouple plugin manifest path resolution (#29620)
## Why Plugin manifests use the same schema whether the package lives on the host or in an executor. Only the path representation differs: host callers need native `Path` inputs and `AbsolutePathBuf` outputs, while executor callers need `PathUri` throughout. Maintaining separate parsing or resolver implementations would duplicate the manifest rules and allow them to drift. This PR instead makes URI-native resolution the single parsing path and keeps host conversion at the boundary. ## What changed - Make `parse_plugin_manifest_uri` the shared manifest parser and resolve every path-bearing field as `PathUri`. - Keep the existing host entrypoint as a thin adapter: convert its native root and manifest path to `PathUri`, run the shared parser, then map resources back to `AbsolutePathBuf`. - Expose `PluginManifest::try_map_resources` so callers can convert the generic resource type without duplicating manifest construction. - Resolve relative manifest paths using the root URI's convention: backslashes are separators for Windows roots and ordinary filename characters for POSIX roots. - Apply lexical containment after URI resolution, rejecting absolute paths and parent traversal outside the plugin root. - Make encoded backslashes fail containment only for Windows URIs; encoded `/` remains unsafe for every convention. - Use a host-native synthetic root for marketplace fallback manifests so the host adapter also works on Windows. ```text host Path --------> PathUri --\ +--> one manifest parser --> PluginManifest<PathUri> executor PathUri -------------/ host result: PluginManifest<PathUri> --> PluginManifest<AbsolutePathBuf> ``` Existing host manifest behavior is preserved; #28918 is the first executor consumer. ## Verification - `just test -p codex-utils-path-uri` - `just test -p codex-plugin` - `just test -p codex-core-plugins` ## Stack 1. #29614 — add lexical `PathUri` containment. 2. **This PR** — share URI-native manifest path resolution. 3. #28918 — keep selected plugin roots and resources URI-native. 4. #29626 — load executor skills without host path conversion. 5. #29628 — resolve executor MCP working directories without host path conversion.jif ·
2026-06-23 20:33:59 +00:00 -
feat(guardian): include connected account email in app reviews (#27045)
## Why auto review reviews Codex App tool calls using connector metadata such as the app ID, name, and description. That metadata does not identify the account behind the OAuth connection. For Google Drive, this means auto review cannot distinguish a Drive connection authenticated as `user@email.com` from a personal Drive account. Uploading work data can therefore look like a transfer to a personal destination even though the connector service already knows the authenticated account email. ## What changed - Read `_meta._codex_apps.connected_account_email` while resolving approval metadata for built-in Codex App tools. - Include the connected account email in the structured MCP tool action sent to auto review. - Trim empty values and omit the field when the connector link has no account email. - Update existing auto review request constructors and add coverage for request construction and JSON serialization. ## Security Only metadata from the trusted built-in `codex_apps` MCP server is accepted. Custom MCP servers cannot inject a connected account email into auto review reviews; the new regression test verifies that spoofed metadata is ignored. The email is used only in auto review's private review request. This change does not add it to model-visible tool descriptions, app-server approval events, or auto review assessment/review analytics.
viyatb-oai ·
2026-06-23 20:33:44 +00:00 -
Add MCP tool call error metrics (#28976)
[Codex Thread 019edc37-5345-7272-92c9-bf5494cf3819](https://codex-thread-link.openai.chatgpt-team.site/thread/019edc37-5345-7272-92c9-bf5494cf3819) ## Summary - count MCP `CallToolResult.isError` responses as failed calls instead of successful transport-level calls - add `codex.mcp.call.error` with bounded `error_type` and trusted plugin-service `error_code` dimensions - record the same error classification on MCP tool-call spans while keeping untrusted server error text out of metric labels ## Scope - no changes to MCP routing, retries, tool behavior, configuration, or public APIs - request failures remain grouped as `mcp_request`; separating connection, timeout, protocol, and JSON-RPC failures requires preserving typed errors through the existing flattened error boundary ## Testing - `just test -p codex-core 'mcp_tool_call::tests::'` (75 passed) - `just fix -p codex-core` - `just fmt` - `just test -p codex-core` (2,676 passed; 80 unrelated environment failures from missing test binaries, sandbox signals, and read-only paths)
stevenlee-oai ·
2026-06-23 13:33:23 -07:00 -
core: use current step environments for tools (#29547)
## Why With deferred executors, an environment can become ready between two sampling requests in the same turn. The model-visible environment update, advertised tools, and eventual tool execution must all describe the same request-time view. Otherwise, a request built while only environment B is ready can advertise a tool without an `environment_id`; if higher-priority environment A becomes ready before execution, that call could silently run in A instead. This PR is stacked on #29527. ## Design `run_turn` captures one `Arc<StepContext>` at each sampling-request boundary. That step owns the request's `TurnContext` and environment snapshot. - World-state environment updates and tool planning borrow that same step. - `ToolCallRuntime` retains the `Arc` while asynchronous tool calls execute. - `ToolInvocation` carries the step to handlers; its temporary `turn` compatibility field is derived from the same object. - `ToolRouter` does not retain `StepContext`; it only uses it while constructing the request's tool set. - With `DeferredExecutor` disabled, step capture keeps using the environments frozen at turn start. Simply: every sampling request gets one consistent picture of its environments, from what the model sees through where its tool calls run. ## What changed - Build environment-dependent tool specs from the current request's `StepContext`. - Use that same step for unified exec, legacy shell, `apply_patch`, `view_image`, and `request_permissions` execution. - Hide environment-backed tools, including `request_permissions`, while no environment is attached. - Resolve legacy shell paths and metadata from the selected step environment instead of the stale turn-start environment. - Capture explicit steps at non-turn-loop boundaries such as compaction, prompt debug, and startup prewarm. - Reconcile prompt-debug history from the same step used to build its tools. ## Follow-up - Bind yielded code-mode cells to the tool runtime that created them, so nested calls made after yielding continue to use the originating request's `StepContext`. ## Test plan - `just test -p codex-core deferred_executor_updates_context_and_tools_after_startup` - `just test -p codex-core environment_count_controls_environment_backed_tools` - `just test -p codex-core build_prompt_input_includes_context_and_user_message`
sayan-oai ·
2026-06-23 20:21:13 +00:00 -
[codex] Fix stale approval policy in MCP test (#29704)
## Summary - replace the removed `AskForApproval::OnFailure` variant in the MCP shutdown test with `OnRequest` ## Why `OnFailure` was removed from `AskForApproval`, but this test fixture still referenced it, causing Rust and Clippy compilation failures. ## Validation - `just test -p codex-mcp shutdown_continues_after_caller_is_aborted` - `just fmt`
Boyang Niu ·
2026-06-23 13:18:50 -07:00 -
[codex] Fix stale approval policy in MCP test (#29696)
## Summary - replace the stale `AskForApproval::OnFailure` reference in the MCP connection manager test with `AskForApproval::OnRequest` - restore `codex-mcp` test compilation after `OnFailure` was removed in #28418 ## Root cause The test was added on main after the approval-policy removal branch had already updated the other references, so the newly added call site was missed when #28418 merged. ## Validation - `just test -p codex-mcp` (90 passed) - `just fmt`
sayan-oai ·
2026-06-23 19:56:20 +00:00 -
core: resolve view_image paths in selected environment (#29526)
## Why view_image needs to support foreign OS remote executors. ## What - resolve image paths against the selected environment as `PathUri` and read them through that environment's filesystem - keep app-server's public path field wire-compatible as `LegacyAppPathString`, with purpose-specific UI rendering - cover relative and absolute target-native paths in the core integration test and run the full `view_image` suite under wine-exec without skips
Adam Perry @ OpenAI ·
2026-06-23 19:52:37 +00:00 -
[codex] allow image generation with provider auth (#29513)
## Summary - allow the native Responses API `image_generation` tool when the active provider carries CCA's non-empty `x-openai-actor-authorization` header - preserve the Codex-managed ChatGPT auth path, scoped to providers that actually require OpenAI auth - keep generic custom providers excluded, including when unrelated ChatGPT credentials are cached - retain the existing feature, provider-capability, and image-input-modality gates ## Why CCA authenticates its inference requests through the active provider's `x-openai-actor-authorization` and `ChatGPT-Account-ID` headers, so it does not have a Codex-managed login session. The previous gate therefore hid the native hosted image-generation tool despite an authenticated codex-backend path. This change is intentionally limited to the native hosted tool. It adds no extension, MCP, plugin-service, session-source, token plumbing, or new provider configuration surface. ## Tests - `cargo test -p codex-core hosted_tools_follow_provider_auth_model_and_config_gates` - `cargo fmt --all -- --check` - `git diff --check origin/main`
richardopenai ·
2026-06-23 12:40:54 -07:00 -
[codex] Preserve proxy state for filesystem sandbox helpers (#29671)
## Why Filesystem helpers intentionally run with a minimal environment that excludes proxy variables. After filesystem operations started using the Windows sandbox wrapper, the wrapper derived an empty proxy configuration from that helper environment and compared it with the persistent sandbox setup marker. When the marker contained proxy ports, every filesystem operation appeared to require a firewall update, which could launch elevated setup, show a UAC or loader dialog, and fail operations such as `apply_patch` with error 1223. Filesystem helpers do not use network access, so they should preserve the proxy/firewall state established by normal sandboxed process launches. ## What changed - Add an explicit Windows sandbox proxy-settings mode for reconciling or preserving persistent proxy state. - Use preserve mode for filesystem helpers while normal process launches continue to reconcile proxy settings from their environment. - Carry the selected proxy state consistently through setup validation, elevated setup, and non-elevated ACL refreshes. - Cover wrapper argument propagation and marker-derived proxy preservation. ## Validation - `cargo build -p codex-cli --bin codex` - `just test -p codex-windows-sandbox preserving_proxy_settings_uses_the_existing_marker` - `just test -p codex-windows-sandbox windows_wrapper_args_round_trip` - `just test -p codex-windows-sandbox setup_request_prefers_explicit_proxy_settings` - `just test -p codex-sandboxing transform_for_direct_spawn_windows` - `just test -p codex-exec-server fs_sandbox::tests` - Ran the same sandboxed `fs/writeFile` reproduction against published `0.142.0-alpha.6` and the new CLI. The published CLI launched elevated setup and failed with `ShellExecuteExW ... 1223`; the new CLI completed without elevation. Related to #28359.
iceweasel-oai ·
2026-06-23 12:29:46 -07:00 -
Separate local and remote plugin analytics IDs (#29495)
## Why Plugin analytics overloaded `plugin_id`: most events used the Codex `<plugin>@<marketplace>` identity, while remote install events used the backend plugin ID. That makes the same field change meaning across event types and complicates downstream identity resolution. This change makes the contract unambiguous: - `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when resolved - `remote_plugin_id`: the backend plugin identity, when available For a remote install failure that happens before plugin details resolve, `plugin_id` is `null` and `remote_plugin_id` remains populated. ## What changed All six plugin analytics events use the same identity contract: - `codex_plugin_installed` - `codex_plugin_install_failed` - `codex_plugin_uninstalled` - `codex_plugin_enabled` - `codex_plugin_disabled` - `codex_plugin_used` Remote identity is resolved from the current installed-plugin snapshot first, with persisted install metadata as fallback. The telemetry metadata type keeps local identity optional for failures that occur before remote details are available. The app-server test client's manual analytics smokes now find remote mutation events through `remote_plugin_id` and validate that `plugin_id` remains local. ## Remote uninstall Resolve and capture telemetry metadata before removing the local plugin cache, then emit `codex_plugin_uninstalled` after the backend confirms success. The event is also emitted when backend uninstall succeeds but local cache cleanup reports `CacheRemove`. If a concurrent remote-cache refresh removes the local bundle before telemetry capture, the already-fetched remote plugin detail supplies fallback capability metadata. ## Validation - `just test -p codex-analytics` — 82 passed - `just test -p codex-core-plugins` — 271 passed - `just test -p codex-app-server-test-client` — 5 passed - `just test -p codex-plugin` — 3 passed - `just test -p codex-app-server plugin_install` — 37 passed - `just test -p codex-app-server plugin_uninstall` — 10 passed The production app-server install/uninstall flow was also exercised against `plugins~Plugin_f1b845ac33888191ac156169c58733c2` (`build-ios-apps@openai-curated-remote`), and the plugin's original uninstalled state was restored.
jameswt-oai ·
2026-06-23 12:27:14 -07:00 -
Keep managed MITM CA private keys in proxy memory (#29013)
## Why The managed MITM trust bundle must be readable by sandboxed commands. Persisting its sibling CA private key under `$CODEX_HOME/proxy` therefore requires a deny-read sandbox rule, but the Windows unelevated backend rejects deny-read paths and WSL1's legacy Landlock path cannot enforce that rule. A persistent OS credential store also does not provide the same cross-platform boundary from other processes running as the same user. Keeping the signer inside the network proxy process avoids both problems: ordinary sandbox setup stays independent of CA-key state, and no private signing key is exposed through the filesystem or a persistent credential record. ## What - generate one managed CA per proxy process and retain its private signer only in proxy memory - emit only content-addressed public CA certificates and trust bundles under `$CODEX_HOME/proxy` - hold a cross-process lease for each active public certificate and prune artifacts from inactive proxy processes - keep all CA ownership in `codex-network-proxy`; no `codex-core` or sandbox-policy changes - validate generated trust-bundle paths by their content hash - keep the public bundle readable by sandboxed commands on Windows, WSL1, macOS, and Linux The independent startup custom-CA follow-up is #29014. ## Validation - `CODEX_HOME=/private/tmp/codex-test-home-network-proxy just test -p codex-network-proxy` (179 tests) - `just bazel-lock-check` - `just fix -p codex-network-proxy` - `just fmt` --------- Co-authored-by: viyatb-oai <viyatb@openai.com>
Winston Howes ·
2026-06-23 12:20:51 -07:00 -
core: add extra metadata field to Thread struct (#29675)
# Summary Adds a field Thread.extras that can be used to hold arbitrary metadata specific to a given thread.
Boyang Niu ·
2026-06-23 19:15:59 +00:00 -
chore(core) rm AskForApproval::OnFailure (#28418)
## Summary Deletes the OnFailure variant of the `AskForApproval` enum. This option has been deprecated since #11631. ## Testing - [x] Tests pass
Dylan Hurd ·
2026-06-23 12:13:54 -07:00 -
Prepare managed network sandbox context (#29456)
## Why Managed network configures commands to use local HTTP and SOCKS proxies. For commands delegated to the exec server, the proxy environment and the sandbox policy were prepared separately. On macOS, that meant a command could receive `HTTPS_PROXY=http://127.0.0.1:43123` while Seatbelt still denied access to port `43123`. ## What changed `NetworkProxy` now prepares the command environment and sandbox context together from the same runtime snapshot: ```text Prepared managed network ├── command environment: HTTPS_PROXY=http://127.0.0.1:43123 └── sandbox context: allow outbound to 127.0.0.1:43123 ``` That context travels with remote exec requests. The exec server preserves the managed proxy and CA environment, and macOS Seatbelt allows only the prepared loopback proxy ports without enabling broad network access or local binding. The protocol field is optional and the existing enforcement flag remains in place, preserving compatibility with callers that do not send the new context.
jif ·
2026-06-23 20:07:09 +01:00 -
app-server: document thread and turn IDs are UUID7 (#27714)
It's actually a very nice property that these are UUID7s, so documenting them so we think twice before changing it away from UUID7s in the future.
Owen Lin ·
2026-06-23 11:46:36 -07:00 -
Rasmus Rygaard ·
2026-06-23 18:35:54 +00:00 -
Handle additional tools in rollout persistence metrics (#29669)
## Why The rollout persistence metrics added on current `main` exhaustively match `ResponseItem`, but omit `ResponseItem::AdditionalTools`. That prevents `codex-rollout` and downstream targets from compiling across Cargo and Bazel builds. ## What Map `ResponseItem::AdditionalTools` to the `response.additional_tools` metric label, consistent with the existing exact-variant labels. ## Validation - `just test -p codex-rollout` (76 passed) - `just fix -p codex-rollout`
Winston Howes ·
2026-06-23 11:06:07 -07:00 -
[codex] Handle additional tools in rollout persistence metrics (#29672)
## Summary Handle `ResponseItem::AdditionalTools` in rollout persistence metrics. The persistence metrics match was added after the `AdditionalTools` variant and omitted it, causing release builds to fail with a non-exhaustive pattern error. This assigns the item the `response.additional_tools` metrics label. Release failure: https://github.com/openai/codex/actions/runs/28043786727/job/83016608475 ## Validation - `just fmt` - `just test -p codex-rollout` (76 passed)
rka-oai ·
2026-06-23 18:03:35 +00:00 -
core: use turn-owned world state for inline compaction (#29527)
## Why Follow-up to #29249 and its [compaction review thread](https://github.com/openai/codex/pull/29249#discussion_r3455055101). During a turn, environment readiness can change between sampling requests. Inline compaction must render the same model-visible `WorldState` used by the request it follows. Rebuilding that state during compaction can observe a newer environment, make replacement history disagree with what the model saw, and suppress the next environment update. ## What changed - Make `run_turn` own the current `Arc<WorldState>` and replace it only between sampling requests. - Build each state from an explicitly chosen environment snapshot, diff deferred-executor steps against the turn-owned state, and retain the latest state in `ContextManager` only for cross-turn and resume tracking. - Pass the exact turn-owned state into inline compaction and explicit new-context-window replacement. - Carry that state with `InitialContextInjection::BeforeLastUserMessage`, so replacement context and its stored baseline cannot come from different snapshots. - Remove obsolete state-recapture helpers and ambiguous TurnContext-only WorldState builders. - Add an integration test that moves an environment from starting to ready during a paused turn, triggers compaction, and verifies the next request receives the readiness update exactly once. ## Test plan - `just test -p codex-core deferred_executor_compaction_preserves_then_updates_environment_once` - `just test -p codex-core process_compacted_history` - `just test -p codex-core mid_turn_continuation_compaction` - `just test -p codex-core build_initial_context` - `just test -p codex-core ignores_session_prefix_messages_when_truncating`
sayan-oai ·
2026-06-23 10:33:19 -07:00 -
Shut down superseded MCP managers on refresh (#29608)
## Summary MCP refresh replaced the published connection manager without shutting down the manager it superseded. If another task retained that old manager, its stdio MCP processes stayed alive and accumulated across refreshes. Atomically swap in the refreshed manager, then explicitly shut down the exact manager returned by the swap. Add a process-level regression test that retains the old manager during refresh and verifies its stdio process exits while the replacement remains available. ## Context Explicit cleanup was lost when manager publication moved to `ArcSwap`. Dropping the old manager is not a reliable shutdown boundary because active callers can retain its `Arc` and underlying client process handles.
jif ·
2026-06-23 18:29:27 +01:00 -
[core] debounce current-time reminders by elapsed time (#29659)
## Summary - rename `reminder_interval_model_requests` to `reminder_interval_seconds` - read the configured time provider before every model request and inject a reminder only after the configured number of seconds has elapsed - preserve immediate first delivery and forced delivery after compaction changes the context window ## Tests - `just test -p codex-core current_time_reminder`
rka-oai ·
2026-06-23 10:13:27 -07:00 -
[codex] Instrument rollout persistence bytes (#29498)
- Add 1%-sampled rollout persistence metrics that report per-item and per-thread JSON byte totals before and after filtering when metrics export is enabled. - Tag each item with its exact response or event variant, including nested turn-item kinds for conditionally persisted completion events, so aggregate cloud-storage impact can be estimated by policy choice.
Tom ·
2026-06-23 09:26:30 -07:00