mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
253 Commits
-
Preserve namespaces on custom tool calls (#30302)
## Summary - Preserve the optional namespace on custom tool calls during response deserialization and app-server replay. - Use the namespaced tool identifier for streaming argument handling and tool dispatch. - Regenerate app-server protocol schemas. - Add regression tests covering namespace serialization and routing. ## Testing - Ran affected protocol and app-server test suites. - Ran the full core test suite; two load-sensitive timing tests passed when rerun individually. - Ran Clippy and formatting checks. - Verified with a local end-to-end app-server replay that the namespace is preserved through the complete request/response flow.
nhamidi-oai ·
2026-06-27 09:54:56 -07:00 -
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
Retry failed Codex Apps MCP startup (#29920)
## Problem The built-in Codex Apps MCP client shares a future for the full startup operation: connect, complete `initialize`, fetch the initial tools, and return a usable client. Sharing deduplicates startup work, but it also memoizes terminal errors. After a transient connection, handshake, or initial `tools/list` failure, later tool builds observe the same failed future. The thread cannot reconnect after the backend recovers and continues serving its startup-time cached tool snapshot, which may be empty or stale. ## Fix When Apps MCP startup ends in an error, Codex starts bounded recovery without putting startup latency on tool-router construction: 1. The current tool build immediately continues with the cached startup snapshot. 2. After the initial failure is reported, Codex starts one fresh full startup attempt in the background. 3. Concurrent tool builds share that in-flight attempt and also continue with cached tools. 4. On success, the recovered client becomes active, refreshes the Apps tools cache, emits a `Ready` startup status, and is reused by later operations. 5. On failure, the cache remains unchanged and later tool builds may start another background attempt after exponential cooldown: 1s, 2s, 4s, 8s, 16s, then 30s maximum. Each recreated startup performs a fresh MCP `initialize` and uncached `tools/list`. The MCP client retains its existing bounded retries for retryable `initialize` and `tools/list` failures. This avoids adding the Apps startup timeout to every request during a sustained outage. ## Scope This is limited to the built-in Codex Apps MCP client: - no reconnects for user-configured MCP servers; - no cache deletion; and - no proactive refresh for a healthy client with stale tools. ## Tests Coverage verifies: - tool builds return cached tools without waiting for a blocked reconnect; - concurrent tool builds start only one background reconnect; - failed reconnects preserve cached tools and respect exponential cooldown; - a recovered client is retained and reused; and - a long-lived thread exposes recovered app tools on a later follow-up. Validation: - `just test -p codex-mcp` — 95 passed - `just test -p codex-core later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures --no-capture` — passed - `just fix -p codex-mcp` - `just fmt`
kbazzi ·
2026-06-25 21:31:12 -07:00 -
core: expose permission profile to shell tools (#29941)
## tl;dr Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name of the current permission profile when invoking a shell tool. ## Why Shell tool owners may need to launch nested commands under the same named permission profile, including through `codex sandbox -P PROFILE --include-managed-config`. Until now, child processes could observe sandbox and network metadata but could not identify the active named permission profile. The `--include-managed-config` flag is essential when a helper reconstructs the sandbox from a profile name: it ensures the nested sandbox also loads managed enterprise requirements. Without it, using the inherited profile could unintentionally create a sandbox that does not enforce the organization's managed restrictions. The new environment value is intentionally informational and **must not be treated as trusted input**. Any process in the ancestry can overwrite an environment variable, so a consumer that passes this value to `codex sandbox -P` must first validate it against the profiles that helper is authorized to use. ## Example Use Case Suppose an organization provides a trusted `remote-bash` wrapper that lets Codex run a command on an approved build host. The local shell command uses the named `:workspace` permission profile: ```toml default_permissions = ":workspace" ``` The command exposed to the model is a small zsh wrapper. It deliberately delegates with `exec`, preserving the original arguments and process environment: ```zsh #!/usr/bin/env zsh exec /opt/codex-tools/remote_bash.py "$@" ``` The model invokes the public wrapper, not its Python implementation: ```sh /opt/codex-tools/remote-bash \ --host builder.example.com \ -- printf '%s' 'hello world' ``` Only the inner implementation is authorized to escape the local sandbox: ```starlark prefix_rule( pattern=["/opt/codex-tools/remote_bash.py"], decision="allow", ) ``` With zsh-fork, execution begins with `remote-bash` inside the `:workspace` sandbox. When the wrapper calls `exec`, the exact prefix rule matches `remote_bash.py`, so that inner script is restarted unsandboxed. The escalated process inherits: ```text CODEX_PERMISSION_PROFILE=:workspace ``` Inheritance does not make the value trustworthy. `remote_bash.py` independently allowlists both the remote host and the permission profile before using either value. In particular, a forged value such as `:danger-full-access` is rejected before it can reach `codex sandbox -P`: ```python import argparse import os import shlex import sys ALLOWED_HOSTS = {"builder.example.com"} ALLOWED_PROFILES = {":workspace"} parser = argparse.ArgumentParser() parser.add_argument("--host", required=True) separator = sys.argv.index("--") args = parser.parse_args(sys.argv[1:separator]) command = sys.argv[separator + 1:] if args.host not in ALLOWED_HOSTS: parser.error("host is not allowlisted") if not command: parser.error("the remote command must not be empty") profile = os.environ.get("CODEX_PERMISSION_PROFILE") if not profile: raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty") if profile not in ALLOWED_PROFILES: raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted") remote_command = shlex.join(command) sandbox_command = shlex.join([ "codex", "sandbox", "-P", profile, "--include-managed-config", "--", "bash", "-lc", remote_command, ]) print(shlex.join(["ssh", args.host, sandbox_command])) ``` This builds each command layer as an argument vector and uses `shlex.join()` at the boundary, rather than interpolating untrusted shell text. After validation and parsing, the nested command has this structure: ```text ssh argv: ["ssh", "builder.example.com", SANDBOX_COMMAND] SANDBOX_COMMAND argv: ["codex", "sandbox", "-P", ":workspace", "--include-managed-config", "--", "bash", "-lc", "printf %s 'hello world'"] bash -lc payload argv: ["printf", "%s", "hello world"] ``` A production implementation could execute that SSH command. The integration fixture prints it and parses the result back into arguments, verifying the complete flow: ```text model invokes outer wrapper -> zsh-fork starts wrapper under :workspace -> wrapper execs allowlisted Python script -> prefix rule restarts Python script unsandboxed -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace -> Python script verifies :workspace is allowlisted -> remote command runs codex sandbox -P :workspace with --include-managed-config -> nested sandbox honors managed enterprise requirements ``` This gives the trusted helper access to resources outside the local sandbox—such as SSH credentials—while ensuring that it can select only an explicitly authorized profile and that work on the remote host remains subject to the organization's managed requirements. ## What changed - Inject `CODEX_PERMISSION_PROFILE` after shell environment policy evaluation so the active profile wins over inherited or configured stale values. - Apply the variable to both `shell_command` and unified `exec_command`, including local, zsh-fork, and remote exec-server paths. - Remove stale values when the session has no active named profile. - Preserve the current profile value when loading a shell snapshot so a parent snapshot cannot restore an older profile. ## Testing - Added classic-shell integration coverage proving an exact prefix rule can run a `require_escalated` script outside the `:workspace` sandbox while preserving `CODEX_PERMISSION_PROFILE=:workspace`. - Added zsh-fork integration coverage in which the model invokes an outer zsh wrapper, an inner allowlisted `remote_bash.py` runs unsandboxed, and its printed SSH command reconstructs the inherited `:workspace` sandbox with `--include-managed-config` while preserving every argument after `--`. - The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and validates it against `ALLOWED_PROFILES` before constructing the nested command. - Assert that the reconstructed sandbox command includes `--include-managed-config` so nested use of the inherited profile cannot bypass managed enterprise requirements. - Added coverage for overriding and removing stale profile values. - Verified `shell_command` receives the selected active profile. - Added shell snapshot coverage using `printenv CODEX_PERMISSION_PROFILE`.Michael Bolin ·
2026-06-25 19:00:23 +00:00 -
feat: add provider-aware model fallback to thread start (#29942)
## Why Helper threads such as task title generation can request a model ID that is valid for the default OpenAI provider but unavailable from the active provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the provider static catalog exposes Bedrock model IDs such as `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background 404s and can surface a misleading turn error even when the main turn succeeds. Clients need an explicit way to ask app-server to resolve an unavailable helper model to the active provider default. That fallback must remain limited to providers with an authoritative static catalog so custom or dynamically discovered model IDs are not rewritten based on an incomplete catalog. Fixes #28741. ## What changed - Add the experimental `allowProviderModelFallback` option to `thread/start`, defaulting to `false` to preserve existing behavior. - Thread the option through thread creation and model selection. - When enabled for a static model manager, preserve requested models present in the catalog and replace unavailable models with the provider default. - Continue preserving explicit model IDs for dynamic model managers without fetching a catalog solely to validate them. - Document the new `thread/start` behavior in the app-server API overview. ## Test Temporary test-client harness: ``` ThreadStartParams { model: Some("gpt-5.4-mini".to_string()), allow_provider_model_fallback: true, ..Default::default() } ``` Command: ``` CODEX_HOME=/tmp/codex-bedrock-thread-start-home \ CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \ ./target/debug/codex-app-server-test-client \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ send-message-v2 --experimental-api ignored ``` Relevant output: ``` > "method": "thread/start", > "params": { > "model": "gpt-5.4-mini", > "modelProvider": null, > "allowProviderModelFallback": true, > ... > } < "result": { < "model": "openai.gpt-5.5", < "modelProvider": "amazon-bedrock", < ... < } ```
Celia Chen ·
2026-06-25 18:24:34 +00:00 -
[codex] Add Ultra reasoning effort (#29899)
## Why Ultra should be one user-facing reasoning selection for work that benefits from both maximum reasoning and proactive multi-agent delegation. Without it, clients must coordinate maximum reasoning with the experimental `multiAgentMode` setting, even though the inference backend still expects its existing `max` effort value. This change makes reasoning effort the source of truth: clients select `ultra`, core derives proactive multi-agent behavior when the turn is eligible for multi-agent V2, and inference requests continue to use the backend-compatible `max` value. ## What changed - Add `ultra` as a first-class reasoning effort and preserve model-catalog ordering when exposing it to clients. - Convert `ultra` to `max` at the inference request boundary, including Responses HTTP/WebSocket requests, startup prewarm, compaction, and memory summarization. - Derive effective multi-agent mode per turn from effective reasoning effort: - eligible multi-agent V2 + `ultra` → `proactive` - eligible multi-agent V2 + any other effort → `explicitRequestOnly` - V1 or otherwise ineligible sessions → no multi-agent mode instruction - Keep the derived effective mode in turn context history so successive turns can emit a developer-message update only when the effective mode changes. - Remove selected multi-agent mode from core session configuration, turn construction, thread settings, resume/fork restoration, and subagent spawn plumbing. Subagents inherit reasoning effort and derive their own effective mode. - Retain the experimental app-server `multiAgentMode` fields for wire compatibility while marking them deprecated. Request values are accepted but ignored; compatibility response fields report `explicitRequestOnly`. - Display Ultra in the TUI using the order supplied by `model/list`. ## Validation - `just test -p codex-core ultra_reasoning_uses_max_for_requests` - `just test -p codex-tui model_reasoning_selection_popup`
Shijie Rao ·
2026-06-24 20:13:52 -07:00 -
[codex] Inject agent graph store into ThreadManager (#29736)
Pick up the AgentGraphStore migration. - Inject an explicit optional agent graph store into `ThreadManager` - Move all calls to spawn, close, recursive resume, and subtree/archive/delete/feedback traversal through it - Keep using `LocalAgentGraphStore` when SQLite is available This required some changes to the interface to deal with futures: - The interface now matches `ThreadStore`'s object-safe pattern by returning a boxed `AgentGraphStoreFuture` directly, allowing `ThreadManager` to hold `Arc<dyn AgentGraphStore>` *Slight behavior change!* Unfiltered subtree enumeration now performs a single all-status breadth-first traversal, so a closed grandchild beneath an open edge is included; the previous Open-then-Closed traversals could not cross mixed-status paths and silently omitted it.
Tom ·
2026-06-24 13:24:10 -07:00 -
test: add app-server auto environment helper (#29746)
## Why Start moving towards app-server tests defaulting to running against remote & foreign OS executors. To do so we need a point of indirection similar to core integration tests' `build_with_auto_env`, but with the flexibility of letting tests control environment registration if they need to. ## What This adds: - `TestAppServer::new_with_auto_env()` for constructing an app server with a default environment defined by the test runner (e.g. bazel) - `TestAppServer::auto_env_params()` for tests to easily acquire turn env params tailored to the automatic environment - `TestAppServer::send_thread_start_request_with_auto_env()` to make it easy for tests to start a thread using the automatic environment The above methods all fail if the test calling them has set up an environment where the automatic environment configuration conflicts with test-created state. ## Validation Adds a couple of basic smoke tests to the app-server test suite. Follow-ups will migrate more tests to use it.
Adam Perry @ OpenAI ·
2026-06-24 01:06:29 +00:00 -
core tests: rename automatic environment builder (#29728)
## Why Use a clearer name for what happens when this helper sets up a test environment. ## What - Rename the builder and its harness wrapper to use `auto_env` instead of `remote_env` because the helper will set up a local environment if configured by the build system.
Adam Perry @ OpenAI ·
2026-06-23 21:45:06 +00:00 -
test: branch on target OS instead of runner flavor (#29712)
## Why Core tests should branch on the executor's operating system, not on runner details such as Docker or Wine. This keeps platform behavior stable as new test backends are added and reserves Wine-specific skips for actual runner debt. ## What - Add `TestTargetOs` and target/host-aware skip helpers while keeping `TestEnvironment` internal. - Replace topology enum access with remote predicates and a narrow Docker accessor. - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and document the skip taxonomy. ## Validation - `just test -p core_test_support` - `just test -p codex-core remote_test_env_can_connect_and_use_filesystem` - `bazel test //codex-rs/core:core-all-wine-exec-test --test_output=errors` reached test execution; unrelated existing view-image, path, and timing failures remain. - `just test -p codex-core` and `just test` reached broad test execution; this checkout has unrelated helper, sandbox, and timing failures.
Adam Perry @ OpenAI ·
2026-06-23 14:27:13 -07:00 -
path-uri: clarify host-native path conversion (#29501)
## Why Downstream refactors are producing confusing code with this functionality having a very generic name. Encoding the specific conversion approach in the method name makes it clearer. ## What Rename `PathUri::from_path` to `PathUri::from_host_native_path` and update its Rust call sites.
Adam Perry @ OpenAI ·
2026-06-23 00:02:33 +00:00 -
feat(core): store turn_id on ResponseItem metadata (#28360)
## Description This PR is a followup to https://github.com/openai/codex/pull/28355 and starts assigning `internal_chat_message_metadata_passthrough.turn_id` to durable Responses API items created during a turn. The goal is that those items keep the `turn_id` that introduced them when Codex resends stateless HTTP context, reconstructs history for resume/fork paths, or reuses websocket response state. ## What changed - Set `internal_chat_message_metadata_passthrough.turn_id` when missing as response items enter durable history, initial/replacement history, inter-agent communication history, and local compaction summaries. - Preserve existing item turn IDs instead of overwriting them during persistence, resume reconstruction, compaction, forked history, and websocket incremental reuse. - Keep `compaction_trigger` fieldless because it is a request control, not a durable response item. - Update focused history/request assertions and fixtures for stateless requests, websocket incrementals, compaction, thread injection, prompt debug, and related CI coverage.
Owen Lin ·
2026-06-22 16:45:14 -07:00 -
core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
## Description This PR cuts Codex over from generic `ResponseItem.metadata` (introduced here: https://github.com/openai/codex/pull/28355) to `ResponseItem.internal_chat_message_metadata_passthrough`, which is the blessed path and has strongly-typed keys. For now we have to drop this MAv2 usage of `metadata`: https://github.com/openai/codex/pull/28561 until we figure out where that should live.
Owen Lin ·
2026-06-22 11:11:25 -07:00 -
Expose thread-level multi-agent mode (#28792)
## Why Once multi-agent mode can be selected per turn, clients also need to choose the initial selection when creating a thread and observe that selection through lifecycle and settings APIs. The selected value is intentionally distinct from the effective model-visible value: no client selection is represented as `null`, even though an eligible multi-agent v2 turn derives `explicitRequestOnly` as its effective default. ## What changed - Add the optional experimental `thread/start.multiAgentMode` parameter and pass it through thread creation. - Preserve an omitted initial value as an unset selection rather than eagerly storing `explicitRequestOnly`. - Apply an explicit `thread/start` selection to the first turn through the session configuration established at thread creation. - Restore the latest persisted effective mode as the selected baseline on cold resume when rollout history contains one. - Inherit the optional selected mode from a loaded parent when creating related runtime threads. - Return the current selected `multiAgentMode` from `thread/start`, `thread/resume`, `thread/fork`, and thread settings, using `null` when no mode is selected. - Keep lifecycle reporting independent from model capability and feature eligibility; core turn construction remains responsible for calculating and persisting the effective mode. ## Not covered - Clearing an existing loaded-session selection back to unset through `turn/start`; omitted or `null` currently retains the session's selection. - A TUI control, slash command, or `config.toml` preference. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode` The focused app-server coverage verifies explicit `thread/start` initialization, first-turn prompting, nullable reporting for an omitted selection, and retention of selections that are not currently runtime-eligible. ## Stack Stacked on #28685. This PR contains only the thread initialization and lifecycle/settings API layer.
Shijie Rao ·
2026-06-19 10:50:44 +02:00 -
[plugins] Refresh plugin and tool caches after remote install (#28951)
Summary - Refresh the installed remote-plugin snapshot and Codex Apps tools after completing a remote JIT install. - Gate `completed: true` on every expected `app_connector_id` appearing after the uncached `tools/list` refresh, while continuing to skip local bundle verification for server-side installs. - Keep the cached recommendations response and filter refreshed installed remote IDs locally, so this does not add another recommendations fetch. - Add regression coverage for tools appearing after the hard refresh and remaining absent after the refresh. The resumed model request sees the refreshed tool router when installation completes. Root Cause - Remote suggestions from `openai-curated-remote` returned `true` before taking the existing connector refresh path, leaving the resumed turn with the pre-install Apps tool catalog. Validation - `just test -p codex-core request_plugin_install` - `just test -p codex-core-plugins recommended_plugin_candidates_filter_installed_and_disabled_plugins` - `just test -p codex-core-plugins` - `just fix -p codex-core-plugins` - `just fix -p codex-core` - `just fmt` - `just test -p codex-core` was not fully clean locally: 2,729 passed, 26 failed, and 16 skipped. The failures were dominated by local Seatbelt/network/timing issues, including plugin-install timeouts under full-suite contention; the focused plugin-install runs pass.
Alex Daley ·
2026-06-18 20:08:04 -04:00 -
core: load AGENTS.md from foreign environments (#28958)
## Why Make it possible to load AGENTS.md from remote exec-servers whose OS is different than app-server. ## What - keep `AGENTS.md` discovery and provenance as `PathUri`, with root-aware parent and ancestor traversal - expose lifecycle instruction sources as legacy app-server path strings in events while retaining `PathUri` internally - preserve and test mixed POSIX and Windows paths in model context and TUI status output - cover remote Windows loading end to end by seeding the Wine prefix through host filesystem APIs - fix bug in `PathUri`'s parent() implementation that would erase Windows drive letters
Adam Perry @ OpenAI ·
2026-06-18 15:06:23 -07:00 -
Emit Trusted MCP App Identity on Tool-Call Items (#27132)
## Summary - Add optional `appContext` to app-server MCP tool-call items with trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata. - Preserve that context across tool-call events, persisted history, reconnects, and thread resume. - Keep the deprecated top-level `mcpAppResourceUri` temporarily for client migration. The consumer contract is `{ appContext: { connectorId, linkId, mcpAppResourceUri }, tool }`. ## Validation - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy, release builds, and argument-comment lint. --------- Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>martinauyeung-oai ·
2026-06-18 14:02:54 -07:00 -
current time reminders impl for system clock (varlatency 2/n) (#28824)
Stacked on #28822. ## Summary - add a host-injectable current-time provider with a built-in system implementation - record UTC developer reminders in history immediately before due model requests - keep cadence state per session and force a refresh after compaction This does NOT include the app server client <-> server clock logic. This PR is only for the reminder message & system clock that will be used in prod. ## Testing - `just test -p codex-core varlatency_` - `just clippy -p codex-core -p codex-app-server -p codex-mcp-server -p codex-thread-manager-sample` - `just fmt`
rka-oai ·
2026-06-18 19:18:42 +00:00 -
Support
openai/formextended form elicitations (#27500)# Summary Allow App Server clients to opt into `openai/form` MCP elicitations.
Gabriel Peal ·
2026-06-18 11:54:49 -07:00 -
app-server: keep the model cache warm (#28699)
## Why The app server is long-lived, but its shared model cache otherwise refreshes only when a caller needs it. Once the five-minute cache expires, starting a thread or calling `model/list` can wait for `/models` on the request path. Refresh the cache in the background before it expires so foreground callers normally use fresh local state. ## What changed - Start an app-server worker that refreshes models immediately and then every three minutes using the existing models-manager API. - Hold only a weak reference to the models manager between refreshes, so the worker does not extend its lifetime. - Stop scheduling refreshes when the app-server lifecycle handle is shut down or dropped. A refresh already in progress is allowed to finish. - Adjust affected app-server test fixtures to distinguish the background `/models` probe from the connection they are testing. The existing models-manager cache, refresh strategies, auth handling, ETag behavior, and concurrency semantics are unchanged. ## Testing - `models_refresh_worker::tests::refreshes_immediately_periodically_and_stops_when_dropped` - `suite::v2::remote_control::listen_off_honors_persisted_remote_control_enable` - `suite::v2::attestation::attestation_generate_round_trip_adds_header_to_responses_websocket_handshake`
jif ·
2026-06-17 16:18:39 +02:00 -
Clarify model-generated and legacy app path types (#28577)
## Why `ApiPathString` kind of implies that it can be used anywhere we pull a path out of JSON, but it's not really appropriate for tool arguments when the model might generate relative paths. Prefer `String` for model-generated paths and we can handle the conversion per feature for now and define a shared abstraction later if it makes sense. # What Rename `ApiPathString` to `AppLegacyPathString` to clarify its role. Expand the `path-types` skill to tell the model to leave tool args as bare strings.
Adam Perry @ OpenAI ·
2026-06-16 20:47:43 +00:00 -
[tests] Keep Apps out of generic core test harness (#28508)
## Summary - disable the stable Apps feature in the generic `test_codex()` integration-test harness - keep Apps-specific tests explicit: their builders re-enable Apps and point it at a local mock server ## Why Generic tests that use dummy ChatGPT auth were also enabling the host-owned `codex_apps` MCP server. That made unrelated tests contact `chatgpt.com` and wait for MCP startup, causing the Bazel timeouts observed on #28368. The generic harness should be hermetic and should not start an external service that the test did not request. This is test-only; production Apps behavior is unchanged. The broader optional-MCP startup behavior is being handled separately in #28407. ## Testing - `just test -p codex-core -E 'test(pre_sampling_compact_runs_when_comp_hash_changes) | test(model_switch_to_smaller_model_updates_token_context_window) | test(codex_apps_file_params_upload_local_paths_before_mcp_tool_call)'` - `just fix -p codex-core` - `just fmt`
jif ·
2026-06-16 13:07:43 +02:00 -
[codex] Use expect in integration tests (#28441)
The workspace denies `clippy::expect_used` in production. Although `clippy.toml` allows `expect` in tests, Bazel Clippy compiles integration-test helper code in a way that does not receive that exemption, which encouraged verbose `unwrap_or_else(... panic!(...))` and equivalent `match`/`let else` forms. This allows `clippy::expect_used` once at each integration-test crate root (including aggregated suites and test-support libraries), then replaces manual panic-based Result and Option unwraps with `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own crate roots. Intentional assertion and unexpected-variant panics remain unchanged, and the production `expect_used = "deny"` lint remains in place. The cleanup is mechanical and net-negative in line count.
pakrym-oai ·
2026-06-15 21:53:47 -07:00 -
Run core integration tests against a Wine-backed Windows executor (#28401)
## Why We want to exercise a linux app-server against a windows exec-server without having to repeat every test case. This approach has slight precedent in the remote docker test setup. ## What Run the shared `codex-core` integration suite against Windows exec-server behavior from Linux. This makes cross-OS path and shell regressions visible while keeping unsupported cases owned by individual tests. - Add `local`, `docker`, and `wine-exec` test environment selection with legacy Docker compatibility. - Extend `codex_rust_crate` to generate a sharded Wine-exec variant using a cross-built Windows server and pinned Bazel Wine/PowerShell runtimes. - Teach remote-aware helpers about Windows paths and track temporary incompatibilities with source-local `skip_if_wine_exec!` calls and follow-up reasons.
Adam Perry @ OpenAI ·
2026-06-16 00:38:41 +00:00 -
feat(core): add metadata field to ResponseItem (#28355)
## Description This PR adds an optional `metadata` field to `ResponseItem` for Responses API calls. Only mechanical plumbing, no actual values populated and sent yet. Turns out just adding a new field to `ResponseItem` has quite a large blast radius already. This change is backwards compatible because `metadata` is optional and omitted when absent, so existing response items and rollout history without it still deserialize and requests that do not set it keep the same wire shape. For provider compatibility, we strip out `metadata` before non-OpenAI Responses requests so Azure and AWS Bedrock never see this field. My followup PR here will actually make use of it to start storing and passing along `turn_id`: https://github.com/openai/codex/pull/28360 ## What changed - Added `ResponseItemMetadata` with optional `turn_id`, plus optional `metadata` on Responses API item variants and inter-agent communication. - Preserved item metadata through response-item rewrites such as truncation, missing tool-output synthesis, compaction history rebuilding, visible-history conversion, rollout/resume, and generated app-server schemas/types. - Strip item metadata from non-OpenAI Responses requests while preserving it for OpenAI-shaped requests. - Updated the mechanical fixture/test construction churn required by the new optional field.
Owen Lin ·
2026-06-15 15:05:28 -07:00 -
[codex] exec-server honors remote environment cwd and shell (#28122)
## Why Next slice needed to make progress on the `remote_env_windows` test is to support passing a Windows cwd for the remote environment and using that environment's native shell. This lets the test run a real Windows process instead of only recording an early path or shell mismatch. ## What - change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to `PathUri` - convert local cwd values to URIs when constructing selections - preserve a remote primary cwd instead of replacing it with the local legacy fallback - prefer the selected environment's discovered shell for unified exec, falling back to the session shell when unavailable - convert back to a host-native absolute path at current native-only consumer boundaries - reject or deny unsupported foreign cwd values at the existing request-permissions boundary, with TODOs for its future migration - extend the hermetic Wine test to execute Windows PowerShell in `C:\windows` and verify successful process completion - record the current app-server rejection against the same Wine-backed remote Windows fixture when its cwd is supplied as a native Windows path
Adam Perry @ OpenAI ·
2026-06-14 06:07:46 +00:00 -
[codex] make PathUri::from_abs_path infallible (#27976)
## Why `PathUri::from_abs_path` can fail for absolute paths that do not have a normal `file:` URI representation, forcing filesystem call sites to handle a conversion error even though the original path can be preserved losslessly. ## What Make `from_abs_path` infallible and migrate its callers. Unrepresentable paths use `file:///%00/bad/path/<base64>`, encoding Unix bytes or Windows UTF-16LE; `to_abs_path` validates and decodes that fallback. The leading encoded null reserves a namespace that cannot collide with a real Unix or Windows path, and fallback URIs remain opaque to lexical path operations. ## Validation Added path-URI coverage for Unix null and non-UTF-8 paths, Windows device/verbatim and non-Unicode paths, serialization, malformed fallbacks, opaque lexical operations, invalid native payloads, and literal `/bad/path` collision resistance.
Adam Perry @ OpenAI ·
2026-06-12 16:58:42 -07:00 -
[codex] Load AGENTS.md from all bound environments (#27696)
## Why We already have the machinery to support multiple environments on a single thread, but we only show the model the contents of `AGENTS.md` files in the primary environment. We should show the model all of the relevant project instructions when we know there's more than one environment. ## Known Gaps As discussed in the RFC, this implementation: 1. doesn't handle environments being added/removed to/from the thread after its creation 2. it doesn't enforce an aggregate context budget across environments, and instead applies the configured project maximum independently to each environment ## Implementation - Discover project instructions in environment order with an independent byte budget per environment and preserve source provenance/order. - Keep the legacy fragment byte-for-byte when exactly one environment contributes project instructions; use environment-labeled sections when two or more environments contribute. - Freeze the complete rendered fragment in `LoadedAgentsMd`, insert it directly into requests, and recognize both layouts in contextual and memory filtering. - Add exact rendering, independent-budget, source-order, creation-snapshot, and consumer coverage without changing app-server schemas.
Adam Perry @ OpenAI ·
2026-06-12 00:10:06 -07:00 -
core: Consolidate Responses API Codex metadata (#27122)
## What Introduce a `CodexResponsesMetadata` struct that defines all the core metadata we send to Responses API. Example fields are `thread_id`, `turn_id`, `window_id`, etc. Going forward, `client_metadata["x-codex-turn-metadata"]` will be the canonical way Codex sends metadata to Responses API across both HTTP and websocket transports. For now, we continue to emit the existing top-level HTTP headers and top-level `client_metadata` fields from the same `CodexResponsesMetadata` struct for compatibility reasons. Also, app-server clients who specify additional `responsesapi_client_metadata` via `turn/start` and `turn/steer` will have those fields merged into `client_metadata["x-codex-turn-metadata"]`, but cannot override the reserved fields that core uses (i.e. the fields in `CodexResponsesMetadata`). ## Why Responses API request instrumentation is the source of truth for downstream Codex analytics that join requests by Codex IDs such as session, thread, turn, and context window. Before this change, those values were assembled through several request-specific paths: HTTP request bodies, websocket handshake headers, websocket `response.create` payloads, compaction requests, and the rich `x-codex-turn-metadata` envelope all had their own wiring. That made metadata propagation easy to drift across API-key/direct Responses API requests, ChatGPT-auth/proxied requests, websocket requests, and compaction requests. It also made additions like `window_id` error-prone because a field could be added to one transport projection but missed in another. ## What changed - Added `CodexResponsesMetadata` as the core-owned snapshot for Codex metadata sent to ResponsesAPI. - Render `client_metadata["x-codex-turn-metadata"]`, flat `client_metadata` projections, and direct compatibility headers from that same snapshot. - Include the known Codex-owned fields in the turn metadata blob, including installation/session/thread/turn/window IDs, request kind, lineage, sandbox/workspace metadata, timing, and compaction details. - Treat app-server `responsesapi_client_metadata` as enrichment for the Codex turn metadata blob while preventing those extras from overriding Codex-owned fields. - Use the same metadata path for normal turns, websocket prewarm, local compaction, remote v1 compaction, and remote v2 compaction. - Keep websocket connection-only preconnect metadata separate so handshakes carry compatibility identity headers without inventing a fake turn metadata blob. ## Verification - `cargo check -p codex-core` - `just fix -p codex-core`
Owen Lin ·
2026-06-11 13:42:09 -07:00 -
[codex] Load user instructions through an injected provider (#27101)
## Why We want to remove implicit use of `$CODEX_HOME` from `codex-core` and make embedders responsible for supplying user-level instructions. This also ensures user instructions load when no primary environment is selected. ## What changed Stacked on #27415, which makes `codex exec` surface thread-scoped runtime warnings. - Added `UserInstructionsProvider` to `codex-extension-api`, with absolute source attribution and recoverable loading warnings. - Added `codex-home` with the filesystem-backed provider for `AGENTS.override.md` and `AGENTS.md`, preserving precedence, fallback, trimming, lossy UTF-8 handling, and the existing uncapped global instruction size. - Removed global instruction loading from `Config` and require `ThreadManager` callers to inject a provider. - Load provider instructions once for each fresh root runtime, including runtimes without a primary environment. Running sessions retain their snapshot, while child agents inherit the parent snapshot without invoking the provider. - Keep provider instructions separate while loading project `AGENTS.md`, then assemble the model-visible instructions with the existing ordering, source attribution, warning, and turn-context behavior. - Wired the Codex home provider through the CLI, app server, MCP server, core facade, and thread-manager sample. ## Validation - `just test -p codex-home -p codex-extension-api` - `just test -p codex-core agents_md` - `just test -p codex-core guardian` - `just test -p codex-app-server thread_start_without_selected_environment_includes_only_global_instruction_source` - `just test -p codex-exec warning` - `just bazel-lock-check`
Adam Perry @ OpenAI ·
2026-06-11 19:28:47 +00:00 -
[codex] migrate ExecutorFileSystem paths to PathUri (#27424)
## Why We're moving exec-server to use PathUri for its internal path representations. ## What Move `ExecutorFileSystem` APIs to use `PathUri` instead of `AbsolutePathBuf`. Future changes will convert higher-level parts of exec-server.
Adam Perry @ OpenAI ·
2026-06-11 18:44:18 +00:00 -
Pair thread environment settings (#26687)
## Why Thread cwd and environment selections are a single logical setting in core: updating one without the other can silently desynchronize the next-turn execution context. This change makes that relationship explicit in the internal thread settings flow while preserving the existing app-server public API shape. ## What changed - Moved the cwd/environment pair through internal `ThreadSettingsOverrides.environment_settings` instead of a top-level internal `cwd` field. - Kept `thread/settings/update` public params unchanged, with app-server translating top-level `cwd` into the paired internal settings shape. - Moved `Op::UserInput` environment overrides into thread settings so user turns and settings updates use the same core path. - Updated core, app-server, MCP, memories, sample, and test callsites to construct the paired settings shape. ## Verification - `git diff --check` - Local test run starting after PR creation.
pakrym-oai ·
2026-06-08 13:55:15 -07:00 -
fix: preserve approval sandbox decisions in unified exec (#24981)
## Why This PR fixes approval sandbox semantics in the unified-exec path. The zsh-fork runtime exposed the bug because the shell can do meaningful work before any intercepted child `execv(2)` exists: redirections, builtins, globbing, and pipeline setup all happen in the launch process. If the model requested `sandbox_permissions=require_escalated`, or an exec-policy `allow` rule explicitly bypassed the sandbox, that approved sandbox decision needs to be preserved for the launch path and for intercepted execs that use the same approval machinery. The behavior is not only about zsh fork. The production changes are in shared approval/escalation code, so they also affect non-zsh-fork intercepted exec paths that go through the same sandbox decision logic. The narrow intent is to preserve the approval decision while still keeping denied-read profiles and bounded additional-permission requests sandboxed. ## Production Changes - `codex-rs/core/src/tools/runtimes/unified_exec.rs`: derives a `launch_sandbox_permissions` value from the requested sandbox permissions and the runtime filesystem policy, then uses that value for managed-network/env setup and launch sandbox selection. This keeps full approval or policy-bypass decisions visible to the first unified-exec attempt, while still preventing a full sandbox override from discarding denied-read restrictions. Direct unified exec keeps the same decision surface; the important difference is that zsh-fork launch setup no longer accidentally loses the approved parent sandbox decision. - `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`: makes intercepted-exec escalation selection explicit for the three sandbox permission modes. `UseDefault` only escalates when an exec-policy decision allows sandbox bypass, `RequireEscalated` escalates when unsandboxed execution is allowed, and `WithAdditionalPermissions` escalates through the bounded additional-permissions path instead of being treated as a full unsandboxed override. Unsandboxed intercepted execs now also rebuild the environment as `RequireEscalated`, which strips managed-network proxy variables consistently with other unsandboxed execution. ## Test Coverage Most of the PR is tests. The new coverage verifies: - unified exec preserves parent approval and exec-policy sandbox decisions for zsh-fork launch selection; - bounded `with_additional_permissions` remains sandboxed and permission-profile based; - denied-read profiles are not weakened by parent approval; - explicit prompt rules still prompt for intercepted execs after the parent command is approved; - unsandboxed intercepted execs strip managed-network env vars. No documentation update is needed; this is an internal approval/sandbox correctness fix. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24981). * #24982 * __->__ #24981
Michael Bolin ·
2026-06-07 11:33:16 -07:00 -
[codex] Use standalone tools for Responses Lite (#26490)
## Summary Responses Lite does not execute hosted Responses tools, so models using it must route web search and image generation through Codex-owned executors & standalone Response's API endpoints. This PR is stacked on #26487. ## Validation - `cargo test -p codex-core responses_lite_ --lib` - `cargo test -p codex-core standalone_executors_remain_hidden_without_flags_or_responses_lite --lib` - `cargo test -p codex-core hosted_tools_follow_provider_auth_model_and_config_gates --lib` - `cargo test -p codex-web-search-extension -p codex-image-generation-extension` - `cargo test -p codex-app-server --test all standalone_` - `cargo fmt --all -- --check`
rka-oai ·
2026-06-06 00:23:40 +00:00 -
Require absolute cwd in thread settings (#26532)
## Why Thread settings cwd overrides are expected to be resolved before they enter core. Keeping this boundary as a plain `PathBuf` made it easy for core/session code to keep fallback normalization and relative-path resolution logic in places that should only receive an already-resolved cwd. This is intentionally the absolute-cwd-only slice: it does not change environment selection stickiness or cwd-to-default-environment fallback behavior. ## What changed - Changes `ThreadSettingsOverrides.cwd`, `CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to use `AbsolutePathBuf`. - Removes core-side cwd normalization/resolution from session settings updates. - Updates affected core/app-server test helpers and callsites to pass existing absolute cwd values or use `abs()` helpers. ## Validation Opening as draft so CI can start while local validation continues.
pakrym-oai ·
2026-06-05 09:29:15 -07:00 -
[codex] Preserve logical paths during AGENTS.md discovery (#26465)
## Intent Follow up on #26205 by avoiding unnecessary filesystem canonicalization during `AGENTS.md` discovery. The configured working directory is already absolute, and canonicalization incorrectly switches symlinked workspaces from their logical parent hierarchy to the target's hierarchy. ## User-facing behavior For a symlinked working directory such as: ```text test-root/ |-- logical-repo/ | |-- AGENTS.md ("logical parent doc") | `-- workspace ------------> physical-repo/workspace/ `-- physical-repo/ |-- AGENTS.md ("physical parent doc") `-- workspace/ `-- AGENTS.md ("workspace doc") ``` Before this change, Codex canonicalized `logical-repo/workspace` to `physical-repo/workspace` before discovery. It therefore loaded `physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`, ignoring the instructions from the repository through which the user entered the workspace. After this change, ancestor discovery walks the configured logical path, so Codex loads `logical-repo/AGENTS.md`. Opening `logical-repo/workspace/AGENTS.md` still follows the symlink through the host filesystem, so the workspace document is also loaded. `physical-repo/AGENTS.md` is not loaded. ## Implementation Use the logical absolute working directory when discovering project instructions and reporting instruction sources. Filesystem reads still follow the working-directory symlink, so an `AGENTS.md` in the target workspace continues to load while ancestor discovery uses the symlink's parents. ## Validation Added integration coverage proving that discovery loads the logical parent's instructions and the target workspace's instructions, but not the target parent's instructions.
Adam Perry @ OpenAI ·
2026-06-04 15:08:52 -07:00 -
Switch runtime to cloud config bundle (#24622)
## Summary - Adapts the moved `codex-cloud-config` crate from the legacy cloud requirements endpoint to the new config bundle endpoint. - Switches runtime consumers from `CloudRequirementsLoader` to `CloudConfigBundleLoader` so one shared bundle supplies cloud-delivered config and requirements. - Removes the legacy cloud requirements domain loader path. ## Details This intentionally keeps `codex-cloud-config` monolithic for review lineage: the previous PR establishes the crate move, and this PR shows the behavior change against that moved implementation. A follow-up PR splits the module back into focused files. The new bundle path preserves the important cloud requirements loader semantics where intended: account-scoped signed cache, 30 minute TTL, 5 minute refresh cadence, retry/backoff, auth recovery, and fail-closed startup loading. The cached payload changes from a single requirements TOML string to the backend-delivered bundle, and validation rejects malformed config or requirements fragments before cache write/use.
joeflorencio-openai ·
2026-06-02 13:18:59 -07:00 -
[codex] Wait for MCP readiness in core integration tests (#24964)
Ensures MCP-backed `codex-core` integration tests exercise initialized servers instead of racing server startup. I've been idly investigating a few flakes and the failure modes are much more confusing when a tool call fails because of a failed server start than when the failed server start causes the test to fail directly.
Adam Perry @ OpenAI ·
2026-05-29 10:22:27 -07:00 -
[codex] Support ui visibility meta for tools (#24700)
## Summary Adds support for the same ui.visibility metadata as resources [spec](https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/draft/apps.mdx#resource-discovery)
Gabriel Peal ·
2026-05-28 10:24:03 -07:00 -
Add experimental turn additional context (#24154)
## Summary Adds experimental `additionalContext` support to `turn/start` and `turn/steer` so clients can provide ephemeral external context, such as browser or automation state, without turning that plumbing into a visible user prompt or triggering user-prompt lifecycle behavior. ## API Shape The parameter shape is: ```ts additionalContext?: Record<string, { value: string kind: "untrusted" | "application" }> | null ``` Example: ```json { "additionalContext": { "browser_info": { "value": "Active tab is CI failures.", "kind": "untrusted" }, "automation_info": { "value": "CI rerun is in progress.", "kind": "application" } } } ``` The keys are opaque and caller-defined. ## Context Injection When provided, accepted entries are inserted into model context as hidden contextual message items, not as visible thread user-message items. `kind: "untrusted"` entries are inserted with role `user`: ```text <external_${key}>${value}</external_${key}> ``` `kind: "application"` entries are inserted with role `developer`: ```text <${key}>${value}</${key}> ``` Values are not escaped. Each value is truncated to 1k approximate tokens before wrapping. For `turn/start`, accepted additional context is inserted before normal user input. For `turn/steer`, additional context is merged only when the steer includes non-empty user input; context-only steers still reject as empty input. ## Dedupe Strategy `AdditionalContextStore` lives on session state and stores the latest complete additional-context map. Each `turn/start` or non-empty `turn/steer` treats its `additionalContext` as the current complete set of values. Entries are injected only when the key is new or the exact entry for that key changed, including `value` or `kind`. After merging, the store is replaced with the provided map, so omitted keys are removed from the retained set and can be injected again later if reintroduced. Omitting `additionalContext`, passing `null`, or passing an empty object resets the store to empty and injects nothing. ## What Changed - Threads experimental v2 `additionalContext` through app-server into core turn start and steer handling. - Adds separate contextual fragment types for untrusted user-role context and application developer-role context. - Uses pending response input items so additional context can be combined with normal user input without treating it as prompt text. - Adds integration coverage for start/steer flow, role routing, dedupe/reset behavior, deletion/re-add behavior, hook-blocked input behavior, empty context-only steer rejection, external-fragment marker matching, and truncation.pakrym-oai ·
2026-05-26 13:02:34 -07:00 -
Move MCP tool naming mode into manager (#21576)
## Why The `non_prefixed_mcp_tool_names` feature should be applied where MCP tools become model-visible, not by remapping names later in core. Keeping the decision in `McpConnectionManager` construction makes `ToolInfo` the single shaped view that spec building, deferred tool search, routing, and unavailable-tool placeholders can consume directly. This also preserves the existing external behavior while the feature is off, and keeps the feature-on behavior for code mode and hooks explicit at the manager boundary. ## What Changed - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig` into `McpConnectionManager::new`. - Normalize MCP `ToolInfo` names in the manager using either legacy-prefixed namespaces or non-prefixed namespaces; the legacy path adds `mcp__` without restoring the old trailing namespace suffix. - Remove the core-side MCP name remapping path so specs, tool search, session resolution, and unavailable-tool placeholder construction use the manager-provided `ToolName` values directly. - Keep code mode flattening on the `__` namespace separator. - Preserve hook compatibility by giving non-prefixed MCP hook names legacy `mcp__...` matcher aliases. - Add/adjust integration and unit coverage for non-prefixed code-mode behavior, hook matching with the feature on and off, and manager-level legacy prefixing. ## Testing - `cargo test -p codex-mcp --lib` - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture` - `cargo test -p codex-core --lib mcp_tools -- --nocapture` - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture` - `cargo test -p codex-core --test all mcp_tool -- --nocapture` - `cargo test -p codex-core --test all search_tool -- --nocapture` - `cargo test -p codex-core --test all hooks_mcp -- --nocapture` - `cargo test -p codex-core --test all code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled -- --nocapture` - `cargo test -p codex-tools` - `cargo test -p codex-features`
pakrym-oai ·
2026-05-26 08:21:15 -07:00 -
Honor client-resolved service tier defaults (#23537)
## Why Model catalog responses can now advertise a nullable `default_service_tier` for each model. Codex needs to preserve three distinct states all the way from config/app-server inputs to inference: - no explicit service tier, so the client may apply the current model catalog default when FastMode is enabled - explicit `default`, meaning the user intentionally wants standard routing - explicit catalog tier ids such as `priority`, `flex`, or future tiers Keeping those states distinct prevents the UI from showing one tier while core sends another, especially after model switches or app-server `thread/start` / `turn/start` updates. ## What Changed - Plumbed `default_service_tier` through model catalog protocol types, app-server model responses, generated schemas, model cache fixtures, and provider/model-manager conversions. - Added the request-only `default` service tier sentinel and normalized legacy config spelling so `fast` in `config.toml` still materializes as the runtime/request id `priority`. - Moved catalog default resolution to the TUI/client side, including recomputing the effective service tier when model/FastMode-dependent surfaces change. - Updated app-server thread lifecycle config construction so `serviceTier: null` preserves explicit standard-routing intent by mapping to `default` instead of internal `None`. - Kept core responsible for validating explicit tiers against the current model and stripping `default` before `/v1/responses`, without applying catalog defaults itself. ## Validation - `CARGO_INCREMENTAL=0 cargo build -p codex-cli` - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list` - `cargo test -p codex-tui service_tier` - `cargo test -p codex-protocol service_tier_for_request` - `cargo test -p codex-core get_service_tier` - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core service_tier`
Shijie Rao ·
2026-05-20 15:57:50 -07:00 -
Make local environment optional in EnvironmentManager (#23369)
## Summary - make `EnvironmentManager` local environment/runtime paths optional - simplify constructor surface around snapshot materialization - rename local env accessors to `require_local_environment` / `try_local_environment` ## Validation - devbox Bazel build for touched crate surfaces - `//codex-rs/exec-server:exec-server-unit-tests` - `//codex-rs/app-server-client:app-server-client-unit-tests` - filtered touched `//codex-rs/core:core-unit-tests` cases
starr-openai ·
2026-05-19 12:55:34 -07:00 -
[5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
**Stack position:** [5 of 7] ## Summary This PR adds `Op::ThreadSettings`, a queued settings-only update mechanism for changing stored thread settings without starting a new turn. It also removes the legacy `Op::OverrideTurnContext` in the same layer, so reviewers can see the replacement and deletion together. ## Changes - Add `Op::ThreadSettings` for settings-only queued updates. - Emit `ThreadSettingsApplied` with the effective thread settings snapshot after core applies an update. - Route settings-only updates through the same submission queue as user input. - Migrate remaining `OverrideTurnContext` tests and callers to the queued `Op::ThreadSettings` path. - Delete `Op::OverrideTurnContext` from the core protocol and submission loop. This stack addresses #20656 and #22090. ## Stack 1. [1 of 7] [Add thread settings to UserInput](https://github.com/openai/codex/pull/23080) 2. [2 of 7] [Remove UserInputWithTurnContext](https://github.com/openai/codex/pull/23081) 3. [3 of 7] [Remove UserTurn](https://github.com/openai/codex/pull/23075) 4. [4 of 7] [Placeholder for OverrideTurnContext cleanup](https://github.com/openai/codex/pull/23087) 5. [5 of 7] [Replace OverrideTurnContext with ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR) 6. [6 of 7] [Add app-server thread settings API](https://github.com/openai/codex/pull/22509) 7. [7 of 7] [Sync TUI thread settings](https://github.com/openai/codex/pull/22510)
Eric Traut ·
2026-05-18 21:03:51 -07:00 -
[3 of 7] Remove UserTurn (#23075)
**Stack position:** [3 of 7] ## Summary This PR finishes the input-op consolidation by moving the remaining `Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`. This touches a lot of files, but it is a low-risk mechanical migration. ## Stack 1. [1 of 7] [Add thread settings to UserInput](https://github.com/openai/codex/pull/23080) 2. [2 of 7] [Remove UserInputWithTurnContext](https://github.com/openai/codex/pull/23081) 3. [3 of 7] [Remove UserTurn](https://github.com/openai/codex/pull/23075) (this PR) 4. [4 of 7] [Placeholder for OverrideTurnContext cleanup](https://github.com/openai/codex/pull/23087) 5. [5 of 7] [Replace OverrideTurnContext with ThreadSettings](https://github.com/openai/codex/pull/22508) 6. [6 of 7] [Add app-server thread settings API](https://github.com/openai/codex/pull/22509) 7. [7 of 7] [Sync TUI thread settings](https://github.com/openai/codex/pull/22510)
Eric Traut ·
2026-05-18 19:56:00 -07:00 -
[codex] Remove legacy shell output formatting paths (#22706)
## Why The client and tool pipeline still carried compatibility code for legacy structured shell output. Current shell and apply_patch responses are already plain text for model consumption, so keeping a JSON-serialization path plus shell-item rewrite logic makes the request formatter and tests preserve a format we do not need anymore. ## What Changed - Removed the client-side shell output rewrite from `core/src/client_common.rs`. - Removed the structured exec-output formatter and the shell `freeform` switch so tool emitters use one model-facing formatter. - Collapsed apply_patch/shell serialization tests around the remaining plain-text output expectations and removed duplicate one-variant parameterized cases. - Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility input shape, but no longer treats it as a separate output-format mode. ## Validation - `cargo test -p codex-core client_common` - `cargo test -p codex-core shell_serialization` - `cargo test -p codex-core apply_patch_cli` - `just fix -p codex-core` ## Documentation No external Codex documentation update is needed.
pakrym-oai ·
2026-05-18 09:57:54 -07:00 -
Add
user_input_requested_during_turnto MCP turn metadata (#22237)## Why - Similar change as https://github.com/openai/codex/pull/21219 - Without change: MCP tool calls receive `_meta["x-codex-turn-metadata"]` with various key values. - Issue: MCP servers currently do not know if user input was requested during the turn (Ex: Model decides to prompt the user for approval mid-turn before making a possibly risky tool call). MCP servers may want to know this when tracking latency metrics because these instances are inflated. ## What Changed - With change: MCP turn metadata now includes `user_input_requested_during_turn` when a model-visible `request_user_input` call happened earlier in the turn, propagated in `_meta["x-codex-turn-metadata"]`. - `mark_turn_user_input_requested()` is called when user input is requested through either MCP elicitation (`mcp.rs`) or the `request_user_input` tool (`mod.rs`). - MCP tool call `_meta` is now built immediately before execution (`mcp_tool_call.rs`) so user input requested earlier in the same turn, including within the same tool call via elicitation, is reflected in the metadata. - Normal `/responses` turn metadata headers are unchanged. ## Verification - `codex-rs/core/src/session/mcp_tests.rs` - `codex-rs/core/src/tools/handlers/request_user_input_tests.rs` - `codex-rs/core/src/turn_metadata_tests.rs` - `codex-rs/core/tests/suite/search_tool.rs`
mchen-oai ·
2026-05-15 01:26:50 +00:00 -
Remove SSE fixture loaders (#22684)
## Why The Responses API test support already has structured SSE event builders. Keeping separate JSON fixture loaders made small mock streams harder to read and left an on-disk fixture for a single event. ## What changed - Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str` from `core_test_support`. - Deleted the one `tests/fixtures/incomplete_sse.json` Responses API fixture. - Replaced the remaining call sites with `responses::sse(...)` and existing event helpers. ## Validation - `cargo test -p codex-core --test all stream_no_completed::retries_on_early_close` - `cargo test -p codex-core --test all history_dedupes_streamed_and_final_messages_across_turns` - `cargo test -p codex-core --test all review::`
pakrym-oai ·
2026-05-15 00:40:32 +00:00 -
chore(features) rm Feature::ApplyPatchFreeform (#22711)
## Summary Removes the feature since this is effectively on by default in all cases where we should use it, or can be configured via models.json. ## Testing - [x] unit tests pass
Dylan Hurd ·
2026-05-14 16:15:56 -07:00 -
Fix remote environment test fixtures (#22572)
## Why The Docker remote-env coverage was failing before it reached the behavior those tests are meant to exercise. The remote-aware test fixture only registered the remote environment, so tests that intentionally select both `local` and `remote` could not start a turn. After that was fixed, two tests exposed stale fixtures: the approval test was auto-approving under workspace-write, and the remote `view_image` test was writing invalid PNG bytes. ## What Changed - Added `EnvironmentManager::create_for_tests_with_local(...)` so tests can keep the provider default while also selecting `local` explicitly. - Updated `build_remote_aware()` to use that test-only manager when a remote exec-server URL is present. - Changed the remote apply-patch approval helper to use `SandboxPolicy::new_read_only_policy()` so the test actually exercises approval caching per environment. - Replaced the hardcoded remote `view_image` PNG blob with the existing `png_bytes(...)` helper so the test uses a valid image fixture. ## Validation Ran these isolated Docker remote-env tests on the devbox with `$remote-tests` setup: - `suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment` - `suite::remote_env::apply_patch_approvals_are_remembered_per_environment` - `suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment` - `suite::remote_env::exec_command_routes_to_selected_remote_environment` - `suite::view_image::view_image_routes_to_selected_remote_environment` All five pass.
starr-openai ·
2026-05-14 12:40:01 -07:00