## Summary
- add the `code_mode_host` feature flag and select
`ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
- initialize code-mode sessions lazily so a missing host reports a tool
error without failing thread startup
- resolve `codex-code-mode-host` beside the running Codex binary by
default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
- add unit and end-to-end coverage for host resolution and graceful
missing-host behavior
## Why
This wires the process-owned session client from #30112 into the core
service behind an opt-in rollout gate. Packaged Codex installations can
place the helper in the same `bin` directory as the main executable
without relying on `PATH`, while development and custom installations
can continue to override the helper path.
## Stack
- Depends on #30112
- Base branch: `cconger/process-owned-session-runtime-4-client`
## Validation
Build `codex` and `codex-code-mode-host`
`CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
./target/debug/codex --enable code_mode_host`
## Why
MCP authentication has distinct OAuth and ChatGPT-session flows.
Representing that choice as `use_chatgpt_auth` makes one flow implicit
and allows the configuration model to express the distinction only
through a boolean.
ChatGPT credential forwarding also needs a first-party trust boundary. A
configurable `chatgpt_base_url` controls routing, but must not grant an
MCP server permission to receive session credentials.
This change builds on #29733, where the boolean was introduced.
## What changed
- Replace `use_chatgpt_auth` with an `auth` field backed by the
exhaustive `McpServerAuth` enum.
- Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
the default.
- Trust only the origin derived from the existing hardcoded
`CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
- Keep configured bearer tokens and authorization headers ahead of the
selected authentication flow.
- Update config writers, schema output, fixtures, and integration-test
setup to use the enum.
## Verification
Integration coverage exercises the complete streamable HTTP startup path
in two independent configurations:
- A directly constructed MCP configuration verifies that matching an
overridden `chatgpt_base_url` does not grant ChatGPT auth.
- A persisted `config.toml` containing an attacker-controlled
`chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
through normal config parsing.
Both tests complete MCP initialization and tool listing and assert that
the full captured request sequence contains no authorization headers.
Separate integration coverage verifies that configured authorization
takes precedence over ChatGPT auth.
## Why
ChatGPT session authentication was inferred from the reserved Codex Apps
server name. That couples credential routing to Codex Apps-specific
behavior and prevents other MCP endpoints hosted by ChatGPT from
explicitly using the current session.
The opt-in also needs a clear security boundary: an arbitrary MCP
configuration must not be able to redirect ChatGPT credentials to
another origin.
## What changed
- Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
`false`.
- Honor the setting only when the parsed server URL has the same HTTP(S)
origin as the configured `chatgpt_base_url`; otherwise remove the
capability before startup.
- Resolve bearer tokens and static or environment-backed authorization
headers before selecting authentication, with configured authorization
taking precedence over ChatGPT session auth.
- Enable the setting for the built-in Codex Apps and hosted plugin
runtime endpoints while keeping Codex Apps caching and tool
normalization scoped to the reserved server.
- Persist the setting through MCP config rewrite paths and expose it in
the generated config schema.
- Load the current login state for `codex mcp list` so reported auth
status matches runtime behavior.
## Verification
Core integration coverage exercises the complete streamable HTTP MCP
startup path and verifies that:
- a same-origin opted-in server receives the current ChatGPT access
token;
- an explicitly configured authorization header takes precedence;
- a different-origin server completes MCP initialization and tool
listing without receiving any ChatGPT authorization header.
## Summary
- move sleep tool enablement from top-level `[features].sleep_tool` to
`[features.current_time_reminder].sleep_tool`
- remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
from resolved current-time configuration
- update config schema, config-lock materialization, and existing sleep
coverage
Stacked on #29907.
## Why
With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the
persistent prefix should not count against the active body window.
`get_context_remaining` and the token-budget reminder should report the
same usable body-after-prefix window that auto-compaction uses, rather
than the total token count since the session began.
This is stacked on #29664 so the mechanical move from `turn.rs` is
isolated from the behavior fix.
## What
- Extends `ContextWindowTokenStatus` with `context_remaining_tokens`.
- Updates `get_context_remaining` to use the shared context-window
accounting.
- Adds integration coverage for body-after-prefix reminder timing and
`get_context_remaining` output.
## Testing
- `just test -p codex-core body_after_prefix_window`
- `just test -p codex-core auto_compact_body_after_prefix`
- `just fix -p codex-core`
## Summary
- rename `reminder_interval_model_requests` to
`reminder_interval_seconds`
- read the configured time provider before every model request and
inject a reminder only after the configured number of seconds has
elapsed
- preserve immediate first delivery and forced delivery after compaction
changes the context window
## Tests
- `just test -p codex-core current_time_reminder`
## What
- make Fjord's centralized response-item image preparation unconditional
for new and resumed history
- have local user images and `view_image` outputs always defer decoding
and resizing to that path
- retain `resize_all_images` as an ignored, removed compatibility key
for released clients
- delete the flag-off producer paths and obsolete policy-specific tests
## Why
Centralized preparation is now the intended image path. Keeping the
runtime feature checks also kept two image-processing implementations
alive and allowed client config to select the legacy behavior.
This is a clean replacement for #28975, rebuilt from the latest `main`.
## How
`prepare_response_items` now runs whenever items enter history and
whenever persisted history is reconstructed. Producers emit deferred
image data, so malformed images become the existing model-visible
placeholder instead of failing the session at the producer.
## Test plan
- `just fmt`
- `just fix -p codex-core -p codex-features`
- `just test -p codex-features` — 52 passed
- focused affected `codex-core` set — 20 passed
- `just test -p codex-core handle_accepts_explicit_high_detail` — 1
passed
- full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated
environment failures from read-only `~/.codex` SQLite state and
unavailable integration helper binaries
## Summary
- Add `web_search = "indexed"` alongside `disabled`, `cached`, and
`live`.
- Use that same resolved mode for both hosted and standalone web search.
- For hosted search, send `index_gated_web_access: true` with external
web access enabled only when `indexed` is selected.
- For standalone search, preserve the existing boolean wire values for
existing modes (`cached` maps to `false` and `live` to `true`) and send
`"indexed"` only for `indexed`; `disabled` keeps the tool unavailable.
- Carry the mode through managed configuration requirements and
generated schemas.
## Why
Indexed search provides a middle ground between cached-only search and
unrestricted live page fetching. Search queries can remain live while
direct page fetches are limited to URLs admitted by the server.
The existing `web_search` setting remains the single source of truth, so
hosted and standalone executors cannot drift into different access
modes. Without an explicit `indexed` selection, the existing
model-visible tool and request shapes are unchanged.
```toml
web_search = "indexed"
[features]
standalone_web_search = true
```
## Validation
- `just fmt`
- `just test -p codex-api` (`126 passed`)
- `just test -p codex-web-search-extension` (`7 passed`)
- `just test -p codex-core
code_mode_can_call_indexed_standalone_web_search` (`1 passed`)
- Focused configuration, hosted request, standalone request, and
managed-requirement coverage is included in the PR; remaining suites run
in CI.
The full workspace test suite was not run locally.
## Summary
- expose `clock.curr_time` when current-time reminders are enabled
- query the session's configured time provider with the calling thread
id
- return the existing UTC reminder text for direct model calls
- return `{ "current_time": "YYYY-MM-DD HH:MM:SS UTC" }` in Code Mode
Clock lookup failures remain fatal, matching pre-inference reminder
behavior.
## Testing
- `just test -p codex-core current_time_tool_returns_the_latest_time`
- `just test -p codex-core
code_mode_current_time_returns_structured_result`
- `just fix -p codex-core`
## Summary
Code mode has two separate truncation points: the nested tool result
returned to JavaScript and the code-mode output later recorded for the
model. These tests now verify those behaviors independently.
- Report whether `result.output` was truncated before printing it.
- Verify omitted or sufficiently large nested limits produce `Variable
truncated: False`, while allowing the printed value to be truncated
downstream.
- Verify an explicit nested limit produces `Variable truncated: True`
when the command output exceeds it.
- Use a token-policy model fixture so downstream truncation is visible
as `…N tokens truncated…`.
- Align the explicit nested-truncation expectation with the warning
header.
This PR changes test coverage only; runtime truncation behavior is
unchanged.
## Validation
- `env -u CODEX_SANDBOX_NETWORK_DISABLED RUST_MIN_STACK=8388608 cargo
test -p codex-core --test all code_mode_exec -- --nocapture` (8 passed)
This PR establishes the intended behavior as an executable contract
before a refactor of the cell runtime begins. It also fixes cases where
a second observer or termination request could replace an existing
response channel and leave the original caller unresolved.
### Behavior codified
- A cell can yield output and subsequently resume to completion.
- A caller can run a cell until it has no immediately runnable work,
receive its accumulated output and outstanding tool-call IDs, and then
resume the same cell when the awaited work is available.
- Each cell admits one active observer:
- a second observer receives an explicit busy error
- the existing observer remains registered and is not displaced
- A natural result (conclusion of the js module) that has already
reached the cell controller wins over a later termination request.
- Otherwise, termination preempts execution and resolves both:
- the active observer, if present
- the caller requesting termination
- Repeated termination requests are rejected while termination is
already in progress.
- Terminal responses are sent only after outstanding callback work has
been handled:
- natural completion drains notifications and cancels outstanding tool
calls
- termination cancels and drains both notification and tool callbacks.
- Cell removal and cell_closed notification happen after callback
cleanup
The workspace denies `clippy::expect_used` in production. Although
`clippy.toml` allows `expect` in tests, Bazel Clippy compiles
integration-test helper code in a way that does not receive that
exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
and equivalent `match`/`let else` forms.
This allows `clippy::expect_used` once at each integration-test crate
root (including aggregated suites and test-support libraries), then
replaces manual panic-based Result and Option unwraps with
`expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
crate roots. Intentional assertion and unexpected-variant panics remain
unchanged, and the production `expect_used = "deny"` lint remains in
place.
The cleanup is mechanical and net-negative in line count.
## Why
We want to exercise a linux app-server against a windows exec-server
without having to repeat every test case. This approach has slight
precedent in the remote docker test setup.
## What
Run the shared `codex-core` integration suite against Windows
exec-server behavior from Linux. This makes cross-OS path and shell
regressions visible while keeping unsupported cases owned by individual
tests.
- Add `local`, `docker`, and `wine-exec` test environment selection with
legacy Docker compatibility.
- Extend `codex_rust_crate` to generate a sharded Wine-exec variant
using a cross-built Windows server and pinned Bazel Wine/PowerShell
runtimes.
- Teach remote-aware helpers about Windows paths and track temporary
incompatibilities with source-local `skip_if_wine_exec!` calls and
follow-up reasons.
Follow-up to #27356.
## Stack note
This PR changes Codex's internal dynamic-tool shape while leaving
`thread/start` unchanged. App-server therefore converts the existing
per-tool input into explicit functions and namespaces before passing it
to core.
[#27371](https://github.com/openai/codex/pull/27371) updates
`thread/start` to use the same explicit shape and removes this temporary
conversion.
## Why
Dynamic tools repeat namespace metadata on every function. Core should
keep one explicit namespace with its member tools so descriptions and
membership stay consistent across sessions and runtime planning.
## What changed
- Represent dynamic tools as top-level functions or explicit namespaces
in protocol and session state.
- Read old flat rollout metadata and write the canonical hierarchy.
- Flatten namespace members only when registering callable tools.
- Keep `thread/start.dynamicTools` flat for now and normalize it at the
app-server boundary.
New builds can read old rollout metadata. Older builds cannot read newly
written hierarchical metadata.
## Test plan
- `just test -p codex-app-server
thread_start_normalizes_legacy_dynamic_tools_into_model_request`
- `just test -p codex-protocol
session_meta_normalizes_legacy_dynamic_tools`
- `just test -p codex-core
resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
- `just test -p codex-core
tool_search_returns_deferred_dynamic_tool_and_routes_follow_up_call`
- `just test -p codex-core code_mode_can_call_hidden_dynamic_tools`
- `just test -p codex-tools`
## Summary
- reject HTTP(S) image URLs from the shared code-mode output-image
normalization path
- return a concise model-visible tool error so the model can recover on
its next turn
- apply the targeted rejection to both `image()` and `generatedImage()`
- leave other non-empty image URL values to existing downstream handling
The returned error is:
> Tool call failed: remote image URLs are not supported in tool outputs.
Pass a base64 data URI instead
## Why
Responses Lite cannot lower a remote image URL emitted from a structured
tool output. Rejecting HTTP(S) values in the Codex harness preserves the
tool-call metadata and gives the model a recoverable next turn instead
of invalidating the sample.
## Test coverage
The regression is covered primarily by a `test_codex()` agent
integration test that simulates the Responses API exchange and asserts
the failed model-visible exec output. A supplemental runtime test covers
both `http://` and `https://` inputs across both image output helpers.
## Test plan
- `cd codex-rs && just test -p codex-code-mode`
- `cd codex-rs && just test -p codex-code-mode-protocol`
- `cd codex-rs && just test -p codex-core
code_mode_image_helper_rejects_remote_url`
- `cd codex-rs && just fmt`
- `git diff --check origin/main...HEAD`
Related context: https://github.com/openai/openai/pull/1022346
## Why
`request_user_input` has direct blocking semantics when invoked by the
model. When it is exposed as a nested code-mode tool, the call has to
flow through code-mode waiting and continuation behavior instead, which
is not the behavior we want for this user-input request surface.
## What changed
- Mark `request_user_input` with `ToolExposure::DirectModelOnly` when
registering the core utility tool.
- Keep `request_user_input` direct-model visible, including in
code-mode-only planning.
- Add focused `spec_plan_tests` coverage that verifies
`request_user_input` remains visible and registered as
direct-model-only, while it is omitted from the nested code-mode tool
description.
No active goal suppression or runtime unavailability behavior is
included in this PR.
## Validation
- No new build/test run for this housekeeping pass, per maintainer
request.
- Earlier targeted run, confirmed from session context: `just test -p
codex-core request_user_input` passed.
## Why
The token budget full-context fragment identifies the current context
window, but not the thread that owns that window. Including the thread
id makes the initial context-window metadata self-contained, and
`get_context_remaining` also needs to be usable from Code Mode without
forcing callers to parse the model-facing fragment string.
## What changed
- Include the session thread id in the initial `<token_budget>` context
fragment.
- Expose `get_context_remaining` as a Code Mode nested tool while
keeping `new_context` direct-model-only.
- Keep direct model-facing `get_context_remaining` output as the
existing `<token_budget>` text fragment.
- Return only `tokens_left` from the Code Mode structured result for
`get_context_remaining`.
- Update token-budget integration tests and add Code Mode coverage for
the structured result.
## Verification
- `just test -p codex-core token_budget`
- `just test -p codex-core
code_mode_get_context_remaining_returns_structured_result`
- `just test -p core_test_support redacted_text_mode_normalizes_uuids`
## Summary
Adds complete client-side image preparation behind the default-off
`resize_all_images` feature flag.
When enabled, local image producers defer decoding and resizing. Images
are prepared centrally before insertion into conversation history,
covering user input, `view_image`, and structured tool-output images.
## Behavior
- Processes base64 `data:` images in messages and function/custom tool
outputs.
- Leaves non-data URLs, including HTTP(S) URLs, unchanged.
- Applies image-detail budgets:
- `high` and omitted: 2048px maximum dimension and 2.5K 32px patches.
- `original`: 6000px maximum dimension and 10K 32px patches.
- `auto`: uses the same 2048px / 2.5K-patch budget as high.
- `low`: unsupported and replaced with an actionable placeholder.
- Preserves original image bytes when no resize or format conversion is
needed.
- Enforces the shared 1 GiB encoded and decoded data-URL sanity limits.
- Replaces only an image that fails preparation, preserving sibling
content and tool-output metadata.
- Uses bounded placeholders distinguishing generic processing failures,
oversized images, and unsupported `low` detail.
- Prepares resumed and forked history before installing it as live
history without modifying persisted rollouts.
## Flag-Off Behavior
When `resize_all_images` is disabled:
- Existing local user-input and `view_image` processing remains
unchanged.
- Existing decoding and error behavior remains unchanged.
- Arbitrary tool-output images are not processed.
- HTTP(S) image URLs continue to be forwarded unchanged.
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/27245
- 👉 `2` https://github.com/openai/codex/pull/27247
- ⏳ `3` https://github.com/openai/codex/pull/27246
- ⏳ `4` https://github.com/openai/codex/pull/27266
## Why
Thread cwd and environment selections are a single logical setting in
core: updating one without the other can silently desynchronize the
next-turn execution context. This change makes that relationship
explicit in the internal thread settings flow while preserving the
existing app-server public API shape.
## What changed
- Moved the cwd/environment pair through internal
`ThreadSettingsOverrides.environment_settings` instead of a top-level
internal `cwd` field.
- Kept `thread/settings/update` public params unchanged, with app-server
translating top-level `cwd` into the paired internal settings shape.
- Moved `Op::UserInput` environment overrides into thread settings so
user turns and settings updates use the same core path.
- Updated core, app-server, MCP, memories, sample, and test callsites to
construct the paired settings shape.
## Verification
- `git diff --check`
- Local test run starting after PR creation.
## What
- Consume plaintext `output` from standalone search while retaining
optional `encrypted_output` parsing.
- Expose `web.run` to code mode and return search output to nested
JavaScript calls.
- Cover direct and code-mode standalone search paths with integration
tests.
## Why
`/v1/alpha/search` now returns plaintext output, which code mode needs
to consume standalone search results.
## Test plan
- `just test -p codex-api`
- `just test -p codex-web-search-extension`
- `just test -p codex-core code_mode_can_call_standalone_web_search`
- `just test -p codex-app-server
standalone_web_search_round_trips_output`
## Why
Thread settings cwd overrides are expected to be resolved before they
enter core. Keeping this boundary as a plain `PathBuf` made it easy for
core/session code to keep fallback normalization and relative-path
resolution logic in places that should only receive an already-resolved
cwd.
This is intentionally the absolute-cwd-only slice: it does not change
environment selection stickiness or cwd-to-default-environment fallback
behavior.
## What changed
- Changes `ThreadSettingsOverrides.cwd`,
`CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
use `AbsolutePathBuf`.
- Removes core-side cwd normalization/resolution from session settings
updates.
- Updates affected core/app-server test helpers and callsites to pass
existing absolute cwd values or use `abs()` helpers.
## Validation
Opening as draft so CI can start while local validation continues.
We cross build when using bazel for windows. This causes a couple
hiccups in that v8 does a mksnapshot step that is expecting to snapshot
on the host arch which wasn't matching when we were doing the
crossbuild. This was causing segfault failiures when starting up
codemode from a cross built artifact.
This changes things such that we cross build the library and then run
and link a snapshot on the host machine/arch which is windows. This
gives us a functional snapshot and library that can start code-mode on
windows.
This fixes the build and then fixes two test regressions we had.
## Why
Research and training setups need to control which tool namespaces
appear inside code mode's nested `tools` surface without disabling those
tools entirely. This makes it possible to train against a deliberately
reduced nested-tool setup while preserving the normal direct and
deferred tool paths.
## What
- Extend `features.code_mode` to accept structured configuration while
preserving the existing boolean syntax.
- Add an exact `excluded_tool_namespaces` list under
`[features.code_mode]`:
```toml
[features.code_mode]
enabled = true
excluded_tool_namespaces = ["mcp__codex_apps", "multi_agent_v1"]
```
- Filter matching canonical `ToolName` namespaces when constructing code
mode's nested router and code-mode-specific direct tool descriptions.
- Keep excluded tools registered, directly exposed in mixed code mode,
and discoverable through top-level `tool_search` when otherwise
eligible.
- Derive deferred nested-tool guidance after namespace filtering so the
`exec` description does not advertise excluded-only deferred tools.
- Preserve the boolean/table representation when materializing config
locks and update the generated config schema.
## Testing
- `just test -p codex-features`
- `just test -p codex-config`
- `just test -p codex-core load_config_resolves_code_mode_config`
- `just test -p codex-core
lock_contains_prompts_and_materializes_features`
- `just test -p codex-core
excluded_deferred_namespaces_do_not_enable_nested_tool_guidance`
- `just test -p codex-core
code_mode_excludes_configured_nested_tool_namespaces`
- `cargo check -p codex-thread-manager-sample`
## Summary
Remove the dead experimental `persistExtendedHistory` app-server flag
and collapse rollout persistence to the single policy app-server already
used.
## What Changed
- Removed `persistExtendedHistory` from v2 thread start/resume/fork
params and deleted its deprecation notice path.
- Removed the persistence-mode enums and plumbing through core, rollout,
and thread-store.
- Made rollout filtering mode-free, keeping the existing limited
persisted-history behavior.
## Test Plan
- `just write-app-server-schema`
- `cargo nextest run --no-fail-fast -p codex-app-server-protocol
schema_fixtures`
- `cargo nextest run --no-fail-fast -p codex-app-server
thread_shell_command_history_responses_exclude_persisted_command_executions`
- `cargo nextest run --no-fail-fast -p codex-rollout -p
codex-thread-store`
- final `rg` for removed flag/type names
Ensures MCP-backed `codex-core` integration tests exercise initialized
servers instead of racing server startup.
I've been idly investigating a few flakes and the failure modes are much
more confusing when a tool call fails because of a failed server start
than when the failed server start causes the test to fail directly.
## Summary
Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.
## API Shape
The parameter shape is:
```ts
additionalContext?: Record<string, {
value: string
kind: "untrusted" | "application"
}> | null
```
Example:
```json
{
"additionalContext": {
"browser_info": {
"value": "Active tab is CI failures.",
"kind": "untrusted"
},
"automation_info": {
"value": "CI rerun is in progress.",
"kind": "application"
}
}
}
```
The keys are opaque and caller-defined.
## Context Injection
When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.
`kind: "untrusted"` entries are inserted with role `user`:
```text
<external_${key}>${value}</external_${key}>
```
`kind: "application"` entries are inserted with role `developer`:
```text
<${key}>${value}</${key}>
```
Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.
For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.
## Dedupe Strategy
`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.
Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.
Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.
## What Changed
- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.
## Why
The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.
This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.
## What Changed
- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.
## Testing
- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`
## Summary
Change code-mode stored value updates to merge writes by key instead of
replacing the session's complete stored-value map after each cell
completes.
Previously, each cell received a snapshot of stored values and returned
the complete resulting map. When multiple cells ran concurrently, a
later completion could overwrite values written by another cell because
it committed an older snapshot.
This change moves stored-value ownership into `CodeModeService`:
- Each runtime starts from the service's current stored values.
- Runtime completion reports only keys written by that cell.
- The service merges those writes into the current stored-value map on
successful completion.
- Core no longer replaces its stored-value state from a cell result.
As a result, concurrently executing cells can update different stored
keys without clobbering one another.
The move into CodeModeService is motivated by a desire to have this
lifetime tied to a new lifetime object on that side in a subsequent PR.
## Summary
- route each configured MCP server through an explicit per-server
`environment_id` instead of a manager-wide remote toggle
- default omitted `environment_id` to `local`, resolve named ids through
`EnvironmentManager`, and fail only the affected MCP server when an
explicit id is unknown
- keep local stdio on the existing local launcher path for now, while
named-environment stdio uses the selected environment backend and
requires an absolute `cwd`
- allow local HTTP MCP servers to keep using the ambient HTTP client
when no local `Environment` is configured; named-environment HTTP MCPs
use that environment's HTTP client
## Validation
- devbox Bazel build: `bazel build --bes_backend= --bes_results_url=
//codex-rs/cli:codex //codex-rs/rmcp-client:test_stdio_server
//codex-rs/rmcp-client:test_streamable_http_server`
- devbox app-server config matrix with real `config.toml` /
`environments.toml` files covering omitted local, explicit local,
omitted local under remote default, explicit remote stdio, local HTTP
without local env, explicit remote HTTP, local stdio without local env,
unknown explicit env, and remote stdio without `cwd`
## Why
Code mode can use nested unified exec calls as data sources. When those
calls omit `max_output_tokens`, code mode should receive raw command
output so the script can parse or summarize it itself. When code mode
does provide `max_output_tokens`, that explicit nested budget should be
respected, including values above the default unified exec limit, rather
than being capped before code mode sees the result.
## What
- Preserve direct unified exec truncation behavior, while letting
code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
explicitly supplied.
- Make code-mode tool results use raw output when no explicit limit is
present, and use the explicit nested limit directly when one is
specified.
- Refactor unified exec output formatting so `truncated_output` takes
the caller-selected token budget.
- Add e2e integration coverage for explicit nested exec limits, omitted
nested exec limits, outer exec limit propagation, omitted-limit outputs
that exceed both the default and a small truncation policy, explicit
nested limits above those caps, and high explicit limits that still
compact larger command output.
- Reuse the code-mode turn setup helper while directly asserting the
exact exec output item in each test.
## Testing
- `just fmt`
- `git diff --check`
- Not run locally per repo guidance; CI should validate the e2e
integration tests.
## Summary
- mark `ToolSearch` as removed and ignore stale config writes for its
legacy key
- make search tool exposure depend only on model capability, not a
feature toggle
- remove app-server enablement support and prune now-obsolete test
coverage/setup
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core search_tool_requires_model_capability`
- `cargo test -p codex-app-server experimental_feature_enablement_set_`
## Notes
- This keeps the legacy config key as a no-op for compatibility while
removing the ability to toggle the behavior off cleanly.
- No developer-facing docs update outside the touched app-server README
was needed.
## Summary
- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.
## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.
## Testing
- [x] unit tests pass
## Why
Some MCP OAuth providers require a pre-registered public client ID and
cannot rely on dynamic client registration. Codex already supports MCP
OAuth, but it had no way to supply that client ID from config into the
PKCE flow.
## What changed
- add `oauth.client_id` under `[mcp_servers.<server>]` config, including
config editing and schema generation
- thread the configured client ID through CLI, app-server, plugin login,
and MCP skill dependency OAuth entrypoints
- configure RMCP authorization with the explicit client when present,
while preserving the existing dynamic-registration path when it is
absent
- add focused coverage for config parsing/serialization and OAuth URL
generation
## Verification
- `cargo test -p codex-config -p codex-rmcp-client -p codex-mcp -p
codex-core-plugins`
- `cargo test -p codex-core blocking_replace_mcp_servers_round_trips
--lib`
- `cargo test -p codex-core
replace_mcp_servers_streamable_http_serializes_oauth_resource --lib`
- `cargo test -p codex-core config_schema_matches_fixture --lib`
## Notes
Broader local package runs still hit unrelated pre-existing stack
overflows in:
- `codex-app-server::in_process_start_clamps_zero_channel_capacity`
-
`codex-core::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.
Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check
Tests: Not run per request.
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.
## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
## Why
Per-turn permission overrides should use the same canonical profile
abstraction as session configuration. That lets TUI submissions preserve
exact configured permissions without round-tripping through legacy
sandbox fields.
## What changed
This adds `permission_profile` to user-turn operations, threads it
through TUI/app-server submission paths, fills the new field in existing
test fixtures, and adds coverage that composer submission includes the
configured profile.
## Verification
- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
* #18288
* #18287
* #18286
* __->__ #18285
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary
## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR
---------
Co-authored-by: Codex <noreply@openai.com>
Deferred dynamic tools need to round-trip a namespace so a tool returned
by `tool_search` can be called through the same registry key that core
uses for dispatch.
This change adds namespace support for dynamic tool specs/calls,
persists it through app-server thread state, and routes dynamic tool
calls by full `ToolName` while still sending the app the leaf tool name.
Deferred dynamic tools must provide a namespace; non-deferred dynamic
tools may remain top-level.
It also introduces `LoadableToolSpec` as the shared
function-or-namespace Responses shape used by both `tool_search` output
and dynamic tool registration, so dynamic tools use the same wrapping
logic in both paths.
Validation:
- `cargo test -p codex-tools`
- `cargo test -p codex-core tool_search`
---------
Co-authored-by: Sayan Sisodiya <sayan@openai.com>
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
## Summary
- hide deferred MCP/app nested tool descriptions from the `exec` prompt
in code mode
- add short guidance that omitted nested tools are still available
through `ALL_TOOLS`
- cover the code_mode_only path with an integration test that discovers
and calls a deferred app tool
## Motivation
`code_mode_only` exposes only top-level `exec`/`wait`, but the `exec`
description could still include a large nested-tool reference. This
keeps deferred nested tools callable while avoiding that prompt bloat.
## Tests
- `just fmt`
- `just fix -p codex-code-mode`
- `just fix -p codex-tools`
- `cargo test -p codex-code-mode
exec_description_mentions_deferred_nested_tools_when_available`
- `cargo test -p codex-tools
create_code_mode_tool_matches_expected_spec`
- `cargo test -p codex-core
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
## Summary
- Add an MCP server environment setting with local as the default.
- Thread the default through config serialization, schema generation,
and existing config fixtures.
## Stack
```text
o #18027 [8/8] Fail exec client operations after disconnect
│
o #18025 [7/8] Cover MCP stdio tests with executor placement
│
o #18089 [6/8] Wire remote MCP stdio through executor
│
o #18088 [5/8] Add executor process transport for MCP stdio
│
o #18087 [4/8] Abstract MCP stdio server launching
│
o #18020 [3/8] Add pushed exec process events
│
o #18086 [2/8] Support piped stdin in exec process API
│
@ #18085 [1/8] Add MCP server environment config
│
o main
```
Co-authored-by: Codex <noreply@openai.com>