codex

Preserve namespaces on custom tool calls (#30302 )

## Summary

- Preserve the optional namespace on custom tool calls during response
deserialization and app-server replay.
- Use the namespaced tool identifier for streaming argument handling and
tool dispatch.
- Regenerate app-server protocol schemas.
- Add regression tests covering namespace serialization and routing.

## Testing

- Ran affected protocol and app-server test suites.
- Ran the full core test suite; two load-sensitive timing tests passed
when rerun individually.
- Ran Clippy and formatting checks.
- Verified with a local end-to-end app-server replay that the namespace
is preserved through the complete request/response flow.

nhamidi-oai · 2026-06-27 09:54:56 -07:00

328e95110c

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

Retry failed Codex Apps MCP startup (#29920 )

## Problem

The built-in Codex Apps MCP client shares a future for the full startup
operation: connect, complete `initialize`, fetch the initial tools, and
return a usable client. Sharing deduplicates startup work, but it also
memoizes terminal errors.

After a transient connection, handshake, or initial `tools/list`
failure, later tool builds observe the same failed future. The thread
cannot reconnect after the backend recovers and continues serving its
startup-time cached tool snapshot, which may be empty or stale.

## Fix

When Apps MCP startup ends in an error, Codex starts bounded recovery
without putting startup latency on tool-router construction:

1. The current tool build immediately continues with the cached startup
snapshot.
2. After the initial failure is reported, Codex starts one fresh full
startup attempt in the background.
3. Concurrent tool builds share that in-flight attempt and also continue
with cached tools.
4. On success, the recovered client becomes active, refreshes the Apps
tools cache, emits a `Ready` startup status, and is reused by later
operations.
5. On failure, the cache remains unchanged and later tool builds may
start another background attempt after exponential cooldown: 1s, 2s, 4s,
8s, 16s, then 30s maximum.

Each recreated startup performs a fresh MCP `initialize` and uncached
`tools/list`. The MCP client retains its existing bounded retries for
retryable `initialize` and `tools/list` failures.

This avoids adding the Apps startup timeout to every request during a
sustained outage.

## Scope

This is limited to the built-in Codex Apps MCP client:

- no reconnects for user-configured MCP servers;
- no cache deletion; and
- no proactive refresh for a healthy client with stale tools.

## Tests

Coverage verifies:

- tool builds return cached tools without waiting for a blocked
reconnect;
- concurrent tool builds start only one background reconnect;
- failed reconnects preserve cached tools and respect exponential
cooldown;
- a recovered client is retained and reused; and
- a long-lived thread exposes recovered app tools on a later follow-up.

Validation:

- `just test -p codex-mcp` — 95 passed
- `just test -p codex-core
later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
--no-capture` — passed
- `just fix -p codex-mcp`
- `just fmt`

kbazzi · 2026-06-25 21:31:12 -07:00

92d2e1df70

core: expose permission profile to shell tools (#29941 )

## tl;dr

Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
of the current permission profile when invoking a shell tool.

## Why

Shell tool owners may need to launch nested commands under the same
named permission profile, including through `codex sandbox -P PROFILE
--include-managed-config`. Until now, child processes could observe
sandbox and network metadata but could not identify the active named
permission profile.

The `--include-managed-config` flag is essential when a helper
reconstructs the sandbox from a profile name: it ensures the nested
sandbox also loads managed enterprise requirements. Without it, using
the inherited profile could unintentionally create a sandbox that does
not enforce the organization's managed restrictions.

The new environment value is intentionally informational and **must not
be treated as trusted input**. Any process in the ancestry can overwrite
an environment variable, so a consumer that passes this value to `codex
sandbox -P` must first validate it against the profiles that helper is
authorized to use.

## Example Use Case

Suppose an organization provides a trusted `remote-bash` wrapper that
lets Codex run a command on an approved build host. The local shell
command uses the named `:workspace` permission profile:

```toml
default_permissions = ":workspace"
```

The command exposed to the model is a small zsh wrapper. It deliberately
delegates with `exec`, preserving the original arguments and process
environment:

```zsh
#!/usr/bin/env zsh
exec /opt/codex-tools/remote_bash.py "$@"
```

The model invokes the public wrapper, not its Python implementation:

```sh
/opt/codex-tools/remote-bash \
  --host builder.example.com \
  -- printf '%s' 'hello world'
```

Only the inner implementation is authorized to escape the local sandbox:

```starlark
prefix_rule(
    pattern=["/opt/codex-tools/remote_bash.py"],
    decision="allow",
)
```

With zsh-fork, execution begins with `remote-bash` inside the
`:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
rule matches `remote_bash.py`, so that inner script is restarted
unsandboxed. The escalated process inherits:

```text
CODEX_PERMISSION_PROFILE=:workspace
```

Inheritance does not make the value trustworthy. `remote_bash.py`
independently allowlists both the remote host and the permission profile
before using either value. In particular, a forged value such as
`:danger-full-access` is rejected before it can reach `codex sandbox
-P`:

```python
import argparse
import os
import shlex
import sys

ALLOWED_HOSTS = {"builder.example.com"}
ALLOWED_PROFILES = {":workspace"}

parser = argparse.ArgumentParser()
parser.add_argument("--host", required=True)
separator = sys.argv.index("--")
args = parser.parse_args(sys.argv[1:separator])
command = sys.argv[separator + 1:]

if args.host not in ALLOWED_HOSTS:
    parser.error("host is not allowlisted")
if not command:
    parser.error("the remote command must not be empty")

profile = os.environ.get("CODEX_PERMISSION_PROFILE")
if not profile:
    raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
if profile not in ALLOWED_PROFILES:
    raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")

remote_command = shlex.join(command)
sandbox_command = shlex.join([
    "codex", "sandbox", "-P", profile,
    "--include-managed-config", "--",
    "bash", "-lc", remote_command,
])
print(shlex.join(["ssh", args.host, sandbox_command]))
```

This builds each command layer as an argument vector and uses
`shlex.join()` at the boundary, rather than interpolating untrusted
shell text. After validation and parsing, the nested command has this
structure:

```text
ssh argv:
  ["ssh", "builder.example.com", SANDBOX_COMMAND]

SANDBOX_COMMAND argv:
  ["codex", "sandbox", "-P", ":workspace",
   "--include-managed-config", "--",
   "bash", "-lc", "printf %s 'hello world'"]

bash -lc payload argv:
  ["printf", "%s", "hello world"]
```

A production implementation could execute that SSH command. The
integration fixture prints it and parses the result back into arguments,
verifying the complete flow:

```text
model invokes outer wrapper
  -> zsh-fork starts wrapper under :workspace
  -> wrapper execs allowlisted Python script
  -> prefix rule restarts Python script unsandboxed
  -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
  -> Python script verifies :workspace is allowlisted
  -> remote command runs codex sandbox -P :workspace
     with --include-managed-config
  -> nested sandbox honors managed enterprise requirements
```

This gives the trusted helper access to resources outside the local
sandbox—such as SSH credentials—while ensuring that it can select only
an explicitly authorized profile and that work on the remote host
remains subject to the organization's managed requirements.

## What changed

- Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
evaluation so the active profile wins over inherited or configured stale
values.
- Apply the variable to both `shell_command` and unified `exec_command`,
including local, zsh-fork, and remote exec-server paths.
- Remove stale values when the session has no active named profile.
- Preserve the current profile value when loading a shell snapshot so a
parent snapshot cannot restore an older profile.

## Testing

- Added classic-shell integration coverage proving an exact prefix rule
can run a `require_escalated` script outside the `:workspace` sandbox
while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
- Added zsh-fork integration coverage in which the model invokes an
outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
unsandboxed, and its printed SSH command reconstructs the inherited
`:workspace` sandbox with `--include-managed-config` while preserving
every argument after `--`.
- The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
validates it against `ALLOWED_PROFILES` before constructing the nested
command.
- Assert that the reconstructed sandbox command includes
`--include-managed-config` so nested use of the inherited profile cannot
bypass managed enterprise requirements.
- Added coverage for overriding and removing stale profile values.
- Verified `shell_command` receives the selected active profile.
- Added shell snapshot coverage using `printenv
CODEX_PERMISSION_PROFILE`.

Michael Bolin · 2026-06-25 19:00:23 +00:00

c65cfeab14

feat: add provider-aware model fallback to thread start (#29942 )

## Why

Helper threads such as task title generation can request a model ID that
is valid for the default OpenAI provider but unavailable from the active
provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
provider static catalog exposes Bedrock model IDs such as
`openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
404s and can surface a misleading turn error even when the main turn
succeeds.

Clients need an explicit way to ask app-server to resolve an unavailable
helper model to the active provider default. That fallback must remain
limited to providers with an authoritative static catalog so custom or
dynamically discovered model IDs are not rewritten based on an
incomplete catalog.

Fixes #28741.

## What changed

- Add the experimental `allowProviderModelFallback` option to
`thread/start`, defaulting to `false` to preserve existing behavior.
- Thread the option through thread creation and model selection.
- When enabled for a static model manager, preserve requested models
present in the catalog and replace unavailable models with the provider
default.
- Continue preserving explicit model IDs for dynamic model managers
without fetching a catalog solely to validate them.
- Document the new `thread/start` behavior in the app-server API
overview.

## Test
Temporary test-client harness:
```
ThreadStartParams {
    model: Some("gpt-5.4-mini".to_string()),
    allow_provider_model_fallback: true,
    ..Default::default()
}
```
Command:
```
CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
./target/debug/codex-app-server-test-client \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  send-message-v2 --experimental-api ignored
```
Relevant output:
```
> "method": "thread/start",
> "params": {
>   "model": "gpt-5.4-mini",
>   "modelProvider": null,
>   "allowProviderModelFallback": true,
>   ...
> }

< "result": {
<   "model": "openai.gpt-5.5",
<   "modelProvider": "amazon-bedrock",
<   ...
< }
```

Celia Chen · 2026-06-25 18:24:34 +00:00

6d9dbacf1a

[codex] Add Ultra reasoning effort (#29899 )

## Why

Ultra should be one user-facing reasoning selection for work that
benefits from both maximum reasoning and proactive multi-agent
delegation. Without it, clients must coordinate maximum reasoning with
the experimental `multiAgentMode` setting, even though the inference
backend still expects its existing `max` effort value.

This change makes reasoning effort the source of truth: clients select
`ultra`, core derives proactive multi-agent behavior when the turn is
eligible for multi-agent V2, and inference requests continue to use the
backend-compatible `max` value.

## What changed

- Add `ultra` as a first-class reasoning effort and preserve
model-catalog ordering when exposing it to clients.
- Convert `ultra` to `max` at the inference request boundary, including
Responses HTTP/WebSocket requests, startup prewarm, compaction, and
memory summarization.
- Derive effective multi-agent mode per turn from effective reasoning
effort:
  - eligible multi-agent V2 + `ultra` → `proactive`
  - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
- V1 or otherwise ineligible sessions → no multi-agent mode instruction
- Keep the derived effective mode in turn context history so successive
turns can emit a developer-message update only when the effective mode
changes.
- Remove selected multi-agent mode from core session configuration, turn
construction, thread settings, resume/fork restoration, and subagent
spawn plumbing. Subagents inherit reasoning effort and derive their own
effective mode.
- Retain the experimental app-server `multiAgentMode` fields for wire
compatibility while marking them deprecated. Request values are accepted
but ignored; compatibility response fields report `explicitRequestOnly`.
- Display Ultra in the TUI using the order supplied by `model/list`.

## Validation

- `just test -p codex-core ultra_reasoning_uses_max_for_requests`
- `just test -p codex-tui model_reasoning_selection_popup`

Shijie Rao · 2026-06-24 20:13:52 -07:00

df1199fddb

[codex] Inject agent graph store into ThreadManager (#29736 )

Pick up the AgentGraphStore migration.

- Inject an explicit optional agent graph store into `ThreadManager` 
- Move all calls to spawn, close, recursive resume, and
subtree/archive/delete/feedback traversal through it
- Keep using  `LocalAgentGraphStore` when SQLite is available

This required some changes to the interface to deal with futures:

- The interface now matches `ThreadStore`'s object-safe pattern by
returning a boxed `AgentGraphStoreFuture` directly, allowing
`ThreadManager` to hold `Arc<dyn AgentGraphStore>`

*Slight behavior change!* Unfiltered subtree enumeration now performs a
single all-status breadth-first traversal, so a closed grandchild
beneath an open edge is included; the previous Open-then-Closed
traversals could not cross mixed-status paths and silently omitted it.

Tom · 2026-06-24 13:24:10 -07:00

ece1dfece0

test: add app-server auto environment helper (#29746 )

## Why

Start moving towards app-server tests defaulting to running against
remote & foreign OS executors. To do so we need a point of indirection
similar to core integration tests' `build_with_auto_env`, but with the
flexibility of letting tests control environment registration if they
need to.

## What

This adds:

- `TestAppServer::new_with_auto_env()` for constructing an app server
with a default environment defined by the test runner (e.g. bazel)
- `TestAppServer::auto_env_params()` for tests to easily acquire turn
env params tailored to the automatic environment
- `TestAppServer::send_thread_start_request_with_auto_env()` to make it
easy for tests to start a thread using the automatic environment

The above methods all fail if the test calling them has set up an
environment where the automatic environment configuration conflicts with
test-created state.

## Validation

Adds a couple of basic smoke tests to the app-server test suite.
Follow-ups will migrate more tests to use it.

Adam Perry @ OpenAI · 2026-06-24 01:06:29 +00:00

283bc4cf01

core tests: rename automatic environment builder (#29728 )

## Why

Use a clearer name for what happens when this helper sets up a test
environment.

## What

- Rename the builder and its harness wrapper to use `auto_env` instead
of `remote_env` because the helper will set up a local environment if
configured by the build system.

Adam Perry @ OpenAI · 2026-06-23 21:45:06 +00:00

5283522939

test: branch on target OS instead of runner flavor (#29712 )

## Why

Core tests should branch on the executor's operating system, not on
runner details such as Docker or Wine. This keeps platform behavior
stable as new test backends are added and reserves Wine-specific skips
for actual runner debt.

## What

- Add `TestTargetOs` and target/host-aware skip helpers while keeping
`TestEnvironment` internal.
- Replace topology enum access with remote predicates and a narrow
Docker accessor.
- Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
document the skip taxonomy.

## Validation

- `just test -p core_test_support`
- `just test -p codex-core
remote_test_env_can_connect_and_use_filesystem`
- `bazel test //codex-rs/core:core-all-wine-exec-test
--test_output=errors` reached test execution; unrelated existing
view-image, path, and timing failures remain.
- `just test -p codex-core` and `just test` reached broad test
execution; this checkout has unrelated helper, sandbox, and timing
failures.

Adam Perry @ OpenAI · 2026-06-23 14:27:13 -07:00

9a79536e6b

path-uri: clarify host-native path conversion (#29501 )

## Why

Downstream refactors are producing confusing code with this
functionality having a very generic name. Encoding the specific
conversion approach in the method name makes it clearer.

## What

Rename `PathUri::from_path` to `PathUri::from_host_native_path` and
update its Rust call sites.

Adam Perry @ OpenAI · 2026-06-23 00:02:33 +00:00

11fab432be

feat(core): store turn_id on ResponseItem metadata (#28360 )

## Description

This PR is a followup to https://github.com/openai/codex/pull/28355 and
starts assigning `internal_chat_message_metadata_passthrough.turn_id` to
durable Responses API items created during a turn.

The goal is that those items keep the `turn_id` that introduced them
when Codex resends stateless HTTP context, reconstructs history for
resume/fork paths, or reuses websocket response state.

## What changed

- Set `internal_chat_message_metadata_passthrough.turn_id` when missing
as response items enter durable history, initial/replacement history,
inter-agent communication history, and local compaction summaries.
- Preserve existing item turn IDs instead of overwriting them during
persistence, resume reconstruction, compaction, forked history, and
websocket incremental reuse.
- Keep `compaction_trigger` fieldless because it is a request control,
not a durable response item.
- Update focused history/request assertions and fixtures for stateless
requests, websocket incrementals, compaction, thread injection, prompt
debug, and related CI coverage.

Owen Lin · 2026-06-22 16:45:14 -07:00

4a82ecc3c9

core: rename metadata -> internal_chat_message_metadata_passthrough (#28968 )

## Description
This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
here: https://github.com/openai/codex/pull/28355) to
`ResponseItem.internal_chat_message_metadata_passthrough`, which is the
blessed path and has strongly-typed keys.

For now we have to drop this MAv2 usage of `metadata`:
https://github.com/openai/codex/pull/28561 until we figure out where
that should live.

Owen Lin · 2026-06-22 11:11:25 -07:00

5b95745eae

Expose thread-level multi-agent mode (#28792 )

## Why

Once multi-agent mode can be selected per turn, clients also need to
choose the initial selection when creating a thread and observe that
selection through lifecycle and settings APIs.

The selected value is intentionally distinct from the effective
model-visible value: no client selection is represented as `null`, even
though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
its effective default.

## What changed

- Add the optional experimental `thread/start.multiAgentMode` parameter
and pass it through thread creation.
- Preserve an omitted initial value as an unset selection rather than
eagerly storing `explicitRequestOnly`.
- Apply an explicit `thread/start` selection to the first turn through
the session configuration established at thread creation.
- Restore the latest persisted effective mode as the selected baseline
on cold resume when rollout history contains one.
- Inherit the optional selected mode from a loaded parent when creating
related runtime threads.
- Return the current selected `multiAgentMode` from `thread/start`,
`thread/resume`, `thread/fork`, and thread settings, using `null` when
no mode is selected.
- Keep lifecycle reporting independent from model capability and feature
eligibility; core turn construction remains responsible for calculating
and persisting the effective mode.

## Not covered

- Clearing an existing loaded-session selection back to unset through
`turn/start`; omitted or `null` currently retains the session's
selection.
- A TUI control, slash command, or `config.toml` preference.

## Verification

- `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
- `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`

The focused app-server coverage verifies explicit `thread/start`
initialization, first-turn prompting, nullable reporting for an omitted
selection, and retention of selections that are not currently
runtime-eligible.

## Stack

Stacked on #28685. This PR contains only the thread initialization and
lifecycle/settings API layer.

Shijie Rao · 2026-06-19 10:50:44 +02:00

7abfcf220b

[plugins] Refresh plugin and tool caches after remote install (#28951 )

Summary
- Refresh the installed remote-plugin snapshot and Codex Apps tools
after completing a remote JIT install.
- Gate `completed: true` on every expected `app_connector_id` appearing
after the uncached `tools/list` refresh, while continuing to skip local
bundle verification for server-side installs.
- Keep the cached recommendations response and filter refreshed
installed remote IDs locally, so this does not add another
recommendations fetch.
- Add regression coverage for tools appearing after the hard refresh and
remaining absent after the refresh. The resumed model request sees the
refreshed tool router when installation completes.

Root Cause
- Remote suggestions from `openai-curated-remote` returned `true` before
taking the existing connector refresh path, leaving the resumed turn
with the pre-install Apps tool catalog.

Validation
- `just test -p codex-core request_plugin_install`
- `just test -p codex-core-plugins
recommended_plugin_candidates_filter_installed_and_disabled_plugins`
- `just test -p codex-core-plugins`
- `just fix -p codex-core-plugins`
- `just fix -p codex-core`
- `just fmt`
- `just test -p codex-core` was not fully clean locally: 2,729 passed,
26 failed, and 16 skipped. The failures were dominated by local
Seatbelt/network/timing issues, including plugin-install timeouts under
full-suite contention; the focused plugin-install runs pass.

Alex Daley · 2026-06-18 20:08:04 -04:00

7e37354a58

core: load AGENTS.md from foreign environments (#28958 )

## Why

Make it possible to load AGENTS.md from remote exec-servers whose OS is
different than app-server.

## What

- keep `AGENTS.md` discovery and provenance as `PathUri`, with
root-aware parent and ancestor traversal
- expose lifecycle instruction sources as legacy app-server path strings
in events while retaining `PathUri` internally
- preserve and test mixed POSIX and Windows paths in model context and
TUI status output
- cover remote Windows loading end to end by seeding the Wine prefix
through host filesystem APIs
- fix bug in `PathUri`'s parent() implementation that would erase
Windows drive letters

Adam Perry @ OpenAI · 2026-06-18 15:06:23 -07:00

dce673905a

Emit Trusted MCP App Identity on Tool-Call Items (#27132 )

## Summary

- Add optional `appContext` to app-server MCP tool-call items with
trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata.
- Preserve that context across tool-call events, persisted history,
reconnects, and thread resume.
- Keep the deprecated top-level `mcpAppResourceUri` temporarily for
client migration.

The consumer contract is `{ appContext: { connectorId, linkId,
mcpAppResourceUri }, tool }`.

## Validation

- Full GitHub Actions suite passes, including CLA, Bazel tests, clippy,
release builds, and argument-comment lint.

---------

Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>

martinauyeung-oai · 2026-06-18 14:02:54 -07:00

765309d5a6

current time reminders impl for system clock (varlatency 2/n) (#28824 )

Stacked on #28822.

## Summary

- add a host-injectable current-time provider with a built-in system
implementation
- record UTC developer reminders in history immediately before due model
requests
- keep cadence state per session and force a refresh after compaction

This does NOT include the app server client <-> server clock logic. This
PR is only for the reminder message & system clock that will be used in
prod.

## Testing

- `just test -p codex-core varlatency_`
- `just clippy -p codex-core -p codex-app-server -p codex-mcp-server -p
codex-thread-manager-sample`
- `just fmt`

rka-oai · 2026-06-18 19:18:42 +00:00

752ed90d78

Support openai/form extended form elicitations (#27500 )

# Summary
Allow App Server clients to opt into `openai/form` MCP elicitations.

Gabriel Peal · 2026-06-18 11:54:49 -07:00

21a599fa56

app-server: keep the model cache warm (#28699 )

## Why

The app server is long-lived, but its shared model cache otherwise
refreshes only when a caller needs it. Once the five-minute cache
expires, starting a thread or calling `model/list` can wait for
`/models` on the request path.

Refresh the cache in the background before it expires so foreground
callers normally use fresh local state.

## What changed

- Start an app-server worker that refreshes models immediately and then
every three minutes using the existing models-manager API.
- Hold only a weak reference to the models manager between refreshes, so
the worker does not extend its lifetime.
- Stop scheduling refreshes when the app-server lifecycle handle is shut
down or dropped. A refresh already in progress is allowed to finish.
- Adjust affected app-server test fixtures to distinguish the background
`/models` probe from the connection they are testing.

The existing models-manager cache, refresh strategies, auth handling,
ETag behavior, and concurrency semantics are unchanged.

## Testing

-
`models_refresh_worker::tests::refreshes_immediately_periodically_and_stops_when_dropped`
-
`suite::v2::remote_control::listen_off_honors_persisted_remote_control_enable`
-
`suite::v2::attestation::attestation_generate_round_trip_adds_header_to_responses_websocket_handshake`

jif · 2026-06-17 16:18:39 +02:00

5935a90619

Clarify model-generated and legacy app path types (#28577 )

## Why

`ApiPathString` kind of implies that it can be used anywhere we pull a
path out of JSON, but it's not really appropriate for tool arguments
when the model might generate relative paths.

Prefer `String` for model-generated paths and we can handle the
conversion per feature for now and define a shared abstraction later if
it makes sense.

# What

Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.

Expand the `path-types` skill to tell the model to leave tool args as
bare strings.

Adam Perry @ OpenAI · 2026-06-16 20:47:43 +00:00

322b83de5e

[tests] Keep Apps out of generic core test harness (#28508 )

## Summary

- disable the stable Apps feature in the generic `test_codex()`
integration-test harness
- keep Apps-specific tests explicit: their builders re-enable Apps and
point it at a local mock server

## Why

Generic tests that use dummy ChatGPT auth were also enabling the
host-owned `codex_apps` MCP server. That made unrelated tests contact
`chatgpt.com` and wait for MCP startup, causing the Bazel timeouts
observed on #28368.

The generic harness should be hermetic and should not start an external
service that the test did not request. This is test-only; production
Apps behavior is unchanged. The broader optional-MCP startup behavior is
being handled separately in #28407.

## Testing

- `just test -p codex-core -E
'test(pre_sampling_compact_runs_when_comp_hash_changes) |
test(model_switch_to_smaller_model_updates_token_context_window) |
test(codex_apps_file_params_upload_local_paths_before_mcp_tool_call)'`
- `just fix -p codex-core`
- `just fmt`

jif · 2026-06-16 13:07:43 +02:00

ef8eb8bdd9

[codex] Use expect in integration tests (#28441 )

The workspace denies `clippy::expect_used` in production. Although
`clippy.toml` allows `expect` in tests, Bazel Clippy compiles
integration-test helper code in a way that does not receive that
exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
and equivalent `match`/`let else` forms.

This allows `clippy::expect_used` once at each integration-test crate
root (including aggregated suites and test-support libraries), then
replaces manual panic-based Result and Option unwraps with
`expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
crate roots. Intentional assertion and unexpected-variant panics remain
unchanged, and the production `expect_used = "deny"` lint remains in
place.

The cleanup is mechanical and net-negative in line count.

pakrym-oai · 2026-06-15 21:53:47 -07:00

e752f7b4ae

Run core integration tests against a Wine-backed Windows executor (#28401 )

## Why

We want to exercise a linux app-server against a windows exec-server
without having to repeat every test case. This approach has slight
precedent in the remote docker test setup.

## What

Run the shared `codex-core` integration suite against Windows
exec-server behavior from Linux. This makes cross-OS path and shell
regressions visible while keeping unsupported cases owned by individual
tests.

- Add `local`, `docker`, and `wine-exec` test environment selection with
legacy Docker compatibility.
- Extend `codex_rust_crate` to generate a sharded Wine-exec variant
using a cross-built Windows server and pinned Bazel Wine/PowerShell
runtimes.
- Teach remote-aware helpers about Windows paths and track temporary
incompatibilities with source-local `skip_if_wine_exec!` calls and
follow-up reasons.

Adam Perry @ OpenAI · 2026-06-16 00:38:41 +00:00

1fe89de576

feat(core): add metadata field to ResponseItem (#28355 )

## Description

This PR adds an optional `metadata` field to `ResponseItem` for
Responses API calls. Only mechanical plumbing, no actual values
populated and sent yet. Turns out just adding a new field to
`ResponseItem` has quite a large blast radius already.

This change is backwards compatible because `metadata` is optional and
omitted when absent, so existing response items and rollout history
without it still deserialize and requests that do not set it keep the
same wire shape. For provider compatibility, we strip out `metadata`
before non-OpenAI Responses requests so Azure and AWS Bedrock never see
this field.

My followup PR here will actually make use of it to start storing and
passing along `turn_id`: https://github.com/openai/codex/pull/28360

## What changed

- Added `ResponseItemMetadata` with optional `turn_id`, plus optional
`metadata` on Responses API item variants and inter-agent communication.
- Preserved item metadata through response-item rewrites such as
truncation, missing tool-output synthesis, compaction history
rebuilding, visible-history conversion, rollout/resume, and generated
app-server schemas/types.
- Strip item metadata from non-OpenAI Responses requests while
preserving it for OpenAI-shaped requests.
- Updated the mechanical fixture/test construction churn required by the
new optional field.

Owen Lin · 2026-06-15 15:05:28 -07:00

040dafa32d

[codex] exec-server honors remote environment cwd and shell (#28122 )

## Why

Next slice needed to make progress on the `remote_env_windows` test is
to support passing a Windows cwd for the remote environment and using
that environment's native shell. This lets the test run a real Windows
process instead of only recording an early path or shell mismatch.

## What

- change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to
`PathUri`
- convert local cwd values to URIs when constructing selections
- preserve a remote primary cwd instead of replacing it with the local
legacy fallback
- prefer the selected environment's discovered shell for unified exec,
falling back to the session shell when unavailable
- convert back to a host-native absolute path at current native-only
consumer boundaries
- reject or deny unsupported foreign cwd values at the existing
request-permissions boundary, with TODOs for its future migration
- extend the hermetic Wine test to execute Windows PowerShell in
`C:\windows` and verify successful process completion
- record the current app-server rejection against the same Wine-backed
remote Windows fixture when its cwd is supplied as a native Windows path

Adam Perry @ OpenAI · 2026-06-14 06:07:46 +00:00

efbd00f21f

[codex] make PathUri::from_abs_path infallible (#27976 )

## Why

`PathUri::from_abs_path` can fail for absolute paths that do not have a
normal `file:` URI representation, forcing filesystem call sites to
handle a conversion error even though the original path can be preserved
losslessly.

## What

Make `from_abs_path` infallible and migrate its callers. Unrepresentable
paths use `file:///%00/bad/path/<base64>`, encoding Unix bytes or
Windows UTF-16LE; `to_abs_path` validates and decodes that fallback. The
leading encoded null reserves a namespace that cannot collide with a
real Unix or Windows path, and fallback URIs remain opaque to lexical
path operations.

## Validation

Added path-URI coverage for Unix null and non-UTF-8 paths, Windows
device/verbatim and non-Unicode paths, serialization, malformed
fallbacks, opaque lexical operations, invalid native payloads, and
literal `/bad/path` collision resistance.

Adam Perry @ OpenAI · 2026-06-12 16:58:42 -07:00

968a3ac9c1

[codex] Load AGENTS.md from all bound environments (#27696 )

## Why

We already have the machinery to support multiple environments on a
single thread, but we only show the model the contents of `AGENTS.md`
files in the primary environment.

We should show the model all of the relevant project instructions when
we know there's more than one environment.

## Known Gaps

As discussed in the RFC, this implementation:

1. doesn't handle environments being added/removed to/from the thread
after its creation
2. it doesn't enforce an aggregate context budget across environments,
and instead applies the configured project maximum independently to each
environment

## Implementation

- Discover project instructions in environment order with an independent
byte budget per environment and preserve source provenance/order.
- Keep the legacy fragment byte-for-byte when exactly one environment
contributes project instructions; use environment-labeled sections when
two or more environments contribute.
- Freeze the complete rendered fragment in `LoadedAgentsMd`, insert it
directly into requests, and recognize both layouts in contextual and
memory filtering.
- Add exact rendering, independent-budget, source-order,
creation-snapshot, and consumer coverage without changing app-server
schemas.

Adam Perry @ OpenAI · 2026-06-12 00:10:06 -07:00

bf667c7003

core: Consolidate Responses API Codex metadata (#27122 )

## What
Introduce a `CodexResponsesMetadata` struct that defines all the core
metadata we send to Responses API. Example fields are `thread_id`,
`turn_id`, `window_id`, etc.

Going forward, `client_metadata["x-codex-turn-metadata"]` will be the
canonical way Codex sends metadata to Responses API across both HTTP and
websocket transports.

For now, we continue to emit the existing top-level HTTP headers and
top-level `client_metadata` fields from the same
`CodexResponsesMetadata` struct for compatibility reasons.

Also, app-server clients who specify additional
`responsesapi_client_metadata` via `turn/start` and `turn/steer` will
have those fields merged into
`client_metadata["x-codex-turn-metadata"]`, but cannot override the
reserved fields that core uses (i.e. the fields in
`CodexResponsesMetadata`).

## Why

Responses API request instrumentation is the source of truth for
downstream Codex analytics that join requests by Codex IDs such as
session, thread, turn, and context window. Before this change, those
values were assembled through several request-specific paths: HTTP
request bodies, websocket handshake headers, websocket `response.create`
payloads, compaction requests, and the rich `x-codex-turn-metadata`
envelope all had their own wiring.

That made metadata propagation easy to drift across API-key/direct
Responses API requests, ChatGPT-auth/proxied requests, websocket
requests, and compaction requests. It also made additions like
`window_id` error-prone because a field could be added to one transport
projection but missed in another.

## What changed

- Added `CodexResponsesMetadata` as the core-owned snapshot for Codex
metadata sent to ResponsesAPI.
- Render `client_metadata["x-codex-turn-metadata"]`, flat
`client_metadata` projections, and direct compatibility headers from
that same snapshot.
- Include the known Codex-owned fields in the turn metadata blob,
including installation/session/thread/turn/window IDs, request kind,
lineage, sandbox/workspace metadata, timing, and compaction details.
- Treat app-server `responsesapi_client_metadata` as enrichment for the
Codex turn metadata blob while preventing those extras from overriding
Codex-owned fields.
- Use the same metadata path for normal turns, websocket prewarm, local
compaction, remote v1 compaction, and remote v2 compaction.
- Keep websocket connection-only preconnect metadata separate so
handshakes carry compatibility identity headers without inventing a fake
turn metadata blob.

## Verification

- `cargo check -p codex-core`
- `just fix -p codex-core`

Owen Lin · 2026-06-11 13:42:09 -07:00

14df0e8833

[codex] Load user instructions through an injected provider (#27101 )

## Why

We want to remove implicit use of `$CODEX_HOME` from `codex-core` and
make embedders responsible for supplying user-level instructions. This
also ensures user instructions load when no primary environment is
selected.

## What changed

Stacked on #27415, which makes `codex exec` surface thread-scoped
runtime warnings.

- Added `UserInstructionsProvider` to `codex-extension-api`, with
absolute source attribution and recoverable loading warnings.
- Added `codex-home` with the filesystem-backed provider for
`AGENTS.override.md` and `AGENTS.md`, preserving precedence, fallback,
trimming, lossy UTF-8 handling, and the existing uncapped global
instruction size.
- Removed global instruction loading from `Config` and require
`ThreadManager` callers to inject a provider.
- Load provider instructions once for each fresh root runtime, including
runtimes without a primary environment. Running sessions retain their
snapshot, while child agents inherit the parent snapshot without
invoking the provider.
- Keep provider instructions separate while loading project `AGENTS.md`,
then assemble the model-visible instructions with the existing ordering,
source attribution, warning, and turn-context behavior.
- Wired the Codex home provider through the CLI, app server, MCP server,
core facade, and thread-manager sample.

## Validation

- `just test -p codex-home -p codex-extension-api`
- `just test -p codex-core agents_md`
- `just test -p codex-core guardian`
- `just test -p codex-app-server
thread_start_without_selected_environment_includes_only_global_instruction_source`
- `just test -p codex-exec warning`
- `just bazel-lock-check`

Adam Perry @ OpenAI · 2026-06-11 19:28:47 +00:00

236b50125d

[codex] migrate ExecutorFileSystem paths to PathUri (#27424 )

## Why

We're moving exec-server to use PathUri for its internal path
representations.

## What

Move `ExecutorFileSystem` APIs to use `PathUri` instead of
`AbsolutePathBuf`. Future changes will convert higher-level parts of
exec-server.

Adam Perry @ OpenAI · 2026-06-11 18:44:18 +00:00

b2a4e3be27

Pair thread environment settings (#26687 )

## Why

Thread cwd and environment selections are a single logical setting in
core: updating one without the other can silently desynchronize the
next-turn execution context. This change makes that relationship
explicit in the internal thread settings flow while preserving the
existing app-server public API shape.

## What changed

- Moved the cwd/environment pair through internal
`ThreadSettingsOverrides.environment_settings` instead of a top-level
internal `cwd` field.
- Kept `thread/settings/update` public params unchanged, with app-server
translating top-level `cwd` into the paired internal settings shape.
- Moved `Op::UserInput` environment overrides into thread settings so
user turns and settings updates use the same core path.
- Updated core, app-server, MCP, memories, sample, and test callsites to
construct the paired settings shape.

## Verification

- `git diff --check`
- Local test run starting after PR creation.

pakrym-oai · 2026-06-08 13:55:15 -07:00

f3c1283411

fix: preserve approval sandbox decisions in unified exec (#24981 )

## Why

This PR fixes approval sandbox semantics in the unified-exec path. The
zsh-fork runtime exposed the bug because the shell can do meaningful
work before any intercepted child `execv(2)` exists: redirections,
builtins, globbing, and pipeline setup all happen in the launch process.
If the model requested `sandbox_permissions=require_escalated`, or an
exec-policy `allow` rule explicitly bypassed the sandbox, that approved
sandbox decision needs to be preserved for the launch path and for
intercepted execs that use the same approval machinery.

The behavior is not only about zsh fork. The production changes are in
shared approval/escalation code, so they also affect non-zsh-fork
intercepted exec paths that go through the same sandbox decision logic.
The narrow intent is to preserve the approval decision while still
keeping denied-read profiles and bounded additional-permission requests
sandboxed.

## Production Changes

- `codex-rs/core/src/tools/runtimes/unified_exec.rs`: derives a
`launch_sandbox_permissions` value from the requested sandbox
permissions and the runtime filesystem policy, then uses that value for
managed-network/env setup and launch sandbox selection. This keeps full
approval or policy-bypass decisions visible to the first unified-exec
attempt, while still preventing a full sandbox override from discarding
denied-read restrictions. Direct unified exec keeps the same decision
surface; the important difference is that zsh-fork launch setup no
longer accidentally loses the approved parent sandbox decision.

- `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`: makes
intercepted-exec escalation selection explicit for the three sandbox
permission modes. `UseDefault` only escalates when an exec-policy
decision allows sandbox bypass, `RequireEscalated` escalates when
unsandboxed execution is allowed, and `WithAdditionalPermissions`
escalates through the bounded additional-permissions path instead of
being treated as a full unsandboxed override. Unsandboxed intercepted
execs now also rebuild the environment as `RequireEscalated`, which
strips managed-network proxy variables consistently with other
unsandboxed execution.

## Test Coverage

Most of the PR is tests. The new coverage verifies:

- unified exec preserves parent approval and exec-policy sandbox
decisions for zsh-fork launch selection;
- bounded `with_additional_permissions` remains sandboxed and
permission-profile based;
- denied-read profiles are not weakened by parent approval;
- explicit prompt rules still prompt for intercepted execs after the
parent command is approved;
- unsandboxed intercepted execs strip managed-network env vars.

No documentation update is needed; this is an internal approval/sandbox
correctness fix.





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24981).
* #24982
* __->__ #24981

Michael Bolin · 2026-06-07 11:33:16 -07:00

e6c470957d

[codex] Use standalone tools for Responses Lite (#26490 )

## Summary

Responses Lite does not execute hosted Responses tools, so models using
it must route web search and image generation through Codex-owned
executors & standalone Response's API endpoints.

This PR is stacked on #26487.

## Validation

- `cargo test -p codex-core responses_lite_ --lib`
- `cargo test -p codex-core
standalone_executors_remain_hidden_without_flags_or_responses_lite
--lib`
- `cargo test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates --lib`
- `cargo test -p codex-web-search-extension -p
codex-image-generation-extension`
- `cargo test -p codex-app-server --test all standalone_`
- `cargo fmt --all -- --check`

rka-oai · 2026-06-06 00:23:40 +00:00

ffe90cb5c3

Require absolute cwd in thread settings (#26532 )

## Why

Thread settings cwd overrides are expected to be resolved before they
enter core. Keeping this boundary as a plain `PathBuf` made it easy for
core/session code to keep fallback normalization and relative-path
resolution logic in places that should only receive an already-resolved
cwd.

This is intentionally the absolute-cwd-only slice: it does not change
environment selection stickiness or cwd-to-default-environment fallback
behavior.

## What changed

- Changes `ThreadSettingsOverrides.cwd`,
`CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
use `AbsolutePathBuf`.
- Removes core-side cwd normalization/resolution from session settings
updates.
- Updates affected core/app-server test helpers and callsites to pass
existing absolute cwd values or use `abs()` helpers.

## Validation

Opening as draft so CI can start while local validation continues.

pakrym-oai · 2026-06-05 09:29:15 -07:00

40c8f1a007

[codex] Preserve logical paths during AGENTS.md discovery (#26465 )

## Intent

Follow up on #26205 by avoiding unnecessary filesystem canonicalization
during `AGENTS.md` discovery. The configured working directory is
already absolute, and canonicalization incorrectly switches symlinked
workspaces from their logical parent hierarchy to the target's
hierarchy.

## User-facing behavior

For a symlinked working directory such as:

```text
test-root/
|-- logical-repo/
|   |-- AGENTS.md              ("logical parent doc")
|   `-- workspace ------------> physical-repo/workspace/
`-- physical-repo/
    |-- AGENTS.md              ("physical parent doc")
    `-- workspace/
        `-- AGENTS.md          ("workspace doc")
```

Before this change, Codex canonicalized `logical-repo/workspace` to
`physical-repo/workspace` before discovery. It therefore loaded
`physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`,
ignoring the instructions from the repository through which the user
entered the workspace.

After this change, ancestor discovery walks the configured logical path,
so Codex loads `logical-repo/AGENTS.md`. Opening
`logical-repo/workspace/AGENTS.md` still follows the symlink through the
host filesystem, so the workspace document is also loaded.
`physical-repo/AGENTS.md` is not loaded.

## Implementation

Use the logical absolute working directory when discovering project
instructions and reporting instruction sources. Filesystem reads still
follow the working-directory symlink, so an `AGENTS.md` in the target
workspace continues to load while ancestor discovery uses the symlink's
parents.

## Validation

Added integration coverage proving that discovery loads the logical
parent's instructions and the target workspace's instructions, but not
the target parent's instructions.

Adam Perry @ OpenAI · 2026-06-04 15:08:52 -07:00

59ca34206b

Switch runtime to cloud config bundle (#24622 )

## Summary

- Adapts the moved `codex-cloud-config` crate from the legacy cloud
requirements endpoint to the new config bundle endpoint.
- Switches runtime consumers from `CloudRequirementsLoader` to
`CloudConfigBundleLoader` so one shared bundle supplies cloud-delivered
config and requirements.
- Removes the legacy cloud requirements domain loader path.

## Details

This intentionally keeps `codex-cloud-config` monolithic for review
lineage: the previous PR establishes the crate move, and this PR shows
the behavior change against that moved implementation. A follow-up PR
splits the module back into focused files.

The new bundle path preserves the important cloud requirements loader
semantics where intended: account-scoped signed cache, 30 minute TTL, 5
minute refresh cadence, retry/backoff, auth recovery, and fail-closed
startup loading. The cached payload changes from a single requirements
TOML string to the backend-delivered bundle, and validation rejects
malformed config or requirements fragments before cache write/use.

joeflorencio-openai · 2026-06-02 13:18:59 -07:00

d45cd26248

[codex] Wait for MCP readiness in core integration tests (#24964 )

Ensures MCP-backed `codex-core` integration tests exercise initialized
servers instead of racing server startup.

I've been idly investigating a few flakes and the failure modes are much
more confusing when a tool call fails because of a failed server start
than when the failed server start causes the test to fail directly.

Adam Perry @ OpenAI · 2026-05-29 10:22:27 -07:00

3e666dd32a

[codex] Support ui visibility meta for tools (#24700 )

## Summary

Adds support for the same ui.visibility metadata as resources

[spec](https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/draft/apps.mdx#resource-discovery)

Gabriel Peal · 2026-05-28 10:24:03 -07:00

577ec03bf8

Add experimental turn additional context (#24154 )

## Summary

Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.

## API Shape

The parameter shape is:

```ts
additionalContext?: Record<string, {
  value: string
  kind: "untrusted" | "application"
}> | null
```

Example:

```json
{
  "additionalContext": {
    "browser_info": {
      "value": "Active tab is CI failures.",
      "kind": "untrusted"
    },
    "automation_info": {
      "value": "CI rerun is in progress.",
      "kind": "application"
    }
  }
}
```

The keys are opaque and caller-defined.

## Context Injection

When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.

`kind: "untrusted"` entries are inserted with role `user`:

```text
<external_${key}>${value}</external_${key}>
```

`kind: "application"` entries are inserted with role `developer`:

```text
<${key}>${value}</${key}>
```

Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.

For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.

## Dedupe Strategy

`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.

Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.

Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.

## What Changed

- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.

pakrym-oai · 2026-05-26 13:02:34 -07:00

768848ab6f

Move MCP tool naming mode into manager (#21576 )

## Why

The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.

This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.

## What Changed

- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.

## Testing

- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`

pakrym-oai · 2026-05-26 08:21:15 -07:00

ff7513cd83

Honor client-resolved service tier defaults (#23537 )

## Why

Model catalog responses can now advertise a nullable
`default_service_tier` for each model. Codex needs to preserve three
distinct states all the way from config/app-server inputs to inference:

- no explicit service tier, so the client may apply the current model
catalog default when FastMode is enabled
- explicit `default`, meaning the user intentionally wants standard
routing
- explicit catalog tier ids such as `priority`, `flex`, or future tiers

Keeping those states distinct prevents the UI from showing one tier
while core sends another, especially after model switches or app-server
`thread/start` / `turn/start` updates.

## What Changed

- Plumbed `default_service_tier` through model catalog protocol types,
app-server model responses, generated schemas, model cache fixtures, and
provider/model-manager conversions.
- Added the request-only `default` service tier sentinel and normalized
legacy config spelling so `fast` in `config.toml` still materializes as
the runtime/request id `priority`.
- Moved catalog default resolution to the TUI/client side, including
recomputing the effective service tier when model/FastMode-dependent
surfaces change.
- Updated app-server thread lifecycle config construction so
`serviceTier: null` preserves explicit standard-routing intent by
mapping to `default` instead of internal `None`.
- Kept core responsible for validating explicit tiers against the
current model and stripping `default` before `/v1/responses`, without
applying catalog defaults itself.

## Validation

- `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
- `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
- `cargo test -p codex-tui service_tier`
- `cargo test -p codex-protocol service_tier_for_request`
- `cargo test -p codex-core get_service_tier`
- `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
service_tier`

Shijie Rao · 2026-05-20 15:57:50 -07:00

370b13afc9

Make local environment optional in EnvironmentManager (#23369 )

## Summary
- make `EnvironmentManager` local environment/runtime paths optional
- simplify constructor surface around snapshot materialization
- rename local env accessors to `require_local_environment` /
`try_local_environment`

## Validation
- devbox Bazel build for touched crate surfaces
- `//codex-rs/exec-server:exec-server-unit-tests`
- `//codex-rs/app-server-client:app-server-client-unit-tests`
- filtered touched `//codex-rs/core:core-unit-tests` cases

starr-openai · 2026-05-19 12:55:34 -07:00

5c43a64e2b

[5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508 )

**Stack position:** [5 of 7]

## Summary

This PR adds `Op::ThreadSettings`, a queued settings-only update
mechanism for changing stored thread settings without starting a new
turn. It also removes the legacy `Op::OverrideTurnContext` in the same
layer, so reviewers can see the replacement and deletion together.

## Changes

- Add `Op::ThreadSettings` for settings-only queued updates.
- Emit `ThreadSettingsApplied` with the effective thread settings
snapshot after core applies an update.
- Route settings-only updates through the same submission queue as user
input.
- Migrate remaining `OverrideTurnContext` tests and callers to the
queued `Op::ThreadSettings` path.
- Delete `Op::OverrideTurnContext` from the core protocol and submission
loop.

This stack addresses #20656 and #22090.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)

Eric Traut · 2026-05-18 21:03:51 -07:00

a668379abf

[3 of 7] Remove UserTurn (#23075 )

**Stack position:** [3 of 7]

## Summary

This PR finishes the input-op consolidation by moving the remaining
`Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`.
This touches a lot of files, but it is a low-risk mechanical migration.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075) (this PR)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)

Eric Traut · 2026-05-18 19:56:00 -07:00

1a25d8b6e5

[codex] Remove legacy shell output formatting paths (#22706 )

## Why

The client and tool pipeline still carried compatibility code for legacy
structured shell output. Current shell and apply_patch responses are
already plain text for model consumption, so keeping a
JSON-serialization path plus shell-item rewrite logic makes the request
formatter and tests preserve a format we do not need anymore.

## What Changed

- Removed the client-side shell output rewrite from
`core/src/client_common.rs`.
- Removed the structured exec-output formatter and the shell `freeform`
switch so tool emitters use one model-facing formatter.
- Collapsed apply_patch/shell serialization tests around the remaining
plain-text output expectations and removed duplicate one-variant
parameterized cases.
- Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility
input shape, but no longer treats it as a separate output-format mode.

## Validation

- `cargo test -p codex-core client_common`
- `cargo test -p codex-core shell_serialization`
- `cargo test -p codex-core apply_patch_cli`
- `just fix -p codex-core`

## Documentation

No external Codex documentation update is needed.

pakrym-oai · 2026-05-18 09:57:54 -07:00

82061660ae

Add user_input_requested_during_turn to MCP turn metadata (#22237 )

## Why
- Similar change as https://github.com/openai/codex/pull/21219
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with various key values.
- Issue: MCP servers currently do not know if user input was requested
during the turn (Ex: Model decides to prompt the user for approval
mid-turn before making a possibly risky tool call). MCP servers may want
to know this when tracking latency metrics because these instances are
inflated.

## What Changed
- With change: MCP turn metadata now includes
`user_input_requested_during_turn` when a model-visible
`request_user_input` call happened earlier in the turn, propagated in
`_meta["x-codex-turn-metadata"]`.
- `mark_turn_user_input_requested()` is called when user input is
requested through either MCP elicitation (`mcp.rs`) or the
`request_user_input` tool (`mod.rs`).
- MCP tool call `_meta` is now built immediately before execution
(`mcp_tool_call.rs`) so user input requested earlier in the same turn,
including within the same tool call via elicitation, is reflected in the
metadata.
- Normal `/responses` turn metadata headers are unchanged.

## Verification
- `codex-rs/core/src/session/mcp_tests.rs`
- `codex-rs/core/src/tools/handlers/request_user_input_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/tests/suite/search_tool.rs`

mchen-oai · 2026-05-15 01:26:50 +00:00

10cf1f79dd

Remove SSE fixture loaders (#22684 )

## Why

The Responses API test support already has structured SSE event
builders. Keeping separate JSON fixture loaders made small mock streams
harder to read and left an on-disk fixture for a single event.

## What changed

- Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str`
from `core_test_support`.
- Deleted the one `tests/fixtures/incomplete_sse.json` Responses API
fixture.
- Replaced the remaining call sites with `responses::sse(...)` and
existing event helpers.

## Validation

- `cargo test -p codex-core --test all
stream_no_completed::retries_on_early_close`
- `cargo test -p codex-core --test all
history_dedupes_streamed_and_final_messages_across_turns`
- `cargo test -p codex-core --test all review::`

pakrym-oai · 2026-05-15 00:40:32 +00:00

4bff020a96

chore(features) rm Feature::ApplyPatchFreeform (#22711 )

## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.

## Testing
- [x] unit tests pass

Dylan Hurd · 2026-05-14 16:15:56 -07:00

51b0e94105

Fix remote environment test fixtures (#22572 )

## Why
The Docker remote-env coverage was failing before it reached the
behavior those tests are meant to exercise. The remote-aware test
fixture only registered the remote environment, so tests that
intentionally select both `local` and `remote` could not start a turn.
After that was fixed, two tests exposed stale fixtures: the approval
test was auto-approving under workspace-write, and the remote
`view_image` test was writing invalid PNG bytes.

## What Changed
- Added `EnvironmentManager::create_for_tests_with_local(...)` so tests
can keep the provider default while also selecting `local` explicitly.
- Updated `build_remote_aware()` to use that test-only manager when a
remote exec-server URL is present.
- Changed the remote apply-patch approval helper to use
`SandboxPolicy::new_read_only_policy()` so the test actually exercises
approval caching per environment.
- Replaced the hardcoded remote `view_image` PNG blob with the existing
`png_bytes(...)` helper so the test uses a valid image fixture.

## Validation
Ran these isolated Docker remote-env tests on the devbox with
`$remote-tests` setup:
-
`suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment`
-
`suite::remote_env::apply_patch_approvals_are_remembered_per_environment`
-
`suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment`
-
`suite::remote_env::exec_command_routes_to_selected_remote_environment`
- `suite::view_image::view_image_routes_to_selected_remote_environment`

All five pass.

starr-openai · 2026-05-14 12:40:01 -07:00

255748638c

253 Commits