codex

[codex] Use model metadata for skills usage instructions (#29740 )

## Summary

- add a false-by-default `include_skills_usage_instructions` model
metadata field
- enable the field for the bundled `gpt-5.5` model metadata
- consume the metadata in both core and extension skill rendering
- remove hardcoded legacy-model matching and its marker plumbing

ani-oai · 2026-06-29 09:44:36 +09:00

6b5f5743b3

[codex] Enable remote plugins by default (#30297 )

## Summary

- enable the remote plugin feature by default
- promote the remote plugin feature from under development to stable
- preserve the existing `features.remote_plugin` override for explicitly
disabling it
- keep legacy disabled-path coverage explicit in TUI and app-server
tests

## Impact

Remote plugin functionality is enabled by default for configurations
that do not set the feature flag. The existing Codex backend
authentication gate still applies.

## Validation

- `just fmt`
- `just test -p codex-features`
- `just test -p codex-tui
plugins_popup_remote_section_fallback_states_snapshot`
- targeted `codex-app-server` plugin-list and skills-list tests
- `git diff --check`

The full TUI and app-server suites were also exercised locally. All
remote-plugin-related coverage passed; unrelated local
sandbox/test-binary failures remain outside this change.

xl-openai · 2026-06-28 11:46:25 -07:00

e428a12d22

[app-server] increase currentTime/read timeout (#30384 )

## Summary

Increase the external currentTime/read request timeout from 5 seconds to
10 seconds.

## Validation

- just fmt
- Focused app-server test build was stopped to defer validation to CI.

rka-oai · 2026-06-27 16:42:03 -07:00

bdd282f3bb

[app-server] expose environment info RPC (#30291 )

## Why

App-server clients that configure named execution environments need to
discover an environment's shell and working directory before selecting
it for a thread or turn. Because the environment can run on a different
operating system than app-server, its working directory is represented
as a canonical `file:` URI rather than a host-local path string. The
probe also needs a bounded response time: an exec-server that completes
initialization but never answers `environment/info` must not hold the
environment serialization queue indefinitely.

## What changed

- Add an experimental `environment/info` app-server RPC for named
environments.
- Route the probe through the managed environment connection and return
target-native shell metadata plus the default working directory as a
`PathUri`.
- Return connection and protocol failures as JSON-RPC errors.
- Bound the exec-server probe response to 30 seconds and remove
timed-out calls from the pending-request table so later environment
mutations can proceed.
- Cover successful responses, omitted working directories, unknown
environments, connection failures, and pending-call cleanup.

## Protocol examples

Request:

```json
{
  "id": 42,
  "method": "environment/info",
  "params": {
    "environmentId": "remote-a"
  }
}
```

Successful response:

```json
{
  "id": 42,
  "result": {
    "shell": {
      "name": "zsh",
      "path": "/bin/zsh"
    },
    "cwd": "file:///workspace"
  }
}
```

If the exec-server initializes but does not answer the probe within 30
seconds:

```json
{
  "id": 42,
  "error": {
    "code": -32603,
    "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
  }
}
```

## Testing

- App-server integration coverage for successful info (including omitted
`cwd`), unknown environments, and connection failures.
- Exec-server RPC coverage verifying a timed-out call is removed from
the pending-request table.

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>

Max Johnson · 2026-06-27 19:34:10 +00:00

e2398d0b16

app-server: structure and test JSON shutdown logs (#30314 )

## Why

`LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
behavior was only covered indirectly. We should verify the actual JSONL
written by both user-facing entry points: `codex app-server` and the
standalone `codex-app-server` binary.

The existing processor shutdown message also always said the channel
closed, even though the processor can exit for several different
reasons. Structured fields make that event more accurate and useful to
log consumers.

## What changed

- Record the processor `exit_reason`, remaining connection count, and
forced-shutdown state as structured tracing fields.
- Add a shared process-test helper that enables JSON logging, validates
every stderr line as JSON, and verifies the top-level timestamp is RFC
3339.
- Cover both `codex app-server` and `codex-app-server`, asserting the
stable `level`, `fields`, and `target` payload.

## Test plan

- `just test -p codex-app-server
standalone_app_server_emits_json_info_events`
- `just test -p codex-cli app_server_emits_json_info_events`

Michael Bolin · 2026-06-26 18:19:56 -07:00

4f1b5a4b73

[codex] Support npm marketplace plugin sources (#29375 )

## Why

Marketplace source deserialization treated `{"source":"npm", ...}` as
unsupported. The loader logged and skipped the entry, so npm-backed
plugins never appeared in `plugin list --available` and `plugin add`
returned "plugin not found".

Codex plugins are installed from a plugin root, not from an npm
dependency tree. For npm-backed marketplace entries, Codex should fetch
the published package contents without running package scripts or
installing unrelated dependencies.

## What changed

- Add `npm` marketplace plugin sources with `package`, optional semver
`version` or version range, and optional HTTPS `registry`.
- Reject unsafe npm source fields before materialization, including
invalid package names, non-semver version selectors, plaintext or
credential-bearing registry URLs, and registry query/fragment data.
- Materialize npm plugins with `npm pack --ignore-scripts`, then unpack
the resulting tarball through the existing hardened plugin bundle
extractor.
- Enforce npm archive and extracted-size limits, require the standard
npm `package/` archive root, and verify the extracted `package.json`
name matches the requested package before installing.
- Keep plugin listings, install-source descriptions, CLI JSON/human
output, app-server v2 `PluginSource`, TUI source summaries, regenerated
schema fixtures, and app-server documentation in sync.

## Impact

Marketplaces can distribute Codex plugins from public or configured
private HTTPS npm registries using the same install flow as existing
materialized plugin sources. `npm` must be available on `PATH` when an
npm-backed plugin is installed.

Fixes #27831

## Validation

- `just write-app-server-schema`
- `just test -p codex-core-plugins -p codex-app-server-protocol -p
codex-app-server -p codex-cli`
  - npm/schema/core-plugin coverage passed in the run.
- The full focused command finished with `1739 passed`, `11 failed`, and
`6 timed out`; the failures were unrelated local app-server environment
failures from `sandbox-exec: sandbox_apply: Operation not permitted`
plus one missing `test_stdio_server` helper binary.
- Installed an npm-published Codex plugin package through a throwaway
local marketplace and throwaway `CODEX_HOME` to exercise the real npm
materialization path end to end.

charlesgong-openai · 2026-06-26 17:24:46 -04:00

6509f3148a

feat(app-server): add optional turn_id to thread/fork (#30277 )

## Description

This adds stable optional `turnId` support to `thread/fork`. When
supplied, the fork copies persisted history through that terminal turn,
inclusive, and drops later turns from the new thread.

Omitting or passing `null` preserves the existing full-history fork
behavior, including the interruption marker when the stored source
history ends mid-turn.

## Why

We're deprecating `thread/rollback` and this will help certain UX use
cases work around it by using `thread/fork` + `turn_id` instead.

Owen Lin · 2026-06-26 19:35:54 +00:00

f72976a5f1

[codex] allow AGENTS.md and skills to authorize delegation (#30274 )

Prompt update of MAv2 to include agents.md and skills more explicitly

should mimic: https://github.com/openai/codex/pull/27919

Charles Du · 2026-06-26 12:17:26 -07:00

79a8ffdbf7

[codex] Add managed new-thread model settings (#29683 )

## Why

Admins need persistent defaults for the model, reasoning effort, and
service tier shown when the Desktop App creates a new thread. These are
initialization defaults rather than runtime constraints: the App should
use them to initialize its draft while still allowing a user to make an
explicit selection.

The app-server therefore needs to expose the managed values before
thread creation without changing `thread/start` behavior for other
clients.

## What changed

- Parse `model`, `model_reasoning_effort`, and `service_tier` from
`[models.new_thread]` in `requirements.toml`.
- Compose the `models` requirements through the existing
requirements-layer precedence rules.
- Expose the resolved values through `configRequirements/read` as
`requirements.models.newThread`.
- Add the corresponding app-server protocol types and regenerate the
JSON and TypeScript schema fixtures.
- Document the new `configRequirements/read` fields in the app-server
README.

## Scope

This PR is data plumbing only. It does not apply these values during
`thread/start` and does not change thread creation for existing
app-server clients, resumed or forked sessions, internal or subagent
sessions, `codex exec`, or the TUI. A companion Desktop App change owns
draft initialization, sends the effective settings for ordinary and
prewarmed starts, and preserves explicit user changes.

## Validation

- Requirements deserialization coverage for `[models.new_thread]`
- Requirements-layer precedence coverage
- App-server API mapping coverage
- `configRequirements/read` integration coverage
- Regenerated app-server JSON and TypeScript schema fixtures

hefuc-oai · 2026-06-26 18:37:40 +00:00

d9cf931d0e

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

Test selected capabilities across unavailable resume (#30215 )

## Why

The selected-capability integration test already covers initial
attachment and cold resume, but it resumes while the selected executor
is still reachable.

That leaves an important World State transition untested: a thread
remembers its selected capability root, resumes while that environment
is unavailable, and later sees the same stable environment return.

## What this tests

This extends the existing end-to-end scenario:

```text
selected executor available
        ↓
app-server stops and the executor goes away
        ↓
thread resumes with the executor unavailable
        ↓
skills, selected MCP tools, and connector attribution are absent
        ↓
the same environment ID is attached again
        ↓
skills, MCP tools, and connector attribution return
```

The test also checks that the unavailable snapshot explicitly tells the
model that no selected-environment skills are currently available. After
reattachment, it invokes the selected skill again and verifies that a
new executor-owned MCP process starts.

## Scope

This is test-only. It keeps the existing assumption that an environment
ID refers to stable capability contents. It does not add package-file
invalidation or live transport reconnect behavior.

jif · 2026-06-26 11:02:27 +01:00

3c03bb4f18

Test selected capabilities across availability and resume (#30157 )

## Why

This stack crosses World State, executor skills, selected plugin
metadata, MCP processes, connectors, dynamic environments, and resume.
This PR adds two end-to-end scenarios that validate those pieces
together.

Both tests enable `deferred_executor`, so they exercise the real
delayed-environment path.

## Scenario 1: availability across turns and resume

```text
1. Start a thread with one selected plugin root bound to E1.
2. E1 is unavailable.
   - executor skill is absent
   - selected MCP is absent
   - connector has no selected-plugin attribution
3. Start E1 and register the same stable environment ID.
4. Start a new turn.
   - the executor skill appears through World State
   - its body beats a colliding host skill
   - the selected MCP tool is advertised and executes inside E1
   - the connector is attributed to the selected plugin
5. Start another turn without changing E1.
   - the MCP PID stays the same, proving runtime reuse
6. Restart app-server and resume the thread.
   - durable selected-root intent is restored
   - skills, MCP, and connector attribution are restored
   - a new MCP PID proves ephemeral process state was rebuilt
```

## Scenario 2: availability changes inside one turn

```text
1. Start a turn while E1 is unavailable.
2. The first model sample sees no executor skill, MCP, or selected connector.
3. The turn pauses on request_user_input.
4. Start E1 and register it while that same turn is still active.
5. Continue the turn.
6. The very next model sample sees:
   - the executor skill catalog
   - the selected MCP tool
   - selected-plugin connector attribution
7. The model calls the MCP, and its output proves execution happened inside E1.
```

This second scenario specifically protects the aeon-style behavior:
capability state is captured again for every sampling step, not only at
the next user turn.

## Scope

These are integration tests only. They do not add a combinatorial matrix
for unsupported plugin-file mutation, environment generations, transport
disconnects, or delayed `required = true` executor MCPs.

jif · 2026-06-26 03:11:55 +01:00

25f50de6ed

Expose MCP app identity in app context (#29934 )

## Why

MCP tool-call events need to expose trusted app identity and action
metadata directly so v2 clients do not have to infer it from tool names
or resource URIs.

## What changed

- Add optional `appName`, `templateId`, and `actionName` fields to MCP
tool-call `appContext`.
- Populate `appName` and `templateId` from trusted Codex Apps metadata,
and derive `actionName` from the trusted app resource metadata.
- Preserve all three fields through core events, legacy protocol events,
persisted thread history, resume redaction, and app-server v2 responses.
- Document the public `appContext` fields in
`codex-rs/app-server/README.md`.
- Regenerate app-server JSON and TypeScript schemas and add coverage for
serialization, persistence, redaction, and metadata propagation.

## Validation

- `just test -p codex-app-server-protocol mcp_tool_call`
- `just test -p codex-core
mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
mcp_tool_call_item_includes_app_identity`
- `just write-app-server-schema`

---------

Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>

Martin Au-Yeung · 2026-06-25 18:31:10 -07:00

ec300bc7bd

Keep MCP elicitation routable across runtime refreshes (#30127 )

## Why

An MCP tool call can still be waiting for an elicitation response when
an environment update replaces the thread's MCP runtime.

Before this change:

```text
runtime A starts a tool call and asks the user
environment becomes ready, so runtime B is published
client answers the prompt through runtime B
runtime B cannot find runtime A's pending responder
```

The response is lost and the original tool call stays blocked.

## What changed

All MCP runtimes for one thread now share a small elicitation router:

```text
runtime A ---\
               shared router: response token -> exact pending responder
runtime B ---/
```

When Codex surfaces an MCP elicitation, it assigns a unique opaque
response token. The router records which pending request owns that
token. A replacement runtime reuses the same router, so the latest
runtime can deliver a response to a request started by the previous
runtime.

The Codex-owned token also prevents two runtime connections that reuse
the same MCP server request ID from receiving each other's responses.

This does not retain or search old MCP managers. Only the pending
responder map is shared.

## Covered scenario

The integration test exercises the complete failure mode:

1. A thread starts while its selected environment is still unavailable.
2. A configured MCP server starts a tool call and asks the client for
input.
3. The environment becomes ready, causing Codex to publish a replacement
MCP runtime.
4. The client answers the original prompt after the replacement.
5. The original tool call receives that answer and completes.

A focused routing test also creates two runtimes with the same server
request ID and verifies that each response reaches the exact request
that emitted its token.

## Scope

This PR changes only elicitation response routing across MCP runtime
replacement. It does not change when runtimes are rebuilt, which
environments contribute MCP configuration, or how environment
availability is detected.

jif · 2026-06-26 01:28:14 +00:00

fb8598df3f

[codex] Attribute app-server analytics by thread originator (#29935 )

## Why

Desktop Work threads and regular Codex threads can share the same
app-server connection. App-server analytics currently copy
`product_client_id` from connection metadata for every thread-scoped
event, so Work thread activity is attributed to the Desktop connection
instead of the thread's resolved originator. This prevents analytics
from distinguishing the two products on a shared connection.

## What changed

- Publish the resolved originator after a thread is materialized,
covering new, resumed, forked, and subagent threads.
- Store that originator in the analytics reducer's existing per-thread
state.
- Override only `app_server_client.product_client_id` for thread, turn,
tool, review, goal, guardian, and compaction events while preserving the
connection's client name, version, and transport metadata.
- Fall back to the connection-wide product client ID when a thread has
no originator override.
- Preserve persisted originators in thread initialization analytics for
resume and fork flows.

## Validation

- `just test -p codex-analytics
thread_originator_overrides_shared_connection_across_thread_events
subagent_events_keep_thread_originator_with_explicit_turn_connection`
- `just test -p codex-app-server
turn_start_tracks_thread_originator_in_analytics
thread_start_tracks_thread_initialized_analytics
thread_fork_tracks_thread_initialized_analytics
thread_resume_tracks_thread_initialized_analytics`
- `just test -p codex-core thread_manager`

alexsong-oai · 2026-06-25 18:15:48 -07:00

841f30598c

Project selected plugin runtime by environment availability (#30093 )

## Why

Selected plugin metadata is stable, but MCP processes are live runtime
state. They need different lifetimes:

- the MCP extension caches manifest, MCP, and connector declarations for
each stable selected root;
- each model step projects that cached metadata through the roots that
resolved as ready for that exact step;
- the MCP manager is rebuilt only when that availability projection
changes.

This matches executor skills: both features consume the same resolved
step roots instead of inferring readiness from the turn's selected
environments.

## Behavior

```text
E1 not ready for this step
  -> no E1 MCP servers or connectors
  -> cached plugin metadata stays in ext/mcp

E1 becomes ready
  -> reuse cached metadata
  -> publish one MCP runtime containing E1 capabilities

same ready roots on the next step
  -> reuse the exact runtime; no rediscovery and no MCP restart

resume
  -> create new extension thread state and a new MCP runtime
```

All model-facing consumers use the same step snapshot:

```text
resolved selected roots
        |
        v
extension MCP/connector projection
        |
        v
{ MCP config, connector snapshot, MCP manager }
        |
        +-> advertise model tools
        +-> build app/connector tools
        +-> execute MCP calls
```

## Cache contract

The existing MCP extension owns a cache keyed by the full
`SelectedCapabilityRoot`:

```rust
let state = thread_store.get_or_init(SelectedExecutorPluginMcpState::default);
```

The cache lives with extension thread state. Environment availability
filters projection but does not invalidate metadata. Resume creates new
thread state. There is no file watcher or executor generation because
contents behind a stable environment/root are assumed stable.

## What changes

- Keeps executor plugin discovery and cached metadata in `ext/mcp`.
- Caches MCP and connector declarations together per selected root.
- Uses the step's already-resolved capability roots, including lazy
environments that are not turn environments.
- Reuses the current MCP runtime when the ready-root projection is
unchanged.
- Uses the same step MCP manager and connector snapshot for
model-visible tools and execution.
- Resolves direct thread-scoped MCP requests from the current
selected-root projection.

## Deliberately out of scope

- `app/list` remains based on the latest global host-plugin state; this
PR does not make its response or notifications thread-specific.
- `required = true` startup semantics do not apply to delayed executor
MCP activation.
- No filesystem/content invalidation.
- No transport-disconnect watcher.
- No executor generations or environment replacement semantics.
- No client sharing across complete manager replacements.

## Stack

1. Extension-owned World State sections.
2. Project executor skills through World State.
3. Pin one MCP runtime to each model step.
4. **This PR:** project selected MCP and connector state from
extension-owned metadata.
5. Integration coverage for selected capability availability and resume.

## Verification

-
`selected_plugin_servers_use_managed_requirements_for_the_selected_root_id`
- The stacked integration PR covers unavailable to ready activation,
unchanged-runtime reuse, skills, MCP tools, connector attribution, and
cold resume.

jif · 2026-06-26 01:36:44 +01:00

3095ea9c3d

Pin MCP runtimes to model steps (#30101 )

## Why

An MCP refresh can replace the session's current manager while a model
step is still running. The step must execute calls through the same
manager whose tools it advertised.

## Boundary

```text
current session MCP runtime
          |
          | capture once for this model step
          v
StepContext.mcp
  - exact MCP config
  - exact connection manager
  - exact runtime environment context
```

```rust
pub struct McpRuntimeSnapshot {
    config: Arc<McpConfig>,
    manager: Arc<McpConnectionManager>,
    runtime_context: McpRuntimeContext,
}
```

## Example

```text
step A captures runtime A and advertises A's tools
refresh publishes runtime B
step A tool call -> runtime A
next step        -> runtime B
```

Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
or make an RPC.

## What changes

- Captures one MCP runtime in `StepContext`.
- Uses it for tool planning, tool calls, resources, approvals, connector
attribution, and elicitation.
- Publishes replacement runtimes atomically.
- Lets an old runtime live only while an in-flight step or request still
holds its `Arc`.

Most of this diff is mechanical routing from the session-global manager
to `step_context.mcp`; it does not introduce selected-plugin discovery
yet.

## What does not change

- No plugin or extension migration.
- No new MCP cache policy.
- No environment file watching.
- No client sharing between separate managers.

## Stack

1. Extension-owned World State sections.
2. Project executor skills through World State.
3. **This PR:** pin one MCP runtime to each model step.
4. Project selected MCP/app/connector metadata by environment
availability.
5. One end-to-end integration scenario.

jif · 2026-06-26 00:53:07 +01:00

ee9e0f6387

[codex] Surface MCP reauthentication-required startup failures (#29877 )

## Summary

- distinguish expired, non-refreshable stored MCP OAuth credentials from
first-time missing credentials
- carry a typed `failureReason: "reauthenticationRequired"` on the
existing `mcpServer/startupStatus/updated` notification only when user
action is required
- keep the public MCP auth-status API unchanged and regenerate the
app-server protocol schemas and documentation

## Why

An MCP server with an expired access token and no usable refresh token
currently fails startup without giving clients a reliable, typed
recovery signal.

The existing startup-status notification is the natural place to carry
this state. Its nullable `failureReason` keeps the recovery reason
attached to the failed startup transition without adding a one-off
notification. Internally, Codex distinguishes first-time login from
reauthentication and emits the reason only when the startup error itself
requires authentication.

## User impact

App clients can prompt an existing user to reconnect an MCP server when
automatic recovery is impossible by handling a failed
`mcpServer/startupStatus/updated` notification whose `failureReason` is
`reauthenticationRequired`. Starting, ready, cancelled, unrelated
failures, and first-time setup carry no reauthentication reason.

## Companion app PR

- openai/openai#1069582

## Validation

- `just test -p codex-app-server-protocol` — 248 passed; schema fixture
tests passed
- `cargo check -p codex-app-server -p codex-tui`
- `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped
- `just test -p codex-protocol -p codex-app-server-protocol -p
codex-mcp` — 579 passed
- `just write-app-server-schema`
- `just fmt`

felixxia-oai · 2026-06-25 21:50:36 +00:00

a6d20ed297

fix(app-server): suppress TUI rollback warning (#30124 )

## Why

The TUI uses `thread/rollback` internally for user-facing flows such as
prompt cancellation/backtracking. After `thread/rollback` was marked
deprecated, those internal calls started surfacing `deprecationNotice`
messages in the TUI, even though the user did not explicitly call the
deprecated app-server API.

The endpoint should remain deprecated for external app-server clients,
but the built-in `codex-tui` client should not show this
implementation-detail warning during normal interaction.

## What changed

- Pass the initialized app-server client name into the `thread/rollback`
request processor.
- Suppress the `thread/rollback` deprecation notice only for
`codex-tui`.
- Preserve the existing `deprecationNotice` behavior for non-TUI
clients.
- Add regression coverage for the `codex-tui` suppression path.

## How to Test

1. Start Codex TUI from this branch.
2. Type text into the composer and press `Esc` to cancel/backtrack.
3. Confirm the TUI restores/cancels the prompt without showing
`thread/rollback is deprecated and will be removed soon`.
4. Also verify an external app-server client that calls
`thread/rollback` still receives `deprecationNotice`.

Targeted tests:

- `just test -p codex-app-server thread_rollback`
- `just argument-comment-lint`

Felipe Coury · 2026-06-25 18:44:35 -03:00

b80fbb70cd

feat(core, mcp): cache codex_apps tools in memory (#29003 )

## Description

This makes Codex Apps tool reads use a shared in-memory snapshot instead
of rereading the disk cache every time `list_all_tools()` runs. Disk
still seeds the cache on startup and gets updated after successful
fetches, but it is no longer the live read path.

The core change is that `McpManager` now owns a process-scoped
`CodexAppsToolsCache`. Codex threads in the same app-server process now
share this Codex Apps in-memory tools snapshot. The snapshot is keyed by
the Codex home plus the Codex Apps identity: the active Codex auth
user/workspace and the effective Codex Apps MCP source config.

There's already code to hard-refresh the cache, so we respect it in this
PR.

## Local benchmark

I ran a local steady-state microbenchmark of the exact repeated Codex
Apps cached-tools read this PR removes, using the same real local cache
payload in both trees: `3,678,138` bytes and `381` tools. The cache file
was already warm in the OS page cache, so this measures same-process
reread/deserialization work rather than cold-disk latency or full turn
latency. Each run is 25 iterations (mimicking a turn that makes 25
inference calls).

| Version | Run 1 | Run 2 | Avg |
|---|---:|---:|---:|
| `origin/main` disk read + JSON deserialize + `filter_tools` | `50.755
ms` | `52.894 ms` | `51.825 ms` |
| This branch in-memory `current_tools` + `filter_tools` | `0.740 ms` |
`0.778 ms` | `0.759 ms` |

That removes about `51 ms` from each repeated Codex Apps cached-tools
read on this machine, roughly `68x` faster for that subpath. It is
useful evidence for the hot path this PR changes, but not a claim that
every production turn gets `51 ms` faster; end-to-end impact also
depends on the rest of `list_all_tools()` and tool-payload construction.

This is on my M2 Max macbook, so with a slower disk this would be much
worse (and indeed we did see this really blew up turn runtime with a
slow disk).

Owen Lin · 2026-06-25 20:54:48 +00:00

703793c22e

[codex] poll external clock during sleep (#30113 )

## Summary

- make the external app-server time provider establish sleep deadlines
using `currentTime/read`
- poll the external clock once per second and complete `clock.sleep`
when the deadline is reached
- keep the system-clock timer and existing steer/agent-message
interruption behavior unchanged

## Why

This lets training control `clock.sleep` through its existing external
simulated clock without adding separate sleep/wake protocol methods.

## Testing

- `just fmt`
- `just test -p codex-app-server
external_sleep_polls_current_time_and_emits_items`

rka-oai · 2026-06-25 13:46:42 -07:00

62c7f506d9

feat: add provider-aware model fallback to thread start (#29942 )

## Why

Helper threads such as task title generation can request a model ID that
is valid for the default OpenAI provider but unavailable from the active
provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
provider static catalog exposes Bedrock model IDs such as
`openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
404s and can surface a misleading turn error even when the main turn
succeeds.

Clients need an explicit way to ask app-server to resolve an unavailable
helper model to the active provider default. That fallback must remain
limited to providers with an authoritative static catalog so custom or
dynamically discovered model IDs are not rewritten based on an
incomplete catalog.

Fixes #28741.

## What changed

- Add the experimental `allowProviderModelFallback` option to
`thread/start`, defaulting to `false` to preserve existing behavior.
- Thread the option through thread creation and model selection.
- When enabled for a static model manager, preserve requested models
present in the catalog and replace unavailable models with the provider
default.
- Continue preserving explicit model IDs for dynamic model managers
without fetching a catalog solely to validate them.
- Document the new `thread/start` behavior in the app-server API
overview.

## Test
Temporary test-client harness:
```
ThreadStartParams {
    model: Some("gpt-5.4-mini".to_string()),
    allow_provider_model_fallback: true,
    ..Default::default()
}
```
Command:
```
CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
./target/debug/codex-app-server-test-client \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  send-message-v2 --experimental-api ignored
```
Relevant output:
```
> "method": "thread/start",
> "params": {
>   "model": "gpt-5.4-mini",
>   "modelProvider": null,
>   "allowProviderModelFallback": true,
>   ...
> }

< "result": {
<   "model": "openai.gpt-5.5",
<   "modelProvider": "amazon-bedrock",
<   ...
< }
```

Celia Chen · 2026-06-25 18:24:34 +00:00

6d9dbacf1a

Persist selected capability roots and resolve availability per model step (#29856 )

## Why

`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”

The important product assumption is:

> One environment ID always names the same logical executor and stable
contents.

`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.

The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.

## The boundary

```text
persisted thread intent
plugin@1 -> environment "worker"
|
| capture the current step
v
model-step view
unavailable, or
plugin@1 + worker's exact captured ready handle
```

The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.

## What changes

### Persist the stable selection

Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.

Only stable data is persisted: root ID, environment ID, and root path.

### Capture readiness together with the exact handle

The environment snapshot records:

```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None // still starting in this step
```

This prevents readiness and execution from coming from different
registry snapshots.

For example:

```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```

Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.

This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.

### Resolve availability per model step

- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.

Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.

## Example

```text
thread selection: plugin@1 -> environment "worker"

step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```

Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.

## Compatibility

The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.

## Stack

1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.

jif · 2026-06-25 17:49:43 +00:00

8f02973d25

chore(app-server): mark thread/rollback as deprecated (#29928 )

We will drop support for this in the near future due to the complexity
it introduces.

Owen Lin · 2026-06-25 17:15:46 +00:00

268328001f

Test executor-routed MCP OAuth token exchange (#29656 )

## Why

#28529 proves OAuth discovery uses the selected executor, but its
end-to-end test stops before the callback and token exchange.

## What changed

- add an executor-only mock token endpoint
- complete the OAuth callback using the authorization URL's `state` and
`redirect_uri`
- assert the PKCE token exchange reaches the executor-only endpoint
- assert the completion notification reports the selected thread and
succeeds

Depends on #28529.

jif · 2026-06-25 09:45:20 +00:00

c38b2e9ba6

Support OAuth for HTTP MCP servers from selected executor plugins (#28529 )

## Why

#28522 routes selected-plugin HTTP MCP traffic through the owning
executor, but OAuth bootstrap and refresh still used host-local clients.
Executor-only servers therefore cannot complete discovery or login
through the same network boundary as the MCP connection.

## What changed

- adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
contract
- let RMCP own discovery, dynamic registration, PKCE, token exchange,
and refresh
- route auth status, persisted-token startup, and app-server login
through the server runtime while preserving the existing local discovery
path
- add optional `threadId` to `mcpServer/oauth/login` and echo it in the
completion notification
- implement RMCP's redirect policy and 1 MiB OAuth response limit over
executor HTTP
- cover selected-thread OAuth discovery and login through an
executor-only route

Depends on #28522.

jif · 2026-06-25 10:31:17 +01:00

b215961a56

Support HTTP MCP servers from selected executor plugins (#28522 )

## Why

Selected executor plugins can declare both stdio and Streamable HTTP MCP
servers, but only stdio registrations were retained. That silently drops
part of the plugin's tool surface and prevents HTTP traffic from using
the owning executor's network.

## What changed

- retain selected-plugin Streamable HTTP MCP declarations alongside
stdio declarations
- route their HTTP clients through the owning executor environment
- preserve local auth-header environment references while rejecting them
for executor-hosted declarations
- cover thread isolation, refresh, and an executor-only HTTP route end
to end

jif · 2026-06-25 10:10:36 +01:00

6368937939

[codex] route sleep through time providers (#29973 )

## Summary

- add a cancellable sleep operation to `TimeProvider`
- route `clock.sleep` through the configured provider
- extend the supported sleep duration to 12 hours
- complete the sleep turn item before propagating provider failures

## Why

This isolates the core clock abstraction needed by external clock
integrations. Existing system and app-server behavior remains wall-clock
based in this PR; the stacked follow-up supplies app-server sleeps from
an external clock.

rka-oai · 2026-06-24 22:17:43 -07:00

f66d793a2d

[codex] Add Ultra reasoning effort (#29899 )

## Why

Ultra should be one user-facing reasoning selection for work that
benefits from both maximum reasoning and proactive multi-agent
delegation. Without it, clients must coordinate maximum reasoning with
the experimental `multiAgentMode` setting, even though the inference
backend still expects its existing `max` effort value.

This change makes reasoning effort the source of truth: clients select
`ultra`, core derives proactive multi-agent behavior when the turn is
eligible for multi-agent V2, and inference requests continue to use the
backend-compatible `max` value.

## What changed

- Add `ultra` as a first-class reasoning effort and preserve
model-catalog ordering when exposing it to clients.
- Convert `ultra` to `max` at the inference request boundary, including
Responses HTTP/WebSocket requests, startup prewarm, compaction, and
memory summarization.
- Derive effective multi-agent mode per turn from effective reasoning
effort:
  - eligible multi-agent V2 + `ultra` → `proactive`
  - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
- V1 or otherwise ineligible sessions → no multi-agent mode instruction
- Keep the derived effective mode in turn context history so successive
turns can emit a developer-message update only when the effective mode
changes.
- Remove selected multi-agent mode from core session configuration, turn
construction, thread settings, resume/fork restoration, and subagent
spawn plumbing. Subagents inherit reasoning effort and derive their own
effective mode.
- Retain the experimental app-server `multiAgentMode` fields for wire
compatibility while marking them deprecated. Request values are accepted
but ignored; compatibility response fields report `explicitRequestOnly`.
- Display Ultra in the TUI using the order supplied by `model/list`.

## Validation

- `just test -p codex-core ultra_reasoning_uses_max_for_requests`
- `just test -p codex-tui model_reasoning_selection_popup`

Shijie Rao · 2026-06-24 20:13:52 -07:00

df1199fddb

[codex] Populate remote plugin local versions (#29956 )

# What

- Carry installed remote release versions through remote plugin
summaries as `localVersion`.
- Keep the app-server mapping a pure adapter by populating that value in
the remote catalog layer.

# Why

Remote plugin summaries always returned `localVersion: null` even after
their versioned bundles had been installed locally. Consumers such as
scheduled-task template discovery use `localVersion` to resolve a
plugin's materialized root, so templates from remote curated plugins
were silently skipped.

Abhinav · 2026-06-25 03:13:03 +00:00

6db937275f

[codex] nest sleep config under current time reminder (#29910 )

## Summary

- move sleep tool enablement from top-level `[features].sleep_tool` to
`[features.current_time_reminder].sleep_tool`
- remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
from resolved current-time configuration
- update config schema, config-lock materialization, and existing sleep
coverage

Stacked on #29907.

rka-oai · 2026-06-24 17:49:00 -07:00

35f5d02464

[codex] namespace sleep under clock (#29907 )

## Summary

- expose the interruptible sleep tool as `clock.sleep` instead of
top-level `sleep`
- keep `clock.curr_time` and `clock.sleep` in the same model-visible
namespace when both features are enabled
- update existing core and app-server integration coverage to issue
namespaced sleep calls

## Why

Sleep is a clock operation. Grouping it with `clock.curr_time` gives the
model a more coherent tool surface without changing the sleep feature
gate or runtime behavior.

## Validation

- `just test -p codex-core sleep_tool_follows_feature_gate`
- `just test -p codex-core any_new_input_interrupts_sleep`
- `just test -p codex-app-server
sleep_emits_started_and_completed_items`

rka-oai · 2026-06-24 17:17:28 -07:00

800529218a

Add a connector declaration snapshot (#29851 )

## Why

Connector declarations currently enter Codex through broad plugin
capability summaries, then MCP setup, turn tooling, and `app/list` each
reconstruct the same information. That makes executor-selected
connectors difficult to add without coupling connector behavior to the
host plugin loader.

This PR introduces a small connector-owned value that later stack layers
can populate before thread startup.

## What changed

- Move the pure app-declaration parser into `codex-connectors`,
preserving declaration order and category cleanup while leaving
host-side validation and deduplication unchanged.
- Add an immutable `ConnectorSnapshot` with ordered connector IDs and
plugin display-name provenance.
- Adapt the existing local-plugin capability summaries into that
snapshot at current consumer boundaries.
- Use the snapshot for MCP tool provenance, turn connector inventory,
and `app/list`.
- Keep the crate API narrow: no test-only snapshot accessors are
exposed.

The externally visible behavior is unchanged. Connector tools still come
from the orchestrator-owned `/ps/mcp` server, and local plugin
enablement remains owned by the existing plugin loader.

## Stack scope

This is the foundation only. It does not read selected executor packages
or change thread startup. #29852 adds the executor-backed declaration
reader, and #29856 composes selected declarations into a thread
snapshot.

jif · 2026-06-24 23:24:01 +01:00

4e0f863df3

[apps] Thread structured icon assets through app list (#29889 )

## Summary

- Add `iconAssets` and `iconDarkAssets` to the app-list protocol.
- Preserve structured icons through directory merging and the connector,
app-
  server, and TUI boundaries.
- Keep legacy logo URLs unchanged as compatibility fallbacks.
- Update generated protocol schemas and TypeScript types.

Drew · 2026-06-24 13:25:44 -07:00

a33ad93996

[codex] Inject agent graph store into ThreadManager (#29736 )

Pick up the AgentGraphStore migration.

- Inject an explicit optional agent graph store into `ThreadManager` 
- Move all calls to spawn, close, recursive resume, and
subtree/archive/delete/feedback traversal through it
- Keep using  `LocalAgentGraphStore` when SQLite is available

This required some changes to the interface to deal with futures:

- The interface now matches `ThreadStore`'s object-safe pattern by
returning a boxed `AgentGraphStoreFuture` directly, allowing
`ThreadManager` to hold `Arc<dyn AgentGraphStore>`

*Slight behavior change!* Unfiltered subtree enumeration now performs a
single all-status breadth-first traversal, so a closed grandchild
beneath an open edge is included; the previous Open-then-Closed
traversals could not cross mixed-status paths and silently omitted it.

Tom · 2026-06-24 13:24:10 -07:00

ece1dfece0

feat(app-server): list descendant threads by ancestor (#29591 )

## Why

`thread/list` can filter direct children with `parentThreadId`, but
clients cannot request an entire spawned subtree. Discovering every
descendant requires repeated client-side requests and gives up the
database's existing filtering and pagination path.

## What changed

Experimental clients can use `ancestorThreadId` to return strict
descendants at any depth while `parentThreadId` retains its direct-child
meaning. The filters are mutually exclusive, the ancestor is excluded,
and every result preserves its immediate `parentThreadId` so callers can
reconstruct the tree.

## How it works

- **Explicit relationship:** Internal list parameters distinguish direct
children from transitive descendants without changing the meaning of
`parentThreadId`.
- **Existing graph:** Persisted parent-child spawn edges remain the
source of truth, so descendant lookup needs no schema migration or
ancestry cache.
- **Indexed traversal:** A recursive SQLite query starts from the
parent-edge index, walks each generation, and applies thread filters,
sorting, and cursor pagination in the same database request.
- **Reconstructable results:** The response stays flat and normally
ordered while carrying each descendant's immediate parent.

## Verification

Ran 550 tests across the protocol, state, rollout, and thread-store
crates, then reran the four focused state, store, and app-server
descendant-listing tests after the final diff reduction. Scoped Clippy
and formatting checks passed. Stable and experimental schema generation
was checked; the stable fixtures remain unchanged while the experimental
schema includes the new field.

Brent Traut · 2026-06-24 13:08:14 -07:00

8057603d0c

[codex] show external import result counts (#29567 )

## What changed

- Show per-type import counts in the `/import` review UI and started
message.
- Render completion results as a multi-line summary with total
imported/failed counts and one row per import type.
- Add snapshot coverage for the updated review and completion output.

<img width="537" height="322" alt="Screenshot 2026-06-23 at 9 41 20 PM"
src="https://github.com/user-attachments/assets/166542eb-2097-4b2b-8130-8f6fd8c680ce"
/>


## Why

The TUI previously only reported that Claude Code import started or
finished. Users could not see how many items of each type were selected
or how many actually imported versus failed.

charlesgong-openai · 2026-06-24 08:56:57 -07:00

3694b48a82

test: use automatic environments in app-server integration tests (#29789 )

## Why

Topology-neutral app-server integration tests should exercise automatic
environment selection so the same setup covers local and remote
executors.

## What

Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
`send_thread_start_request_with_auto_env()`. Leave explicit-topology
tests unchanged, and skip the request-permissions case on Windows with a
TODO for cross-platform tool routing.

## Validation

- `just test -p codex-app-server`
- `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
--test_output=errors`

Stacked on #29788.

Adam Perry @ OpenAI · 2026-06-23 22:48:06 -07:00

c2b3e3b4f5

test: run app-server integration tests under Wine (#29788 )

## Why

Made a mistake when carving #29746 out of my local changes and the test
was missing from the build graph. Oops!

## What

Enable the app-server Wine exec test target. Remove the `manual` tag
from generated Wine-exec test variants so wildcard Bazel test
invocations select them. Refactor the smoke test to ensure it passes
with current Windows support.

Adam Perry @ OpenAI · 2026-06-24 05:23:29 +00:00

b17f30eb2a

connectors: own app metadata types (#29723 )

## Why

Connector metadata is consumed by connector discovery, ChatGPT
integration, core, and TUI code. Treating app-server's wire DTO as the
shared domain model reverses the intended dependency direction.

## What changed

- Added connector-owned app branding, review, screenshot, metadata, and
info types.
- Added explicit conversions in app-server and TUI while preserving
app-server's wire payloads.
- Removed production app-server-protocol dependencies from connectors
and ChatGPT connector code.

## Stack

This is PR 4 of 6, stacked on [PR
#29722](https://github.com/openai/codex/pull/29722). Review only the
delta from `codex/split-config-layer-types`. Next: [PR
#29724](https://github.com/openai/codex/pull/29724).

## Validation

- Connector and tools coverage passed.
- App-server app-list coverage passed: 13 tests.

Adam Perry @ OpenAI · 2026-06-23 22:08:23 -07:00

e639e8c4bd

config: own layer provenance types (#29722 )

## Why

Config layer provenance describes how effective configuration was
assembled, so it belongs with the config loader rather than in
app-server's serialized API types.

## What changed

- Moved `ConfigLayerSource`, `ConfigLayerMetadata`, and `ConfigLayer`
ownership into `codex-config`.
- Kept app-server's wire payloads unchanged and added explicit
conversions at the app boundary.
- Removed lower-level app-server-protocol dependencies from config
consumers.

## Stack

This is PR 3 of 6, stacked on [PR
#29721](https://github.com/openai/codex/pull/29721). Review only the
delta from `codex/split-auth-domain-types`. Next: [PR
#29723](https://github.com/openai/codex/pull/29723).

## Validation

- `codex-config` coverage passed.
- App-server config-manager and config RPC coverage passed.

Adam Perry @ OpenAI · 2026-06-24 04:03:04 +00:00

1d65ccabd5

[plugins] Enforce marketplace source admission requirements (#29753 )

## Why

Managed marketplace source requirements only become effective when every
local marketplace mutation path applies the same admission decision.
This change centralizes that decision so CLI, app-server, and
external-agent migration flows cannot add, install from, or refresh a
disallowed source.

## What changed

- Match exact normalized Git repository URLs with an optional exact
`ref`.
- Match Git hosts with managed regular expressions.
- Match local marketplaces by exact absolute path.
- Preserve the expected path/name boundary for managed OpenAI
marketplaces.
- Enforce source admission during marketplace add, plugin install, and
configured Git marketplace upgrade.
- Continue upgrading independent marketplaces when one source is
rejected and return a per-marketplace error.
- Load the effective requirements stack at CLI, app-server, and
external-agent migration entry points.

This PR does not filter already configured marketplaces at runtime; that
remains in draft follow-up #29691.

## Stack

This is PR 2 of 3 and is based on #29690, which introduces the
requirements data shape and merge behavior.

## Test plan

- Source matcher coverage for Git URL/ref, host-pattern, local-path, and
managed marketplace cases.
- Marketplace add and plugin install coverage for allowed and rejected
sources.
- Marketplace upgrade coverage for rejection and per-marketplace
continuation.

xl-openai · 2026-06-23 20:13:11 -07:00

4fe02f4fcf

auth: move domain mode below app wire types (#29721 )

## Why

Authentication mode is a domain concept used by login, model selection,
telemetry, and transports. Keeping the canonical type in app-server
protocol forces those lower-level crates to depend on an unrelated wire
API.

## What changed

- Added canonical `codex_protocol::auth::AuthMode` domain values.
- Kept the app-server wire DTO unchanged and added an explicit app-side
conversion.
- Removed production app-server-protocol dependencies from login,
model-provider-info, models-manager, and otel call paths.

## Stack

This is PR 2 of 6, stacked on [PR
#29714](https://github.com/openai/codex/pull/29714). Review only the
delta from `codex/split-json-rpc-protocols`. Next: [PR
#29722](https://github.com/openai/codex/pull/29722).

## Validation

- Auth and login coverage passed in the focused protocol/domain test
run.
- App-server account and auth conversion coverage passed.

Adam Perry @ OpenAI · 2026-06-24 03:10:20 +00:00

31372078d1

[codex] Ignore local curated plugins when remote catalog is active (#29765 )

## Summary

- suppress configured `openai-curated` plugins when the remote plugin
feature is enabled and auth uses the Codex backend
- preserve `openai-api-curated` and non-Codex-backend behavior while
including remote catalog activation in the plugin load cache key
- add core plugin coverage and an app-server integration test for
runtime feature enablement

## Why

The Codex app enables remote plugins through process-local runtime
feature enablement, which can happen after app-server startup tasks have
already observed legacy local plugin state. The existing conflict logic
only preferred a remote plugin when the same plugin was already
installed remotely, so a configured legacy-only plugin could continue
exposing skills and other capabilities from `openai-curated`.

## Impact

When the remote catalog is active, legacy `openai-curated` plugins no
longer contribute skills, MCP servers, apps, or hooks. Remote installed
plugins continue to load normally, and `openai-api-curated` remains
unaffected. This does not change remote fetch, bundle sync, or uninstall
behavior.

## Validation

- `just test -p codex-core-plugins
remote_global_catalog_ignores_local_curated_plugins
remote_plugin_feature_keeps_local_curated_without_codex_backend`
- `just test -p codex-app-server
runtime_remote_plugin_enablement_excludes_local_curated_plugin_skills`
- `just fmt`
- `git diff --check`

xl-openai · 2026-06-23 19:51:31 -07:00

ff78e21215

Let image generation extension hosts control output persistence (#29711 )

## Why

Some extension hosts need generated images returned without writing them
to the local filesystem or giving the model a local path.

## What changed

**tl;dr**: we now conduct all extension operations in the image gen
extension

- Let hosts provide an optional image save root when installing the
extension.
- Save images and return path hints only when a save root is configured.
- Return image data without saving or adding a path hint when no save
root is configured.
- Preserve the extension-provided `saved_path` instead of persisting
extension images again in core.
- Leave built-in image generation unchanged.

## Validation

- `just test -p codex-image-generation-extension`
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`
- `just test -p codex-core
extension_tool_uses_granted_turn_permissions_without_local_persistence`
- `just test -p codex-core tools::handlers::extension_tools::tests`
- tested on CODEX CLI on both save_root: CODEX_HOME and None 
- tested on CODEX APP on both as well

Won Park · 2026-06-23 18:51:49 -07:00

61f5a84930

test: add app-server auto environment helper (#29746 )

## Why

Start moving towards app-server tests defaulting to running against
remote & foreign OS executors. To do so we need a point of indirection
similar to core integration tests' `build_with_auto_env`, but with the
flexibility of letting tests control environment registration if they
need to.

## What

This adds:

- `TestAppServer::new_with_auto_env()` for constructing an app server
with a default environment defined by the test runner (e.g. bazel)
- `TestAppServer::auto_env_params()` for tests to easily acquire turn
env params tailored to the automatic environment
- `TestAppServer::send_thread_start_request_with_auto_env()` to make it
easy for tests to start a thread using the automatic environment

The above methods all fail if the test calling them has set up an
environment where the automatic environment configuration conflicts with
test-created state.

## Validation

Adds a couple of basic smoke tests to the app-server test suite.
Follow-ups will migrate more tests to use it.

Adam Perry @ OpenAI · 2026-06-24 01:06:29 +00:00

283bc4cf01

Support thread-level originator overrides (#29477 )

## Why

Work(TPP) threads can be launched from the Desktop app, but if they all
keep the Desktop app's default originator then downstream attribution
cannot distinguish local Work launches from cloud-backed Work launches.
`thread/start.serviceName` already carries that launch signal, while
`SessionMeta.originator` is the durable thread-level value that survives
resume and fork.

This change converts the Desktop Work service names into an effective
originator at thread creation time, persists that originator with the
thread, and keeps using it for later model requests and memory writes.

## What changed

- Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
per-thread originators, while preserving
`CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
- Persist the effective originator in `SessionMeta.originator`, read it
back on resume/fork, and inherit the parent originator for subagent
spawns when there is no persisted session metadata.
- Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
back to the live parent originator when the forked history no longer
includes `SessionMeta`.
- Thread the per-thread originator through Responses headers,
websocket/compaction request paths, thread-store creation, rollout
metadata, and memory stage-one telemetry.

## Verification

- `just test -p codex-core
agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
thread_manager::tests::originator_override_precedes_service_name_remapping`
- `just test -p codex-core
agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
- `just test -p codex-memories-write`
- `just fix -p codex-core -p codex-memories-write`
- `git diff --check`

alexsong-oai · 2026-06-23 17:23:38 -07:00

1acb722e8a

[codex] rename rollout budget error to session budget error (#29744 )

## Summary

- rename the rollout-budget exhaustion error from
`RolloutBudgetExceeded` to `SessionBudgetExceeded`
- expose the matching app-server v2 wire value as
`sessionBudgetExceeded`
- regenerate JSON/TypeScript schema fixtures and update the app-server
docs and focused tests

This is a naming-only follow-up to #29715 based on [Pavel's review
suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
Runtime behavior is unchanged.

## Tests

- `just test -p codex-core rollout_budget`
- `just test -p codex-app-server-protocol`
- `just fmt`
- `just write-app-server-schema`

rka-oai · 2026-06-23 16:49:13 -07:00

1ec3def0b5

[codex] surface rollout budget exhaustion (#29715 )

## Summary
- surface shared rollout-budget exhaustion as
`CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
- map it through the existing `CodexErrorInfo` and app-server v2
`codexErrorInfo` path
- keep local compaction from retrying after the shared rollout budget is
exhausted

This gives app-server clients a stable `rolloutBudgetExceeded` error
they can classify without guessing from `status="interrupted"`.

## Tests
- `just test -p codex-core rollout_budget`

rka-oai · 2026-06-23 15:01:28 -07:00

bbbea91960

Make selected plugin roots URI-native (#28918 )

## Why

Selected capability roots belong to the executor filesystem, not the
app-server host. Converting their path strings into the host's native
`Path` breaks whenever the two machines use different path conventions,
such as a Windows executor behind a Unix app-server.

This PR establishes `PathUri` as the selected-plugin boundary so the
executor remains authoritative for its paths.

## What changed

- Require `selectedCapabilityRoots[].location.path` to be a canonical
`file:` URI and deserialize it directly as `PathUri`; native path
strings are rejected.
- Update the app-server schema, generated TypeScript, examples, and
request coverage for the URI contract.
- Keep selected roots, resolved plugin locations, manifest paths, and
manifest resources as `PathUri`.
- Inspect and read plugin roots and manifests only through the selected
environment's `ExecutorFileSystem`.
- Parse executor manifests with the shared URI-native parser from #29620
instead of projecting them onto the host filesystem.
- Enforce resource containment lexically and preserve the root URI's
POSIX or Windows path convention.
- Cover foreign Windows plugin roots and URI-native manifest resources.

```text
thread/start
  selectedCapabilityRoots[].location.path = "file:///C:/plugins/demo"
                              | PathUri
                              v
                    ExecutorFileSystem
                              |
                              +--> plugin.json
                              +--> manifest resources
```

This PR stops at the shared selected-plugin representation. The next two
PRs remove the remaining host-path projections in the skill and MCP
consumers.

## Stack

1. #29614 — add lexical `PathUri` containment.
2. #29620 — share URI-native manifest path resolution.
3. **This PR** — keep selected plugin roots and resources URI-native.
4. #29626 — load executor skills without host path conversion.
5. #29628 — resolve executor MCP working directories without host path
conversion.

jif · 2026-06-23 22:51:19 +01:00

2e69966cd8

1446 Commits