codex

[codex] Treat max as a first-class reasoning effort (#30467 )

## Why

The Bedrock GPT-5.6 catalog advertises `max`, but Codex treated it as an
opaque custom effort. That made the reasoning picker render it as
lowercase `max` while known efforts use productized labels.

Making `max` a known effort aligns catalog data, parsing, and UI
presentation without changing the `max` wire value or persisted
representation.

## What changed

- Add first-class `ReasoningEffort::Max` parsing and serialization.
- Use the typed effort in the Bedrock catalog and render it as `Max` in
the TUI.
- Preserve forward-compatible custom-effort coverage with a genuinely
unknown `future` value.

### Before
<img width="559" height="124" alt="Screenshot 2026-06-28 at 12 08 47 PM"
src="https://github.com/user-attachments/assets/7c43cf4f-020b-4605-9239-0a9c97eb7364"
/>

### After
<img width="558" height="107" alt="Screenshot 2026-06-28 at 12 09 10 PM"
src="https://github.com/user-attachments/assets/b9cc5ded-c940-43b4-b024-bba25abe0a17"
/>

Shijie Rao · 2026-06-29 09:38:49 -07:00

80f54d1266

[codex] Restore v1 delegation guidance (#30511 )

## Summary

- restore the v1 clarification that requests for depth, research, or
investigation do not authorize subagent spawning
- restore guidance for keeping critical-path, urgent, tightly coupled,
or difficult work local
- update the focused v1 tool-search and spawn-description coverage

## Why

PR #27919 simplified the v1 `spawn_agent` prompt by removing its
delegation decision guidance. That left the authorization rule intact,
but removed the instructions that constrained what should be delegated
after spawning was authorized.

Restore those guardrails while preserving later support for explicit
delegation authorization from applicable AGENTS.md and skill
instructions. Multi-agent v2 prompts are unchanged.

## User impact

Models using the v1 multi-agent tool surface receive clearer guidance to
delegate independent side work while keeping blocking work on the main
rollout.

## Validation

- `just fmt`
- `git diff --check`
- tests not run locally per repository guidance; CI will validate the
focused coverage

Ahmed Ibrahim · 2026-06-28 20:34:47 -07:00

8dac605901

[codex] Use model metadata for skills usage instructions (#29740 )

## Summary

- add a false-by-default `include_skills_usage_instructions` model
metadata field
- enable the field for the bundled `gpt-5.5` model metadata
- consume the metadata in both core and extension skill rendering
- remove hardcoded legacy-model matching and its marker plumbing

ani-oai · 2026-06-29 09:44:36 +09:00

6b5f5743b3

core: stabilize synthesized call output IDs (#30327 )

## Why

Response item IDs represent stable conversation identity.
`ContextManager::for_prompt` repairs an unmatched call by synthesizing
an `"aborted"` output in the disposable prompt projection, but that
output previously had no ID. Assigning a fresh ID on every prompt build
would make retries and resumes change otherwise identical model context
and reduce prompt-cache reuse.

The concrete bug is that these normalization-created outputs bypass the
regular item-ID allocation path. Even with item IDs enabled, a prompt
could therefore contain an identified call paired with a synthetic
output whose `id` was missing. This change closes that gap by deriving
the output ID from the source call's item ID. For legacy calls that have
no item ID, the output remains ID-less because there is no stable source
identity to derive from.

The originating call already has a stable item ID under the item-ID
model introduced in #28814. A prompt-only output can therefore derive
stable identity from that call without mutating canonical history or
persisted rollouts. This addresses the failure exposed by #30311 while
keeping normalization read-only outside its detached prompt snapshot.

UUIDv5 is intentional here because it is the standard namespaced,
deterministic UUID construction. Using the output kind and source call
ID as the name produces the same UUID on every projection while keeping
output kinds in separate name domains. UUIDv7 would introduce randomness
and time, so keeping it stable would require persisting the synthetic
repair. UUIDv5 uses SHA-1 internally, but this is only an identity
mapping—not an authenticity or security boundary.

## What changed

- Derive a deterministic UUIDv5 ID for each synthesized call output from
the source call item ID.
- Use the Responses API prefix appropriate for function, custom-tool,
tool-search, and local-shell outputs.
- Preserve the existing insertion position immediately after the
unmatched call.
- Keep synthesized outputs prompt-only; no rollout, task-lifecycle,
compaction, or raw-response behavior changes.

## Testing

- `just test -p codex-core
for_prompt_assigns_stable_id_to_synthetic_output_without_reordering_history`
- `just test -p codex-core
synthetic_call_output_id_is_stable_across_resumes`
- `just test -p codex-core normalize_adds_missing_output`
- `just test -p codex-core response_item_ids`

Michael Bolin · 2026-06-27 10:47:54 -07:00

d2885dc3cd

Preserve namespaces on custom tool calls (#30302 )

## Summary

- Preserve the optional namespace on custom tool calls during response
deserialization and app-server replay.
- Use the namespaced tool identifier for streaming argument handling and
tool dispatch.
- Regenerate app-server protocol schemas.
- Add regression tests covering namespace serialization and routing.

## Testing

- Ran affected protocol and app-server test suites.
- Ran the full core test suite; two load-sensitive timing tests passed
when rerun individually.
- Ran Clippy and formatting checks.
- Verified with a local end-to-end app-server replay that the namespace
is preserved through the complete request/response flow.

nhamidi-oai · 2026-06-27 09:54:56 -07:00

328e95110c

[codex] consume pushed exec-server process events (#30273 )

## Summary

- complete unified-exec processes from the ordered event stream instead
of issuing a final zero-wait `process/read`
- add optional executor sandbox-denial state to `process/exited`
- retain `process/read` as a retained-output and compatibility fallback
for receiver lag, sequence gaps, and legacy servers
- recover sandbox-denial state across transport reconnection
- cover the real `TestCodex` remote-exec path without adding a public
test-only event constructor

## Why

A successful one-shot tool call currently receives its output and
terminal notifications, then pays another wide-area `process/read` round
trip before returning. Staging traces showed that remote response wait
accounted for more than 99.8% of RPC time; local serialization,
queueing, and deserialization were below 0.6 ms.

## Measured impact

A direct staging A/B used the same build and route and changed only
completion mode. Each arm ran three times with 30 one-shot
`/usr/bin/true` calls per run. The table reports the median of the three
per-run percentiles.

| Metric | Final `process/read` | Pushed events | Change |
| --- | ---: | ---: | ---: |
| End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
| End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
| Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
| Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |

TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
successful, complete, in-order event path issued zero final
`process/read` calls.

## Compatibility and recovery

- new servers send `sandboxDenied` on `process/exited`
- legacy servers omit it, which triggers one compatibility
`process/read`
- broadcast lag or a sequence gap triggers a retained-output read
- recovery remains bounded by the server's existing 1 MiB
retained-output window
- complete, in-order event streams issue no completion read
- sandbox denial is attached to the exit event before consumers can
observe process completion
- server-first and client-first rollouts remain wire-compatible;
server-first realizes the latency win immediately

## Integration coverage

The `TestCodex` suite exercises four distinct remote-exec contracts:

- complete pushed output/exit/close with zero reads
- direct pushed sandbox denial with zero reads
- legacy missing denial metadata with exactly one compatibility read
- count-bounded replay eviction recovered from retained output without
duplication

## Validation

- `just test -p codex-core
exec_command_consumes_pushed_remote_process_events`: 4 passed
- `just test -p codex-core unified_exec::process_tests::`: 4 passed
- `just test -p codex-exec-server`: 294 passed, 2 skipped
- `just test -p codex-exec-server-protocol`: 5 passed
- `just test -p codex-rmcp-client`: 89 passed, 2 skipped
- focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
- scoped `just fix` passed for core and exec-server
- `just fmt` passed

The complete workspace suite was not rerun; focused Cargo and Bazel
coverage passed for the changed behavior.

richardopenai · 2026-06-26 18:05:52 -07:00

d4ec08b8f0

[codex] allow AGENTS.md and skills to authorize delegation (#30274 )

Prompt update of MAv2 to include agents.md and skills more explicitly

should mimic: https://github.com/openai/codex/pull/27919

Charles Du · 2026-06-26 12:17:26 -07:00

79a8ffdbf7

feat(app-server): add history_mode to thread (#29927 )

## Description

This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
This will be stored in `SessionMeta` in the JSONL rollout file and as a
new column in the SQLite thread_metadata table, and exposed on
`thread/start` and on the `Thread` object in app-server.

## What changed

- Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
defaulting old and new SessionMeta to `legacy`.
- Carried `history_mode` through core session config, ThreadStore stored
metadata, local/in-memory stores, rollout metadata extraction, and the
existing SQLite `threads` table.
- Added experimental `historyMode` to app-server v2 `Thread` and
`thread/start`.
- Made paginated stored threads metadata-discoverable but unsupported
for legacy full-history reads, `load_history`, live resume, and create
paths.
- Regenerated app-server schema fixtures and added
protocol/state/thread-store/app-server coverage for persistence and
fail-closed behavior.

## Compatibility floor
Because users may be running various versions of Codex binaries on the
same machine (TUI, Codex App, etc.), we will need to establish a
compatibility floor for upcoming paginated threads, which will change
how thread storage reads and writes work.

The overall plan here:
```
Release N:
- Add historyMode to SessionMeta / Thread / SQLite metadata.
- Teach binaries to understand paginated threads.
- If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
- Default remains `"legacy"`.

Release N+1:
- First-party clients start opting into paginated threads where appropriate.
- Internal dogfood / staged rollout.
- Measure old-client usage and paginated-thread unsupported errors.

Release N+2:
- Only after Release N+ is overwhelmingly deployed, make paginated the default.
- Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
```

The important behavior change is fail-closed handling for a binary that
encounters a persisted `paginated` thread before it knows how to fully
support paginated history. In app-server, if a thread is `paginated`, we
will:

- allow metadata-only discovery paths like `thread/list` and
`thread/read(includeTurns=false)`, so clients can still see the thread
and inspect its `historyMode`
- reject legacy full-history/live-thread paths like
`thread/read(includeTurns=true)` and `thread/resume` with an unsupported
JSON-RPC error
- avoid silently treating an unknown or future `historyMode` as `legacy`

Under the hood, the ThreadStore layer also rejects legacy operations
that would need to load or replay the full thread history for a
paginated thread. That gives us the behavior we want for Release N:
future paginated threads are visible, but this binary fails closed
instead of trying to operate on them as if they were legacy threads.

Owen Lin · 2026-06-26 09:12:42 -07:00

5267e805fb

[codex] wire process-owned code mode host into core (#30142 )

## Summary

- add the `code_mode_host` feature flag and select
`ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
- initialize code-mode sessions lazily so a missing host reports a tool
error without failing thread startup
- resolve `codex-code-mode-host` beside the running Codex binary by
default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
- add unit and end-to-end coverage for host resolution and graceful
missing-host behavior

## Why

This wires the process-owned session client from #30112 into the core
service behind an opt-in rollout gate. Packaged Codex installations can
place the helper in the same `bin` directory as the main executable
without relying on `PATH`, while development and custom installations
can continue to override the helper path.

## Stack

- Depends on #30112
- Base branch: `cconger/process-owned-session-runtime-4-client`

## Validation

Build `codex` and `codex-code-mode-host`
`CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
./target/debug/codex --enable code_mode_host`

Channing Conger · 2026-06-26 00:23:33 -07:00

7d8906b478

Retry failed Codex Apps MCP startup (#29920 )

## Problem

The built-in Codex Apps MCP client shares a future for the full startup
operation: connect, complete `initialize`, fetch the initial tools, and
return a usable client. Sharing deduplicates startup work, but it also
memoizes terminal errors.

After a transient connection, handshake, or initial `tools/list`
failure, later tool builds observe the same failed future. The thread
cannot reconnect after the backend recovers and continues serving its
startup-time cached tool snapshot, which may be empty or stale.

## Fix

When Apps MCP startup ends in an error, Codex starts bounded recovery
without putting startup latency on tool-router construction:

1. The current tool build immediately continues with the cached startup
snapshot.
2. After the initial failure is reported, Codex starts one fresh full
startup attempt in the background.
3. Concurrent tool builds share that in-flight attempt and also continue
with cached tools.
4. On success, the recovered client becomes active, refreshes the Apps
tools cache, emits a `Ready` startup status, and is reused by later
operations.
5. On failure, the cache remains unchanged and later tool builds may
start another background attempt after exponential cooldown: 1s, 2s, 4s,
8s, 16s, then 30s maximum.

Each recreated startup performs a fresh MCP `initialize` and uncached
`tools/list`. The MCP client retains its existing bounded retries for
retryable `initialize` and `tools/list` failures.

This avoids adding the Apps startup timeout to every request during a
sustained outage.

## Scope

This is limited to the built-in Codex Apps MCP client:

- no reconnects for user-configured MCP servers;
- no cache deletion; and
- no proactive refresh for a healthy client with stale tools.

## Tests

Coverage verifies:

- tool builds return cached tools without waiting for a blocked
reconnect;
- concurrent tool builds start only one background reconnect;
- failed reconnects preserve cached tools and respect exponential
cooldown;
- a recovered client is retained and reused; and
- a long-lived thread exposes recovered app tools on a later follow-up.

Validation:

- `just test -p codex-mcp` — 95 passed
- `just test -p codex-core
later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
--no-capture` — passed
- `just fix -p codex-mcp`
- `just fmt`

kbazzi · 2026-06-25 21:31:12 -07:00

92d2e1df70

[codex] allow CCA image generation and web search extensions (#29909 )

## Summary

- allow the standalone image-generation and web-search extensions for
the actor-authorized provider shape used by CCA
- preserve builtin `image_generation` and `web_search` for older models
and existing flows
- keep ordinary non-OpenAI providers excluded from both extensions
- remove only the image extension local managed-AuthManager requirement
that CCA cannot satisfy
- share actor-authorization detection through `ModelProviderInfo`
- keep Core tests focused on routing behavior and cover header-shape
edge cases in `model-provider-info`
- add a Responses Lite regression that verifies both
`image_gen.imagegen` and `web.run`

## Why

CCA uses a provider named `local` with `requires_openai_auth: false` and
a non-empty `x-openai-actor-authorization` header. Core accepts that
provider shape, but both extension provider-name gates rejected it;
image generation additionally required a Codex-managed login.

The standalone paths must coexist with existing builtin tools. New
Responses Lite models can receive `image_gen.imagegen` and `web.run`,
while older models continue using builtin tools.

## Impact

This enables both standalone extensions for CCA once installed
downstream, without removing or changing builtin-tool compatibility for
older models.

## Validation

- `just test -p codex-core
responses_lite_exposes_standalone_tools_for_actor_authorized_provider`
- `just test -p codex-core
responses_lite_uses_standalone_web_search_and_image_generation`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-image-generation-extension`
- `just test -p codex-web-search-extension`
- `just test -p codex-model-provider-info`
- `just fmt`
- `git diff --check`

Won Park · 2026-06-25 18:34:35 -07:00

0d4351c1b8

Expose MCP app identity in app context (#29934 )

## Why

MCP tool-call events need to expose trusted app identity and action
metadata directly so v2 clients do not have to infer it from tool names
or resource URIs.

## What changed

- Add optional `appName`, `templateId`, and `actionName` fields to MCP
tool-call `appContext`.
- Populate `appName` and `templateId` from trusted Codex Apps metadata,
and derive `actionName` from the trusted app resource metadata.
- Preserve all three fields through core events, legacy protocol events,
persisted thread history, resume redaction, and app-server v2 responses.
- Document the public `appContext` fields in
`codex-rs/app-server/README.md`.
- Regenerate app-server JSON and TypeScript schemas and add coverage for
serialization, persistence, redaction, and metadata propagation.

## Validation

- `just test -p codex-app-server-protocol mcp_tool_call`
- `just test -p codex-core
mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
mcp_tool_call_item_includes_app_identity`
- `just write-app-server-schema`

---------

Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>

Martin Au-Yeung · 2026-06-25 18:31:10 -07:00

ec300bc7bd

Pin MCP runtimes to model steps (#30101 )

## Why

An MCP refresh can replace the session's current manager while a model
step is still running. The step must execute calls through the same
manager whose tools it advertised.

## Boundary

```text
current session MCP runtime
          |
          | capture once for this model step
          v
StepContext.mcp
  - exact MCP config
  - exact connection manager
  - exact runtime environment context
```

```rust
pub struct McpRuntimeSnapshot {
    config: Arc<McpConfig>,
    manager: Arc<McpConnectionManager>,
    runtime_context: McpRuntimeContext,
}
```

## Example

```text
step A captures runtime A and advertises A's tools
refresh publishes runtime B
step A tool call -> runtime A
next step        -> runtime B
```

Capturing the snapshot is only an `Arc` clone. It does not restart MCPs
or make an RPC.

## What changes

- Captures one MCP runtime in `StepContext`.
- Uses it for tool planning, tool calls, resources, approvals, connector
attribution, and elicitation.
- Publishes replacement runtimes atomically.
- Lets an old runtime live only while an in-flight step or request still
holds its `Arc`.

Most of this diff is mechanical routing from the session-global manager
to `step_context.mcp`; it does not introduce selected-plugin discovery
yet.

## What does not change

- No plugin or extension migration.
- No new MCP cache policy.
- No environment file watching.
- No client sharing between separate managers.

## Stack

1. Extension-owned World State sections.
2. Project executor skills through World State.
3. **This PR:** pin one MCP runtime to each model step.
4. Project selected MCP/app/connector metadata by environment
availability.
5. One end-to-end integration scenario.

jif · 2026-06-26 00:53:07 +01:00

ee9e0f6387

[codex] impl delivery_mode: current time reminders on response boundaries (#30033 )

## Summary
- track user-like input and tool-output boundaries in current-time
reminder state
- gate reminder injection when delivery_mode is
after_user_or_tool_output
- preserve interval debounce and forced reminders after context-window
changes

## Why
Training can request reminders only after user or tool-output items
while keeping the existing canonical pre-inference history-injection
path.

## Validation
- just test -p codex-core
current_time_reminders_can_follow_only_user_or_tool_outputs
- just test -p codex-core
current_time_reminders_follow_time_interval_and_persist_in_history
- just test -p codex-core
current_time_reminder_is_refreshed_after_compaction
- just fix -p codex-core

rka-oai · 2026-06-25 19:28:50 +00:00

adccb464d0

core: expose permission profile to shell tools (#29941 )

## tl;dr

Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
of the current permission profile when invoking a shell tool.

## Why

Shell tool owners may need to launch nested commands under the same
named permission profile, including through `codex sandbox -P PROFILE
--include-managed-config`. Until now, child processes could observe
sandbox and network metadata but could not identify the active named
permission profile.

The `--include-managed-config` flag is essential when a helper
reconstructs the sandbox from a profile name: it ensures the nested
sandbox also loads managed enterprise requirements. Without it, using
the inherited profile could unintentionally create a sandbox that does
not enforce the organization's managed restrictions.

The new environment value is intentionally informational and **must not
be treated as trusted input**. Any process in the ancestry can overwrite
an environment variable, so a consumer that passes this value to `codex
sandbox -P` must first validate it against the profiles that helper is
authorized to use.

## Example Use Case

Suppose an organization provides a trusted `remote-bash` wrapper that
lets Codex run a command on an approved build host. The local shell
command uses the named `:workspace` permission profile:

```toml
default_permissions = ":workspace"
```

The command exposed to the model is a small zsh wrapper. It deliberately
delegates with `exec`, preserving the original arguments and process
environment:

```zsh
#!/usr/bin/env zsh
exec /opt/codex-tools/remote_bash.py "$@"
```

The model invokes the public wrapper, not its Python implementation:

```sh
/opt/codex-tools/remote-bash \
  --host builder.example.com \
  -- printf '%s' 'hello world'
```

Only the inner implementation is authorized to escape the local sandbox:

```starlark
prefix_rule(
    pattern=["/opt/codex-tools/remote_bash.py"],
    decision="allow",
)
```

With zsh-fork, execution begins with `remote-bash` inside the
`:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
rule matches `remote_bash.py`, so that inner script is restarted
unsandboxed. The escalated process inherits:

```text
CODEX_PERMISSION_PROFILE=:workspace
```

Inheritance does not make the value trustworthy. `remote_bash.py`
independently allowlists both the remote host and the permission profile
before using either value. In particular, a forged value such as
`:danger-full-access` is rejected before it can reach `codex sandbox
-P`:

```python
import argparse
import os
import shlex
import sys

ALLOWED_HOSTS = {"builder.example.com"}
ALLOWED_PROFILES = {":workspace"}

parser = argparse.ArgumentParser()
parser.add_argument("--host", required=True)
separator = sys.argv.index("--")
args = parser.parse_args(sys.argv[1:separator])
command = sys.argv[separator + 1:]

if args.host not in ALLOWED_HOSTS:
    parser.error("host is not allowlisted")
if not command:
    parser.error("the remote command must not be empty")

profile = os.environ.get("CODEX_PERMISSION_PROFILE")
if not profile:
    raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
if profile not in ALLOWED_PROFILES:
    raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")

remote_command = shlex.join(command)
sandbox_command = shlex.join([
    "codex", "sandbox", "-P", profile,
    "--include-managed-config", "--",
    "bash", "-lc", remote_command,
])
print(shlex.join(["ssh", args.host, sandbox_command]))
```

This builds each command layer as an argument vector and uses
`shlex.join()` at the boundary, rather than interpolating untrusted
shell text. After validation and parsing, the nested command has this
structure:

```text
ssh argv:
  ["ssh", "builder.example.com", SANDBOX_COMMAND]

SANDBOX_COMMAND argv:
  ["codex", "sandbox", "-P", ":workspace",
   "--include-managed-config", "--",
   "bash", "-lc", "printf %s 'hello world'"]

bash -lc payload argv:
  ["printf", "%s", "hello world"]
```

A production implementation could execute that SSH command. The
integration fixture prints it and parses the result back into arguments,
verifying the complete flow:

```text
model invokes outer wrapper
  -> zsh-fork starts wrapper under :workspace
  -> wrapper execs allowlisted Python script
  -> prefix rule restarts Python script unsandboxed
  -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
  -> Python script verifies :workspace is allowlisted
  -> remote command runs codex sandbox -P :workspace
     with --include-managed-config
  -> nested sandbox honors managed enterprise requirements
```

This gives the trusted helper access to resources outside the local
sandbox—such as SSH credentials—while ensuring that it can select only
an explicitly authorized profile and that work on the remote host
remains subject to the organization's managed requirements.

## What changed

- Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
evaluation so the active profile wins over inherited or configured stale
values.
- Apply the variable to both `shell_command` and unified `exec_command`,
including local, zsh-fork, and remote exec-server paths.
- Remove stale values when the session has no active named profile.
- Preserve the current profile value when loading a shell snapshot so a
parent snapshot cannot restore an older profile.

## Testing

- Added classic-shell integration coverage proving an exact prefix rule
can run a `require_escalated` script outside the `:workspace` sandbox
while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
- Added zsh-fork integration coverage in which the model invokes an
outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
unsandboxed, and its printed SSH command reconstructs the inherited
`:workspace` sandbox with `--include-managed-config` while preserving
every argument after `--`.
- The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
validates it against `ALLOWED_PROFILES` before constructing the nested
command.
- Assert that the reconstructed sandbox command includes
`--include-managed-config` so nested use of the inherited profile cannot
bypass managed enterprise requirements.
- Added coverage for overriding and removing stale profile values.
- Verified `shell_command` receives the selected active profile.
- Added shell snapshot coverage using `printenv
CODEX_PERMISSION_PROFILE`.

Michael Bolin · 2026-06-25 19:00:23 +00:00

c65cfeab14

[codex] current time reminder interval to be set to 0 (#30029 )

A zero interval lets callers request a reminder at every
otherwise-eligible inference boundary.

## Validation
- just test -p codex-core load_config_resolves_current_time_reminder

rka-oai · 2026-06-25 18:30:53 +00:00

cc78903379

feat: add provider-aware model fallback to thread start (#29942 )

## Why

Helper threads such as task title generation can request a model ID that
is valid for the default OpenAI provider but unavailable from the active
provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
provider static catalog exposes Bedrock model IDs such as
`openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
404s and can surface a misleading turn error even when the main turn
succeeds.

Clients need an explicit way to ask app-server to resolve an unavailable
helper model to the active provider default. That fallback must remain
limited to providers with an authoritative static catalog so custom or
dynamically discovered model IDs are not rewritten based on an
incomplete catalog.

Fixes #28741.

## What changed

- Add the experimental `allowProviderModelFallback` option to
`thread/start`, defaulting to `false` to preserve existing behavior.
- Thread the option through thread creation and model selection.
- When enabled for a static model manager, preserve requested models
present in the catalog and replace unavailable models with the provider
default.
- Continue preserving explicit model IDs for dynamic model managers
without fetching a catalog solely to validate them.
- Document the new `thread/start` behavior in the app-server API
overview.

## Test
Temporary test-client harness:
```
ThreadStartParams {
    model: Some("gpt-5.4-mini".to_string()),
    allow_provider_model_fallback: true,
    ..Default::default()
}
```
Command:
```
CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
./target/debug/codex-app-server-test-client \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  send-message-v2 --experimental-api ignored
```
Relevant output:
```
> "method": "thread/start",
> "params": {
>   "model": "gpt-5.4-mini",
>   "modelProvider": null,
>   "allowProviderModelFallback": true,
>   ...
> }

< "result": {
<   "model": "openai.gpt-5.5",
<   "modelProvider": "amazon-bedrock",
<   ...
< }
```

Celia Chen · 2026-06-25 18:24:34 +00:00

6d9dbacf1a

Persist selected capability roots and resolve availability per model step (#29856 )

## Why

`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”

The important product assumption is:

> One environment ID always names the same logical executor and stable
contents.

`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.

The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.

## The boundary

```text
persisted thread intent
plugin@1 -> environment "worker"
|
| capture the current step
v
model-step view
unavailable, or
plugin@1 + worker's exact captured ready handle
```

The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.

## What changes

### Persist the stable selection

Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.

Only stable data is persisted: root ID, environment ID, and root path.

### Capture readiness together with the exact handle

The environment snapshot records:

```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None // still starting in this step
```

This prevents readiness and execution from coming from different
registry snapshots.

For example:

```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```

Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.

This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.

### Resolve availability per model step

- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.

Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.

## Example

```text
thread selection: plugin@1 -> environment "worker"

step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```

Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.

## Compatibility

The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.

## Stack

1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.

jif · 2026-06-25 17:49:43 +00:00

8f02973d25

Support OAuth for HTTP MCP servers from selected executor plugins (#28529 )

## Why

#28522 routes selected-plugin HTTP MCP traffic through the owning
executor, but OAuth bootstrap and refresh still used host-local clients.
Executor-only servers therefore cannot complete discovery or login
through the same network boundary as the MCP connection.

## What changed

- adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
contract
- let RMCP own discovery, dynamic registration, PKCE, token exchange,
and refresh
- route auth status, persisted-token startup, and app-server login
through the server runtime while preserving the existing local discovery
path
- add optional `threadId` to `mcpServer/oauth/login` and echo it in the
completion notification
- implement RMCP's redirect policy and 1 MiB OAuth response limit over
executor HTTP
- cover selected-thread OAuth discovery and login through an
executor-only route

Depends on #28522.

jif · 2026-06-25 10:31:17 +01:00

b215961a56

core: reconcile legacy WorldState sections (#29997 )

## Why

Older rollouts can retain model-visible context for a WorldState section
without having a persisted snapshot for that section. Treating the
missing snapshot as definitely absent can duplicate old context or fail
to tell the model that it was replaced or removed.

This provides a generic migration path for sections moving into
WorldState, beginning with AGENTS.md.

Builds on #29810.

## What changed

- distinguish section state that is absent, known from a persisted
snapshot, or unknown because matching legacy context remains in history
- let WorldState sections identify their own legacy fragments while
`ContextManager` owns history reconciliation and baseline persistence
- make AGENTS.md emit one conservative replacement or removal update for
legacy history, then deduplicate from the newly persisted baseline
- preserve existing environment rendering when persisted section data is
missing or malformed

## Testing

- `just test -p codex-core world_state`
- `just test -p codex-core
cold_resume_invalidates_deleted_legacy_agents_md_once -- --exact`

sayan-oai · 2026-06-25 07:03:52 +00:00

ab80d4d484

core: make AGENTS.md react to environment changes (#29810 )

## Why

With deferred executors, a turn can begin before a remote environment
attaches. AGENTS.md discovery previously ran only during session setup,
so instructions from a later environment never reached the model or the
session instruction sources.

WorldState persistence has now landed, so this uses the durable
model-visible baseline directly instead of carrying a temporary
resume/fork compatibility path.

## What

- Add an `AgentsMdManager` in `SessionServices` to own host
instructions, loaded state, and refresh caching.
- When `DeferredExecutor` is enabled, refresh AGENTS.md when attached
environment selections change and freeze the result in the corresponding
`StepContext`.
- Represent AGENTS.md as a persisted WorldState section for every
session, with bounded initial, replacement, and removal updates.
- Remove duplicate AGENTS.md state and rendering from
`SessionConfiguration` and `TurnContext`.
- Build initial context, per-request updates, and compaction context
from the same step-scoped value.
- On resume and fork, compare current instructions with the restored
WorldState baseline and inject a replacement exactly once when they
differ.

Builds on #29833, #29835, and #29837.

## Tests

- Covers a remote environment becoming ready mid-turn, with AGENTS.md
appearing on the next request exactly once and updating canonical
instruction sources.
- Covers full, unchanged, replaced, and removed AGENTS.md WorldState
rendering.
- Covers changed instructions across cold resume and fork without
duplicate reinjection.
- Covers remote-v2 compaction retaining creation-time instructions in
the live session and cold resume appending one replacement when the
source changed.
- Ran focused `codex-core` AGENTS.md, WorldState, and context-update
test suites.

sayan-oai · 2026-06-24 22:57:42 -07:00

f2f80ef442

feat: use run agent task auth for inference (#19051 )

## Stack

This is PR 3 of the simplified HAI single-run-task stack:

- [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
assertion and task-registration primitives, including the shared
run-task helper used by existing Agent Identity JWT auth.
- [#19049](https://github.com/openai/codex/pull/19049)
Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
Agent Identity runtime auth and its single run task.
- [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
provider auth that uses one backend-owned task id for first-party
inference and compaction requests.

[#19054](https://github.com/openai/codex/pull/19054) collapsed out of
the active stack because the simplified design no longer needs a
separate background/control-plane task helper.

## Summary

This PR moves Agent Identity usage into provider auth resolution. That
keeps `AgentAssertion` auth tied to first-party OpenAI provider requests
instead of applying a late session-wide override that could affect
local, custom, Bedrock, API-key, or external-bearer providers.

What changed:

- adds a small `ProviderAuthScope` struct carrying the run auth policy
and session source needed by provider-scoped auth resolution
- lets `Session` opt the existing `ModelClient` into `ChatGptAuth`
policy when `use_agent_identity` is enabled, without adding a second
model-client constructor
- resolves Agent Identity only for first-party OpenAI provider auth
paths
- uses the persisted run task id from the `AgentIdentityAuth` record to
build `AgentAssertion` auth for Responses requests
- routes shared request setup through scoped provider auth so unary
compact requests use the same run-task assertion path as inference turns
- keeps local/custom/Bedrock/env-key/external-bearer provider auth
unchanged
- lets missing run-task state surface through the existing model-request
error path instead of silently falling back to bearer auth

This PR intentionally does not create thread-scoped, target-scoped, or
background-scoped task identities. The run task is the only task Codex
registers in this POC shape.

## Testing

- `just test -p codex-model-provider`
- `just test -p codex-core client::tests::provider_auth_scope_uses`
- `just test -p codex-core remote_compact_uses_agent_identity_assertion`

Adrian · 2026-06-24 22:31:41 -07:00

51864b0b4b

[codex] route sleep through time providers (#29973 )

## Summary

- add a cancellable sleep operation to `TimeProvider`
- route `clock.sleep` through the configured provider
- extend the supported sleep duration to 12 hours
- complete the sleep turn item before propagating provider failures

## Why

This isolates the core clock abstraction needed by external clock
integrations. Existing system and app-server behavior remains wall-clock
based in this PR; the stacked follow-up supplies app-server sleeps from
an external clock.

rka-oai · 2026-06-24 22:17:43 -07:00

f66d793a2d

[codex] Add Ultra reasoning effort (#29899 )

## Why

Ultra should be one user-facing reasoning selection for work that
benefits from both maximum reasoning and proactive multi-agent
delegation. Without it, clients must coordinate maximum reasoning with
the experimental `multiAgentMode` setting, even though the inference
backend still expects its existing `max` effort value.

This change makes reasoning effort the source of truth: clients select
`ultra`, core derives proactive multi-agent behavior when the turn is
eligible for multi-agent V2, and inference requests continue to use the
backend-compatible `max` value.

## What changed

- Add `ultra` as a first-class reasoning effort and preserve
model-catalog ordering when exposing it to clients.
- Convert `ultra` to `max` at the inference request boundary, including
Responses HTTP/WebSocket requests, startup prewarm, compaction, and
memory summarization.
- Derive effective multi-agent mode per turn from effective reasoning
effort:
  - eligible multi-agent V2 + `ultra` → `proactive`
  - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
- V1 or otherwise ineligible sessions → no multi-agent mode instruction
- Keep the derived effective mode in turn context history so successive
turns can emit a developer-message update only when the effective mode
changes.
- Remove selected multi-agent mode from core session configuration, turn
construction, thread settings, resume/fork restoration, and subagent
spawn plumbing. Subagents inherit reasoning effort and derive their own
effective mode.
- Retain the experimental app-server `multiAgentMode` fields for wire
compatibility while marking them deprecated. Request values are accepted
but ignored; compatibility response fields report `explicitRequestOnly`.
- Display Ultra in the TUI using the order supplied by `model/list`.

## Validation

- `just test -p codex-core ultra_reasoning_uses_max_for_requests`
- `just test -p codex-tui model_reasoning_selection_popup`

Shijie Rao · 2026-06-24 20:13:52 -07:00

df1199fddb

[2/3] core: persist world state in rollouts (#29835 )

## Why

`WorldState` currently remembers its model-visible diff baseline only in
memory. That leaves no durable source for restoring the exact baseline
after resume, fork, rollback, or compaction.

This is the second PR in the WorldState persistence stack, built on
#29833 and following #29249. It records durable state transitions; the
next PR will replay them during rollout reconstruction.

## What

- Add a `world_state` rollout item containing either a full snapshot or
an RFC 7386 JSON Merge Patch.
- Persist a full snapshot after initial context and after compaction
establishes a new context window.
- Persist non-empty patches when later sampling steps or turns advance
the WorldState baseline.
- Write model-visible history before its matching WorldState record, so
an interrupted write can only cause a safe repeated update on replay.
- Preserve WorldState records for full-history forks while excluding
them from thread previews, metadata, and app-server history
materialization.

Older binaries read rollout lines independently, so they skip the
unknown `world_state` records while retaining the rest of the thread.

## Testing

- `just test -p codex-core
snapshot_merge_patch_changes_and_removes_nested_values`
- `just test -p codex-core
world_state_baseline_deduplicates_until_history_is_replaced`
- `just test -p codex-core
deferred_executor_compaction_preserves_then_updates_environment_once`
- `just test -p codex-protocol`
- `just test -p codex-rollout`
- `just test -p codex-state`
- `just test -p codex-thread-store`
- `just test -p codex-app-server-protocol`

sayan-oai · 2026-06-24 20:13:49 -07:00

fa036d39aa

Represent MCP authentication with an enum (#29924 )

## Why

MCP authentication has distinct OAuth and ChatGPT-session flows.
Representing that choice as `use_chatgpt_auth` makes one flow implicit
and allows the configuration model to express the distinction only
through a boolean.

ChatGPT credential forwarding also needs a first-party trust boundary. A
configurable `chatgpt_base_url` controls routing, but must not grant an
MCP server permission to receive session credentials.

This change builds on #29733, where the boolean was introduced.

## What changed

- Replace `use_chatgpt_auth` with an `auth` field backed by the
exhaustive `McpServerAuth` enum.
- Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
the default.
- Trust only the origin derived from the existing hardcoded
`CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
- Keep configured bearer tokens and authorization headers ahead of the
selected authentication flow.
- Update config writers, schema output, fixtures, and integration-test
setup to use the enum.

## Verification

Integration coverage exercises the complete streamable HTTP startup path
in two independent configurations:

- A directly constructed MCP configuration verifies that matching an
overridden `chatgpt_base_url` does not grant ChatGPT auth.
- A persisted `config.toml` containing an attacker-controlled
`chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
through normal config parsing.

Both tests complete MCP initialization and tool listing and assert that
the full captured request sequence contains no authorization headers.
Separate integration coverage verifies that configured authorization
takes precedence over ChatGPT auth.

Ahmed Ibrahim · 2026-06-24 19:51:51 -07:00

f8937b7d86

Allow ChatGPT-hosted MCP servers to use session auth (#29733 )

## Why

ChatGPT session authentication was inferred from the reserved Codex Apps
server name. That couples credential routing to Codex Apps-specific
behavior and prevents other MCP endpoints hosted by ChatGPT from
explicitly using the current session.

The opt-in also needs a clear security boundary: an arbitrary MCP
configuration must not be able to redirect ChatGPT credentials to
another origin.

## What changed

- Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
`false`.
- Honor the setting only when the parsed server URL has the same HTTP(S)
origin as the configured `chatgpt_base_url`; otherwise remove the
capability before startup.
- Resolve bearer tokens and static or environment-backed authorization
headers before selecting authentication, with configured authorization
taking precedence over ChatGPT session auth.
- Enable the setting for the built-in Codex Apps and hosted plugin
runtime endpoints while keeping Codex Apps caching and tool
normalization scoped to the reserved server.
- Persist the setting through MCP config rewrite paths and expose it in
the generated config schema.
- Load the current login state for `codex mcp list` so reported auth
status matches runtime behavior.

## Verification

Core integration coverage exercises the complete streamable HTTP MCP
startup path and verifies that:

- a same-origin opted-in server receives the current ChatGPT access
token;
- an explicitly configured authorization header takes precedence;
- a different-origin server completes MCP initialization and tool
listing without receiving any ChatGPT authorization header.

Ahmed Ibrahim · 2026-06-24 19:21:28 -07:00

4c0706e24a

core: add configurable <context_window_guidance> message (#29936 )

## Why

This PR adds a configurable `<context_window_guidance>` developer
section immediately after `<context_window>`. Harness integrations need
this section to give the model deployment-specific instructions for
preparing for context-window transitions.

## What changed

- Add an optional `features.token_budget.guidance_message` config with a
1,000-byte runtime cap and generated schema support.
- Render configured guidance as a developer `ContextualUserFragment`
wrapped in `<context_window_guidance>` immediately after
`<context_window>`.
- Omit the section when guidance is unset, empty, or whitespace-only.
- Preserve the resolved value in config locks and classify persisted
guidance as contextual developer content.
- Add integration coverage for rendered content and ordering.

Michael Bolin · 2026-06-24 18:03:44 -07:00

f15df624a6

[codex] nest sleep config under current time reminder (#29910 )

## Summary

- move sleep tool enablement from top-level `[features].sleep_tool` to
`[features.current_time_reminder].sleep_tool`
- remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
from resolved current-time configuration
- update config schema, config-lock materialization, and existing sleep
coverage

Stacked on #29907.

rka-oai · 2026-06-24 17:49:00 -07:00

35f5d02464

[codex] namespace sleep under clock (#29907 )

## Summary

- expose the interruptible sleep tool as `clock.sleep` instead of
top-level `sleep`
- keep `clock.curr_time` and `clock.sleep` in the same model-visible
namespace when both features are enabled
- update existing core and app-server integration coverage to issue
namespaced sleep calls

## Why

Sleep is a clock operation. Grouping it with `clock.curr_time` gives the
model a more coherent tool surface without changing the sleep feature
gate or runtime behavior.

## Validation

- `just test -p codex-core sleep_tool_follows_feature_gate`
- `just test -p codex-core any_new_input_interrupts_sleep`
- `just test -p codex-app-server
sleep_emits_started_and_completed_items`

rka-oai · 2026-06-24 17:17:28 -07:00

800529218a

[plugins] Track plugin install requests by ID (#29684 )

Summary
- Emit `codex_plugin_install_requested` when a validated plugin install
request is made, before the user accepts or declines the elicitation.
- Record the exact model-visible plugin ID, remote plugin ID, required
connector IDs, stable suggestion ID, and `endpoint_recommendation` vs
`legacy_discovery` source.
- Keep `suggest_reason` out of telemetry and leave connector-only
install requests unchanged.

Rollout
- Backend/schema dependency:
https://github.com/openai/openai/pull/1065270
- Land the backend PR before this producer starts sending the event.

Validation
- `just test -p codex-analytics` (83 passed)
- `just test -p codex-core request_plugin_install` (17 passed)
- `just fix -p codex-analytics`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`

Alex Daley · 2026-06-24 21:29:11 +00:00

24423f5712

[codex] Inject agent graph store into ThreadManager (#29736 )

Pick up the AgentGraphStore migration.

- Inject an explicit optional agent graph store into `ThreadManager` 
- Move all calls to spawn, close, recursive resume, and
subtree/archive/delete/feedback traversal through it
- Keep using  `LocalAgentGraphStore` when SQLite is available

This required some changes to the interface to deal with futures:

- The interface now matches `ThreadStore`'s object-safe pattern by
returning a boxed `AgentGraphStoreFuture` directly, allowing
`ThreadManager` to hold `Arc<dyn AgentGraphStore>`

*Slight behavior change!* Unfiltered subtree enumeration now performs a
single all-status breadth-first traversal, so a closed grandchild
beneath an open edge is included; the previous Open-then-Closed
traversals could not cross mixed-status paths and silently omitted it.

Tom · 2026-06-24 13:24:10 -07:00

ece1dfece0

[codex] Remove auto-compaction opt-out (#29815 )

## Summary

- remove the default-on `auto_compaction` feature flag and generated
config schema entries
- restore unconditional pre-turn, model-switch/hash, and mid-turn
automatic compaction
- expose `new_context` whenever token-budget tooling is enabled
- remove the disabled-auto-compaction integration coverage introduced by
#28260

## Motivation

Roll back the internal auto-compaction escape hatch added in #28260.
Automatic compaction should no longer be suppressible with `--disable
auto_compaction`; existing manual `/compact` behavior remains unchanged.

## Testing

- `just write-config-schema`
- `just test -p codex-features` — 53 passed
- `just test -p codex-core 'suite::compact::'` — 36 passed
- `just test -p codex-core
suite::token_budget::new_context_tool_starts_new_window_before_follow_up`
— 1 passed
- `just fix -p codex-core -p codex-features`
- `just fmt`
- `just test -p codex-core` — 2,778 passed, 59 failed, 16 skipped;
failures were outside the changed compaction paths and were dominated by
missing first-party test binaries and shell-snapshot timeouts

rhan-oai · 2026-06-24 00:15:04 -07:00

2a320fedb5

test: use automatic environments in app-server integration tests (#29789 )

## Why

Topology-neutral app-server integration tests should exercise automatic
environment selection so the same setup covers local and remote
executors.

## What

Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
`send_thread_start_request_with_auto_env()`. Leave explicit-topology
tests unchanged, and skip the request-permissions case on Windows with a
TODO for cross-platform tool routing.

## Validation

- `just test -p codex-app-server`
- `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
--test_output=errors`

Stacked on #29788.

Adam Perry @ OpenAI · 2026-06-23 22:48:06 -07:00

c2b3e3b4f5

test: run app-server integration tests under Wine (#29788 )

## Why

Made a mistake when carving #29746 out of my local changes and the test
was missing from the build graph. Oops!

## What

Enable the app-server Wine exec test target. Remove the `manual` tag
from generated Wine-exec test variants so wildcard Bazel test
invocations select them. Refactor the smoke test to ensure it passes
with current Windows support.

Adam Perry @ OpenAI · 2026-06-24 05:23:29 +00:00

b17f30eb2a

Let image generation extension hosts control output persistence (#29711 )

## Why

Some extension hosts need generated images returned without writing them
to the local filesystem or giving the model a local path.

## What changed

**tl;dr**: we now conduct all extension operations in the image gen
extension

- Let hosts provide an optional image save root when installing the
extension.
- Save images and return path hints only when a save root is configured.
- Return image data without saving or adding a path hint when no save
root is configured.
- Preserve the extension-provided `saved_path` instead of persisting
extension images again in core.
- Leave built-in image generation unchanged.

## Validation

- `just test -p codex-image-generation-extension`
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`
- `just test -p codex-core
extension_tool_uses_granted_turn_permissions_without_local_persistence`
- `just test -p codex-core tools::handlers::extension_tools::tests`
- tested on CODEX CLI on both save_root: CODEX_HOME and None 
- tested on CODEX APP on both as well

Won Park · 2026-06-23 18:51:49 -07:00

61f5a84930

test: add app-server auto environment helper (#29746 )

## Why

Start moving towards app-server tests defaulting to running against
remote & foreign OS executors. To do so we need a point of indirection
similar to core integration tests' `build_with_auto_env`, but with the
flexibility of letting tests control environment registration if they
need to.

## What

This adds:

- `TestAppServer::new_with_auto_env()` for constructing an app server
with a default environment defined by the test runner (e.g. bazel)
- `TestAppServer::auto_env_params()` for tests to easily acquire turn
env params tailored to the automatic environment
- `TestAppServer::send_thread_start_request_with_auto_env()` to make it
easy for tests to start a thread using the automatic environment

The above methods all fail if the test calling them has set up an
environment where the automatic environment configuration conflicts with
test-created state.

## Validation

Adds a couple of basic smoke tests to the app-server test suite.
Follow-ups will migrate more tests to use it.

Adam Perry @ OpenAI · 2026-06-24 01:06:29 +00:00

283bc4cf01

core: add wait_for_environment for starting environments (#29745 )

## Why

With `DeferredExecutor`, a sampling request can begin while an
environment is still starting. The model can see that pending state, but
needs a way to wait for the environment within the same turn before
continuing.

Environment startup is owned by Core, so the wait tool should use the
same request-frozen `StepContext` that advertised the starting
environment. This keeps tool registration and execution tied to the
exact startup operation the model saw, even if live thread state later
changes.

Supersedes #29735.

## What

- register `wait_for_environment` when the current `StepContext`
contains starting environments
- wait on the selected `StartingTurnEnvironment` shared resolution and
return a bounded ready or failed result
- rebuild the next request normally, removing the wait tool and exposing
ready environment tools, or reporting the environment as unavailable
after failure

## Testing

- `just test -p codex-core deferred_executor_`
- verifies the wait tool is replaced by environment-backed tools after
startup
- verifies startup failure removes both the wait tool and unavailable
environment tools while notifying the model

sayan-oai · 2026-06-24 00:35:34 +00:00

61ff4d087e

Support thread-level originator overrides (#29477 )

## Why

Work(TPP) threads can be launched from the Desktop app, but if they all
keep the Desktop app's default originator then downstream attribution
cannot distinguish local Work launches from cloud-backed Work launches.
`thread/start.serviceName` already carries that launch signal, while
`SessionMeta.originator` is the durable thread-level value that survives
resume and fork.

This change converts the Desktop Work service names into an effective
originator at thread creation time, persists that originator with the
thread, and keeps using it for later model requests and memory writes.

## What changed

- Map `CODEX_WORK_LOCAL` and `CODEX_WORK_CLOUD` service names to
per-thread originators, while preserving
`CODEX_INTERNAL_ORIGINATOR_OVERRIDE` as the highest-precedence override.
- Persist the effective originator in `SessionMeta.originator`, read it
back on resume/fork, and inherit the parent originator for subagent
spawns when there is no persisted session metadata.
- Handle truncated `SpawnAgentForkMode::LastNTurns` forks by falling
back to the live parent originator when the forked history no longer
includes `SessionMeta`.
- Thread the per-thread originator through Responses headers,
websocket/compaction request paths, thread-store creation, rollout
metadata, and memory stage-one telemetry.

## Verification

- `just test -p codex-core
agent::control::tests::spawn_thread_subagent_inherits_parent_originator_without_fork
agent::control::tests::spawn_thread_subagent_fork_last_n_turns_inherits_parent_originator_without_session_meta
thread_manager::tests::originator_override_precedes_service_name_remapping`
- `just test -p codex-core
agent::control::tests::resume_thread_subagent_restores_stored_metadata_and_effective_multi_agent_mode`
- `just test -p codex-memories-write`
- `just fix -p codex-core -p codex-memories-write`
- `git diff --check`

alexsong-oai · 2026-06-23 17:23:38 -07:00

1acb722e8a

core: reset context for token budget compaction (#29743 )

## Why

When `Feature::TokenBudget` is enabled, compaction should behave like
`new_context`: start a fresh context window with the standard injected
context, without asking the server to summarize old history and without
carrying prior user or assistant messages into the next model request.

This is still a compaction operation from the client lifecycle
perspective. Manual `/compact` and auto-compaction should keep the same
observable side effects that clients and hooks expect, including compact
hooks and `TurnItem::ContextCompaction`.

## What changed

- Added `compact_token_budget` to run token-budget manual and inline
auto-compaction through a shared compaction lifecycle.
- Split pending `new_context` requests from forced context-window
startup: `take_new_context_window_request()` consumes pending requests,
and `start_new_context_window()` installs a fresh context window.
- Routed token-budget manual `/compact` and inline auto-compaction to
install a fresh context window locally instead of calling server/local
summarization.
- Preserved compact lifecycle side effects for token-budget compaction
by running pre/post compact hooks and emitting `ContextCompaction` item
start/completion events.
- Updated token-budget tests to assert fresh window IDs, absence of
server-side compaction calls, dropped prior transcript messages/tool
output after reset, and compact hook/item lifecycle behavior.

## Testing

- `just test -p codex-core
token_budget_context_uses_new_window_after_compaction`
- `just test -p codex-core token_budget_compaction_runs_compact_hooks`
- `just test -p codex-core
token_budget_mid_turn_auto_compaction_resets_before_active_follow_up`

---------

Co-authored-by: pakrym-oai <pakrym@openai.com>

Michael Bolin · 2026-06-23 16:59:04 -07:00

32b65bbf7a

[codex] rename rollout budget error to session budget error (#29744 )

## Summary

- rename the rollout-budget exhaustion error from
`RolloutBudgetExceeded` to `SessionBudgetExceeded`
- expose the matching app-server v2 wire value as
`sessionBudgetExceeded`
- regenerate JSON/TypeScript schema fixtures and update the app-server
docs and focused tests

This is a naming-only follow-up to #29715 based on [Pavel's review
suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
Runtime behavior is unchanged.

## Tests

- `just test -p codex-core rollout_budget`
- `just test -p codex-app-server-protocol`
- `just fmt`
- `just write-app-server-schema`

rka-oai · 2026-06-23 16:49:13 -07:00

1ec3def0b5

fix: scope context remaining to body window (#29665 )

## Why

With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the
persistent prefix should not count against the active body window.
`get_context_remaining` and the token-budget reminder should report the
same usable body-after-prefix window that auto-compaction uses, rather
than the total token count since the session began.

This is stacked on #29664 so the mechanical move from `turn.rs` is
isolated from the behavior fix.

## What

- Extends `ContextWindowTokenStatus` with `context_remaining_tokens`.
- Updates `get_context_remaining` to use the shared context-window
accounting.
- Adds integration coverage for body-after-prefix reminder timing and
`get_context_remaining` output.

## Testing

- `just test -p codex-core body_after_prefix_window`
- `just test -p codex-core auto_compact_body_after_prefix`
- `just fix -p codex-core`

Michael Bolin · 2026-06-23 23:08:54 +00:00

77e7ce1374

[codex] surface rollout budget exhaustion (#29715 )

## Summary
- surface shared rollout-budget exhaustion as
`CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
- map it through the existing `CodexErrorInfo` and app-server v2
`codexErrorInfo` path
- keep local compaction from retrying after the shared rollout budget is
exhausted

This gives app-server clients a stable `rolloutBudgetExceeded` error
they can classify without guessing from `status="interrupted"`.

## Tests
- `just test -p codex-core rollout_budget`

rka-oai · 2026-06-23 15:01:28 -07:00

bbbea91960

core: persist initial context window metadata (#29519 )

## Why

PR #29494 made context-window IDs visible to the model by wrapping the
token-budget window payload in `<context_window>`, but rollout JSONL
consumers still could not see the initial window identity by tailing the
session file. Compacted rollout items carry window IDs only after
compaction has happened, so a session with no compaction had no durable
JSONL record for window 0.

This change gives tailing consumers a stable initial-window record at
session creation time.

## What Changed

- Added `session_meta.context_window.window_id` for the initial
context-window identity.
- `CreateThreadParams` now requires `initial_window_id: String`, so
thread-store callers cannot accidentally create new threads without
window-0 metadata.
- Live thread creation derives the persisted initial window ID from the
same `AutoCompactWindowIds` used to initialize `SessionState`, keeping
runtime state and JSONL metadata aligned.
- Rollout reconstruction uses `session_meta.context_window.window_id` as
the initial-window fallback and derives `window_number = 0`,
`first_window_id = window_id`, and `previous_window_id = None`
internally.
- Fork reconstruction intentionally uses the same rollout reconstruction
path; consumers that need to distinguish copied initial-window metadata
can use the rollout `thread_id`.
- Legacy compactions without `window_number` still use compaction-count
fallback accounting instead of being reset to window 0 by the
initial-window fallback.
- Compacted rollout metadata still takes precedence once compaction
records exist, preserving the richer chain fields there.

## JSONL Shape

Real rollout JSONL is one object per line. This example is expanded for
readability, but shows the new initial `session_meta.context_window`
record followed by the existing compacted rollout item shape that also
carries window IDs:

```jsonl
{
  "timestamp": "2026-06-22T12:00:00.000Z",
  "type": "session_meta",
  "payload": {
    "session_id": "<THREAD_ID>",
    "id": "<THREAD_ID>",
    "timestamp": "2026-06-22T12:00:00.000Z",
    "cwd": "/repo",
    "originator": "codex",
    "cli_version": "0.0.0",
    "source": "cli",
    "model_provider": "<MODEL_PROVIDER>",
    "context_window": {
      "window_id": "<INITIAL_WINDOW_ID>"
    }
  }
}
...
{
  "timestamp": "2026-06-22T12:34:56.000Z",
  "type": "compacted",
  "payload": {
    "message": "<COMPACTION_SUMMARY>",
    "replacement_history": [
      "..."
    ],
    "window_number": 1,
    "first_window_id": "<INITIAL_WINDOW_ID>",
    "previous_window_id": "<INITIAL_WINDOW_ID>",
    "window_id": "<NEXT_WINDOW_ID>"
  }
}
```

The nested `context_window` object is intentional: it gives rollout
consumers a stable namespace for context-window metadata while only
writing the non-derivable initial `window_id`. For the initial window,
`window_number`, `first_window_id`, and `previous_window_id` are derived
internally instead of being written to the rollout.

## Verification

- `just test -p codex-protocol`
- `just test -p codex-rollout
recorder_materializes_on_flush_with_pending_items`
- `just test -p codex-core reconstruct_history`
- `just test -p codex-core
record_initial_history_reconstructs_forked_transcript`
- `just test -p codex-thread-store`
- `just test -p codex-state`
- `just test -p codex-app-server
thread_read_returns_summary_without_turns`
- `just test -p codex-rollout persistence_metrics`

Michael Bolin · 2026-06-23 21:50:50 +00:00

01f89c8c59

core tests: rename automatic environment builder (#29728 )

## Why

Use a clearer name for what happens when this helper sets up a test
environment.

## What

- Rename the builder and its harness wrapper to use `auto_env` instead
of `remote_env` because the helper will set up a local environment if
configured by the build system.

Adam Perry @ OpenAI · 2026-06-23 21:45:06 +00:00

5283522939

test: branch on target OS instead of runner flavor (#29712 )

## Why

Core tests should branch on the executor's operating system, not on
runner details such as Docker or Wine. This keeps platform behavior
stable as new test backends are added and reserves Wine-specific skips
for actual runner debt.

## What

- Add `TestTargetOs` and target/host-aware skip helpers while keeping
`TestEnvironment` internal.
- Replace topology enum access with remote predicates and a narrow
Docker accessor.
- Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
document the skip taxonomy.

## Validation

- `just test -p core_test_support`
- `just test -p codex-core
remote_test_env_can_connect_and_use_filesystem`
- `bazel test //codex-rs/core:core-all-wine-exec-test
--test_output=errors` reached test execution; unrelated existing
view-image, path, and timing failures remain.
- `just test -p codex-core` and `just test` reached broad test
execution; this checkout has unrelated helper, sandbox, and timing
failures.

Adam Perry @ OpenAI · 2026-06-23 14:27:13 -07:00

9a79536e6b

core: use current step environments for tools (#29547 )

## Why

With deferred executors, an environment can become ready between two
sampling requests in the same turn. The model-visible environment
update, advertised tools, and eventual tool execution must all describe
the same request-time view.

Otherwise, a request built while only environment B is ready can
advertise a tool without an `environment_id`; if higher-priority
environment A becomes ready before execution, that call could silently
run in A instead.

This PR is stacked on #29527.

## Design

`run_turn` captures one `Arc<StepContext>` at each sampling-request
boundary. That step owns the request's `TurnContext` and environment
snapshot.

- World-state environment updates and tool planning borrow that same
step.
- `ToolCallRuntime` retains the `Arc` while asynchronous tool calls
execute.
- `ToolInvocation` carries the step to handlers; its temporary `turn`
compatibility field is derived from the same object.
- `ToolRouter` does not retain `StepContext`; it only uses it while
constructing the request's tool set.
- With `DeferredExecutor` disabled, step capture keeps using the
environments frozen at turn start.

Simply: every sampling request gets one consistent picture of its
environments, from what the model sees through where its tool calls run.

## What changed

- Build environment-dependent tool specs from the current request's
`StepContext`.
- Use that same step for unified exec, legacy shell, `apply_patch`,
`view_image`, and `request_permissions` execution.
- Hide environment-backed tools, including `request_permissions`, while
no environment is attached.
- Resolve legacy shell paths and metadata from the selected step
environment instead of the stale turn-start environment.
- Capture explicit steps at non-turn-loop boundaries such as compaction,
prompt debug, and startup prewarm.
- Reconcile prompt-debug history from the same step used to build its
tools.

## Follow-up

- Bind yielded code-mode cells to the tool runtime that created them, so
nested calls made after yielding continue to use the originating
request's `StepContext`.

## Test plan

- `just test -p codex-core
deferred_executor_updates_context_and_tools_after_startup`
- `just test -p codex-core
environment_count_controls_environment_backed_tools`
- `just test -p codex-core
build_prompt_input_includes_context_and_user_message`

sayan-oai · 2026-06-23 20:21:13 +00:00

4cc6a4bab5

core: resolve view_image paths in selected environment (#29526 )

## Why

view_image needs to support foreign OS remote executors.

## What

- resolve image paths against the selected environment as `PathUri` and
read them through that environment's filesystem
- keep app-server's public path field wire-compatible as
`LegacyAppPathString`, with purpose-specific UI rendering
- cover relative and absolute target-native paths in the core
integration test and run the full `view_image` suite under wine-exec
without skips

Adam Perry @ OpenAI · 2026-06-23 19:52:37 +00:00

510bce9927

chore(core) rm AskForApproval::OnFailure (#28418 )

## Summary
Deletes the OnFailure variant of the `AskForApproval` enum. This option
has been deprecated since #11631.

## Testing
- [x] Tests pass

Dylan Hurd · 2026-06-23 12:13:54 -07:00

2cf2a6a844

core: use turn-owned world state for inline compaction (#29527 )

## Why

Follow-up to #29249 and its [compaction review
thread](https://github.com/openai/codex/pull/29249#discussion_r3455055101).

During a turn, environment readiness can change between sampling
requests. Inline compaction must render the same model-visible
`WorldState` used by the request it follows. Rebuilding that state
during compaction can observe a newer environment, make replacement
history disagree with what the model saw, and suppress the next
environment update.

## What changed

- Make `run_turn` own the current `Arc<WorldState>` and replace it only
between sampling requests.
- Build each state from an explicitly chosen environment snapshot, diff
deferred-executor steps against the turn-owned state, and retain the
latest state in `ContextManager` only for cross-turn and resume
tracking.
- Pass the exact turn-owned state into inline compaction and explicit
new-context-window replacement.
- Carry that state with
`InitialContextInjection::BeforeLastUserMessage`, so replacement context
and its stored baseline cannot come from different snapshots.
- Remove obsolete state-recapture helpers and ambiguous TurnContext-only
WorldState builders.
- Add an integration test that moves an environment from starting to
ready during a paused turn, triggers compaction, and verifies the next
request receives the readiness update exactly once.

## Test plan

- `just test -p codex-core
deferred_executor_compaction_preserves_then_updates_environment_once`
- `just test -p codex-core process_compacted_history`
- `just test -p codex-core mid_turn_continuation_compaction`
- `just test -p codex-core build_initial_context`
- `just test -p codex-core
ignores_session_prefix_messages_when_truncating`

sayan-oai · 2026-06-23 10:33:19 -07:00

d1d11cac05

1521 Commits