codex

codex-rs: fix thread resume rejoin semantics (#11756 )

## Summary
- always rejoin an in-memory running thread on `thread/resume`, even
when overrides are present
- reject `thread/resume` when `history` is provided for a running thread
- reject `thread/resume` when `path` mismatches the running thread
rollout path
- warn (but do not fail) on override mismatches for running threads
- add more `thread_resume` integration tests and fixes; including
restart-based resume-with-overrides coverage

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all thread_resume`
- manual test with app-server-test-client
https://github.com/openai/codex/pull/11755
- manual test both stdio and websocket in app

Max Johnson · 2026-02-13 23:09:58 +00:00

fb0aaf94de

[app-server] add fuzzyFileSearch/sessionCompleted (#11773 )

this is to allow the client to know when to stop showing a spinner.

Jeremy Rose · 2026-02-13 15:08:14 -08:00

e4f8263798

[apps] Fix app loading logic. (#11518 )

When `app/list` is called with `force_refetch=True`, we should seed the
results with what is already cached instead of starting from an empty
list. Otherwise when we send app/list/updated events, the client will
first see an empty list of accessible apps and then get the updated one.

Matthew Zeng · 2026-02-13 03:55:10 +00:00

f93037f55d

Add cwd as an optional field to thread/list (#11651 )

Add's the ability to filter app-server thread/list by cwd

acrognale-oai · 2026-02-13 02:05:04 +00:00

ebe359b876

Persist complete TurnContextItem state via canonical conversion (#11656 )

## Summary

This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.

Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.

No context injection/diff behavior changes are included in this PR.

## Why this change

The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.

This PR addresses both with a minimal, reviewable change.

## Changes

### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.

Files:
- `codex-rs/protocol/src/protocol.rs`

### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.

Files:
- `codex-rs/core/src/codex.rs`

### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
  - sampling request path
  - compaction path

Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`

### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
  - deserializing old payloads with no `network`
  - serializing when `network` is present

Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.

Charley Cunningham · 2026-02-12 17:22:44 -08:00

f24669d444

[apps] Add is_enabled to app info. (#11417 )

- [x] Add is_enabled to app info and the response of `app/list`.
- [x] Update TUI to have Enable/Disable button on the app detail page.

Matthew Zeng · 2026-02-13 00:30:52 +00:00

c37560069a

app-server tests: disable shell_snapshot for review suite (#11657 )

## Why


`suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
was failing on Windows CI due to an unrelated process crash during shell
snapshot initialization (`tokio-runtime-worker` stack overflow).

This review test suite validates review API behavior and should not
depend on shell snapshot behavior. Keeping shell snapshot enabled in
this fixture made the test flaky for reasons outside the scenario under
test.

## What Changed

- Updated the review suite test config in
`codex-rs/app-server/tests/suite/v2/review.rs` to set:
  - `shell_snapshot = false`

This keeps the review tests focused on review behavior by disabling
shell snapshot initialization in this fixture.

## Verification

- `cargo test -p codex-app-server`
- Confirmed the previously failing Windows CI job for this test now
passes on this PR.

Michael Bolin · 2026-02-12 23:56:43 +00:00

aef4af1079

fix: skip review_start_with_detached_delivery_returns_new_thread_id o… (#11645 )

…n windows

Owen Lin · 2026-02-12 15:12:57 -08:00

76256a8cec

app-server: add fuzzy search sessions for streaming file search (#10268 )

Jeremy Rose · 2026-02-12 10:49:44 -08:00

66e0c3aaa3

feat: make sandbox read access configurable with ReadOnlyAccess (#11387 )

`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.

It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.

## What

- Added `ReadOnlyAccess` in protocol with:
  - `Restricted { include_platform_defaults, readable_roots }`
  - `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
  - `ReadOnly { access: ReadOnlyAccess }`
  - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.

## Compatibility / rollout

- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.

Michael Bolin · 2026-02-11 18:31:14 -08:00

abbd74e2be

test(app-server): stabilize app/list thread feature-flag test by using file-backed MCP OAuth creds (#11521 )

## Why

`suite::v2::app_list::list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
has been flaky in CI. The test exercises `thread/start`, which
initializes `codex_apps`. In CI/Linux, that path can reach OS
keyring-backed MCP OAuth credential lookup (`Codex MCP Credentials`) and
intermittently abort the MCP process (observed stack overflow in
`zbus`), causing the test to fail before the assertion logic runs.

## What Changed

- Updated the test config in
`codex-rs/app-server/tests/suite/v2/app_list.rs` to set
`mcp_oauth_credentials_store = "file"` in both relevant config-writing
paths:
- The in-test config override inside
`list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
- `write_connectors_config(...)`, which is used by the v2 `app_list`
test suite
- This keeps test coverage focused on thread-scoped app feature flags
while removing OS keyring/DBus dependency from this test path.

## How It Was Verified

- `cargo test -p codex-app-server`
- `cargo test -p codex-app-server
list_apps_uses_thread_feature_flag_when_thread_id_is_provided --
--nocapture`

Michael Bolin · 2026-02-11 18:30:18 -08:00

572ab66496

Reapply "Add app-server transport layer with websocket support" (#11370 )

Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>

Max Johnson · 2026-02-11 18:13:39 +00:00

7053aa5457

Remove test-support feature from codex-core and replace it with explicit test toggles (#11405 )

## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.

Michael Bolin · 2026-02-10 22:44:02 -08:00

476c1a7160

feat: support multiple rate limits (#11260 )

Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets

xl-openai · 2026-02-10 20:09:31 -08:00

fdd0cd1de9

chore: persist turn_id in rollout session and make turn_id uuid based (#11246 )

Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.

This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.

This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.

example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```

backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.

Celia Chen · 2026-02-11 03:56:01 +00:00

641d5268fa

Prefer websocket transport when model opts in (#11386 )

Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off

Testing
- Not run (not requested)

pakrym-oai · 2026-02-10 18:50:48 -08:00

c68999ee6d

Disable very flaky tests (#11394 )

Collected from last 20 builds of main in
https://github.com/openai/codex/commits/main/.

pakrym-oai · 2026-02-10 18:50:11 -08:00

bfd4e2112c

fix: reduce usage of open_if_present (#11344 )

jif-oai · 2026-02-10 19:25:07 +00:00

847a6092e6

feat: opt-out of events in the app-server (#11319 )

Add `optOutNotificationMethods` in the app-server to opt-out events
based on exact method matching

jif-oai · 2026-02-10 18:04:52 +00:00

a364dd8b56

Revert "Add app-server transport layer with websocket support (#10693 )" (#11323 )

Suspected cause of deadlocking bug

Max Johnson · 2026-02-10 17:37:49 +00:00

47356ff83c

[apps] Add thread_id param to optionally load thread config for apps feature check. (#11279 )

- [x] Add thread_id param to optionally load thread config for apps
feature check

Matthew Zeng · 2026-02-09 23:10:26 -08:00

005e040f97

fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240 )

…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!

Owen Lin · 2026-02-09 20:48:58 -08:00

53741013ab

Adjust shell command timeouts for Windows (#11247 )

Summary
- add platform-aware defaults for shell command timeouts so Windows
tests get longer waits
- keep medium timeout longer on Windows to ensure flakiness is reduced

Testing
- Not run (not requested)

Dylan Hurd · 2026-02-09 20:03:32 -08:00

168c359b71

test: deflake nextest child-process leak in MCP harnesses (#11263 )

## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes

## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.

## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`


## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245

Josh McKinney · 2026-02-10 03:43:24 +00:00

de59e550c0

feat: extend skills/list to support additional roots. (#10835 )

Add an optional perCwdExtraUserRoots

xl-openai · 2026-02-09 13:30:38 -08:00

a33ee46e3b

Load requirements on windows (#10770 )

We support requirements on Unix, loading from
`/etc/codex/requirements.toml`. On MacOS, we also support MDM.

Now, on Windows, we'll load requirements from
`%ProgramData%\OpenAI\Codex\requirements.toml`

gt-oai · 2026-02-09 16:05:38 +00:00

9fe925b15a

[apps] Improve app loading. (#10994 )

There are two concepts of apps that we load in the harness:

- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.

Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.

In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.

- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.

Matthew Zeng · 2026-02-08 15:24:56 -08:00

45b7763c3f

Upgrade rmcp to 0.14 (#10718 )

- [x] Upgrade rmcp to 0.14

Matthew Zeng · 2026-02-08 15:07:53 -08:00

9f1009540b

Defer persistence of rollout file (#11028 )

- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes

For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests

Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn

Eric Traut · 2026-02-07 23:05:03 -08:00

b3de6c7f2b

app-server: treat null mode developer instructions as built-in defaults (#10983 )

## Summary
- make `turn/start` normalize
`collaborationMode.settings.developer_instructions: null` to the
built-in instructions for the selected mode
- prevent app-server clients from accidentally clearing mode-switch
developer instructions by sending `null`
- document this behavior in the v2 protocol and app-server docs

## What changed
- `codex-rs/app-server/src/codex_message_processor.rs`
  - added a small `normalize_turn_start_collaboration_mode` helper
  - in `turn_start`, apply normalization before `OverrideTurnContext`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- extended `turn_start_accepts_collaboration_mode_override_v2` to assert
the outgoing request includes default-mode instruction text when the
client sends `developer_instructions: null`
- `codex-rs/app-server-protocol/src/protocol/v2.rs`
- clarified `TurnStartParams.collaboration_mode` docs:
`settings.developer_instructions: null` means use built-in mode
instructions
- regenerated schema fixture:
- `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
- docs:
  - `codex-rs/app-server/README.md`
  - `codex-rs/docs/codex_mcp_interface.md`

Charley Cunningham · 2026-02-07 12:59:41 -08:00

e6662d6387

Fixed a flaky test (#10970 )

## Summary

Stabilize v2 review integration tests by making them hermetic with
respect to model discovery.

`app-server` review tests were intermittently timing out in CI
(especially on Windows runners) because their test config allowed remote
model refresh. During `thread/start`, the test process could issue live
`/v1/models` requests, introducing external network latency and
nondeterministic timing before review flow assertions.

This change disables remote model fetching in the review test config
helper used by these tests.

Eric Traut · 2026-02-06 21:26:26 -08:00

4d52428fa2

Treat compaction failure as failure state (#10927 )

- Return compaction errors from local and remote compaction flows.\n-
Stop turns/tasks when auto-compaction fails instead of continuing
execution.

Ahmed Ibrahim · 2026-02-06 13:51:46 -08:00

ba8b5d9018

Add app configs to config.toml (#10822 )

Adds app configs to config.toml + tests

canvrno-oai · 2026-02-06 10:29:08 -08:00

36c16e0c58

Handle required MCP startup failures across components (#10902 )

Summary
- add a `required` flag for MCP servers everywhere config/CLI data is
touched so mandatory helpers can be round-tripped
- have `codex exec` and `codex app-server` thread start/resume fail fast
when required MCPs fail to initialize

jif-oai · 2026-02-06 17:14:37 +01:00

aab61934af

Removed the "remote_compaction" feature flag (#10840 )

This feature is always on now

Eric Traut · 2026-02-05 23:54:57 -08:00

dd80e332c4

feat(app-server): turn/steer API (#10821 )

This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.

## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.

Having a dedicated `turn/steer` API eliminates all those quirks.

`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.

Owen Lin · 2026-02-06 00:35:04 +00:00

0d8b2b74c4

Add stage field for experimental flags. (#10793 )

- [x] Add stage field for experimental flags.

Matthew Zeng · 2026-02-05 23:31:04 +00:00

729b016515

Add app-server transport layer with websocket support (#10693 )

- Adds --listen <URL> to codex app-server with two listen modes:
      - stdio:// (default, existing behavior)
      - ws://IP:PORT (new websocket transport)
  - Refactors message routing to be connection-aware:
- Tracks per-connection session state (initialize/experimental
capability)
      - Routes responses/errors to the originating connection
- Broadcasts server notifications/requests to initialized connections
- Updates initialization semantics to be per connection (not
process-global), and updates app-server docs accordingly.
- Adds websocket accept/read/write handling (JSON-RPC per text frame,
ping/pong handling, connection lifecycle events).

Testing

- Unit tests for transport URL parsing and targeted response/error
routing.
  - New websocket integration test validating:
      - per-connection initialization requirements
      - no cross-connection response leakage
      - same request IDs on different connections route independently.

Max Johnson · 2026-02-05 20:56:34 +00:00

8473096efb

[app-server] Add a method to list experimental features. (#10721 )

- [x] Add a method to list experimental features.

Matthew Zeng · 2026-02-05 20:04:01 +00:00

7e81f63698

fix: flaky landlock (#10689 )

https://openai.slack.com/archives/C095U48JNL9/p1770243347893959

jif-oai · 2026-02-05 10:30:18 +00:00

c67120f4a0

fix(core,app-server) resume with different model (#10719 )

## Summary
When resuming with a different model, we should also append a developer
message with the model instructions

## Testing
- [x] Added unit tests

Dylan Hurd · 2026-02-05 00:40:05 -08:00

fe8b474acd

chore(config) Default Personality Pragmatic (#10705 )

## Summary
Switch back to Pragmatic personality

## Testing
- [x] Updated unit tests

Dylan Hurd · 2026-02-04 21:22:47 -08:00

a05aadfa1b

chore(core) personality migration tests (#10650 )

## Summary
Adds additional tests for personality edge cases

## Testing
- [x] These are tests

Dylan Hurd · 2026-02-04 19:03:14 -08:00

73f32840c6

feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567 )

Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397

Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image

So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.

## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).

Old shape:
`{ "output": "dynamic-ok", "success": true }`

New shape:
```
{
    "contentItems": [
      { "type": "inputText", "text": "dynamic-ok" },
      { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
    ]
  "success": true
}
```

Owen Lin · 2026-02-04 16:12:47 -08:00

5ea107a088

Fix test_shell_command_interruption flake (#10649 )

## Human summary
Sandboxing (specifically `LandlockRestrict`) is means that e.g. `sleep
10` fails immediately. Therefore it cannot be interrupted.

In suite::interrupt::test_shell_command_interruption, sleep 10 is issued
at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}),
then fails at 17:28:16.589 with duration_ms=34, success=false,
exit_code=101, and
    Sandbox(LandlockRestrict).

## Codex summary
- set `sandbox_mode = "danger-full-access"` in `interrupt` and
`v2/turn_interrupt` integration tests
- set `sandbox: Some(SandboxMode::DangerFullAccess)` in
`test_codex_jsonrpc_conversation_flow`
- set `sandbox_policy: Some(SandboxPolicy::DangerFullAccess)` in
`command_execution_notifications_include_process_id`

## Why
On some Linux CI environments, command execution fails immediately with
`LandlockRestrict` when sandboxed. These tests are intended to validate
JSON-RPC/task lifecycle behavior (interrupt semantics, command
notification shape/process id, request flow), but early sandbox startup
failure changes turn flow and can trigger extra follow-up requests,
causing flakes.

This change removes environment-specific sandbox startup dependency from
these tests while preserving their primary intent.

## Testing
- not run in this environment (per request)

gt-oai · 2026-02-04 22:19:06 +00:00

7c6d21a414

Add thread/compact v2 (#10445 )

- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.

Ahmed Ibrahim · 2026-02-03 18:15:55 -08:00

38a47700b5

Feat: add upgrade to app server modelList (#10556 )

### Summary
* Add model upgrade to listModel app server endpoint to support
dynamically show model upgrade banner.

Shijie Rao · 2026-02-03 14:53:36 -08:00

750ebe154d

fix(app-server): fix approval events in review mode (#10416 )

One of our partners flagged that they were seeing the wrong order of
events when running `review/start` with command exec approvals:
```
{"method":"item/commandExecution/requestApproval","id":0,"params":{"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0","itemId":"0","reason":"`/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'` requires approval: Xcode-required approval: Require explicit user confirmation for all commands.","proposedExecpolicyAmendment":null}}

{"method":"item/started","params":{"item":{"type":"commandExecution","id":"call_AEjlbHqLYNM7kbU3N6uw1CNi","command":"/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'","cwd":"/Users/devingreen/Desktop/SampleProject","processId":null,"status":"inProgress","commandActions":[{"type":"unknown","command":"git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat"}],"aggregatedOutput":null,"exitCode":null,"durationMs":null},"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0"}}
```

**Key fix**: In the review sub‑agent delegate we were forwarding exec
(and patch) approvals using the parent turn id (`parent_ctx.sub_id`) as
the approval call_id. That made
`item/commandExecution/requestApproval.itemId` differ from the actual
`item/started` id. We now forward the sub‑agent’s `call_id` from the
approval event instead, so the approval item id matches the
commandExecution item id in review flows.

Here’s the expected event order for an inline `review/start` that
triggers an exec approval after this fix:
1. Response to review/start (JSON‑RPC response)
- Includes `turn` (status inProgress) and `review_thread_id` (same as
parent thread for inline).
2. `turn/started` notification
  - turnId is the review turn id (e.g., "0").
3. `item/started` → EnteredReviewMode
  - item.id == turnId, marks entry into review mode.
4. `item/started` → commandExecution
  - item.id == <call_id> (e.g., "review-call-1"), status: inProgress.
5. `item/commandExecution/requestApproval` request
  - JSON‑RPC request (not a notification).
  - params.itemId == <call_id> and params.turnId == turnId.
6. Client replies to approval request (Approved / Declined / etc).
7. If approved:
  - Optional `item/commandExecution/outputDelta` notifications.
  - `item/completed` → commandExecution with status and exitCode.
8. Review finishes:
  - `item/started` → ExitedReviewMode
  - `item/completed` → ExitedReviewMode
  - (Agent message items may also appear, depending on review output.)
9. `turn/completed` notification

The key being #4 and #5 are now in the proper order with the correct
item id.

Owen Lin · 2026-02-03 12:08:17 -08:00

d9ad5c3c49

Cleanup collaboration mode variants (#10404 )

## Summary

This PR simplifies collaboration modes to the visible set `default |
plan`, while preserving backward compatibility for older partners that
may still send legacy mode
names.

Specifically:
- Renames the old Code behavior to **Default**.
- Keeps **Plan** as-is.
- Removes **Custom** mode behavior (fallbacks now resolve to Default).
- Keeps `PairProgramming` and `Execute` internally for compatibility
plumbing, while removing them from schema/API and UI visibility.
- Adds legacy input aliasing so older clients can still send old mode
names.

## What Changed

1. Mode enum and compatibility
- `ModeKind` now uses `Plan` + `Default` as active/public modes.
- `ModeKind::Default` deserialization accepts legacy values:
  - `code`
  - `pair_programming`
  - `execute`
  - `custom`
- `PairProgramming` and `Execute` variants remain in code but are hidden
from protocol/schema generation.
- `Custom` variant is removed; previous custom fallbacks now map to
`Default`.

2. Collaboration presets and templates
- Built-in presets now return only:
  - `Plan`
  - `Default`
- Template rename:
  - `core/templates/collaboration_mode/code.md` -> `default.md`
- `execute.md` and `pair_programming.md` remain on disk but are not
surfaced in visible preset lists.

3. TUI updates
- Updated user-facing naming and prompts from “Code” to “Default”.
- Updated mode-cycle and indicator behavior to reflect only visible
`Plan` and `Default`.
- Updated corresponding tests and snapshots.

4. request_user_input behavior
- `request_user_input` remains allowed only in `Plan` mode.
- Rejection messaging now consistently treats non-plan modes as
`Default`.

5. Schemas
- Regenerated config and app-server schemas.
- Public schema types now advertise mode values as:
  - `plan`
  - `default`

## Backward Compatibility Notes

- Incoming legacy mode names (`code`, `pair_programming`, `execute`,
`custom`) are accepted and coerced to `default`.
- Outgoing/public schema surfaces intentionally expose only `plan |
default`.
- This allows tolerant ingestion of older partner payloads while
standardizing new integrations on the reduced mode set.

## Codex author
`codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`

Charley Cunningham · 2026-02-03 09:23:53 -08:00

d509df676b

[Codex][CLI] Gate image inputs by model modalities (#10271 )

###### Summary

- Add input_modalities to model metadata so clients can determine
supported input types.
- Gate image paste/attach in TUI when the selected model does not
support images.
- Block submits that include images for unsupported models and show a
clear warning.
- Propagate modality metadata through app-server protocol/model-list
responses.
  - Update related tests/fixtures.

  ###### Rationale

  - Models support different input modalities.
- Clients need an explicit capability signal to prevent unsupported
requests.
- Backward-compatible defaults preserve existing behavior when modality
metadata is absent.

  ###### Scope

  - codex-rs/protocol, codex-rs/core, codex-rs/tui
  - codex-rs/app-server-protocol, codex-rs/app-server
  - Generated app-server types / schema fixtures

  ###### Trade-offs

- Default behavior assumes text + image when field is absent for
compatibility.
  - Server-side validation remains the source of truth.

  ###### Follow-up

- Non-TUI clients should consume input_modalities to disable unsupported
attachments.
- Model catalogs should explicitly set input_modalities for text-only
models.

  ###### Testing

  - cargo fmt --all
  - cargo test -p codex-tui
  - env -u GITHUB_APP_KEY cargo test -p codex-core --lib
  - just write-app-server-schema
- cargo run -p codex-cli --bin codex -- app-server generate-ts --out
app-server-types
  - test against local backend
  
<img width="695" height="199" alt="image"
src="https://github.com/user-attachments/assets/d22dd04f-5eba-4db9-a7c5-a2506f60ec44"
/>

---------

Co-authored-by: Josh McKinney <joshka@openai.com>

Colin Young · 2026-02-02 18:56:39 -08:00

7e07ec8f73

213 Commits