mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
main
164 Commits
-
feat(app-server): add history_mode to thread (#29927)
## Description This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`. This will be stored in `SessionMeta` in the JSONL rollout file and as a new column in the SQLite thread_metadata table, and exposed on `thread/start` and on the `Thread` object in app-server. ## What changed - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`, defaulting old and new SessionMeta to `legacy`. - Carried `history_mode` through core session config, ThreadStore stored metadata, local/in-memory stores, rollout metadata extraction, and the existing SQLite `threads` table. - Added experimental `historyMode` to app-server v2 `Thread` and `thread/start`. - Made paginated stored threads metadata-discoverable but unsupported for legacy full-history reads, `load_history`, live resume, and create paths. - Regenerated app-server schema fixtures and added protocol/state/thread-store/app-server coverage for persistence and fail-closed behavior. ## Compatibility floor Because users may be running various versions of Codex binaries on the same machine (TUI, Codex App, etc.), we will need to establish a compatibility floor for upcoming paginated threads, which will change how thread storage reads and writes work. The overall plan here: ``` Release N: - Add historyMode to SessionMeta / Thread / SQLite metadata. - Teach binaries to understand paginated threads. - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread. - Default remains `"legacy"`. Release N+1: - First-party clients start opting into paginated threads where appropriate. - Internal dogfood / staged rollout. - Measure old-client usage and paginated-thread unsupported errors. Release N+2: - Only after Release N+ is overwhelmingly deployed, make paginated the default. - Accept that a small tail of N-1-or-older binaries may not understand paginated threads. ``` The important behavior change is fail-closed handling for a binary that encounters a persisted `paginated` thread before it knows how to fully support paginated history. In app-server, if a thread is `paginated`, we will: - allow metadata-only discovery paths like `thread/list` and `thread/read(includeTurns=false)`, so clients can still see the thread and inspect its `historyMode` - reject legacy full-history/live-thread paths like `thread/read(includeTurns=true)` and `thread/resume` with an unsupported JSON-RPC error - avoid silently treating an unknown or future `historyMode` as `legacy` Under the hood, the ThreadStore layer also rejects legacy operations that would need to load or replay the full thread history for a paginated thread. That gives us the behavior we want for Release N: future paginated threads are visible, but this binary fails closed instead of trying to operate on them as if they were legacy threads.
Owen Lin ·
2026-06-26 09:12:42 -07:00 -
Expose MCP app identity in app context (#29934)
## Why MCP tool-call events need to expose trusted app identity and action metadata directly so v2 clients do not have to infer it from tool names or resource URIs. ## What changed - Add optional `appName`, `templateId`, and `actionName` fields to MCP tool-call `appContext`. - Populate `appName` and `templateId` from trusted Codex Apps metadata, and derive `actionName` from the trusted app resource metadata. - Preserve all three fields through core events, legacy protocol events, persisted thread history, resume redaction, and app-server v2 responses. - Document the public `appContext` fields in `codex-rs/app-server/README.md`. - Regenerate app-server JSON and TypeScript schemas and add coverage for serialization, persistence, redaction, and metadata propagation. ## Validation - `just test -p codex-app-server-protocol mcp_tool_call` - `just test -p codex-core mcp_tool_call_item_metadata_only_trusts_codex_apps_identity mcp_tool_call_item_includes_app_identity` - `just write-app-server-schema` --------- Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
Martin Au-Yeung ·
2026-06-25 18:31:10 -07:00 -
[codex] Surface MCP reauthentication-required startup failures (#29877)
## Summary - distinguish expired, non-refreshable stored MCP OAuth credentials from first-time missing credentials - carry a typed `failureReason: "reauthenticationRequired"` on the existing `mcpServer/startupStatus/updated` notification only when user action is required - keep the public MCP auth-status API unchanged and regenerate the app-server protocol schemas and documentation ## Why An MCP server with an expired access token and no usable refresh token currently fails startup without giving clients a reliable, typed recovery signal. The existing startup-status notification is the natural place to carry this state. Its nullable `failureReason` keeps the recovery reason attached to the failed startup transition without adding a one-off notification. Internally, Codex distinguishes first-time login from reauthentication and emits the reason only when the startup error itself requires authentication. ## User impact App clients can prompt an existing user to reconnect an MCP server when automatic recovery is impossible by handling a failed `mcpServer/startupStatus/updated` notification whose `failureReason` is `reauthenticationRequired`. Starting, ready, cancelled, unrelated failures, and first-time setup carry no reauthentication reason. ## Companion app PR - openai/openai#1069582 ## Validation - `just test -p codex-app-server-protocol` — 248 passed; schema fixture tests passed - `cargo check -p codex-app-server -p codex-tui` - `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped - `just test -p codex-protocol -p codex-app-server-protocol -p codex-mcp` — 579 passed - `just write-app-server-schema` - `just fmt`
felixxia-oai ·
2026-06-25 21:50:36 +00:00 -
Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
## Why #28522 routes selected-plugin HTTP MCP traffic through the owning executor, but OAuth bootstrap and refresh still used host-local clients. Executor-only servers therefore cannot complete discovery or login through the same network boundary as the MCP connection. ## What changed - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient` contract - let RMCP own discovery, dynamic registration, PKCE, token exchange, and refresh - route auth status, persisted-token startup, and app-server login through the server runtime while preserving the existing local discovery path - add optional `threadId` to `mcpServer/oauth/login` and echo it in the completion notification - implement RMCP's redirect policy and 1 MiB OAuth response limit over executor HTTP - cover selected-thread OAuth discovery and login through an executor-only route Depends on #28522.
jif ·
2026-06-25 10:31:17 +01:00 -
[apps] Thread structured icon assets through app list (#29889)
## Summary - Add `iconAssets` and `iconDarkAssets` to the app-list protocol. - Preserve structured icons through directory merging and the connector, app- server, and TUI boundaries. - Keep legacy logo URLs unchanged as compatibility fallbacks. - Update generated protocol schemas and TypeScript types.
Drew ·
2026-06-24 13:25:44 -07:00 -
[codex] rename rollout budget error to session budget error (#29744)
## Summary - rename the rollout-budget exhaustion error from `RolloutBudgetExceeded` to `SessionBudgetExceeded` - expose the matching app-server v2 wire value as `sessionBudgetExceeded` - regenerate JSON/TypeScript schema fixtures and update the app-server docs and focused tests This is a naming-only follow-up to #29715 based on [Pavel's review suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480). Runtime behavior is unchanged. ## Tests - `just test -p codex-core rollout_budget` - `just test -p codex-app-server-protocol` - `just fmt` - `just write-app-server-schema`
rka-oai ·
2026-06-23 16:49:13 -07:00 -
[codex] surface rollout budget exhaustion (#29715)
## Summary - surface shared rollout-budget exhaustion as `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn - map it through the existing `CodexErrorInfo` and app-server v2 `codexErrorInfo` path - keep local compaction from retrying after the shared rollout budget is exhausted This gives app-server clients a stable `rolloutBudgetExceeded` error they can classify without guessing from `status="interrupted"`. ## Tests - `just test -p codex-core rollout_budget`
rka-oai ·
2026-06-23 15:01:28 -07:00 -
core: resolve view_image paths in selected environment (#29526)
## Why view_image needs to support foreign OS remote executors. ## What - resolve image paths against the selected environment as `PathUri` and read them through that environment's filesystem - keep app-server's public path field wire-compatible as `LegacyAppPathString`, with purpose-specific UI rendering - cover relative and absolute target-native paths in the core integration test and run the full `view_image` suite under wine-exec without skips
Adam Perry @ OpenAI ·
2026-06-23 19:52:37 +00:00 -
core: add extra metadata field to Thread struct (#29675)
# Summary Adds a field Thread.extras that can be used to hold arbitrary metadata specific to a given thread.
Boyang Niu ·
2026-06-23 19:15:59 +00:00 -
chore(core) rm AskForApproval::OnFailure (#28418)
## Summary Deletes the OnFailure variant of the `AskForApproval` enum. This option has been deprecated since #11631. ## Testing - [x] Tests pass
Dylan Hurd ·
2026-06-23 12:13:54 -07:00 -
app-server: document thread and turn IDs are UUID7 (#27714)
It's actually a very nice property that these are UUID7s, so documenting them so we think twice before changing it away from UUID7s in the future.
Owen Lin ·
2026-06-23 11:46:36 -07:00 -
Propagate safety buffering treatment metadata (#29473)
## Summary - read the request-scoped safety-buffering treatment from HTTP response headers and per-turn WebSocket metadata through one shared header parser - combine that treatment with Responses API safety-buffering signals - propagate `showBufferingUi` and nullable `fasterModel` through the existing `model/safetyBuffering/updated` app-server notification - update the app-server documentation and generated JSON and TypeScript schemas The public implementation contains no model mapping or real model identifier. Tests and protocol examples use generic `current-model` and `faster-model` placeholders only. ## Dependencies - server-side treatment evaluation: https://github.com/openai/openai/pull/1060247 - initial Responses API safety-buffering propagation: https://github.com/openai/codex/pull/29371 - Codex App UI: https://github.com/openai/openai/pull/1057789 ## Validation - Codex API tests: 129 passed - focused Codex core safety-buffering integration test passed - app-server protocol tests passed after regenerating schema fixtures - Clippy fix and repository formatting completed successfully The broader app-server run compiled all changed crates and completed with 1,269 passing tests. Its remaining failures were unrelated environment limitations: macOS sandbox application was denied, one expected test binary was unavailable, and several existing subprocess tests timed out as a result.
Francis Chalissery ·
2026-06-22 19:51:03 -07:00 -
Simplify multi-agent mode controls (#29324)
## Why Multi-agent delegation policy was split across `multiAgentMode`, `features.multi_agent_mode`, and `usage_hint_enabled`. These controls could disagree: a requested mode could be downgraded by the feature flag, and disabling usage hints also disabled mode instructions. Some clients also need multi-agent tools without adding delegation-policy text to model context. The previous two-mode API could not express that directly. ## What changed `multiAgentMode` is now the only live delegation-policy control: | Mode | Behavior | | --- | --- | | `none` | Keep multi-agent tools available without adding mode instructions. | | `explicitRequestOnly` | Only delegate after an explicit user request. | | `proactive` | Delegate when parallel work materially improves speed or quality. | - new threads default to `explicitRequestOnly`; omitting the mode on later turns keeps the current value - thread start, resume, fork, and settings responses always report the concrete current mode instead of `null` - mode selection remains sticky across turns and resume - usage-hint text no longer controls whether mode instructions apply - `features.multi_agent_mode` and `usage_hint_enabled` remain accepted as ignored compatibility settings so existing configs continue to load - app-server documentation and generated schemas describe the three-mode API ## Tests - `just test -p codex-core multi_agent_mode` - `just test -p codex-core multi_agent_v2_config_from_feature_table` - `just test -p codex-core spawn_agent_description` - `just test -p codex-features` - `just test -p codex-app-server-protocol` - `just test -p codex-app-server multi_agent_mode`
jif ·
2026-06-22 10:05:36 +02:00 -
Propagate safety buffering events to app-server clients (#29371)
Responses API safety buffering metadata currently stops at the transport boundary, so app-server clients cannot render the in-progress safety review state. This change: - decodes and deduplicates `safety_buffering` metadata from Responses API SSE and WebSocket events without suppressing the original response event - emits a typed core event containing the requested model plus backend use cases and reasons - forwards that event as `turn/safetyBuffering/updated` through app-server v2 and updates generated protocol schemas - keeps the side-channel event out of persisted rollouts and turn timing This supports the Codex Apps buffering UX and depends on the Responses API backend work in https://github.com/openai/openai/pull/1044569 and https://github.com/openai/openai/pull/1044571. Validation: - focused `codex-core` safety-buffering integration test passes - `cargo check -p codex-core -p codex-app-server -p codex-app-server-protocol` - `just fix -p codex-api -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-rollout -p codex-rollout-trace -p codex-otel` - `just fmt` - broad package test run: 4,430/4,492 passed; 62 unrelated local-environment/concurrency failures involved unavailable test binaries, MCP subprocess setup, and app-server timeouts
Francis Chalissery ·
2026-06-22 03:39:14 +00:00 -
Expose thread-level multi-agent mode (#28792)
## Why Once multi-agent mode can be selected per turn, clients also need to choose the initial selection when creating a thread and observe that selection through lifecycle and settings APIs. The selected value is intentionally distinct from the effective model-visible value: no client selection is represented as `null`, even though an eligible multi-agent v2 turn derives `explicitRequestOnly` as its effective default. ## What changed - Add the optional experimental `thread/start.multiAgentMode` parameter and pass it through thread creation. - Preserve an omitted initial value as an unset selection rather than eagerly storing `explicitRequestOnly`. - Apply an explicit `thread/start` selection to the first turn through the session configuration established at thread creation. - Restore the latest persisted effective mode as the selected baseline on cold resume when rollout history contains one. - Inherit the optional selected mode from a loaded parent when creating related runtime threads. - Return the current selected `multiAgentMode` from `thread/start`, `thread/resume`, `thread/fork`, and thread settings, using `null` when no mode is selected. - Keep lifecycle reporting independent from model capability and feature eligibility; core turn construction remains responsible for calculating and persisting the effective mode. ## Not covered - Clearing an existing loaded-session selection back to unset through `turn/start`; omitted or `null` currently retains the session's selection. - A TUI control, slash command, or `config.toml` preference. ## Verification - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode` The focused app-server coverage verifies explicit `thread/start` initialization, first-turn prompting, nullable reporting for an omitted selection, and retention of selections that are not currently runtime-eligible. ## Stack Stacked on #28685. This PR contains only the thread initialization and lifecycle/settings API layer.
Shijie Rao ·
2026-06-19 10:50:44 +02:00 -
Emit Trusted MCP App Identity on Tool-Call Items (#27132)
## Summary - Add optional `appContext` to app-server MCP tool-call items with trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata. - Preserve that context across tool-call events, persisted history, reconnects, and thread resume. - Keep the deprecated top-level `mcpAppResourceUri` temporarily for client migration. The consumer contract is `{ appContext: { connectorId, linkId, mcpAppResourceUri }, tool }`. ## Validation - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy, release builds, and argument-comment lint. --------- Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>martinauyeung-oai ·
2026-06-18 14:02:54 -07:00 -
unified-exec: retain PathUri in command events (#28780)
## Why App-server must report command events containing foreign-platform paths without changing existing client or rollout path-string formats. ## What changed - retain `PathUri` through exec command begin/end events - convert cwd values to `LegacyAppPathString` at the app-server compatibility boundary - drop command actions with foreign paths and log them - serialize rollout-trace cwd values using their inferred native path representation - restore Wine coverage for retained Windows cwd values and successful completion
Adam Perry @ OpenAI ·
2026-06-18 05:00:04 +00:00 -
[codex] Restore thread recency with compatible migration history (#28671)
## Summary - Revert #28655, restoring the thread `recencyAt` behavior introduced by #27910. - Move `threads_recency_at` to migration 0039 so it no longer collides with `external_agent_config_imports` at version 0038. - Repair databases that already applied the recency migration as version 38 by moving the matching migration-history row to version 39 before SQLx validation. The current version-38 migration can then apply normally. ## Validation - `just test -p codex-state migrations::tests::repairs_recency_migration_that_was_applied_as_version_38` - `just test -p codex-state -p codex-rollout -p codex-thread-store -p codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests could not open the machine's existing read-only incident database at `~/.codex/sqlite/state_5.sqlite`. - `just fix -p codex-state` - `just fmt` - Verified that state migration versions are unique.
Jeremy Rose ·
2026-06-17 18:52:18 +00:00 -
Revert thread recencyAt for sidebar ordering (#28655)
## Why Revert #27910 to remove the newly introduced thread `recencyAt` persistence and API behavior from `main`. ## What changed This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`, including the state migration, thread-store propagation, app-server API surface, generated schemas, and related tests. ## Validation Not run before opening; relying on CI for the initial fast signal.
pakrym-oai ·
2026-06-16 21:39:30 -07:00 -
Add thread recencyAt for sidebar ordering (#27910)
## Summary Add a server-owned `recencyAt` timestamp and `recency_at` thread-list sort key for product recency ordering while preserving the existing meaning of `updatedAt` as the latest persisted thread mutation. This is the server-side alternative to #27697. Rather than narrowing `updatedAt`, clients can sort the sidebar by `recency_at` and continue treating `updatedAt` as mutation time. Paired Codex Apps PR: [openai/openai#1024599](https://github.com/openai/openai/pull/1024599) ## Contract - `recencyAt` initializes when a thread is created. - A turn start advances `recencyAt` monotonically. - Commentary, agent output, tool results, token/accounting updates, turn completion, archive, unarchive, resume, and generic metadata writes do not advance it. - `updatedAt` retains its existing behavior and continues to advance for persisted thread mutations. - Current servers populate `recencyAt`; the response field is optional in generated TypeScript so clients connected to older servers can fall back to `updatedAt`. - Filesystem-only fallback uses existing updated/mtime ordering when SQLite is unavailable. ## Persistence and compatibility Migration 0038 adds second- and millisecond-precision recency columns, backfills them from the existing updated timestamp, creates list indexes, and includes an insert trigger so older binaries writing to a migrated database seed recency without causing later mutations to advance it. Generic metadata upserts preserve existing recency values. Turn-start updates use a dedicated monotonic touch, and process-local allocation keeps millisecond cursor values unique. State DB list, search, read, filtered-list repair, rollout fallback propagation, and app-server conversions all carry the new field. ## API `Thread` responses include: ```ts recencyAt?: number ``` `thread/list` and `thread/search` accept: ```json { "sortKey": "recency_at" } ``` Generated TypeScript and JSON schemas are included. ## Validation - `just test -p codex-state` — 146 passed - `just test -p codex-rollout` — 69 passed - `just test -p codex-thread-store` — 81 passed - `just test -p codex-app-server-protocol` — 231 passed - Focused app-server list ordering, response mapping, archive/unarchive, and resume lifecycle tests passed - Scoped `just fix` for state, rollout, thread-store, app-server-protocol, and app-server - `just fmt` - `git diff --check` - Independent correctness, simplicity, elegance, security, and test-quality reviews; actionable ordering, lifecycle, query-projection, and timestamp-uniqueness findings were addressed
Jeremy Rose ·
2026-06-16 17:06:22 -07:00 -
Clarify model-generated and legacy app path types (#28577)
## Why `ApiPathString` kind of implies that it can be used anywhere we pull a path out of JSON, but it's not really appropriate for tool arguments when the model might generate relative paths. Prefer `String` for model-generated paths and we can handle the conversion per feature for now and define a shared abstraction later if it makes sense. # What Rename `ApiPathString` to `AppLegacyPathString` to clarify its role. Expand the `path-types` skill to tell the model to leave tool args as bare strings.
Adam Perry @ OpenAI ·
2026-06-16 20:47:43 +00:00 -
[codex] Record external agent import results (#28396)
## Summary - restore `externalAgentConfig/import/progress` notifications while keeping `externalAgentConfig/import/completed` as the must-deliver event - persist completed external-agent config imports in state DB by `importId`, including concrete success/failure details for config, AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and sessions - add `externalAgentConfig/import/readHistories` so clients can recover persisted import results after missing the live completion notification - include `errorType` on import failures in protocol responses/notifications and persisted DB JSON so future code can classify failures without another wire/storage shape change ## Validation - `git diff --check` - `just test -p codex-state external_agent_config_imports` - `just test -p codex-app-server-protocol` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details just test -p codex-app-server external_agent_config_import_sends_completion_notification_for_sync_only_import` Also ran earlier broader checks before publishing: - `just test -p codex-state` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite just test -p codex-app-server external_agent_config` - `just test -p codex-external-agent-migration`
charlesgong-openai ·
2026-06-15 23:17:24 -07:00 -
[codex] Add interruptible sleep tool (#28429)
## Why Models sometimes need to pause briefly while waiting for external work, but using a shell command for that delay ties the wait to a process and does not naturally resume when new turn input arrives. ## What changed - add a built-in `sleep` tool behind the under-development `sleep_tool` feature - accept a bounded `duration_ms` argument, matching the millisecond convention used by unified exec - end the sleep early when either steered user input or mailbox input arrives - include elapsed wall-clock time in completed and interrupted outputs - emit a dedicated core `SleepItem` through `item/started` and `item/completed` - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain it in reconstructed thread history - regenerate the configuration schema for the new feature flag - regenerate app-server JSON and TypeScript schema fixtures ## Test plan - `just test -p codex-core sleep_tool_follows_feature_gate` - `just test -p codex-core any_new_input_interrupts_sleep` - `just test -p codex-app-server-protocol` - `just test -p codex-app-server sleep_emits_started_and_completed_items`
pakrym-oai ·
2026-06-15 21:39:21 -07:00 -
Use ApiPathString in app-server filesystem permission paths (#28367)
## Why Clients running an app-server on one OS and an exec-server on another OS need to be able to pass sandbox config to app-server that refers to resources on the executor's foreign OS. ## What `AbsolutePathBuf` can't represent these paths and we don't want users to be exposed to `PathUri` yet, so this moves the public app-server API to be expressed in terms of `ApiPathString`. Stacked on #28165. - change app-server v2 filesystem permission paths, including legacy read/write roots, to `ApiPathString` - localize API paths through `PathUri` when converting into the current native core permission types - make path-bearing permission conversions fallible and surface localization failures instead of silently treating malformed grants as ordinary denials - propagate conversion failures through app-server and TUI approval handling - regenerate the app-server JSON and TypeScript schemas - leave migration TODOs on native-path conversions so they can be removed once core permission paths use `PathUri`
Adam Perry @ OpenAI ·
2026-06-15 19:25:54 -07:00 -
[codex] Add external agent import result accounting (#28008)
## Why External-agent imports can complete synchronously or continue in the background for plugins/sessions. Clients need a stable import id to correlate the immediate response with the eventual completion notification, and the completion payload needs enough accounting to show which artifact types succeeded or failed without hiding partial failures. ## What Changed - `externalAgentConfig/import` now returns an `importId`; `externalAgentConfig/import/completed` includes the same `importId` plus type-level `itemResults`. - Completed `itemResults` report `successCount`, `errorCount`, `successes`, and `rawErrors` for each migrated item type. - Added protocol/schema/TypeScript types for import successes, raw errors, and type-level results. No progress notification is included in the final PR. - `ExternalAgentConfigService::import` now returns an outcome object with synchronous item results and pending plugin imports. - Plugin import outcomes track succeeded/failed marketplaces, plugin ids, and raw errors. Plugin failures can be reported in completed accounting while later migration items continue. - Non-plugin synchronous import failures still fail the request, so invalid config/skills-style failures are not reported as a successful import response. - Session imports now return item results. Successful imports include the source session path and imported thread id; prepare, persist, ledger, and source-validation failures become raw errors in completion accounting where the import can continue. - The request processor generates the `importId`, aggregates synchronous results with background plugin/session results, and sends a single completed notification when all selected work is done. - App-server docs and generated schema fixtures were updated for the new response/completed payload shapes. ## Validation - `just test -p codex-app-server-protocol` - `just test -p codex-app-server-client event_requires_delivery` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-sync-error just test -p codex-app-server external_agent_config_import_returns_error_for_failed_sync_import` - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-external-agent just test -p codex-app-server external_agent_config` Note: local sandbox validation used `CODEX_SQLITE_HOME` because the default sqlite state path is read-only in this environment.
charlesgong-openai ·
2026-06-15 13:25:42 -07:00 -
feat: add Bedrock API key as a managed auth mode (#27443)
## Why Codex needs to manage Amazon Bedrock API key credentials through the existing auth lifecycle instead of introducing a separate auth manager or provider-specific credential file. Treating Bedrock API key login as a primary auth mode gives it the same persistence, keyring, reload, and logout behavior as the existing OpenAI API key and ChatGPT modes. The credential is valid only for the `amazon-bedrock` model provider. OpenAI-compatible providers must reject this auth mode rather than treating the Bedrock key as an OpenAI bearer token. ## What changed - Added `bedrockApiKey` as an app-server `AuthMode` and `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode. - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or the configured keyring backend. - Added `login_with_bedrock_api_key(...)`, parallel to `login_with_api_key(...)`, which replaces the current stored login with Bedrock credentials. - Reused generic auth reload and logout behavior instead of adding a Bedrock-specific auth manager or logout path. - Updated login restrictions, status reporting, diagnostics, telemetry classification, generated app-server schemas, and auth fixtures for the new mode. - Added explicit errors when Bedrock API key auth is selected with an OpenAI-compatible model provider. This PR establishes managed storage and auth-mode behavior. Routing the managed key and region into Amazon Bedrock requests will be in follow-up PRs.
Celia Chen ·
2026-06-10 20:42:38 -07:00 -
Add app-server
thread/deleteAPI (#25018)## Why Clients can archive and unarchive threads today, but there is no app-server API for permanently removing a thread. Deletion also needs to cover the full session tree: deleting a main thread should remove spawned subagent threads and the related local metadata instead of leaving orphaned rollout files, goals, or subagent state behind. ## What - Adds the v2 `thread/delete` request and `thread/deleted` notification, with the response shape kept consistent with `thread/archive`. - Implements local hard delete for active and archived rollout files. - Deletes the requested thread's state DB row as the commit point, then best-effort cleans associated state including spawned descendants, goals, spawn edges, logs, dynamic tools, and agent job assignments. - Updates app-server API docs and generated protocol schema/TypeScript fixtures.
Eric Traut ·
2026-06-10 11:22:12 -07:00 -
[codex-analytics] add extensible feature thread sources (#27063)
## Why - `ThreadSource` currently defines a closed set of core-owned values - Product features also create threads for background or scheduled work - Adding every product-specific value to the core enum would require repeated `codex-rs` protocol changes - Feature-backed values let product callers provide precise attribution while preserving the existing core classifications ## What Changed - Adds `ThreadSource::Feature(String)` for app-owned thread source values - Represents all app-server v2 thread sources as scalar strings, so a feature source is supplied as `"automation"` - Persists and emits the feature's plain string label, so `"automation"` produces `thread_source="automation"` in analytics - Keeps `user`, `subagent`, and `memory_consolidation` as explicit core-owned values and regenerates the app-server schemas and TypeScript bindings ## Verification - `just write-app-server-schema` - `cargo check --workspace` - `just test -p codex-protocol feature_thread_source_serializes_as_its_app_owned_label` - `just test -p codex-app-server-protocol thread_sources_round_trip_as_scalar_labels` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `just fmt`
marksteinbrick-oai ·
2026-06-09 12:27:10 -07:00 -
multi-agent: add path-based v2 activity tracking (#27007)
## Why Multi-agent v2 identifies agents by canonical paths, but its tool handlers still emitted the larger legacy collaboration begin/end events built around nickname and role metadata. App-server, rollout-trace, analytics, and TUI consumers therefore lacked one compact path-based completion signal that behaved consistently across live events and replay. The TUI also needs a bounded `/agent` status surface for v2 agents. It should use recent local activity for previews, refresh liveness without loading full histories, and keep the legacy picker available when no path-backed v2 agent is known. ## What changed - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and `interrupt_agent` legacy lifecycle emissions with a success-only `SubAgentActivity` event. The event records the tool call ID, occurrence time, affected thread, canonical agent path, and `started`, `interacted`, or `interrupted` kind. - Expose the activity as a completion-only app-server v2 `subAgentActivity` thread item in live notifications and reconstructed history, regenerate the protocol schemas, and count it in sub-agent tool analytics. - Track canonical paths from live activity and loaded-thread metadata in the TUI, and render the activity in live and replayed transcripts. - Make `/agent` list running path-backed agents with summaries from bounded local event buffers. Each summary is capped at 240 graphemes, the scan is capped at six recent items, only the last three wrapped lines are shown, and command output is omitted. Liveness falls back to metadata-only `thread/read` when local turn state is unavailable. - Persist the activity as a terminal rollout-trace runtime payload and reduce it to the corresponding spawn, send, follow-up, or close interaction edge. `interrupt_agent` is classified as a close-edge operation. - Preserve the legacy picker when no path-backed v2 agent is known. ## Compatibility App-server v2 clients that consumed `collabAgentToolCall` begin/end pairs for these tools must handle the new completion-only `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged. ## Screenshot <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47" src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d" /> ## Testing - `just test -p codex-app-server-protocol` - `just test -p codex-rollout-trace` - Added focused coverage for activity analytics, terminal trace serialization, spawn-edge reduction, `interrupt_agent` classification, TUI status rendering without aggregated command output, and clearing stale running state after a completed turn.
jif ·
2026-06-09 12:14:48 +02:00 -
fix(tui): scope MCP startup status by thread (#26639)
## Why MCP startup failures from spawned subagents were rendered as global notifications, so a child thread's failure could pollute the visible parent transcript. Routing the notification to the child exposed two related replay problems: session refresh could discard the buffered event, and a newly created child `ChatWidget` did not know the expected MCP server set, which could leave its startup spinner running after every server had settled. MCP startup diagnostics should remain visible in the thread that owns the startup without affecting other transcripts. The protocol also needs to support a future app-scoped MCP lifecycle where startup is not owned by any thread. ## Reported Behavior The [originating Slack report](https://openai.slack.com/archives/C08JZTV654K/p1780604538859939) called out that using subagents could turn MCP startup failures into a wall of yellow CLI warnings because repeated failures were not deduplicated. The intended behavior is for those diagnostics to remain visible once in the thread that owns the startup, without polluting the parent transcript. ## What Changed - add nullable `threadId` ownership to `mcpServer/startupStatus/updated` - populate it from the app-server conversation ID for the current thread-scoped lifecycle and regenerate the protocol schema and TypeScript artifacts - treat a missing or null `threadId` as app-scoped without injecting it into the active chat transcript - route and buffer thread-owned MCP startup notifications by thread in the TUI - preserve buffered MCP startup events across child session refresh - seed expected MCP servers before replaying a thread snapshot so startup reaches its terminal state - suppress an identical repeated failure warning for the same server within one startup round The owning thread still renders the detailed failure and final `MCP startup incomplete (...)` summary. ## How to Test 1. Configure an optional MCP server named `smoke` that exits during initialization. 2. Launch the TUI with multi-agent support enabled. 3. Confirm the main thread's own startup failure renders one detailed `smoke` warning and one incomplete-startup summary. 4. Spawn exactly one subagent. 5. Confirm the parent transcript does not receive the subagent's MCP startup failure. 6. Switch to the subagent thread and confirm it contains exactly one detailed `smoke` failure and one incomplete-startup summary. 7. Confirm the subagent's MCP startup spinner disappears and the thread remains usable. 8. Switch between the parent and subagent and confirm the warnings neither move nor duplicate. Targeted tests: - `just test -p codex-app-server-protocol` - `just test -p codex-app-server thread_start_emits_mcp_server_status_updated_notifications` - `just test -p codex-tui mcp_startup` The parent/child behavior and spinner completion were also exercised manually in tmux. `just argument-comment-lint` was attempted but blocked by an unrelated local Bazel LLVM empty-glob failure; touched Rust callsites were inspected manually.
Felipe Coury ·
2026-06-07 20:12:05 -07:00 -
[codex-rs] support v2 personal access tokens (#25731)
## Summary - add v2 personal access token support for `codex login --with-access-token` and `CODEX_ACCESS_TOKEN` - classify opaque `at-` tokens separately from legacy Agent Identity JWTs - hydrate required ChatGPT account metadata through AuthAPI `/v1/user-auth-credential/whoami` - use PATs directly as bearer tokens while preserving existing ChatGPT account surfaces - expose PAT-backed auth as the explicit `personalAccessToken` app-server auth mode ## Implementation PAT auth is intentionally small and stateless. Loading a PAT performs one AuthAPI metadata request, stores the hydrated metadata in the in-memory auth object, and redacts the secret from debug output. Legacy Agent Identity JWT handling remains unchanged. The shared access-token classifier lives in a private neutral module because it dispatches between both credential types. PAT hydration fails closed when AuthAPI omits any required metadata, including email. Hydrated metadata is intentionally not persisted: startup performs a live `whoami` preflight so revoked tokens or changed account metadata are not accepted from a stale cache. ## Workspace restriction scope This change intentionally does **not** apply `forced_chatgpt_workspace_id` to PAT authentication. The setting is a client-side config guardrail, not an authorization boundary, and PAT does not currently require workspace-ID parity. The PAT login and `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without threading workspace-restriction state through access-token loading. Existing workspace checks for non-PAT auth remain on their established paths. ## App-server compatibility The public app-server `AuthMode` is shared across v1 and v2, and PAT-backed auth reports `personalAccessToken` through both APIs. Following human review, this intentionally removes the temporary v1 compatibility mapping that reported PATs as `chatgpt`; the deprecated v1 API is kept in parity with v2 rather than maintaining a separate closed enum. Clients with exhaustive auth-mode handling in either API version must add the new case and should generally treat it as ChatGPT-backed unless they need PAT-specific behavior. The v1 auth-status response still omits the raw PAT when `includeToken` is requested because that response cannot carry the account metadata needed to reuse the credential safely. Persisted PAT auth also omits the new enum value so older Codex builds can deserialize `auth.json` and infer PAT auth from the credential field after a rollback. ## Validation Latest review-fix validation: - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed) - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed) - `CARGO_INCREMENTAL=0 just test -p codex-cli stored_auth_validation_handles_personal_access_token` - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226 passed) - `CARGO_INCREMENTAL=0 just test -p codex-models-manager refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth` - `CARGO_INCREMENTAL=0 just test -p codex-tui existing_non_oauth_chatgpt_login_counts_as_signed_in` - `CARGO_INCREMENTAL=0 just fix -p codex-login -p codex-app-server-protocol -p codex-models-manager -p codex-tui -p codex-cli` - `just fmt` - `git diff --check` The broader `codex-tui` suite previously compiled and ran 2,834 tests. Three unrelated environment-sensitive guardian/IDE-socket tests failed after retries; the PAT-relevant TUI coverage passed.
cooper-oai ·
2026-06-05 17:36:18 -07:00 -
[codex] Forward turn moderation metadata through app-server (#25710)
## Why First-party backends can supply turn-scoped moderation metadata that app-server clients need for client-side presentation. Exposing this as an experimental typed notification lets opted-in clients consume it without interpreting raw Responses API events. ## What changed - forward `response.metadata.openai_chatgpt_moderation_metadata` from Responses API SSE and WebSocket streams as turn-scoped moderation metadata - emit the experimental app-server v2 `turn/moderationMetadata` notification with `{ threadId, turnId, metadata }` - add app-server integration coverage for the typed moderation metadata notification ## Testing - `just test -p codex-core build_ws_client_metadata_includes_window_lineage_and_turn_metadata` - `just test -p codex-core` (fails locally: 46 failures and 1 timeout, primarily missing `test_stdio_server` and shell snapshot timeouts) - `just test -p codex-app-server-protocol` - `just test -p codex-app-server turn_moderation_metadata_emits_typed_notification_v2` - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed, and 5 timed out; failures are in existing environment-sensitive tests, primarily because nested macOS `sandbox-exec` is not permitted) - `just write-app-server-schema --experimental --schema-root /tmp/codex-app-server-schema-experimental`carlc-oai ·
2026-06-05 02:41:06 -07:00 -
[codex] Support model-defined reasoning efforts (#26444)
## Summary - accept non-empty model-defined reasoning effort values while preserving built-in effort behavior - propagate the non-Copy effort type through core, app-server, TUI, telemetry, and persistence call sites - preserve string wire encoding and expose an open-string schema for clients - update model selection and shortcut behavior for model-advertised effort values ## Root cause `ReasoningEffort` gained a string-backed custom variant, so it could no longer implement `Copy` or rely on derived closed-enum serialization. Existing consumers still moved effort values from shared references and assumed a fixed built-in value set. ## Validation - `just fmt` - Local tests and compilation were not run per request; relying on CI.
Ahmed Ibrahim ·
2026-06-04 13:36:24 -07:00 -
feat: show enterprise monthly credit limits in status (#24812)
## Summary Enterprise users can have an effective monthly credit limit, but Codex `/status` currently drops that metadata from the account-usage response. This change adds the optional `spend_control.individual_limit` projection to the existing rate-limit snapshot flow. The backend client reads the monthly limit, app-server exposes it as `individualLimit`, and the TUI renders a `Monthly credit limit` row through the existing progress-bar renderer. When the backend does not return an effective monthly limit, existing rate-limit behavior is unchanged. ## Existing backend state The account-usage backend already returns the effective monthly limit and current usage together: ```json { "spend_control": { "reached": false, "individual_limit": { "limit": "25000", "used": "8000", "remaining": "17000", "used_percent": 32, "remaining_percent": 68, "reset_after_seconds": 86400, "reset_at": 1778137680 } } } ``` Before this change, Codex projected rolling `primary` and `secondary` windows plus `credits`. It ignored `spend_control.individual_limit`, so app-server clients and `/status` could not render the monthly cap. The updated flow is: ```text account usage backend -> backend-client reads spend_control.individual_limit -> existing rate-limit snapshot carries optional individual_limit -> app-server exposes optional individualLimit -> TUI renders Monthly credit limit ``` ## App-server contract `account/rateLimits/read` and sparse `account/rateLimits/updated` notifications now include an additive nullable `rateLimits.individualLimit` field: ```json { "individualLimit": { "limit": "25000", "used": "8000", "remainingPercent": 68, "resetsAt": 1778137680 } } ``` In an `account/rateLimits/read` response, `null` means no monthly limit is available. `account/rateLimits/updated` remains a sparse rolling notification: clients merge available values into their most recent `account/rateLimits/read` snapshot or refetch. Nullable account metadata in a rolling notification does not clear a previously observed value. ## Design decisions - Extend the existing rate-limit snapshot instead of introducing a separate request or wire-level update protocol. - Keep the Codex projection narrow: `/status` needs the effective limit, current usage, remaining percentage, and reset timestamp. - Render the monthly row through the existing progress-bar renderer, with one optional detail line for `8,000 of 25,000 credits used`. - Keep the backend response optional so existing accounts and older usage states preserve their current behavior. - Preserve cached monthly metadata when sparse rolling notifications omit it. Live account-usage reads remain authoritative and can clear a removed limit. ## Visual evidence ```text Monthly credit limit: [██████████████░░░░░░] 68% left (resets 07:08 on 7 May) 8,000 of 25,000 credits used ``` Snapshot: `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap` ## Testing Tests: generated app-server schema verification, protocol tests, backend-client tests, app-server integration coverage, TUI snapshot coverage, formatting, and workspace lint cleanup.efrazer-oai ·
2026-06-01 21:25:42 -07:00 -
store and expose parent_thread_id on Threads (#25113)
## Why This PR https://github.com/openai/codex/pull/24161#discussion_r3325692763 revealed a subagent data modeling issue, where we overloaded `forked_from_id` to also mean `parent_thread_id`. That's incorrect since guardian and review subagents can be a subagent and NOT fork the main thread's history. The solution here is to explicitly store a new `parent_thread_id` on `SessionMeta`, alongside `forked_from_id` which already exists. While we're at it, also expose it in the app-server protocol on the `Thread` object. A thread->subagent relationship and a fork of thread history are orthogonal concepts. ## What Changed - Added top-level `parent_thread_id` persistence on `SessionMeta` and runtime/session plumbing through `SessionConfiguredEvent`, `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`, `TurnContext`, and `ModelClient`. - Made turn metadata, request headers, analytics, and subagent-start events read the separate runtime/top-level parent field instead of deriving general parent lineage from `SessionSource` or `forked_from_thread_id`. - Passed parent lineage separately at delegated subagent, review, guardian, agent-job, and multi-agent spawn construction sites; copied-history fork lineage remains derived only from `InitialHistory`. - Persisted and exposed parent lineage through rollout/thread-store projections and app-server v2 `Thread.parentThreadId`. - Updated app-server README text and regenerated app-server schema fixtures for the additive `parentThreadId` response field.
Owen Lin ·
2026-06-01 04:33:20 +00:00 -
Add cloud-managed config layer support (#24620)
## Summary PR 3 of 5 in the cloud-managed config client stack. Adds enterprise-managed cloud config as a first-class config layer source. The layer metadata is preserved through config loading, diagnostics, debug output, hook attribution, and app-server protocol surfaces. ## Details - Enterprise-managed config becomes a normal config layer source with backend-supplied `id` and display `name` attached for provenance. - These layers are designed to behave like non-file managed config: they can surface syntax/type diagnostics by layer name even though there is no physical config file. - Relative path settings are resolved from a stored config base so cloud-delivered config remains consistent with existing MDM-delivered config semantics. - Hook attribution distinguishes config-delivered hooks from requirements-delivered hooks via `HookSource::CloudManagedConfig`. - This remains pull-based and snapshot-oriented; the PR adds layer identity/diagnostics, not dynamic reload behavior. ## Validation Validated through the targeted stack checks after rebasing onto current `main`: - Rust crate tests for config/hooks/cloud-config/backend-client/app-server-protocol - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle` tests - Python generated-file contract test - `cargo shear --deny-warnings` - Targeted `argument-comment-lint` for config/hooks
joeflorencio-openai ·
2026-05-31 15:54:31 -07:00 -
[codex] Add user input client ids (#24653)
## Summary Adds an optional `clientId` field to app-server v2 `UserInput` and carries it through the core `UserInput` model so clients can correlate echoed user input items without relying on payload equality. ## Details - Adds `client_id: Option<String>` to core `UserInput` variants. - Exposes the v2 app-server field as `clientId` on the wire and in generated TypeScript. - Preserves the id when converting between app-server v2 and core protocol types. - Regenerates app-server schema fixtures. ## Validation - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-protocol` - `git diff --check`
Alexi Christakis ·
2026-05-28 14:54:39 -07:00 -
Restore legacy image detail values (#24644)
## Why Older persisted rollouts can contain `input_image.detail` values of `auto` or `low` from before `ImageDetail` was narrowed to `high`/`original`. Current deserialization rejects those values, which can make resume skip later compacted checkpoints and reconstruct an oversized raw suffix before the next compaction attempt. Confirmed Sentry reports fixed by this compatibility path: - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/) - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/) - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/) - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/) ## Background [openai/codex#20693](https://github.com/openai/codex/pull/20693) added image-detail plumbing for app-server `UserInput` so input images could explicitly request `detail: original`. The Slack discussion behind that PR was about ScreenSpot / bridge evals where user input images were resized, while tool output images already had MCP/code-mode ways to request image detail. In review, the intended new API surface was narrowed to `high` and `original`: default to `high`, allow `original` when callers need unchanged image handling, and avoid encouraging new `auto` or `low` usage. That policy still makes sense for newly emitted values. The missing compatibility piece is persisted history. Older rollouts can already contain `auto` and `low`, and resume reconstructs typed history by deserializing those rollout records. Rejecting old values at that boundary causes valid compacted checkpoints to be skipped. This PR restores `auto` and `low` as real variants so old records deserialize and round-trip without being rewritten as `high`, while product paths can continue to default to `high` and avoid emitting `auto` for new behavior. ## What changed - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class protocol values. - Preserved `auto`/`low` through rollout deserialization, MCP image metadata, code-mode image output, and schema/type generation. - Kept local image byte handling conservative: only `original` switches to original-resolution loading; `auto`/`low`/`high` continue through the resize-to-fit path while retaining their detail value. - Added regression coverage for enum round-tripping and code-mode `low` detail handling. ## Testing - `just write-app-server-schema` - `just test -p codex-protocol` - `just test -p codex-tools` - `just test -p codex-code-mode` - `just test -p codex-app-server-protocol` - `just test -p codex-core suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata` - `just test -p codex-core suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper` - Loaded broken rollouts on local fixed builds, and started/completed new turns. I also attempted `just test -p codex-core`; the local broad run did not finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed out. The failures were broad timeout/deadline failures across unrelated areas; targeted changed-path core tests above passed.
rhan-oai ·
2026-05-26 16:24:33 -07:00 -
app-server: drop legacy profile config surface (#24067)
## Why Legacy `[profiles.<name>]` config tables and the legacy `profile` selector are being retired in favor of profile files selected with `--profile <name>`. After #23886 removed the CLI-side legacy profile plumbing, the app-server config surface still exposed those fields and still carried conversion code for the old protocol shape. ## What changed - Remove `profile`, `profiles`, and `ProfileV2` from the app-server config protocol/schema output so `config/read` no longer returns legacy profile config. - Drop the old v1 `UserSavedConfig` profile conversion path from `config`. - Reject new app-server config writes under `profiles.*` with the same migration direction used for `profile`, while still allowing callers to clear existing legacy profile tables. - Refresh app-server config coverage and the experimental API README example around the remaining `Config` nesting path. ## Verification - Added config-manager coverage that `config/read` omits legacy profile config, `profiles.*` writes are rejected, and existing legacy profile tables can still be cleared. - Updated the v2 config RPC test to cover the rejected `profiles.*` batch-write path.
jif-oai ·
2026-05-22 19:41:39 +02:00 -
[codex] Add rollout-backed thread content search (#23519)
## Summary - add experimental `thread/search` for local rollout-backed thread search using `rg` over JSONL rollouts - return search-specific result rows with optional previews instead of storing preview data on `StoredThread` or ordinary `Thread` responses - keep `thread/list` separate from full-content search and document the new app-server surface ## Testing - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server thread_search_returns_content_and_title_matches -- --nocapture`
Francis Chalissery ·
2026-05-21 11:52:24 -07:00 -
[codex] Add plugin id to MCP tool call items (#23737)
Add owning plugin id to MCP tool call items so we can better filter them at plugin level. ## Summary - add optional `plugin_id` to MCP tool-call items and legacy begin/end events - propagate plugin metadata into emitted core items and app-server v2 `ThreadItem::McpToolCall` - preserve plugin ids through app-server replay/redaction paths and regenerate v2 schema fixtures ## Testing - `just write-app-server-schema` - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-protocol -p codex-app-server-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib` - `cargo check -p codex-tui --tests` - `cargo check -p codex-app-server --tests` - `git diff --check` ## Notes - `just fix -p codex-core` completed with two non-fatal `too_many_arguments` warnings on the touched MCP notification helpers. - A broader `cargo test -p codex-core` run passed core unit tests, then hit shell/sandbox/snapshot failures in the integration target. - A broader app-server downstream run hit the existing `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack overflow; `cargo test -p codex-exec` also hit the existing sandbox expectation mismatch in `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
Matthew Zeng ·
2026-05-20 17:02:10 -07:00 -
Add SubagentStop hook (#22873)
# What <img width="1792" height="1024" alt="image" src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e" /> `SubagentStop` runs when a thread-spawned subagent turn is about to finish. Thread-spawned subagents use `SubagentStop` instead of the normal root-agent `Stop` hook. Configured handlers match on `agent_type`. Hook input includes the normal stop fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. - `agent_transcript_path`: the child subagent transcript path. - `transcript_path`: the parent thread transcript path. - `last_assistant_message`: the final assistant message from the child turn, when available. - `stop_hook_active`: `true` when the child is already continuing because an earlier stop-like hook blocked completion. `SubagentStop` shares the same completion-control semantics as `Stop`, scoped to the child turn: - No decision allows the child turn to finish. - `decision: "block"` with a non-empty `reason` records that reason as hook feedback and continues the child with that prompt. - `continue: false` stops the child turn. If `stopReason` is present, Codex surfaces it as the stop reason. # Lifecycle Scope Only thread-spawned subagents run `SubagentStop`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `Stop` hooks and do not run `SubagentStop`. This avoids exposing synthetic matcher labels for internal implementation paths. # Stack 1. #22782: add `SubagentStart`. 2. This PR: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.
Abhinav ·
2026-05-20 14:59:41 -07:00 -
feat(permissions): resolve permission profile inheritance (#22270)
## Stack This is the foundation PR for the permission-profile inheritance stack. - This PR adds config-level `extends` resolution and merge semantics. - Follow-up: #23705 applies resolved profiles at runtime and updates the active-profile protocol surfaces. ## Why Permission profiles are starting to carry enough policy that copy-pasting near-identical definitions becomes hard to review and easy to drift. Before the runtime can consume inherited profiles, the config layer needs one explicit resolver that can merge parent chains and reject unsafe or invalid inheritance shapes. ## What changed - Add `extends` to permission-profile TOML and resolve parent chains in inheritance order. - Merge inherited profile TOML with the existing config merge behavior while preserving the permission-specific normalization needed for network domain keys. - Keep parent descriptions out of resolved child profiles and record inherited profile names separately for downstream consumers. - Reject undefined parents, unsupported built-in parents, and inheritance cycles with targeted errors. - Cover resolver behavior with TOML fixture tests and refresh the generated config schema. ## Validation - `cargo test -p codex-config` - `cargo test -p codex-core permissions_profiles_`
viyatb-oai ·
2026-05-20 20:12:07 +00:00 -
Add thread/settings/update app-server API (#23502)
## Why App-server clients need a way to update a thread's next-turn settings without starting a turn, adding transcript content, or waiting for turn lifecycle events. This gives settings UI a direct path for durable thread settings while clients observe the eventual effective state through a notification. This is a simplified rework of PR https://github.com/openai/codex/pull/22509. In particular, it changes the `thread/settings/update` api to return immediately rather than waiting and returning the effective (updated) thread settings. This makes the new api consistent with `turn/start` and greatly reduces the complexity of the implementation relative to the earlier attempt. ## What Changed - Adds experimental `thread/settings/update` with partial-update request fields and an empty acknowledgment response. - Adds experimental `thread/settings/updated`, carrying full effective `ThreadSettings` and scoped by `threadId` to subscribed clients for the affected thread. - Shares durable settings validation with `turn/start`, including `sandboxPolicy` plus `permissions` rejection and `serviceTier: null` clearing. - Emits the same settings notification when `turn/start` overrides change the stored effective thread settings. - Regenerates app-server protocol schema fixtures and updates `app-server/README.md`.
Eric Traut ·
2026-05-20 11:03:20 -07:00 -
Add SubagentStart hook (#22782)
# What `SubagentStart` runs once when Codex creates a thread-spawned subagent, before that child sends its first model request. Thread-spawned subagents use `SubagentStart` instead of the normal root-agent `SessionStart` hook. Configured handlers match on the subagent `agent_type`, using the same value passed to `spawn_agent`. When no agent type is specified, Codex uses the default agent type. Hook input includes the normal session-start fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. `SubagentStart` may return `hookSpecificOutput.additionalContext`. That context is added to the child conversation before the first model request. # Lifecycle Scope Only thread-spawned subagents run `SubagentStart`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `SessionStart` hooks and do not run `SubagentStart`. This avoids exposing synthetic matcher labels for internal implementation paths. Also the `SessionStart` hook no longer fires for subagents, this matches behavior with other coding agents' implementation # Stack 1. This PR: add `SubagentStart`. 2. #22873: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.
Abhinav ·
2026-05-19 12:45:08 -07:00 -
Make
denycanonical for filesystem permission entries (#23493)## Why Filesystem permission profiles used `none` for deny-read entries, which is less direct than the action the entry actually represents. This change makes `deny` the canonical filesystem permission spelling while preserving compatibility for older configs that still send `none`. ## What changed - rename `FileSystemAccessMode::None` to `Deny` - serialize and generate schemas with `deny` as the canonical value - retain `none` only as a legacy input alias for temporary config compatibility - update filesystem glob diagnostics and regression coverage to use the canonical spelling - refresh config and app-server schema fixtures to match the new wire shape ## Validation - `cargo test -p codex-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core config_toml_deserializes_permission_profiles --lib` - `cargo test -p codex-core read_write_glob_patterns_still_reject_non_subpath_globs --lib` Earlier in the session, a broad `cargo test -p codex-core` run reached unrelated pre-existing failures in timing/snapshot/git-info tests under this environment; the targeted surfaces touched by this PR passed cleanly.
viyatb-oai ·
2026-05-19 11:03:47 -07:00 -
goal: pause continuation loops on usage limits and blockers (#23094)
Addresses #22833, #22245, #23067 ## Why `/goal` can keep synthesizing turns even when the next turn cannot make meaningful progress. Hard usage exhaustion can replay failing turns, and repeated permission or external-resource blockers can keep burning tokens while waiting for user or system intervention. ## What changed - Add resumable `blocked` and `usageLimited` goal states. As with `paused`, goal continuation stops with these states. - Move to `usageLimited` after usage-limit failures. - Allow the built-in `update_goal` tool to set `blocked` only under explicit repeated-impasse guidance. Updated goal continuation prompt to specify that agent should use `blocked` only when it has made at least three attempts to get past an impasse. Most of the files touched by this PR are because of the small app server protocol update. ## Validation I manually reproduced a number of situations where an agent can run into a true impasse and verified that it properly enters `blocked` state. I then resumed and verified that it once again entered `blocked` state several turns later if the impasse still exists. I also manually reproduced the usage-limit condition by creating a simulated responses API endpoint that returns 429 errors with the appropriate error message. Verified that the goal runtime properly moves the goal into `usageLimited` state and TUI UI updates appropriately. Verified that `/goal resume` resumes (and immediately goes back into `ussageLImited` state if appropriate). ## Follow-up PRs Small changes will be needed to the GUI clients to properly handle the two new states.
Eric Traut ·
2026-05-18 11:28:53 -07:00 -
Preserve image detail in app-server inputs (#20693)
## Summary - Add optional image detail to user image inputs across core, app-server v2, thread history/event mapping, and the generated app-server schemas/types. - Preserve requested detail when serializing Responses image inputs: omitted detail stays on the existing `high` default, while explicit `original` keeps local images on the original-resolution path. - Support `high`/`original` consistently for tool image outputs, including MCP `codex/imageDetail`, code-mode image helpers, and `view_image`.
Curtis 'Fjord' Hawthorne ·
2026-05-15 15:04:04 -07:00 -
feat(app-server): update remote control APIs for better UX (#22877)
## Why To help improve `codex remote-control` CLI UX which I plan to do in a followup, this PR adds `server-name` to the various remote control APIs: - `remoteControl/enable` - `remoteControl/disable` - `remoteControl/status/changed` Also, add a `remoteControl/status/read` API. This will be helpful in the Codex App.
Owen Lin ·
2026-05-15 14:33:24 -07:00 -
feat: Use installation ID in remote enrollments (#21662)
* Pass installation ID for storage on enrollments server for deduping/grouping multiple appservers per installation * Pass installation ID in remoteControl/status/changed events
David de Regt ·
2026-05-08 17:54:01 +00:00