164 Commits

  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Expose MCP app identity in app context (#29934)
    ## Why
    
    MCP tool-call events need to expose trusted app identity and action
    metadata directly so v2 clients do not have to infer it from tool names
    or resource URIs.
    
    ## What changed
    
    - Add optional `appName`, `templateId`, and `actionName` fields to MCP
    tool-call `appContext`.
    - Populate `appName` and `templateId` from trusted Codex Apps metadata,
    and derive `actionName` from the trusted app resource metadata.
    - Preserve all three fields through core events, legacy protocol events,
    persisted thread history, resume redaction, and app-server v2 responses.
    - Document the public `appContext` fields in
    `codex-rs/app-server/README.md`.
    - Regenerate app-server JSON and TypeScript schemas and add coverage for
    serialization, persistence, redaction, and metadata propagation.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol mcp_tool_call`
    - `just test -p codex-core
    mcp_tool_call_item_metadata_only_trusts_codex_apps_identity
    mcp_tool_call_item_includes_app_identity`
    - `just write-app-server-schema`
    
    ---------
    
    Co-authored-by: Martin Au-Yeung <280153141+martinauyeung-oai@users.noreply.github.com>
  • [codex] Surface MCP reauthentication-required startup failures (#29877)
    ## Summary
    
    - distinguish expired, non-refreshable stored MCP OAuth credentials from
    first-time missing credentials
    - carry a typed `failureReason: "reauthenticationRequired"` on the
    existing `mcpServer/startupStatus/updated` notification only when user
    action is required
    - keep the public MCP auth-status API unchanged and regenerate the
    app-server protocol schemas and documentation
    
    ## Why
    
    An MCP server with an expired access token and no usable refresh token
    currently fails startup without giving clients a reliable, typed
    recovery signal.
    
    The existing startup-status notification is the natural place to carry
    this state. Its nullable `failureReason` keeps the recovery reason
    attached to the failed startup transition without adding a one-off
    notification. Internally, Codex distinguishes first-time login from
    reauthentication and emits the reason only when the startup error itself
    requires authentication.
    
    ## User impact
    
    App clients can prompt an existing user to reconnect an MCP server when
    automatic recovery is impossible by handling a failed
    `mcpServer/startupStatus/updated` notification whose `failureReason` is
    `reauthenticationRequired`. Starting, ready, cancelled, unrelated
    failures, and first-time setup carry no reauthentication reason.
    
    ## Companion app PR
    
    - openai/openai#1069582
    
    ## Validation
    
    - `just test -p codex-app-server-protocol` — 248 passed; schema fixture
    tests passed
    - `cargo check -p codex-app-server -p codex-tui`
    - `just test -p codex-rmcp-client -p codex-mcp` — 184 passed, 2 skipped
    - `just test -p codex-protocol -p codex-app-server-protocol -p
    codex-mcp` — 579 passed
    - `just write-app-server-schema`
    - `just fmt`
  • Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
    ## Why
    
    #28522 routes selected-plugin HTTP MCP traffic through the owning
    executor, but OAuth bootstrap and refresh still used host-local clients.
    Executor-only servers therefore cannot complete discovery or login
    through the same network boundary as the MCP connection.
    
    ## What changed
    
    - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
    contract
    - let RMCP own discovery, dynamic registration, PKCE, token exchange,
    and refresh
    - route auth status, persisted-token startup, and app-server login
    through the server runtime while preserving the existing local discovery
    path
    - add optional `threadId` to `mcpServer/oauth/login` and echo it in the
    completion notification
    - implement RMCP's redirect policy and 1 MiB OAuth response limit over
    executor HTTP
    - cover selected-thread OAuth discovery and login through an
    executor-only route
    
    Depends on #28522.
  • [apps] Thread structured icon assets through app list (#29889)
    ## Summary
    
    - Add `iconAssets` and `iconDarkAssets` to the app-list protocol.
    - Preserve structured icons through directory merging and the connector,
    app-
      server, and TUI boundaries.
    - Keep legacy logo URLs unchanged as compatibility fallbacks.
    - Update generated protocol schemas and TypeScript types.
  • [codex] rename rollout budget error to session budget error (#29744)
    ## Summary
    
    - rename the rollout-budget exhaustion error from
    `RolloutBudgetExceeded` to `SessionBudgetExceeded`
    - expose the matching app-server v2 wire value as
    `sessionBudgetExceeded`
    - regenerate JSON/TypeScript schema fixtures and update the app-server
    docs and focused tests
    
    This is a naming-only follow-up to #29715 based on [Pavel's review
    suggestion](https://github.com/openai/codex/pull/29715#discussion_r3463183480).
    Runtime behavior is unchanged.
    
    ## Tests
    
    - `just test -p codex-core rollout_budget`
    - `just test -p codex-app-server-protocol`
    - `just fmt`
    - `just write-app-server-schema`
  • [codex] surface rollout budget exhaustion (#29715)
    ## Summary
    - surface shared rollout-budget exhaustion as
    `CodexErr::RolloutBudgetExceeded` instead of a generic interrupted turn
    - map it through the existing `CodexErrorInfo` and app-server v2
    `codexErrorInfo` path
    - keep local compaction from retrying after the shared rollout budget is
    exhausted
    
    This gives app-server clients a stable `rolloutBudgetExceeded` error
    they can classify without guessing from `status="interrupted"`.
    
    ## Tests
    - `just test -p codex-core rollout_budget`
  • core: resolve view_image paths in selected environment (#29526)
    ## Why
    
    view_image needs to support foreign OS remote executors.
    
    ## What
    
    - resolve image paths against the selected environment as `PathUri` and
    read them through that environment's filesystem
    - keep app-server's public path field wire-compatible as
    `LegacyAppPathString`, with purpose-specific UI rendering
    - cover relative and absolute target-native paths in the core
    integration test and run the full `view_image` suite under wine-exec
    without skips
  • core: add extra metadata field to Thread struct (#29675)
    # Summary
    
    Adds a field Thread.extras that can be used to hold arbitrary metadata
    specific to a given thread.
  • chore(core) rm AskForApproval::OnFailure (#28418)
    ## Summary
    Deletes the OnFailure variant of the `AskForApproval` enum. This option
    has been deprecated since #11631.
    
    ## Testing
    - [x] Tests pass
  • app-server: document thread and turn IDs are UUID7 (#27714)
    It's actually a very nice property that these are UUID7s, so documenting
    them so we think twice before changing it away from UUID7s in the
    future.
  • Propagate safety buffering treatment metadata (#29473)
    ## Summary
    
    - read the request-scoped safety-buffering treatment from HTTP response
    headers and per-turn WebSocket metadata through one shared header parser
    - combine that treatment with Responses API safety-buffering signals
    - propagate `showBufferingUi` and nullable `fasterModel` through the
    existing `model/safetyBuffering/updated` app-server notification
    - update the app-server documentation and generated JSON and TypeScript
    schemas
    
    The public implementation contains no model mapping or real model
    identifier. Tests and protocol examples use generic `current-model` and
    `faster-model` placeholders only.
    
    ## Dependencies
    
    - server-side treatment evaluation:
    https://github.com/openai/openai/pull/1060247
    - initial Responses API safety-buffering propagation:
    https://github.com/openai/codex/pull/29371
    - Codex App UI: https://github.com/openai/openai/pull/1057789
    
    ## Validation
    
    - Codex API tests: 129 passed
    - focused Codex core safety-buffering integration test passed
    - app-server protocol tests passed after regenerating schema fixtures
    - Clippy fix and repository formatting completed successfully
    
    The broader app-server run compiled all changed crates and completed
    with 1,269 passing tests. Its remaining failures were unrelated
    environment limitations: macOS sandbox application was denied, one
    expected test binary was unavailable, and several existing subprocess
    tests timed out as a result.
  • Simplify multi-agent mode controls (#29324)
    ## Why
    
    Multi-agent delegation policy was split across `multiAgentMode`,
    `features.multi_agent_mode`, and `usage_hint_enabled`. These controls
    could disagree: a requested mode could be downgraded by the feature
    flag, and disabling usage hints also disabled mode instructions.
    
    Some clients also need multi-agent tools without adding
    delegation-policy text to model context. The previous two-mode API could
    not express that directly.
    
    ## What changed
    
    `multiAgentMode` is now the only live delegation-policy control:
    
    | Mode | Behavior |
    | --- | --- |
    | `none` | Keep multi-agent tools available without adding mode
    instructions. |
    | `explicitRequestOnly` | Only delegate after an explicit user request.
    |
    | `proactive` | Delegate when parallel work materially improves speed or
    quality. |
    
    - new threads default to `explicitRequestOnly`; omitting the mode on
    later turns keeps the current value
    - thread start, resume, fork, and settings responses always report the
    concrete current mode instead of `null`
    - mode selection remains sticky across turns and resume
    - usage-hint text no longer controls whether mode instructions apply
    - `features.multi_agent_mode` and `usage_hint_enabled` remain accepted
    as ignored compatibility settings so existing configs continue to load
    - app-server documentation and generated schemas describe the three-mode
    API
    
    ## Tests
    
    - `just test -p codex-core multi_agent_mode`
    - `just test -p codex-core multi_agent_v2_config_from_feature_table`
    - `just test -p codex-core spawn_agent_description`
    - `just test -p codex-features`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server multi_agent_mode`
  • Propagate safety buffering events to app-server clients (#29371)
    Responses API safety buffering metadata currently stops at the transport
    boundary, so app-server clients cannot render the in-progress safety
    review state.
    
    This change:
    - decodes and deduplicates `safety_buffering` metadata from Responses
    API SSE and WebSocket events without suppressing the original response
    event
    - emits a typed core event containing the requested model plus backend
    use cases and reasons
    - forwards that event as `turn/safetyBuffering/updated` through
    app-server v2 and updates generated protocol schemas
    - keeps the side-channel event out of persisted rollouts and turn timing
    
    This supports the Codex Apps buffering UX and depends on the Responses
    API backend work in https://github.com/openai/openai/pull/1044569 and
    https://github.com/openai/openai/pull/1044571.
    
    Validation:
    - focused `codex-core` safety-buffering integration test passes
    - `cargo check -p codex-core -p codex-app-server -p
    codex-app-server-protocol`
    - `just fix -p codex-api -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-rollout -p
    codex-rollout-trace -p codex-otel`
    - `just fmt`
    - broad package test run: 4,430/4,492 passed; 62 unrelated
    local-environment/concurrency failures involved unavailable test
    binaries, MCP subprocess setup, and app-server timeouts
  • Expose thread-level multi-agent mode (#28792)
    ## Why
    
    Once multi-agent mode can be selected per turn, clients also need to
    choose the initial selection when creating a thread and observe that
    selection through lifecycle and settings APIs.
    
    The selected value is intentionally distinct from the effective
    model-visible value: no client selection is represented as `null`, even
    though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
    its effective default.
    
    ## What changed
    
    - Add the optional experimental `thread/start.multiAgentMode` parameter
    and pass it through thread creation.
    - Preserve an omitted initial value as an unset selection rather than
    eagerly storing `explicitRequestOnly`.
    - Apply an explicit `thread/start` selection to the first turn through
    the session configuration established at thread creation.
    - Restore the latest persisted effective mode as the selected baseline
    on cold resume when rollout history contains one.
    - Inherit the optional selected mode from a loaded parent when creating
    related runtime threads.
    - Return the current selected `multiAgentMode` from `thread/start`,
    `thread/resume`, `thread/fork`, and thread settings, using `null` when
    no mode is selected.
    - Keep lifecycle reporting independent from model capability and feature
    eligibility; core turn construction remains responsible for calculating
    and persisting the effective mode.
    
    ## Not covered
    
    - Clearing an existing loaded-session selection back to unset through
    `turn/start`; omitted or `null` currently retains the session's
    selection.
    - A TUI control, slash command, or `config.toml` preference.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`
    
    The focused app-server coverage verifies explicit `thread/start`
    initialization, first-turn prompting, nullable reporting for an omitted
    selection, and retention of selections that are not currently
    runtime-eligible.
    
    ## Stack
    
    Stacked on #28685. This PR contains only the thread initialization and
    lifecycle/settings API layer.
  • Emit Trusted MCP App Identity on Tool-Call Items (#27132)
    ## Summary
    
    - Add optional `appContext` to app-server MCP tool-call items with
    trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata.
    - Preserve that context across tool-call events, persisted history,
    reconnects, and thread resume.
    - Keep the deprecated top-level `mcpAppResourceUri` temporarily for
    client migration.
    
    The consumer contract is `{ appContext: { connectorId, linkId,
    mcpAppResourceUri }, tool }`.
    
    ## Validation
    
    - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy,
    release builds, and argument-comment lint.
    
    ---------
    
    Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>
  • unified-exec: retain PathUri in command events (#28780)
    ## Why
    
    App-server must report command events containing foreign-platform paths
    without changing existing client or rollout path-string formats.
    
    ## What changed
    
    - retain `PathUri` through exec command begin/end events
    - convert cwd values to `LegacyAppPathString` at the app-server
    compatibility boundary
    - drop command actions with foreign paths and log them
    - serialize rollout-trace cwd values using their inferred native path
    representation
    - restore Wine coverage for retained Windows cwd values and successful
    completion
  • [codex] Restore thread recency with compatible migration history (#28671)
    ## Summary
    
    - Revert #28655, restoring the thread `recencyAt` behavior introduced by
    #27910.
    - Move `threads_recency_at` to migration 0039 so it no longer collides
    with `external_agent_config_imports` at version 0038.
    - Repair databases that already applied the recency migration as version
    38 by moving the matching migration-history row to version 39 before
    SQLx validation. The current version-38 migration can then apply
    normally.
    
    ## Validation
    
    - `just test -p codex-state
    migrations::tests::repairs_recency_migration_that_was_applied_as_version_38`
    - `just test -p codex-state -p codex-rollout -p codex-thread-store -p
    codex-app-server-protocol -p codex-tui`: 3,439 passed; six TUI tests
    could not open the machine's existing read-only incident database at
    `~/.codex/sqlite/state_5.sqlite`.
    - `just fix -p codex-state`
    - `just fmt`
    - Verified that state migration versions are unique.
  • Revert thread recencyAt for sidebar ordering (#28655)
    ## Why
    
    Revert #27910 to remove the newly introduced thread `recencyAt`
    persistence and API behavior from `main`.
    
    ## What changed
    
    This reverts commit `fac3158c2a783095768076489815f361fa9b0db4`,
    including the state migration, thread-store propagation, app-server API
    surface, generated schemas, and related tests.
    
    ## Validation
    
    Not run before opening; relying on CI for the initial fast signal.
  • Add thread recencyAt for sidebar ordering (#27910)
    ## Summary
    
    Add a server-owned `recencyAt` timestamp and `recency_at` thread-list
    sort key for product recency ordering while preserving the existing
    meaning of `updatedAt` as the latest persisted thread mutation.
    
    This is the server-side alternative to #27697. Rather than narrowing
    `updatedAt`, clients can sort the sidebar by `recency_at` and continue
    treating `updatedAt` as mutation time.
    
    Paired Codex Apps PR:
    [openai/openai#1024599](https://github.com/openai/openai/pull/1024599)
    
    ## Contract
    
    - `recencyAt` initializes when a thread is created.
    - A turn start advances `recencyAt` monotonically.
    - Commentary, agent output, tool results, token/accounting updates, turn
    completion, archive, unarchive, resume, and generic metadata writes do
    not advance it.
    - `updatedAt` retains its existing behavior and continues to advance for
    persisted thread mutations.
    - Current servers populate `recencyAt`; the response field is optional
    in generated TypeScript so clients connected to older servers can fall
    back to `updatedAt`.
    - Filesystem-only fallback uses existing updated/mtime ordering when
    SQLite is unavailable.
    
    ## Persistence and compatibility
    
    Migration 0038 adds second- and millisecond-precision recency columns,
    backfills them from the existing updated timestamp, creates list
    indexes, and includes an insert trigger so older binaries writing to a
    migrated database seed recency without causing later mutations to
    advance it.
    
    Generic metadata upserts preserve existing recency values. Turn-start
    updates use a dedicated monotonic touch, and process-local allocation
    keeps millisecond cursor values unique. State DB list, search, read,
    filtered-list repair, rollout fallback propagation, and app-server
    conversions all carry the new field.
    
    ## API
    
    `Thread` responses include:
    
    ```ts
    recencyAt?: number
    ```
    
    `thread/list` and `thread/search` accept:
    
    ```json
    { "sortKey": "recency_at" }
    ```
    
    Generated TypeScript and JSON schemas are included.
    
    ## Validation
    
    - `just test -p codex-state` — 146 passed
    - `just test -p codex-rollout` — 69 passed
    - `just test -p codex-thread-store` — 81 passed
    - `just test -p codex-app-server-protocol` — 231 passed
    - Focused app-server list ordering, response mapping, archive/unarchive,
    and resume lifecycle tests passed
    - Scoped `just fix` for state, rollout, thread-store,
    app-server-protocol, and app-server
    - `just fmt`
    - `git diff --check`
    - Independent correctness, simplicity, elegance, security, and
    test-quality reviews; actionable ordering, lifecycle, query-projection,
    and timestamp-uniqueness findings were addressed
  • Clarify model-generated and legacy app path types (#28577)
    ## Why
    
    `ApiPathString` kind of implies that it can be used anywhere we pull a
    path out of JSON, but it's not really appropriate for tool arguments
    when the model might generate relative paths.
    
    Prefer `String` for model-generated paths and we can handle the
    conversion per feature for now and define a shared abstraction later if
    it makes sense.
    
    # What
    
    Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.
    
    Expand the `path-types` skill to tell the model to leave tool args as
    bare strings.
  • [codex] Record external agent import results (#28396)
    ## Summary
    - restore `externalAgentConfig/import/progress` notifications while
    keeping `externalAgentConfig/import/completed` as the must-deliver event
    - persist completed external-agent config imports in state DB by
    `importId`, including concrete success/failure details for config,
    AGENTS.md, skills, plugins, MCP servers, subagents, hooks, commands, and
    sessions
    - add `externalAgentConfig/import/readHistories` so clients can recover
    persisted import results after missing the live completion notification
    - include `errorType` on import failures in protocol
    responses/notifications and persisted DB JSON so future code can
    classify failures without another wire/storage shape change
    
    ## Validation
    - `git diff --check`
    - `just test -p codex-state external_agent_config_imports`
    - `just test -p codex-app-server-protocol`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-sqlite-read-details
    just test -p codex-app-server
    external_agent_config_import_sends_completion_notification_for_sync_only_import`
    
    Also ran earlier broader checks before publishing:
    - `just test -p codex-state`
    -
    `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-external-agent-test-sqlite
    just test -p codex-app-server external_agent_config`
    - `just test -p codex-external-agent-migration`
  • [codex] Add interruptible sleep tool (#28429)
    ## Why
    
    Models sometimes need to pause briefly while waiting for external work,
    but using a shell command for that delay ties the wait to a process and
    does not naturally resume when new turn input arrives.
    
    ## What changed
    
    - add a built-in `sleep` tool behind the under-development `sleep_tool`
    feature
    - accept a bounded `duration_ms` argument, matching the millisecond
    convention used by unified exec
    - end the sleep early when either steered user input or mailbox input
    arrives
    - include elapsed wall-clock time in completed and interrupted outputs
    - emit a dedicated core `SleepItem` through `item/started` and
    `item/completed`
    - expose the sleep item as app-server v2 `ThreadItem::Sleep` and retain
    it in reconstructed thread history
    - regenerate the configuration schema for the new feature flag
    - regenerate app-server JSON and TypeScript schema fixtures
    
    ## Test plan
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • Use ApiPathString in app-server filesystem permission paths (#28367)
    ## Why
    
    Clients running an app-server on one OS and an exec-server on another OS
    need to be able to pass sandbox config to app-server that refers to
    resources on the executor's foreign OS.
    
    ## What
    
    `AbsolutePathBuf` can't represent these paths and we don't want users to
    be exposed to `PathUri` yet, so this moves the public app-server API to
    be expressed in terms of `ApiPathString`.
    
    Stacked on #28165.
    
    - change app-server v2 filesystem permission paths, including legacy
    read/write roots, to `ApiPathString`
    - localize API paths through `PathUri` when converting into the current
    native core permission types
    - make path-bearing permission conversions fallible and surface
    localization failures instead of silently treating malformed grants as
    ordinary denials
    - propagate conversion failures through app-server and TUI approval
    handling
    - regenerate the app-server JSON and TypeScript schemas
    - leave migration TODOs on native-path conversions so they can be
    removed once core permission paths use `PathUri`
  • [codex] Add external agent import result accounting (#28008)
    ## Why
    
    External-agent imports can complete synchronously or continue in the
    background for plugins/sessions. Clients need a stable import id to
    correlate the immediate response with the eventual completion
    notification, and the completion payload needs enough accounting to show
    which artifact types succeeded or failed without hiding partial
    failures.
    
    ## What Changed
    
    - `externalAgentConfig/import` now returns an `importId`;
    `externalAgentConfig/import/completed` includes the same `importId` plus
    type-level `itemResults`.
    - Completed `itemResults` report `successCount`, `errorCount`,
    `successes`, and `rawErrors` for each migrated item type.
    - Added protocol/schema/TypeScript types for import successes, raw
    errors, and type-level results. No progress notification is included in
    the final PR.
    - `ExternalAgentConfigService::import` now returns an outcome object
    with synchronous item results and pending plugin imports.
    - Plugin import outcomes track succeeded/failed marketplaces, plugin
    ids, and raw errors. Plugin failures can be reported in completed
    accounting while later migration items continue.
    - Non-plugin synchronous import failures still fail the request, so
    invalid config/skills-style failures are not reported as a successful
    import response.
    - Session imports now return item results. Successful imports include
    the source session path and imported thread id; prepare, persist,
    ledger, and source-validation failures become raw errors in completion
    accounting where the import can continue.
    - The request processor generates the `importId`, aggregates synchronous
    results with background plugin/session results, and sends a single
    completed notification when all selected work is done.
    - App-server docs and generated schema fixtures were updated for the new
    response/completed payload shapes.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server-client event_requires_delivery`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-sync-error
    just test -p codex-app-server
    external_agent_config_import_returns_error_for_failed_sync_import`
    - `CODEX_SQLITE_HOME=/private/tmp/codex-app-server-review-external-agent
    just test -p codex-app-server external_agent_config`
    
    Note: local sandbox validation used `CODEX_SQLITE_HOME` because the
    default sqlite state path is read-only in this environment.
  • feat: add Bedrock API key as a managed auth mode (#27443)
    ## Why
    
    Codex needs to manage Amazon Bedrock API key credentials through the
    existing auth lifecycle instead of introducing a separate auth manager
    or provider-specific credential file. Treating Bedrock API key login as
    a primary auth mode gives it the same persistence, keyring, reload, and
    logout behavior as the existing OpenAI API key and ChatGPT modes.
    
    The credential is valid only for the `amazon-bedrock` model provider.
    OpenAI-compatible providers must reject this auth mode rather than
    treating the Bedrock key as an OpenAI bearer token.
    
    ## What changed
    
    - Added `bedrockApiKey` as an app-server `AuthMode` and
    `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode.
    - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to
    the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or
    the configured keyring backend.
    - Added `login_with_bedrock_api_key(...)`, parallel to
    `login_with_api_key(...)`, which replaces the current stored login with
    Bedrock credentials.
    - Reused generic auth reload and logout behavior instead of adding a
    Bedrock-specific auth manager or logout path.
    - Updated login restrictions, status reporting, diagnostics, telemetry
    classification, generated app-server schemas, and auth fixtures for the
    new mode.
    - Added explicit errors when Bedrock API key auth is selected with an
    OpenAI-compatible model provider.
    
    This PR establishes managed storage and auth-mode behavior. Routing the
    managed key and region into Amazon Bedrock requests will be in follow-up
    PRs.
  • Add app-server thread/delete API (#25018)
    ## Why
    
    Clients can archive and unarchive threads today, but there is no
    app-server API for permanently removing a thread. Deletion also needs to
    cover the full session tree: deleting a main thread should remove
    spawned subagent threads and the related local metadata instead of
    leaving orphaned rollout files, goals, or subagent state behind.
    
    ## What
    
    - Adds the v2 `thread/delete` request and `thread/deleted` notification,
    with the response shape kept consistent with `thread/archive`.
    - Implements local hard delete for active and archived rollout files.
    - Deletes the requested thread's state DB row as the commit point, then
    best-effort cleans associated state including spawned descendants,
    goals, spawn edges, logs, dynamic tools, and agent job assignments.
    - Updates app-server API docs and generated protocol schema/TypeScript
    fixtures.
  • [codex-analytics] add extensible feature thread sources (#27063)
    ## Why
    - `ThreadSource` currently defines a closed set of core-owned values
    - Product features also create threads for background or scheduled work
    - Adding every product-specific value to the core enum would require
    repeated `codex-rs` protocol changes
    - Feature-backed values let product callers provide precise attribution
    while preserving the existing core classifications
    
    ## What Changed
    - Adds `ThreadSource::Feature(String)` for app-owned thread source
    values
    - Represents all app-server v2 thread sources as scalar strings, so a
    feature source is supplied as `"automation"`
    - Persists and emits the feature's plain string label, so `"automation"`
    produces `thread_source="automation"` in analytics
    - Keeps `user`, `subagent`, and `memory_consolidation` as explicit
    core-owned values and regenerates the app-server schemas and TypeScript
    bindings
    
    ## Verification
    - `just write-app-server-schema`
    - `cargo check --workspace`
    - `just test -p codex-protocol
    feature_thread_source_serializes_as_its_app_owned_label`
    - `just test -p codex-app-server-protocol
    thread_sources_round_trip_as_scalar_labels`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `just fmt`
  • multi-agent: add path-based v2 activity tracking (#27007)
    ## Why
    
    Multi-agent v2 identifies agents by canonical paths, but its tool
    handlers still emitted the larger legacy collaboration begin/end events
    built around nickname and role metadata. App-server, rollout-trace,
    analytics, and TUI consumers therefore lacked one compact path-based
    completion signal that behaved consistently across live events and
    replay.
    
    The TUI also needs a bounded `/agent` status surface for v2 agents. It
    should use recent local activity for previews, refresh liveness without
    loading full histories, and keep the legacy picker available when no
    path-backed v2 agent is known.
    
    ## What changed
    
    - Replace the v2 `spawn_agent`, `send_message`, `followup_task`, and
    `interrupt_agent` legacy lifecycle emissions with a success-only
    `SubAgentActivity` event. The event records the tool call ID, occurrence
    time, affected thread, canonical agent path, and `started`,
    `interacted`, or `interrupted` kind.
    - Expose the activity as a completion-only app-server v2
    `subAgentActivity` thread item in live notifications and reconstructed
    history, regenerate the protocol schemas, and count it in sub-agent tool
    analytics.
    - Track canonical paths from live activity and loaded-thread metadata in
    the TUI, and render the activity in live and replayed transcripts.
    - Make `/agent` list running path-backed agents with summaries from
    bounded local event buffers. Each summary is capped at 240 graphemes,
    the scan is capped at six recent items, only the last three wrapped
    lines are shown, and command output is omitted. Liveness falls back to
    metadata-only `thread/read` when local turn state is unavailable.
    - Persist the activity as a terminal rollout-trace runtime payload and
    reduce it to the corresponding spawn, send, follow-up, or close
    interaction edge. `interrupt_agent` is classified as a close-edge
    operation.
    - Preserve the legacy picker when no path-backed v2 agent is known.
    
    ## Compatibility
    
    App-server v2 clients that consumed `collabAgentToolCall` begin/end
    pairs for these tools must handle the new completion-only
    `subAgentActivity` item. Legacy v1 collaboration behavior is unchanged.
    
    ## Screenshot
    
    <img width="684" height="288" alt="Screenshot 2026-06-08 at 15 40 47"
    src="https://github.com/user-attachments/assets/194b3cd0-619d-45fb-b587-cf3e2b1b8a1d"
    />
    
    ## Testing
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-rollout-trace`
    - Added focused coverage for activity analytics, terminal trace
    serialization, spawn-edge reduction, `interrupt_agent` classification,
    TUI status rendering without aggregated command output, and clearing
    stale running state after a completed turn.
  • fix(tui): scope MCP startup status by thread (#26639)
    ## Why
    
    MCP startup failures from spawned subagents were rendered as global
    notifications, so a child thread's failure could pollute the visible
    parent transcript. Routing the notification to the child exposed two
    related replay problems: session refresh could discard the buffered
    event, and a newly created child `ChatWidget` did not know the expected
    MCP server set, which could leave its startup spinner running after
    every server had settled.
    
    MCP startup diagnostics should remain visible in the thread that owns
    the startup without affecting other transcripts. The protocol also needs
    to support a future app-scoped MCP lifecycle where startup is not owned
    by any thread.
    
    ## Reported Behavior
    
    The [originating Slack
    report](https://openai.slack.com/archives/C08JZTV654K/p1780604538859939)
    called out that using subagents could turn MCP startup failures into a
    wall of yellow CLI warnings because repeated failures were not
    deduplicated. The intended behavior is for those diagnostics to remain
    visible once in the thread that owns the startup, without polluting the
    parent transcript.
    
    ## What Changed
    
    - add nullable `threadId` ownership to `mcpServer/startupStatus/updated`
    - populate it from the app-server conversation ID for the current
    thread-scoped lifecycle and regenerate the protocol schema and
    TypeScript artifacts
    - treat a missing or null `threadId` as app-scoped without injecting it
    into the active chat transcript
    - route and buffer thread-owned MCP startup notifications by thread in
    the TUI
    - preserve buffered MCP startup events across child session refresh
    - seed expected MCP servers before replaying a thread snapshot so
    startup reaches its terminal state
    - suppress an identical repeated failure warning for the same server
    within one startup round
    
    The owning thread still renders the detailed failure and final `MCP
    startup incomplete (...)` summary.
    
    ## How to Test
    
    1. Configure an optional MCP server named `smoke` that exits during
    initialization.
    2. Launch the TUI with multi-agent support enabled.
    3. Confirm the main thread's own startup failure renders one detailed
    `smoke` warning and one incomplete-startup summary.
    4. Spawn exactly one subagent.
    5. Confirm the parent transcript does not receive the subagent's MCP
    startup failure.
    6. Switch to the subagent thread and confirm it contains exactly one
    detailed `smoke` failure and one incomplete-startup summary.
    7. Confirm the subagent's MCP startup spinner disappears and the thread
    remains usable.
    8. Switch between the parent and subagent and confirm the warnings
    neither move nor duplicate.
    
    Targeted tests:
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    thread_start_emits_mcp_server_status_updated_notifications`
    - `just test -p codex-tui mcp_startup`
    
    The parent/child behavior and spinner completion were also exercised
    manually in tmux. `just argument-comment-lint` was attempted but blocked
    by an unrelated local Bazel LLVM empty-glob failure; touched Rust
    callsites were inspected manually.
  • [codex-rs] support v2 personal access tokens (#25731)
    ## Summary
    
    - add v2 personal access token support for `codex login
    --with-access-token` and `CODEX_ACCESS_TOKEN`
    - classify opaque `at-` tokens separately from legacy Agent Identity
    JWTs
    - hydrate required ChatGPT account metadata through AuthAPI
    `/v1/user-auth-credential/whoami`
    - use PATs directly as bearer tokens while preserving existing ChatGPT
    account surfaces
    - expose PAT-backed auth as the explicit `personalAccessToken`
    app-server auth mode
    
    ## Implementation
    
    PAT auth is intentionally small and stateless. Loading a PAT performs
    one AuthAPI metadata request, stores the hydrated metadata in the
    in-memory auth object, and redacts the secret from debug output. Legacy
    Agent Identity JWT handling remains unchanged. The shared access-token
    classifier lives in a private neutral module because it dispatches
    between both credential types.
    
    PAT hydration fails closed when AuthAPI omits any required metadata,
    including email. Hydrated metadata is intentionally not persisted:
    startup performs a live `whoami` preflight so revoked tokens or changed
    account metadata are not accepted from a stale cache.
    
    ## Workspace restriction scope
    
    This change intentionally does **not** apply
    `forced_chatgpt_workspace_id` to PAT authentication. The setting is a
    client-side config guardrail, not an authorization boundary, and PAT
    does not currently require workspace-ID parity. The PAT login and
    `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without
    threading workspace-restriction state through access-token loading.
    Existing workspace checks for non-PAT auth remain on their established
    paths.
    
    ## App-server compatibility
    
    The public app-server `AuthMode` is shared across v1 and v2, and
    PAT-backed auth reports `personalAccessToken` through both APIs.
    Following human review, this intentionally removes the temporary v1
    compatibility mapping that reported PATs as `chatgpt`; the deprecated v1
    API is kept in parity with v2 rather than maintaining a separate closed
    enum. Clients with exhaustive auth-mode handling in either API version
    must add the new case and should generally treat it as ChatGPT-backed
    unless they need PAT-specific behavior.
    
    The v1 auth-status response still omits the raw PAT when `includeToken`
    is requested because that response cannot carry the account metadata
    needed to reuse the credential safely. Persisted PAT auth also omits the
    new enum value so older Codex builds can deserialize `auth.json` and
    infer PAT auth from the credential field after a rollback.
    
    ## Validation
    
    Latest review-fix validation:
    
    - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli
    stored_auth_validation_handles_personal_access_token`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226
    passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-models-manager
    refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth`
    - `CARGO_INCREMENTAL=0 just test -p codex-tui
    existing_non_oauth_chatgpt_login_counts_as_signed_in`
    - `CARGO_INCREMENTAL=0 just fix -p codex-login -p
    codex-app-server-protocol -p codex-models-manager -p codex-tui -p
    codex-cli`
    - `just fmt`
    - `git diff --check`
    
    The broader `codex-tui` suite previously compiled and ran 2,834 tests.
    Three unrelated environment-sensitive guardian/IDE-socket tests failed
    after retries; the PAT-relevant TUI coverage passed.
  • [codex] Forward turn moderation metadata through app-server (#25710)
    ## Why
    First-party backends can supply turn-scoped moderation metadata that
    app-server clients need for client-side presentation. Exposing this as
    an experimental typed notification lets opted-in clients consume it
    without interpreting raw Responses API events.
    
    ## What changed
    - forward `response.metadata.openai_chatgpt_moderation_metadata` from
    Responses API SSE and WebSocket streams as turn-scoped moderation
    metadata
    - emit the experimental app-server v2 `turn/moderationMetadata`
    notification with `{ threadId, turnId, metadata }`
    - add app-server integration coverage for the typed moderation metadata
    notification
    
    ## Testing
    - `just test -p codex-core
    build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
    - `just test -p codex-core` (fails locally: 46 failures and 1 timeout,
    primarily missing `test_stdio_server` and shell snapshot timeouts)
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server
    turn_moderation_metadata_emits_typed_notification_v2`
    - `just test -p codex-app-server` (fails locally: 792 passed, 10 failed,
    and 5 timed out; failures are in existing environment-sensitive tests,
    primarily because nested macOS `sandbox-exec` is not permitted)
    - `just write-app-server-schema --experimental --schema-root
    /tmp/codex-app-server-schema-experimental`
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • feat: show enterprise monthly credit limits in status (#24812)
    ## Summary
    
    Enterprise users can have an effective monthly credit limit, but Codex
    `/status` currently drops that metadata from the account-usage response.
    
    This change adds the optional `spend_control.individual_limit`
    projection to the existing rate-limit snapshot flow. The backend client
    reads the monthly limit, app-server exposes it as `individualLimit`, and
    the TUI renders a `Monthly credit limit` row through the existing
    progress-bar renderer.
    
    When the backend does not return an effective monthly limit, existing
    rate-limit behavior is unchanged.
    
    ## Existing backend state
    
    The account-usage backend already returns the effective monthly limit
    and current usage together:
    
    ```json
    {
      "spend_control": {
        "reached": false,
        "individual_limit": {
          "limit": "25000",
          "used": "8000",
          "remaining": "17000",
          "used_percent": 32,
          "remaining_percent": 68,
          "reset_after_seconds": 86400,
          "reset_at": 1778137680
        }
      }
    }
    ```
    
    Before this change, Codex projected rolling `primary` and `secondary`
    windows plus `credits`. It ignored `spend_control.individual_limit`, so
    app-server clients and `/status` could not render the monthly cap.
    
    The updated flow is:
    
    ```text
    account usage backend
      -> backend-client reads spend_control.individual_limit
      -> existing rate-limit snapshot carries optional individual_limit
      -> app-server exposes optional individualLimit
      -> TUI renders Monthly credit limit
    ```
    
    ## App-server contract
    
    `account/rateLimits/read` and sparse `account/rateLimits/updated`
    notifications now include an additive nullable
    `rateLimits.individualLimit` field:
    
    ```json
    {
      "individualLimit": {
        "limit": "25000",
        "used": "8000",
        "remainingPercent": 68,
        "resetsAt": 1778137680
      }
    }
    ```
    
    In an `account/rateLimits/read` response, `null` means no monthly limit
    is available. `account/rateLimits/updated` remains a sparse rolling
    notification: clients merge available values into their most recent
    `account/rateLimits/read` snapshot or refetch. Nullable account metadata
    in a rolling notification does not clear a previously observed value.
    
    ## Design decisions
    
    - Extend the existing rate-limit snapshot instead of introducing a
    separate request or wire-level update protocol.
    - Keep the Codex projection narrow: `/status` needs the effective limit,
    current usage, remaining percentage, and reset timestamp.
    - Render the monthly row through the existing progress-bar renderer,
    with one optional detail line for `8,000 of 25,000 credits used`.
    - Keep the backend response optional so existing accounts and older
    usage states preserve their current behavior.
    - Preserve cached monthly metadata when sparse rolling notifications
    omit it. Live account-usage reads remain authoritative and can clear a
    removed limit.
    
    ## Visual evidence
    
    ```text
     Monthly credit limit:   [██████████████░░░░░░] 68% left (resets 07:08 on 7 May)
                             8,000 of 25,000 credits used
    ```
    
    Snapshot:
    `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap`
    
    ## Testing
    
    Tests: generated app-server schema verification, protocol tests,
    backend-client tests, app-server integration coverage, TUI snapshot
    coverage, formatting, and workspace lint cleanup.
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • Add cloud-managed config layer support (#24620)
    ## Summary
    
    PR 3 of 5 in the cloud-managed config client stack.
    
    Adds enterprise-managed cloud config as a first-class config layer
    source. The layer metadata is preserved through config loading,
    diagnostics, debug output, hook attribution, and app-server protocol
    surfaces.
    
    ## Details
    
    - Enterprise-managed config becomes a normal config layer source with
    backend-supplied `id` and display `name` attached for provenance.
    - These layers are designed to behave like non-file managed config: they
    can surface syntax/type diagnostics by layer name even though there is
    no physical config file.
    - Relative path settings are resolved from a stored config base so
    cloud-delivered config remains consistent with existing MDM-delivered
    config semantics.
    - Hook attribution distinguishes config-delivered hooks from
    requirements-delivered hooks via `HookSource::CloudManagedConfig`.
    - This remains pull-based and snapshot-oriented; the PR adds layer
    identity/diagnostics, not dynamic reload behavior.
    
    ## Validation
    
    Validated through the targeted stack checks after rebasing onto current
    `main`:
    
    - Rust crate tests for
    config/hooks/cloud-config/backend-client/app-server-protocol
    - Filtered `codex-core` and `codex-app-server` `cloud_config_bundle`
    tests
    - Python generated-file contract test
    - `cargo shear --deny-warnings`
    - Targeted `argument-comment-lint` for config/hooks
  • [codex] Add user input client ids (#24653)
    ## Summary
    
    Adds an optional `clientId` field to app-server v2 `UserInput` and
    carries it through the core `UserInput` model so clients can correlate
    echoed user input items without relying on payload equality.
    
    ## Details
    
    - Adds `client_id: Option<String>` to core `UserInput` variants.
    - Exposes the v2 app-server field as `clientId` on the wire and in
    generated TypeScript.
    - Preserves the id when converting between app-server v2 and core
    protocol types.
    - Regenerates app-server schema fixtures.
    
    ## Validation
    
    - `just fmt`
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-protocol`
    - `git diff --check`
  • Restore legacy image detail values (#24644)
    ## Why
    
    Older persisted rollouts can contain `input_image.detail` values of
    `auto` or `low` from before `ImageDetail` was narrowed to
    `high`/`original`. Current deserialization rejects those values, which
    can make resume skip later compacted checkpoints and reconstruct an
    oversized raw suffix before the next compaction attempt.
    
    Confirmed Sentry reports fixed by this compatibility path:
    
    - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
    - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
    - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
    - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
    
    ## Background
    
    [openai/codex#20693](https://github.com/openai/codex/pull/20693) added
    image-detail plumbing for app-server `UserInput` so input images could
    explicitly request `detail: original`. The Slack discussion behind that
    PR was about ScreenSpot / bridge evals where user input images were
    resized, while tool output images already had MCP/code-mode ways to
    request image detail.
    
    In review, the intended new API surface was narrowed to `high` and
    `original`: default to `high`, allow `original` when callers need
    unchanged image handling, and avoid encouraging new `auto` or `low`
    usage. That policy still makes sense for newly emitted values.
    
    The missing compatibility piece is persisted history. Older rollouts can
    already contain `auto` and `low`, and resume reconstructs typed history
    by deserializing those rollout records. Rejecting old values at that
    boundary causes valid compacted checkpoints to be skipped. This PR
    restores `auto` and `low` as real variants so old records deserialize
    and round-trip without being rewritten as `high`, while product paths
    can continue to default to `high` and avoid emitting `auto` for new
    behavior.
    
    ## What changed
    
    - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
    protocol values.
    - Preserved `auto`/`low` through rollout deserialization, MCP image
    metadata, code-mode image output, and schema/type generation.
    - Kept local image byte handling conservative: only `original` switches
    to original-resolution loading; `auto`/`low`/`high` continue through the
    resize-to-fit path while retaining their detail value.
    - Added regression coverage for enum round-tripping and code-mode `low`
    detail handling.
    
    ## Testing
    
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-tools`
    - `just test -p codex-code-mode`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
    - `just test -p codex-core
    suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
    - Loaded broken rollouts on local fixed builds, and started/completed
    new turns.
    
    I also attempted `just test -p codex-core`; the local broad run did not
    finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
    out. The failures were broad timeout/deadline failures across unrelated
    areas; targeted changed-path core tests above passed.
  • app-server: drop legacy profile config surface (#24067)
    ## Why
    
    Legacy `[profiles.<name>]` config tables and the legacy `profile`
    selector are being retired in favor of profile files selected with
    `--profile <name>`. After #23886 removed the CLI-side legacy profile
    plumbing, the app-server config surface still exposed those fields and
    still carried conversion code for the old protocol shape.
    
    ## What changed
    
    - Remove `profile`, `profiles`, and `ProfileV2` from the app-server
    config protocol/schema output so `config/read` no longer returns legacy
    profile config.
    - Drop the old v1 `UserSavedConfig` profile conversion path from
    `config`.
    - Reject new app-server config writes under `profiles.*` with the same
    migration direction used for `profile`, while still allowing callers to
    clear existing legacy profile tables.
    - Refresh app-server config coverage and the experimental API README
    example around the remaining `Config` nesting path.
    
    ## Verification
    
    - Added config-manager coverage that `config/read` omits legacy profile
    config, `profiles.*` writes are rejected, and existing legacy profile
    tables can still be cleared.
    - Updated the v2 config RPC test to cover the rejected `profiles.*`
    batch-write path.
  • [codex] Add rollout-backed thread content search (#23519)
    ## Summary
    - add experimental `thread/search` for local rollout-backed thread
    search using `rg` over JSONL rollouts
    - return search-specific result rows with optional previews instead of
    storing preview data on `StoredThread` or ordinary `Thread` responses
    - keep `thread/list` separate from full-content search and document the
    new app-server surface
    
    ## Testing
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_search_returns_content_and_title_matches -- --nocapture`
  • [codex] Add plugin id to MCP tool call items (#23737)
    Add owning plugin id to MCP tool call items so we can better filter them
    at plugin level.
    
    ## Summary
    - add optional `plugin_id` to MCP tool-call items and legacy begin/end
    events
    - propagate plugin metadata into emitted core items and app-server v2
    `ThreadItem::McpToolCall`
    - preserve plugin ids through app-server replay/redaction paths and
    regenerate v2 schema fixtures
    
    ## Testing
    - `just write-app-server-schema`
    - `just fmt`
    - `just fix -p codex-core`
    - `cargo test -p codex-protocol -p codex-app-server-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
    - `cargo check -p codex-tui --tests`
    - `cargo check -p codex-app-server --tests`
    - `git diff --check`
    
    ## Notes
    - `just fix -p codex-core` completed with two non-fatal
    `too_many_arguments` warnings on the touched MCP notification helpers.
    - A broader `cargo test -p codex-core` run passed core unit tests, then
    hit shell/sandbox/snapshot failures in the integration target.
    - A broader app-server downstream run hit the existing
    `in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
    overflow; `cargo test -p codex-exec` also hit the existing sandbox
    expectation mismatch in
    `thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
  • Add SubagentStop hook (#22873)
    # What
    
    <img width="1792" height="1024" alt="image"
    src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
    />
    
    `SubagentStop` runs when a thread-spawned subagent turn is about to
    finish. Thread-spawned subagents use `SubagentStop` instead of the
    normal root-agent `Stop` hook.
    
    Configured handlers match on `agent_type`. Hook input includes the
    normal stop fields plus:
    
    - `agent_id`: the child thread id.
    - `agent_type`: the resolved subagent type.
    - `agent_transcript_path`: the child subagent transcript path.
    - `transcript_path`: the parent thread transcript path.
    - `last_assistant_message`: the final assistant message from the child
    turn, when available.
    - `stop_hook_active`: `true` when the child is already continuing
    because an earlier stop-like hook blocked completion.
    
    `SubagentStop` shares the same completion-control semantics as `Stop`,
    scoped to the child turn:
    
    - No decision allows the child turn to finish.
    - `decision: "block"` with a non-empty `reason` records that reason as
    hook feedback and continues the child with that prompt.
    - `continue: false` stops the child turn. If `stopReason` is present,
    Codex surfaces it as the stop reason.
    
    # Lifecycle Scope
    
    Only thread-spawned subagents run `SubagentStop`.
    
    Internal/system subagents such as Review, Compact, MemoryConsolidation,
    and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
    This avoids exposing synthetic matcher labels for internal
    implementation paths.
    
    # Stack
    
    1. #22782: add `SubagentStart`.
    2. This PR: add `SubagentStop`.
    3. #22882: add subagent identity to normal hook inputs.
  • feat(permissions): resolve permission profile inheritance (#22270)
    ## Stack
    
    This is the foundation PR for the permission-profile inheritance stack.
    
    - This PR adds config-level `extends` resolution and merge semantics.
    - Follow-up: #23705 applies resolved profiles at runtime and updates the
    active-profile protocol surfaces.
    
    ## Why
    
    Permission profiles are starting to carry enough policy that
    copy-pasting near-identical definitions becomes hard to review and easy
    to drift. Before the runtime can consume inherited profiles, the config
    layer needs one explicit resolver that can merge parent chains and
    reject unsafe or invalid inheritance shapes.
    
    ## What changed
    
    - Add `extends` to permission-profile TOML and resolve parent chains in
    inheritance order.
    - Merge inherited profile TOML with the existing config merge behavior
    while preserving the permission-specific normalization needed for
    network domain keys.
    - Keep parent descriptions out of resolved child profiles and record
    inherited profile names separately for downstream consumers.
    - Reject undefined parents, unsupported built-in parents, and
    inheritance cycles with targeted errors.
    - Cover resolver behavior with TOML fixture tests and refresh the
    generated config schema.
    
    ## Validation
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core permissions_profiles_`
  • Add thread/settings/update app-server API (#23502)
    ## Why
    
    App-server clients need a way to update a thread's next-turn settings
    without starting a turn, adding transcript content, or waiting for turn
    lifecycle events. This gives settings UI a direct path for durable
    thread settings while clients observe the eventual effective state
    through a notification.
    
    This is a simplified rework of PR
    https://github.com/openai/codex/pull/22509. In particular, it changes
    the `thread/settings/update` api to return immediately rather than
    waiting and returning the effective (updated) thread settings. This
    makes the new api consistent with `turn/start` and greatly reduces the
    complexity of the implementation relative to the earlier attempt.
    
    ## What Changed
    
    - Adds experimental `thread/settings/update` with partial-update request
    fields and an empty acknowledgment response.
    - Adds experimental `thread/settings/updated`, carrying full effective
    `ThreadSettings` and scoped by `threadId` to subscribed clients for the
    affected thread.
    - Shares durable settings validation with `turn/start`, including
    `sandboxPolicy` plus `permissions` rejection and `serviceTier: null`
    clearing.
    - Emits the same settings notification when `turn/start` overrides
    change the stored effective thread settings.
    - Regenerates app-server protocol schema fixtures and updates
    `app-server/README.md`.
  • Add SubagentStart hook (#22782)
    # What
    
    `SubagentStart` runs once when Codex creates a thread-spawned subagent,
    before that child sends its first model request. Thread-spawned
    subagents use `SubagentStart` instead of the normal root-agent
    `SessionStart` hook.
    
    Configured handlers match on the subagent `agent_type`, using the same
    value passed to `spawn_agent`. When no agent type is specified, Codex
    uses the default agent type.
    
    Hook input includes the normal session-start fields plus:
    
    - `agent_id`: the child thread id.
    - `agent_type`: the resolved subagent type.
    
    `SubagentStart` may return `hookSpecificOutput.additionalContext`. That
    context is added to the child conversation before the first model
    request.
    
    # Lifecycle Scope
    
    Only thread-spawned subagents run `SubagentStart`.
    
    Internal/system subagents such as Review, Compact, MemoryConsolidation,
    and Other do not run normal `SessionStart` hooks and do not run
    `SubagentStart`. This avoids exposing synthetic matcher labels for
    internal implementation paths.
    
    Also the `SessionStart` hook no longer fires for subagents, this matches
    behavior with other coding agents' implementation
    
    # Stack
    
    1. This PR: add `SubagentStart`.
    2. #22873: add `SubagentStop`.
    3. #22882: add subagent identity to normal hook inputs.
  • Make deny canonical for filesystem permission entries (#23493)
    ## Why
    Filesystem permission profiles used `none` for deny-read entries, which
    is less direct than the action the entry actually represents. This
    change makes `deny` the canonical filesystem permission spelling while
    preserving compatibility for older configs that still send `none`.
    
    ## What changed
    - rename `FileSystemAccessMode::None` to `Deny`
    - serialize and generate schemas with `deny` as the canonical value
    - retain `none` only as a legacy input alias for temporary config
    compatibility
    - update filesystem glob diagnostics and regression coverage to use the
    canonical spelling
    - refresh config and app-server schema fixtures to match the new wire
    shape
    
    ## Validation
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core config_toml_deserializes_permission_profiles
    --lib`
    - `cargo test -p codex-core
    read_write_glob_patterns_still_reject_non_subpath_globs --lib`
    
    Earlier in the session, a broad `cargo test -p codex-core` run reached
    unrelated pre-existing failures in timing/snapshot/git-info tests under
    this environment; the targeted surfaces touched by this PR passed
    cleanly.
  • goal: pause continuation loops on usage limits and blockers (#23094)
    Addresses #22833, #22245, #23067
    
    ## Why
    `/goal` can keep synthesizing turns even when the next turn cannot make
    meaningful progress. Hard usage exhaustion can replay failing turns, and
    repeated permission or external-resource blockers can keep burning
    tokens while waiting for user or system intervention.
    
    ## What changed
    - Add resumable `blocked` and `usageLimited` goal states. As with
    `paused`, goal continuation stops with these states.
    - Move to `usageLimited` after usage-limit failures.
    - Allow the built-in `update_goal` tool to set `blocked` only under
    explicit repeated-impasse guidance. Updated goal continuation prompt to
    specify that agent should use `blocked` only when it has made at least
    three attempts to get past an impasse.
    
    Most of the files touched by this PR are because of the small app server
    protocol update.
    
    ## Validation
    
    I manually reproduced a number of situations where an agent can run into
    a true impasse and verified that it properly enters `blocked` state. I
    then resumed and verified that it once again entered `blocked` state
    several turns later if the impasse still exists.
    
    I also manually reproduced the usage-limit condition by creating a
    simulated responses API endpoint that returns 429 errors with the
    appropriate error message. Verified that the goal runtime properly moves
    the goal into `usageLimited` state and TUI UI updates appropriately.
    Verified that `/goal resume` resumes (and immediately goes back into
    `ussageLImited` state if appropriate).
    
    
    ## Follow-up PRs
    
    Small changes will be needed to the GUI clients to properly handle the
    two new states.
  • Preserve image detail in app-server inputs (#20693)
    ## Summary
    
    - Add optional image detail to user image inputs across core, app-server
    v2, thread history/event mapping, and the generated app-server
    schemas/types.
    - Preserve requested detail when serializing Responses image inputs:
    omitted detail stays on the existing `high` default, while explicit
    `original` keeps local images on the original-resolution path.
    - Support `high`/`original` consistently for tool image outputs,
    including MCP `codex/imageDetail`, code-mode image helpers, and
    `view_image`.
  • feat(app-server): update remote control APIs for better UX (#22877)
    ## Why
    To help improve `codex remote-control` CLI UX which I plan to do in a
    followup, this PR adds `server-name` to the various remote control APIs:
    - `remoteControl/enable`
    - `remoteControl/disable`
    - `remoteControl/status/changed`
    
    Also, add a `remoteControl/status/read` API. This will be helpful in the
    Codex App.
  • feat: Use installation ID in remote enrollments (#21662)
    * Pass installation ID for storage on enrollments server for
    deduping/grouping multiple appservers per installation
    * Pass installation ID in remoteControl/status/changed events