206 Commits

  • [codex] Use model metadata for skills usage instructions (#29740)
    ## Summary
    
    - add a false-by-default `include_skills_usage_instructions` model
    metadata field
    - enable the field for the bundled `gpt-5.5` model metadata
    - consume the metadata in both core and extension skill rendering
    - remove hardcoded legacy-model matching and its marker plumbing
  • app-server: structure and test JSON shutdown logs (#30314)
    ## Why
    
    `LOG_FORMAT=json` and `RUST_LOG` are supported by app-server, but the
    behavior was only covered indirectly. We should verify the actual JSONL
    written by both user-facing entry points: `codex app-server` and the
    standalone `codex-app-server` binary.
    
    The existing processor shutdown message also always said the channel
    closed, even though the processor can exit for several different
    reasons. Structured fields make that event more accurate and useful to
    log consumers.
    
    ## What changed
    
    - Record the processor `exit_reason`, remaining connection count, and
    forced-shutdown state as structured tracing fields.
    - Add a shared process-test helper that enables JSON logging, validates
    every stderr line as JSON, and verifies the top-level timestamp is RFC
    3339.
    - Cover both `codex app-server` and `codex-app-server`, asserting the
    stable `level`, `fields`, and `target` payload.
    
    ## Test plan
    
    - `just test -p codex-app-server
    standalone_app_server_emits_json_info_events`
    - `just test -p codex-cli app_server_emits_json_info_events`
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • auth: move domain mode below app wire types (#29721)
    ## Why
    
    Authentication mode is a domain concept used by login, model selection,
    telemetry, and transports. Keeping the canonical type in app-server
    protocol forces those lower-level crates to depend on an unrelated wire
    API.
    
    ## What changed
    
    - Added canonical `codex_protocol::auth::AuthMode` domain values.
    - Kept the app-server wire DTO unchanged and added an explicit app-side
    conversion.
    - Removed production app-server-protocol dependencies from login,
    model-provider-info, models-manager, and otel call paths.
    
    ## Stack
    
    This is PR 2 of 6, stacked on [PR
    #29714](https://github.com/openai/codex/pull/29714). Review only the
    delta from `codex/split-json-rpc-protocols`. Next: [PR
    #29722](https://github.com/openai/codex/pull/29722).
    
    ## Validation
    
    - Auth and login coverage passed in the focused protocol/domain test
    run.
    - App-server account and auth conversion coverage passed.
  • test: add app-server auto environment helper (#29746)
    ## Why
    
    Start moving towards app-server tests defaulting to running against
    remote & foreign OS executors. To do so we need a point of indirection
    similar to core integration tests' `build_with_auto_env`, but with the
    flexibility of letting tests control environment registration if they
    need to.
    
    ## What
    
    This adds:
    
    - `TestAppServer::new_with_auto_env()` for constructing an app server
    with a default environment defined by the test runner (e.g. bazel)
    - `TestAppServer::auto_env_params()` for tests to easily acquire turn
    env params tailored to the automatic environment
    - `TestAppServer::send_thread_start_request_with_auto_env()` to make it
    easy for tests to start a thread using the automatic environment
    
    The above methods all fail if the test calling them has set up an
    environment where the automatic environment configuration conflicts with
    test-created state.
    
    ## Validation
    
    Adds a couple of basic smoke tests to the app-server test suite.
    Follow-ups will migrate more tests to use it.
  • core: persist initial context window metadata (#29519)
    ## Why
    
    PR #29494 made context-window IDs visible to the model by wrapping the
    token-budget window payload in `<context_window>`, but rollout JSONL
    consumers still could not see the initial window identity by tailing the
    session file. Compacted rollout items carry window IDs only after
    compaction has happened, so a session with no compaction had no durable
    JSONL record for window 0.
    
    This change gives tailing consumers a stable initial-window record at
    session creation time.
    
    ## What Changed
    
    - Added `session_meta.context_window.window_id` for the initial
    context-window identity.
    - `CreateThreadParams` now requires `initial_window_id: String`, so
    thread-store callers cannot accidentally create new threads without
    window-0 metadata.
    - Live thread creation derives the persisted initial window ID from the
    same `AutoCompactWindowIds` used to initialize `SessionState`, keeping
    runtime state and JSONL metadata aligned.
    - Rollout reconstruction uses `session_meta.context_window.window_id` as
    the initial-window fallback and derives `window_number = 0`,
    `first_window_id = window_id`, and `previous_window_id = None`
    internally.
    - Fork reconstruction intentionally uses the same rollout reconstruction
    path; consumers that need to distinguish copied initial-window metadata
    can use the rollout `thread_id`.
    - Legacy compactions without `window_number` still use compaction-count
    fallback accounting instead of being reset to window 0 by the
    initial-window fallback.
    - Compacted rollout metadata still takes precedence once compaction
    records exist, preserving the richer chain fields there.
    
    ## JSONL Shape
    
    Real rollout JSONL is one object per line. This example is expanded for
    readability, but shows the new initial `session_meta.context_window`
    record followed by the existing compacted rollout item shape that also
    carries window IDs:
    
    ```jsonl
    {
      "timestamp": "2026-06-22T12:00:00.000Z",
      "type": "session_meta",
      "payload": {
        "session_id": "<THREAD_ID>",
        "id": "<THREAD_ID>",
        "timestamp": "2026-06-22T12:00:00.000Z",
        "cwd": "/repo",
        "originator": "codex",
        "cli_version": "0.0.0",
        "source": "cli",
        "model_provider": "<MODEL_PROVIDER>",
        "context_window": {
          "window_id": "<INITIAL_WINDOW_ID>"
        }
      }
    }
    ...
    {
      "timestamp": "2026-06-22T12:34:56.000Z",
      "type": "compacted",
      "payload": {
        "message": "<COMPACTION_SUMMARY>",
        "replacement_history": [
          "..."
        ],
        "window_number": 1,
        "first_window_id": "<INITIAL_WINDOW_ID>",
        "previous_window_id": "<INITIAL_WINDOW_ID>",
        "window_id": "<NEXT_WINDOW_ID>"
      }
    }
    ```
    
    The nested `context_window` object is intentional: it gives rollout
    consumers a stable namespace for context-window metadata while only
    writing the non-derivable initial `window_id`. For the initial window,
    `window_number`, `first_window_id`, and `previous_window_id` are derived
    internally instead of being written to the rollout.
    
    ## Verification
    
    - `just test -p codex-protocol`
    - `just test -p codex-rollout
    recorder_materializes_on_flush_with_pending_items`
    - `just test -p codex-core reconstruct_history`
    - `just test -p codex-core
    record_initial_history_reconstructs_forked_transcript`
    - `just test -p codex-thread-store`
    - `just test -p codex-state`
    - `just test -p codex-app-server
    thread_read_returns_summary_without_turns`
    - `just test -p codex-rollout persistence_metrics`
  • feat(app-server): thread/turns/items/list -> thread/items/list (#29705)
    ## Description
    
    Rename the experimental app-server item pagination API from
    `thread/turns/items/list` to `thread/items/list` and make `turnId`
    optional. Clients can now page persisted items across a thread, or still
    filter to one turn when needed.
    
    ## What changed
    
    - Rename the request/response protocol types and JSON-RPC method to
    `ThreadItemsList*` / `thread/items/list`.
    - Pass optional `turnId` through to `ThreadStore::list_items`.
    - Update app-server docs and focused protocol/app-server tests.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol thread_items_list_round_trips`
    - `just test -p codex-app-server thread_items_list_returns_unsupported`
  • Persist session IDs across thread resume (#29327)
    ## Summary
    
    A cold-resumed subagent kept its durable thread ID but could receive a
    new session ID, splitting one agent tree across multiple sessions after
    a restart.
    
    Persist the root session ID in every rollout `SessionMeta`, carry it
    through thread creation, and restore it before initializing the resumed
    `Session` and `AgentControl`.
    
    ## Behavior
    
    For a nested agent tree:
    
    ```text
    root session R
      parent thread P
        child thread C
    ```
    
    The child rollout stores:
    
    ```text
    session_id:       R
    parent_thread_id: P
    id:               C
    ```
    
    After a cold resume, the child still belongs to root session `R` while
    its immediate parent remains `P`. The integration coverage uses distinct
    values for all three IDs so it catches restoring the session from
    `parent_thread_id`.
    
    ## Legacy rollouts
    
    Previous rollouts have `id` but no `session_id`. `SessionMetaLine`
    deserialization treats a missing `session_id` as `id`, keeping those
    files readable, listable, and resumable. When a legacy subagent is
    resumed through its root, that synthesized child ID no longer overrides
    the inherited root-scoped `AgentControl`. New rollouts always persist
    the explicit root session ID.
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • Add realtime speech append control (#27917)
    ## Why
    
    Realtime voice harness tuning needs app-side control over what backend
    Codex text is spoken. Backend orchestrator text is written for a reading
    UI, so automatically speaking every preamble, progress update, or final
    assistant message can make the realtime voice model too chatty.
    
    For experimentation, clients need two simple controls: keep app/client
    text-item injection on the existing item-create path, and add an
    explicit speakable path that app code can call only when it wants
    realtime to speak. Automatic Codex output also needs an opt-in way to
    switch from the protocol's default speakable path to regular realtime
    items, with a caller-provided prefix so prompt wording can be tuned
    outside core.
    
    The default remains unchanged: if a client omits the new start fields
    and never calls `appendSpeech`, automatic backend output continues down
    the existing speakable path for the selected realtime protocol.
    
    ## What Changed
    
    - Adds experimental `thread/realtime/appendSpeech` for app-provided
    speakable text.
    - Keeps existing `thread/realtime/appendText` as the item-create API for
    app-provided realtime text items.
    - Adds `codexResponsesAsItems` / `codex_responses_as_items` on
    `thread/realtime/start` to send automatic Codex responses with
    `conversation.item.create` instead of the protocol's default speakable
    output path.
    - Adds `codexResponseItemPrefix` / `codex_response_item_prefix` so
    clients can prepend experiment instructions to those automatic Codex
    response items.
    - Keeps literal `conversation.handoff.append` routing scoped to the v1
    speakable path; v2 default speech uses its item/function-output plus
    `response.create` behavior.
    - Removes the earlier public silent-context API and hardcoded
    silent-context prefix.
    - Updates realtime tests to cover default automatic speakable behavior,
    opt-in automatic item-create behavior, and explicit `appendSpeech`
    behavior.
    
    ## Validation
    
    - `cargo check -p codex-core -p codex-app-server -p codex-api`
    - `just test -p codex-app-server realtime_conversation`
    - `just test -p codex-core realtime_conversation` (50/51 passed in the
    filtered parallel run; the lone failure passed when rerun in isolation)
    - `just test -p codex-core
    conversation_mirrors_assistant_message_text_to_realtime_handoff`
    - `just test -p codex-api
    e2e_connect_and_exchange_events_against_mock_ws_server`
    - `just fix -p codex-core`
    - `just fix -p codex-app-server`
    - `cargo build -p codex-cli`
  • feat(app-server): expose rate-limit reset credits (#28143)
    ## Why
    
    Codex users can earn personal rate-limit reset credits, but app-server
    clients do not currently have an API for reading or redeeming them. This
    adds the backend and protocol foundation used by the `/usage` TUI flow
    in #28154.
    
    ## What changed
    
    - Extend `account/rateLimits/read` with a nullable
    `rateLimitResetCredits` summary sourced from the existing usage
    response.
    - Add backend-client and app-server support for consuming a reset with a
    caller-generated idempotency key. A UUID is recommended, and clients
    reuse the same key when retrying the same logical reset.
    - Return only the consume `outcome`; clients refetch
    `account/rateLimits/read` for updated window state.
    - Document the response field and each consume outcome, and regenerate
    the JSON and TypeScript schema fixtures.
    - Clarify in `AGENTS.md` that new app-server string enum values use
    camelCase on the wire.
    - Update the existing TUI response fixture for the expanded protocol
    shape.
    - Add coverage for authentication, response mapping, backend failures,
    consume outcomes, and request timeout behavior.
    
    ## Validation
    
    - `just test -p codex-app-server-protocol` — 231 passed.
    - `just test -p codex-backend-client` — 14 passed.
    - Focused `codex-app-server` reset-credit tests — 5 passed.
    - Focused `codex-tui` protocol response fixture test — passed.
    - `just fix -p codex-backend-client -p codex-app-server-protocol -p
    codex-app-server` — passed.
    - `just fmt` — passed.
  • build: run buildifier from just fmt (#28125)
    ## Intent
    
    Keep Bazel and Starlark files consistently formatted without requiring
    contributors to install or version buildifier themselves.
    
    ## Implementation
    
    - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
    v8.5.1.
    - Run buildifier from the shared `just fmt` and `just fmt-check` driver,
    with Windows-safe explicit DotSlash invocation.
    - Provision DotSlash in formatting CI and contributor devcontainers, and
    document the source-build prerequisite.
    - Apply the initial mechanical buildifier formatting baseline.
  • feat(app-server): enforce managed remote control disable (#27961)
    ## Why
    
    Managed deployments need a reliable deny gate for remote control.
    Persisted enablement and explicit startup requests currently remain able
    to start the transport, while the removed `features.remote_control` key
    is intentionally only a compatibility no-op.
    
    This adds a dedicated requirement that administrators can use to force
    remote control off without deleting the user's persisted preference.
    Removing the requirement and restarting restores the prior choice.
    
    ## What Changed
    
    - Added top-level `allow_remote_control` requirements parsing, sourced
    layer precedence, debug output, and `configRequirements/read` exposure
    as `allowRemoteControl`.
    - Added a typed transport policy captured from the startup requirements
    snapshot. Managed disable forces the initial state to disabled and
    prevents enrollment, refresh, connection, and persisted-preference
    mutation.
    - Rejected every `remoteControl/*` RPC before parameter deserialization
    with JSON-RPC `-32600` and `remote control is disabled by managed
    requirements`.
    - Preserved the existing disabled status notification and the previous
    behavior when the requirement is `true` or omitted.
    - Regenerated app-server protocol schemas and documented the new
    requirement.
    
    ## Verification
    
    - Confirmed all remote-control RPCs, including a malformed request,
    return the managed-policy error while the initial status notification
    remains `disabled`.
    - Confirmed explicit ephemeral startup and persisted enablement make no
    backend connection and leave the SQLite preference unchanged.
    - Confirmed `allow_remote_control = true` does not enable or block
    remote control and `configRequirements/read` returns
    `allowRemoteControl: false` for the deny policy.
    
    Related issue: N/A (managed-policy hardening).
  • feat: use encrypted local secrets for CLI auth (#27539)
    ## Why
    
    Windows Credential Manager limits generic credential blobs to 2,560
    bytes. Large serialized ChatGPT auth payloads can exceed that limit, so
    keyring-mode CLI auth needs a backend that keeps only the encryption key
    in the OS keyring and stores the payload in Codex's encrypted
    local-secrets file.
    
    This is the third PR in the encrypted-auth stack:
    
    1. #27504 — feature and config selection
    2. #27535 — auth-specific local-secrets namespaces
    3. This PR — CLI auth implementation and activation
    4. MCP OAuth implementation and activation
    
    ## What Changed
    
    - Added encrypted CLI-auth storage using the `CliAuth` secrets
    namespace.
    - Preserved direct keyring storage for platforms/configurations where it
    remains selected.
    - Selected the backend consistently for login, logout, refresh,
    device-code login, auth loading, and login restrictions.
    - Threaded resolved bootstrap/full config through CLI, exec, TUI,
    app-server account handling, cloud config, and cloud tasks.
    - Removed stale `auth.json` fallback data after successful encrypted
    saves and removed encrypted, direct-keyring, and fallback data during
    logout.
    - Added storage and integration coverage for both direct and encrypted
    keyring modes.
    
    MCP OAuth persistence is intentionally left to the next PR.
    
    ## Validation
    
    - `just test -p codex-login` — 131 passed
    - `just test -p codex-cli` — 280 passed
    - `just test -p codex-app-server v2::account` — 25 passed
    - `just test -p codex-cloud-config service` — 21 passed, 7 skipped
    - `just fix -p codex-login`
    - `just fix -p codex-cli`
    - `just fmt`
  • feat(app-server): persist remote-control desired state (#27445)
    ## Why
    
    Remote-control runtime enablement and persisted enrollment preference
    were represented by separate flags. That made startup rehydration, RPC
    persistence, and new-enrollment seeding race with one another, and it
    did not cleanly distinguish runtime-only CLI or daemon starts from
    durable app-server RPC changes.
    
    ## What Changed
    
    - Replace the parallel enablement, seed, and rehydration flags with one
    transport-owned `RemoteControlDesiredState`.
    - Add nullable enrollment-scoped persistence and preserve existing
    preferences during enrollment upserts.
    - Rehydrate plain startup only after auth and client scope resolve,
    without overwriting a concurrent RPC transition.
    - Make ordinary `remoteControl/enable` and `remoteControl/disable`
    durable while retaining `ephemeral: true` for runtime-only callers.
    - Have the daemon explicitly request ephemeral enablement and regenerate
    the app-server schemas.
    
    ## Verification
    
    - Covered migration and `NULL`/`0`/`1` persistence round trips.
    - Covered plain-start rehydration and runtime-only versus durable
    enrollment seeding.
    - Covered durable enable, durable disable, and ephemeral enable through
    app-server RPC.
    - Covered the daemon's exact `{ "ephemeral": true }` request payload.
    
    Related issue: N/A (internal remote-control persistence architecture
    change).
  • [codex] Add comp_hash to model metadata (#27532)
    ## Summary
    - add optional `comp_hash` metadata to `ModelInfo`
    - update `ModelInfo` fixtures for the shared schema change
    - keep older model responses compatible by defaulting the field to
    `None`
    
    ## Why
    The models endpoint needs an opaque identifier for compaction-compatible
    model configurations. This PR only exposes that value in model metadata;
    it does not add it to turn context or change runtime behavior.
    
    Follow-up #27520 carries the value through turn context and rollouts,
    then uses it to trigger compaction.
    
    ## Stack
    - based directly on `main`
    - replaces #27519, which was accidentally merged into the wrong base
    branch
    - functionality follow-up: #27520
    
    ## Testing
    - `just test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • feat: add Bedrock API key as a managed auth mode (#27443)
    ## Why
    
    Codex needs to manage Amazon Bedrock API key credentials through the
    existing auth lifecycle instead of introducing a separate auth manager
    or provider-specific credential file. Treating Bedrock API key login as
    a primary auth mode gives it the same persistence, keyring, reload, and
    logout behavior as the existing OpenAI API key and ChatGPT modes.
    
    The credential is valid only for the `amazon-bedrock` model provider.
    OpenAI-compatible providers must reject this auth mode rather than
    treating the Bedrock key as an OpenAI bearer token.
    
    ## What changed
    
    - Added `bedrockApiKey` as an app-server `AuthMode` and
    `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode.
    - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to
    the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or
    the configured keyring backend.
    - Added `login_with_bedrock_api_key(...)`, parallel to
    `login_with_api_key(...)`, which replaces the current stored login with
    Bedrock credentials.
    - Reused generic auth reload and logout behavior instead of adding a
    Bedrock-specific auth manager or logout path.
    - Updated login restrictions, status reporting, diagnostics, telemetry
    classification, generated app-server schemas, and auth fixtures for the
    new mode.
    - Added explicit errors when Bedrock API key auth is selected with an
    OpenAI-compatible model provider.
    
    This PR establishes managed storage and auth-mode behavior. Routing the
    managed key and region into Amazon Bedrock requests will be in follow-up
    PRs.
  • Add app-server thread/delete API (#25018)
    ## Why
    
    Clients can archive and unarchive threads today, but there is no
    app-server API for permanently removing a thread. Deletion also needs to
    cover the full session tree: deleting a main thread should remove
    spawned subagent threads and the related local metadata instead of
    leaving orphaned rollout files, goals, or subagent state behind.
    
    ## What
    
    - Adds the v2 `thread/delete` request and `thread/deleted` notification,
    with the response shape kept consistent with `thread/archive`.
    - Implements local hard delete for active and archived rollout files.
    - Deletes the requested thread's state DB row as the commit point, then
    best-effort cleans associated state including spawned descendants,
    goals, spawn edges, logs, dynamic tools, and agent job assignments.
    - Updates app-server API docs and generated protocol schema/TypeScript
    fixtures.
  • [codex-rs] support v2 personal access tokens (#25731)
    ## Summary
    
    - add v2 personal access token support for `codex login
    --with-access-token` and `CODEX_ACCESS_TOKEN`
    - classify opaque `at-` tokens separately from legacy Agent Identity
    JWTs
    - hydrate required ChatGPT account metadata through AuthAPI
    `/v1/user-auth-credential/whoami`
    - use PATs directly as bearer tokens while preserving existing ChatGPT
    account surfaces
    - expose PAT-backed auth as the explicit `personalAccessToken`
    app-server auth mode
    
    ## Implementation
    
    PAT auth is intentionally small and stateless. Loading a PAT performs
    one AuthAPI metadata request, stores the hydrated metadata in the
    in-memory auth object, and redacts the secret from debug output. Legacy
    Agent Identity JWT handling remains unchanged. The shared access-token
    classifier lives in a private neutral module because it dispatches
    between both credential types.
    
    PAT hydration fails closed when AuthAPI omits any required metadata,
    including email. Hydrated metadata is intentionally not persisted:
    startup performs a live `whoami` preflight so revoked tokens or changed
    account metadata are not accepted from a stale cache.
    
    ## Workspace restriction scope
    
    This change intentionally does **not** apply
    `forced_chatgpt_workspace_id` to PAT authentication. The setting is a
    client-side config guardrail, not an authorization boundary, and PAT
    does not currently require workspace-ID parity. The PAT login and
    `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without
    threading workspace-restriction state through access-token loading.
    Existing workspace checks for non-PAT auth remain on their established
    paths.
    
    ## App-server compatibility
    
    The public app-server `AuthMode` is shared across v1 and v2, and
    PAT-backed auth reports `personalAccessToken` through both APIs.
    Following human review, this intentionally removes the temporary v1
    compatibility mapping that reported PATs as `chatgpt`; the deprecated v1
    API is kept in parity with v2 rather than maintaining a separate closed
    enum. Clients with exhaustive auth-mode handling in either API version
    must add the new case and should generally treat it as ChatGPT-backed
    unless they need PAT-specific behavior.
    
    The v1 auth-status response still omits the raw PAT when `includeToken`
    is requested because that response cannot carry the account metadata
    needed to reuse the credential safely. Persisted PAT auth also omits the
    new enum value so older Codex builds can deserialize `auth.json` and
    infer PAT auth from the credential field after a rollback.
    
    ## Validation
    
    Latest review-fix validation:
    
    - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli
    stored_auth_validation_handles_personal_access_token`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226
    passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-models-manager
    refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth`
    - `CARGO_INCREMENTAL=0 just test -p codex-tui
    existing_non_oauth_chatgpt_login_counts_as_signed_in`
    - `CARGO_INCREMENTAL=0 just fix -p codex-login -p
    codex-app-server-protocol -p codex-models-manager -p codex-tui -p
    codex-cli`
    - `just fmt`
    - `git diff --check`
    
    The broader `codex-tui` suite previously compiled and ran 2,834 tests.
    Three unrelated environment-sensitive guardian/IDE-socket tests failed
    after retries; the PAT-relevant TUI coverage passed.
  • feat(app-server): add remote control pairing status RPC (#26450)
    ## What
    
    Exposes the pairing status transport as experimental app-server v2 RPC
    `remoteControl/pairing/status`.
    
    - Adds request/response protocol types for exactly one lookup key:
    `pairingCode` or `manualPairingCode`, returning `{ claimed }`.
    - Registers the RPC with `global_shared_read("remote-control-pairing")`.
    - Wires the method through `MessageProcessor` and
    `RemoteControlRequestProcessor`.
    - Validates missing/conflicting pairing-code params as invalid requests.
    - Documents the RPC in `app-server/README.md`.
    - Adds processor, protocol export, and JSON-RPC integration coverage for
    both code paths.
    
    ## Why
    
    This is the app-server surface the desktop app can poll while the
    QR/manual pairing modal is active.
    
    Depends on https://github.com/openai/codex/pull/26449
    Related backend change: https://github.com/openai/openai/pull/990244
    
    ## Verification
    
    - `cargo test --manifest-path app-server-protocol/Cargo.toml
    remote_control`
    - `cargo test --manifest-path app-server/Cargo.toml remote_control`
    - `cargo fmt --all --check`
    - `git diff --check`
  • [codex] Add use_responses_lite 'override' logic (#26487)
    ## Summary
    
    - add a defaulted `ModelInfo.use_responses_lite` catalog field
    - support serializing `reasoning.context` while preserving the existing
    effort and summary path
    - has not been turned on for any models yet
    
    I've added an override to parallel tools if responses_lite is on. I've
    also forced persistent reasoning when using responses_lite. It would be
    ideal if we could centralize all the responses_lite plumbing, but I
    think this is best for now to keep the plumbing & diffs small.
    
    ## Testing
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    responses_lite_sets_all_turns_context_and_disables_parallel_tool_calls`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    configured_reasoning_summary_is_sent`
    - `cargo check -p codex-core --tests`
    - `RUST_MIN_STACK=8388608 cargo clippy -p codex-core --tests` (passes
    with pre-existing warnings in `codex-code-mode` and
    `codex-core-plugins`)
  • [codex] Support model-defined reasoning efforts (#26444)
    ## Summary
    - accept non-empty model-defined reasoning effort values while
    preserving built-in effort behavior
    - propagate the non-Copy effort type through core, app-server, TUI,
    telemetry, and persistence call sites
    - preserve string wire encoding and expose an open-string schema for
    clients
    - update model selection and shortcut behavior for model-advertised
    effort values
    
    ## Root cause
    `ReasoningEffort` gained a string-backed custom variant, so it could no
    longer implement `Copy` or rely on derived closed-enum serialization.
    Existing consumers still moved effort values from shared references and
    assumed a fixed built-in value set.
    
    ## Validation
    - `just fmt`
    - Local tests and compilation were not run per request; relying on CI.
  • feat(app-server): add remote control client management RPCs (#25785)
    ## Why
    
    Remote-control clients need to list and revoke controller-device grants
    without enabling or enrolling the local relay. These are signed-in
    account-management operations, so coupling them to websocket, pairing,
    enrollment, or persisted relay state would prevent clients from managing
    stale grants from the picker.
    
    Related enhancement request: N/A. This adds the Codex app-server surface
    for the planned upstream environment-scoped revoke endpoint.
    
    ## What Changed
    
    - Added experimental app-server v2 RPCs:
      - `remoteControl/client/list`
      - `remoteControl/client/revoke`
    - Added picker-oriented protocol types and standard generated schema
    fixtures. The list response intentionally omits backend account id,
    enrollment status, and location fields.
    - Added `app-server-transport/src/transport/remote_control/clients.rs`
    for environment-scoped GET and DELETE requests. It builds escaped URL
    path segments, forwards optional pagination query fields, sends ChatGPT
    auth plus `chatgpt-account-id`, converts RFC3339 `last_seen_at` values
    to Unix seconds, accepts `204 No Content` revoke responses, and retries
    once after a `401`.
    - Extracted shared ChatGPT auth loading and recovery into
    `app-server-transport/src/transport/remote_control/auth.rs` so
    websocket, pairing, and client management use the same account-auth
    boundary.
    - Retained the configured remote-control base URL on
    `RemoteControlHandle` and resolve management URLs lazily, preserving
    deferred validation while relay startup is disabled.
    - Registered list as `global_shared_read("remote-control-clients")` and
    revoke as `global("remote-control-clients")`.
    
    ## Verification
    
    - Added transport coverage proving list and revoke work while relay
    state is disabled, IDs are escaped, picker-only fields are returned,
    timestamps are converted, revoke accepts `204`, auth headers are
    forwarded, `401` retries exactly once, `403` is not retried, and
    malformed list payloads retain decode context.
    - Added an app-server integration test proving both JSON-RPC methods
    work before relay enablement and successful revoke returns `{}`.
    - Regenerated and validated experimental and standard app-server schema
    fixtures.
  • Add multi-agent runtime metadata types (#25720)
    Stack split from #25708. Original PR intentionally left open. This first
    PR adds the multi-agent runtime metadata types and catalog plumbing used
    by the rest of the stack.
  • feat(remote-control): add pairing start (#25675)
    ## Why
    
    Remote control enrollment authorizes a desktop server, but app-server v2
    did not expose the follow-up pairing operation needed to mint a
    short-lived controller pairing artifact from that enrolled server.
    Clients need a narrow RPC that starts pairing without exposing the
    backend `serverId` or conflating pairing with websocket connection
    state.
    
    Issue: N/A; internal remote-control pairing API change.
    
    ## What Changed
    
    Added experimental app-server v2 `remoteControl/pairing/start` with
    `manualCode` input and `pairingCode`, nullable `manualPairingCode`,
    `environmentId`, and Unix-seconds `expiresAt` output. The method
    serializes under its own `global("remote-control-pairing")` scope and is
    documented in `app-server/README.md`.
    
    Extended the remote-control transport with private `/server/pair`
    request/response types and normalized `pair_url` handling. Pairing uses
    the current enrolled server bearer, refreshes that bearer when needed,
    keeps backend `server_id` private, validates returned `server_id` and
    `environment_id` against the current enrollment, and preserves backend
    status/header/body context for failures and malformed responses.
    
    Wired the request through `RemoteControlRequestProcessor` and
    `MessageProcessor`, mapping unavailable/disabled pairing to
    `invalid_request` and backend failures to internal errors.
    
    ## Verification
    
    - `just test -p codex-app-server-transport`
    - `just test -p codex-app-server
    remote_control_pairing_start_returns_pairing_artifacts`
  • fix: rename McpServer to TestAppServer (#25701)
    This PR brought to you via VS Code rather than Codex...
    
    - opened `codex-rs/app-server/tests/common/mcp_process.rs`
    - put the cursor on `McpServer`
    - hit `F2` and renamed the symbol to `TestAppServer`
    - went to the file tree
    - hit enter and renamed `mcp_process.rs` to `test_app_server.rs`
    - ran **Save All Files** from the Command Palette
    - ran `just fmt`
    
    The End
    
    (Admittedly, most of the local variables for `TestAppServer` are still
    named `mcp`, though.)
  • [codex-rs] auto-review model override (#23767)
    ## Why
    
    Guardian auto-review normally uses the provider-preferred review model
    when one is available. Some parent models need model-catalog metadata to
    select a different review model while keeping older `/models` payloads
    compatible when that metadata is absent.
    
    ## What changed
    
    - Added optional `ModelInfo::auto_review_model_override` metadata to the
    public model payload as a review-model slug.
    - Updated Guardian review model selection to prefer the catalog override
    when present, while preserving the existing provider preferred-model
    path and parent-model fallback when it is omitted.
    - Added focused Guardian coverage for override and no-override model
    selection.
    - Added an `auto_review` core integration suite test that loads override
    metadata from a remote model catalog path and asserts the strict
    auto-review `/responses` request uses the catalog-selected review model.
    - Updated existing `ModelInfo` fixtures and local catalog constructors
    for the new optional field.
    
    ## Validation
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `cargo test -p codex-core guardian_review_uses_`
    - `cargo test -p codex-core
    remote_model_override_uses_catalog_model_for_strict_auto_review --test
    all`
    - `just fix -p codex-protocol`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • store and expose parent_thread_id on Threads (#25113)
    ## Why
    
    This PR
    https://github.com/openai/codex/pull/24161#discussion_r3325692763
    revealed a subagent data modeling issue, where we overloaded
    `forked_from_id` to also mean `parent_thread_id`. That's incorrect since
    guardian and review subagents can be a subagent and NOT fork the main
    thread's history.
    
    The solution here is to explicitly store a new `parent_thread_id` on
    `SessionMeta`, alongside `forked_from_id` which already exists. While
    we're at it, also expose it in the app-server protocol on the `Thread`
    object.
    
    A thread->subagent relationship and a fork of thread history are
    orthogonal concepts.
    
    ## What Changed
    
    - Added top-level `parent_thread_id` persistence on `SessionMeta` and
    runtime/session plumbing through `SessionConfiguredEvent`,
    `CodexSpawnArgs`, `SessionConfiguration`, `ThreadConfigSnapshot`,
    `TurnContext`, and `ModelClient`.
    - Made turn metadata, request headers, analytics, and subagent-start
    events read the separate runtime/top-level parent field instead of
    deriving general parent lineage from `SessionSource` or
    `forked_from_thread_id`.
    - Passed parent lineage separately at delegated subagent, review,
    guardian, agent-job, and multi-agent spawn construction sites;
    copied-history fork lineage remains derived only from `InitialHistory`.
    - Persisted and exposed parent lineage through rollout/thread-store
    projections and app-server v2 `Thread.parentThreadId`.
    - Updated app-server README text and regenerated app-server schema
    fixtures for the additive `parentThreadId` response field.
  • [codex] Add model tool mode selector (#25031)
    ## Why
    Some models need to select their code-execution behavior through model
    catalog metadata. Models without that metadata must continue to follow
    the existing `CodeMode` and `CodeModeOnly` feature flags, including when
    a newer server sends an enum value this client does not recognize.
    
    ## What changed
    - add optional `ModelInfo.tool_mode` metadata with `direct`,
    `code_mode`, and `code_mode_only`
    - treat omitted and unknown wire values as `None`
    - resolve `None` from the existing feature flags
    - carry the resolved `ToolMode` directly on `TurnContext`, outside
    `Config`
    - use the resolved value for turn creation, model switches, review
    turns, tool planning, and code execution
    
    ## Coverage
    - add protocol coverage for omitted, known, and unknown enum values
    - add focused coverage for flag fallback and explicit metadata
    overriding feature flags
    - add core integration coverage that fetches remote model metadata
    through `/v1/models` and verifies the outbound `/responses` tools for
    explicit `direct` and `code_mode_only` selectors
    
    ## Stack
    - followed by #25032
  • Add runtime extra skill roots API (#24977)
    ## Summary
    - Add v2 `skills/extraRoots/set` to replace app-server process-local
    standalone skill roots. The setting is not persisted, accepts missing
    roots, and `extraRoots: []` clears the runtime set.
    - Wire runtime roots into core skill discovery for `skills/list` and
    turn loads, clear skill caches on set, and register the roots with the
    skills watcher so later filesystem changes emit `skills/changed`.
    - Update app-server docs, generated JSON/TypeScript schemas, and
    coverage for serialization, missing roots, empty clears, and restart
    behavior.
    
    ## Testing
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core-skills`
    - `cargo test -p codex-app-server
    skills_extra_roots_set_updates_process_runtime_roots`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-core-skills`
    - `just fix -p codex-app-server`
  • package: include zsh fork in Codex package (#23756)
    ## Why
    
    The package layout gives Codex a stable place for runtime helpers that
    should travel with the entrypoint. `shell_zsh_fork` still required users
    to configure `zsh_path` manually, even though we already publish
    prebuilt zsh fork artifacts.
    
    This PR builds on #24129 and uses the shared DotSlash artifact fetcher
    to include the zsh fork in Codex packages when a matching target
    artifact exists. Packaged Codex builds can then discover the bundled
    fork automatically; the user/profile `zsh_path` override is removed so
    the feature uses the package-managed artifact instead of a legacy path
    knob.
    
    ## What Changed
    
    - Added `scripts/codex_package/codex-zsh`, a checked-in DotSlash
    manifest for the current macOS arm64 and Linux zsh fork artifacts.
    - Taught `scripts/build_codex_package.py` to fetch the matching zsh fork
    artifact and install it at `codex-resources/zsh/bin/zsh` when available
    for the selected target.
    - Added package layout validation for the optional bundled zsh resource.
    - Added `InstallContext::bundled_zsh_path()` and
    `InstallContext::bundled_zsh_bin_dir()` for package-layout resource
    discovery.
    - Threaded the packaged zsh path through config loading as the runtime
    `zsh_path` for packaged installs, and removed the config/profile/CLI
    override path.
    - Kept the packaged default zsh override typed as `AbsolutePathBuf`
    until the existing runtime `Config::zsh_path` boundary.
    - Updated app-server zsh-fork integration tests to spawn
    `codex-app-server` from a temporary package layout with
    `codex-resources/zsh/bin/zsh`, matching the new packaged discovery path
    instead of setting `zsh_path` in config.
    - Switched package executable copying from metadata-preserving `copy2()`
    to `copyfile()` plus explicit executable bits, which avoids macOS
    file-flag failures when local smoke tests use system binaries as inputs.
    
    ## Testing
    
    To verify that the `zsh` executable from the Codex package is picked up
    correctly, first I ran:
    
    ```shell
    ./scripts/build_codex_package.py
    ```
    
    which created:
    
    ```
    /private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/
    ```
    
    so then I ran:
    
    ```
    /private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/bin/codex exec --enable shell_zsh_fork 'run `echo $0`'
    ```
    
    which reported the following, as expected:
    
    ```
    /private/var/folders/vw/x2knqmks50sfhfpy27nftl900000gp/T/codex-package-pms94kdp/codex-resources/zsh/bin/zsh
    ```
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23756).
    * #23768
    * __->__ #23756
  • [codex] Add rollout-backed thread content search (#23519)
    ## Summary
    - add experimental `thread/search` for local rollout-backed thread
    search using `rg` over JSONL rollouts
    - return search-specific result rows with optional previews instead of
    storing preview data on `StoredThread` or ordinary `Thread` responses
    - keep `thread/list` separate from full-content search and document the
    new app-server surface
    
    ## Testing
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server
    thread_search_returns_content_and_title_matches -- --nocapture`
  • Honor client-resolved service tier defaults (#23537)
    ## Why
    
    Model catalog responses can now advertise a nullable
    `default_service_tier` for each model. Codex needs to preserve three
    distinct states all the way from config/app-server inputs to inference:
    
    - no explicit service tier, so the client may apply the current model
    catalog default when FastMode is enabled
    - explicit `default`, meaning the user intentionally wants standard
    routing
    - explicit catalog tier ids such as `priority`, `flex`, or future tiers
    
    Keeping those states distinct prevents the UI from showing one tier
    while core sends another, especially after model switches or app-server
    `thread/start` / `turn/start` updates.
    
    ## What Changed
    
    - Plumbed `default_service_tier` through model catalog protocol types,
    app-server model responses, generated schemas, model cache fixtures, and
    provider/model-manager conversions.
    - Added the request-only `default` service tier sentinel and normalized
    legacy config spelling so `fast` in `config.toml` still materializes as
    the runtime/request id `priority`.
    - Moved catalog default resolution to the TUI/client side, including
    recomputing the effective service tier when model/FastMode-dependent
    surfaces change.
    - Updated app-server thread lifecycle config construction so
    `serviceTier: null` preserves explicit standard-routing intent by
    mapping to `default` instead of internal `None`.
    - Kept core responsible for validating explicit tiers against the
    current model and stripping `default` before `/v1/responses`, without
    applying catalog defaults itself.
    
    ## Validation
    
    - `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
    - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
    - `cargo test -p codex-tui service_tier`
    - `cargo test -p codex-protocol service_tier_for_request`
    - `cargo test -p codex-core get_service_tier`
    - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
    service_tier`
  • Add thread/settings/update app-server API (#23502)
    ## Why
    
    App-server clients need a way to update a thread's next-turn settings
    without starting a turn, adding transcript content, or waiting for turn
    lifecycle events. This gives settings UI a direct path for durable
    thread settings while clients observe the eventual effective state
    through a notification.
    
    This is a simplified rework of PR
    https://github.com/openai/codex/pull/22509. In particular, it changes
    the `thread/settings/update` api to return immediately rather than
    waiting and returning the effective (updated) thread settings. This
    makes the new api consistent with `turn/start` and greatly reduces the
    complexity of the implementation relative to the earlier attempt.
    
    ## What Changed
    
    - Adds experimental `thread/settings/update` with partial-update request
    fields and an empty acknowledgment response.
    - Adds experimental `thread/settings/updated`, carrying full effective
    `ThreadSettings` and scoped by `threadId` to subscribed clients for the
    affected thread.
    - Shares durable settings validation with `turn/start`, including
    `sandboxPolicy` plus `permissions` rejection and `serviceTier: null`
    clearing.
    - Emits the same settings notification when `turn/start` overrides
    change the stored effective thread settings.
    - Regenerates app-server protocol schema fixtures and updates
    `app-server/README.md`.
  • feat: add permission profile list api (#23412)
    ## Why
    
    Clients need a typed permission-profile catalog instead of
    reconstructing that state from config internals.
    
    ## What changed
    
    - Added `permissionProfile/list` to the app-server v2 protocol with
    cursor pagination and optional `cwd`.
    - The list response includes built-in permission profiles plus
    config-defined `[permissions.<id>]` profiles from the effective config
    for the request context.
    - Permission profiles keep optional `description` metadata for display
    purposes.
    - App-server docs and schema fixtures are updated for the new RPC.
  • [codex] Add installed-plugin mention API (#22448)
    ## Summary
    - add app-server `plugin/installed` for mention-oriented plugin loading
    - return installed plugins plus explicitly requested install-suggestion
    rows
    - keep remote handling on installed-state data instead of the broad
    catalog listing path
    
    ## Why
    The `@` mention surface only needs plugins that are usable now, plus a
    small product-approved set of install suggestions. It does not need the
    full catalog-shaped `plugin/list` payload that the Plugins page uses.
    
    ## Validation
    - `just write-app-server-schema`
    - `just fmt`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-core-plugins`
    - `cargo test -p codex-app-server --test all plugin_installed_`
    
    ## Notes
    - The package-wide `cargo test -p codex-app-server` run still hits an
    existing unrelated stack overflow in
    `in_process::tests::in_process_start_clamps_zero_channel_capacity`.
    - Companion webview PR: https://github.com/openai/openai/pull/915672
  • feat(app-server): update remote control APIs for better UX (#22877)
    ## Why
    To help improve `codex remote-control` CLI UX which I plan to do in a
    followup, this PR adds `server-name` to the various remote control APIs:
    - `remoteControl/enable`
    - `remoteControl/disable`
    - `remoteControl/status/changed`
    
    Also, add a `remoteControl/status/read` API. This will be helpful in the
    Codex App.
  • enable/disable remote control at runtime, not via features (#22578)
    ## Why
    reapplies https://github.com/openai/codex/pull/22386 which was
    previously reverted
    
    Also, introduce `remoteControl/enable` and `remoteControl/disable`
    app-server APIs to toggle on/off remote control at runtime for a given
    running app-server instance.
    
    ## What Changed
    
    - Adds experimental v2 RPCs:
      - `remoteControl/enable`
      - `remoteControl/disable`
    - Adds `RemoteControlRequestProcessor` and routes the new RPCs through
    it instead of `ConfigRequestProcessor`.
    - Adds named `RemoteControlHandle::enable`, `disable`, and `status`
    methods.
    - Makes `remoteControl/enable` return an error when sqlite state DB is
    unavailable, while keeping enrollment/websocket failures as async status
    updates.
    - Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden
    `--remote-control` flags for `codex app-server` and `codex-app-server`.
    - Updates managed daemon startup to use `codex app-server
    --remote-control --listen unix://`.
    - Marks `Feature::RemoteControl` as removed and ignores
    `[features].remote_control`.
    - Updates app-server README entries for the new remote-control methods.
  • feat(app-server, threadstore): Thread pagination APIs and ThreadStore contract (#21566)
    ## Why
    The goal of this PR is to align on app-server and `ThreadStore` API
    updates for paginating through large threads.
    
    
    #### app-server
    ##### `thread/turns/list`
    - Updates `thread/turns/list` to support `itemsView?: "notLoaded" |
    "summary" | "full" | null`, defaulting to `summary`.
    - Implements the current `thread/turns/list` behavior over the existing
    persisted rollout-history fallback:
      - `notLoaded` returns turn envelopes with empty `items`.
    - `summary` returns the first user message and final assistant message
    when available.
      - `full` preserves the existing full item behavior.
    
    Note that this method still uses the naive approach of loading the
    entire rollout file, and returns just the filtered slice of the data.
    Real pagination will come later by leveraging SQLite.
    
    ##### `thread/turns/items/list`
    - Adds the experimental `thread/turns/items/list` protocol, schema,
    dispatcher, and processor stub. The app-server currently returns
    JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`.
    
    #### ThreadStore
    - Adds the experimental `thread/turns/items/list` protocol, schema,
    dispatcher, and processor stub. The app-server currently returns
    JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`.
    - Adds `ThreadStore` contract types and stubbed methods for listing
    thread turns and listing items within a turn.
    - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking
    app-server API enums or lossy string status values into the store-facing
    turn contract.
    - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking
    app-server API enums or lossy string status values into the store-facing
    turn contract.
    
    This also sketches the storage abstraction we expect to need once turns
    are indexed/stored. In particular, `notLoaded` is useful only if
    ThreadStore can eventually list turn metadata without loading every
    persisted item for each turn.
    
    ## Validation
    
    - Added/updated protocol serialization coverage for the new request and
    response shapes.
    - Added app-server integration coverage for `thread/turns/list` default
    summary behavior and all three `itemsView` modes.
    - Added app-server integration coverage that `thread/turns/items/list`
    returns the expected unsupported JSON-RPC error when experimental APIs
    are enabled.
    - Added thread-store coverage that the default trait methods return
    `ThreadStoreError::Unsupported`.
    
    No developers.openai.com documentation update is needed for this
    internal experimental app-server API surface.
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex-analytics] rework thread_source for thread analytics (#20949)
    ## Summary
    - make `thread_source` an explicit optional thread-level field on
    `thread/start`, `thread/fork`, and returned thread payloads
    - persist `thread_source` in rollout/session metadata so resumed live
    threads retain the original value
    - replace the old best-effort `session_source` -> `thread_source`
    mapping with an explicit caller-supplied analytics classification
    
    ## Why
    Before this change, analytics `thread_source` was populated by a
    best-effort mapping from `session_source`. `session_source` describes
    the runtime/client surface, not the actual thread-level origin, so that
    projection was not accurate enough to distinguish cases such as `user`,
    `subagent`, `memory_consolidation`, and future thread origins reliably.
    
    Making `thread_source` explicit keeps one thread-level analytics field
    while letting callers provide the real classification directly instead
    of recovering it indirectly from `session_source`.
    
    ## Impact
    For new analytics events, `thread_source` now reflects the explicit
    thread-level classification supplied by the caller rather than an
    inferred value derived from `session_source`. Existing protocol fields
    remain optional; callers that omit `threadSource` now produce `null`
    instead of a best-effort inferred value.
    
    ## Validation
    - `just write-app-server-schema`
    - `cargo test -p codex-analytics -p codex-core -p
    codex-app-server-protocol --no-run`
    - `cargo test -p codex-app-server-protocol
    generated_ts_optional_nullable_fields_only_in_params`
    - `cargo test -p codex-analytics
    thread_initialized_event_serializes_expected_shape`
    - `cargo test -p codex-core
    resume_stopped_thread_from_rollout_preserves_thread_source`
  • 1- Add model service tiers metadata (#20969)
    ## Why
    
    The model list needs to carry display-ready service tier metadata so
    clients can render tier choices with stable IDs, names, and
    descriptions. A raw speed-tier string list is not enough for richer UI
    copy or future tier labels.
    
    ## What changed
    
    - Added `ModelServiceTier` to shared model metadata with string `id`,
    `name`, and `description` fields.
    - Added `service_tiers` to `ModelInfo` and `ModelPreset`, preserving
    empty defaults for older cached model payloads.
    - Exposed `serviceTiers` on app-server v2 `Model` responses and threaded
    it through TUI app-server model conversion.
    - Marked legacy `additional_speed_tiers` / `additionalSpeedTiers`
    metadata as deprecated in source and generated schema output.
    - Regenerated app-server protocol JSON schema and TypeScript fixtures,
    including `ModelServiceTier.ts`.
    
    ## Verification
    
    - Ran `just write-app-server-schema`.
    - Did not run local tests per repo instruction; relying on PR CI.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Add unsandboxed process exec API (#19040)
    ## Why
    
    App-server clients sometimes need argv-based local process execution
    while sandbox policy is controlled outside Codex. Those environments can
    reject sandbox-disabling paths before a command ever starts, even when
    the caller intentionally wants unsandboxed execution.
    
    This PR adds a distinct `process/*` API for that use case instead of
    extending `command/exec` with another sandbox-disabling shape. Keeping
    the new surface separate also makes the future removal of `command/exec`
    simpler: clients that need explicit process lifecycle control can move
    to the newer handle-based API without depending on `command/exec`
    business logic.
    
    ## What changed
    
    - Added v2 process lifecycle methods: `process/spawn`,
    `process/writeStdin`, `process/resizePty`, and `process/kill`.
    - Added process notifications: `process/outputDelta` for streamed
    stdout/stderr chunks and `process/exited` for final exit status and
    buffered output.
    - Made `process/spawn` intentionally unsandboxed and omitted
    sandbox-selection fields such as `sandboxPolicy` and
    `permissionProfile`.
    - Added client-supplied, connection-scoped `processHandle` values for
    follow-up control requests and notification routing.
    - Supported cwd, environment overrides, PTY mode and size, stdin
    streaming, stdout/stderr streaming, per-stream output caps, and timeout
    controls.
    - Killed active process sessions when the originating app-server
    connection closes.
    - Wired the implementation through the modular `request_processors/`
    app-server layout, with process-handle request serialization for
    follow-up control calls.
    - Updated generated JSON/TypeScript schema fixtures and documented the
    new API in `codex-rs/app-server/README.md`.
    - Added v2 app-server integration coverage in
    `codex-rs/app-server/tests/suite/v2/process_exec.rs` for spawn
    acknowledgement before exit, buffered output caps, and process
    termination.
    
    ## Verification
    
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server`
    
    ---------
    
    Co-authored-by: Owen Lin <owen@openai.com>
  • Add remote plugin skill read API (#20150)
    ## Summary
    
    Adds an app-server `plugin/skill/read` method for remote plugin skill
    markdown. The new method calls the plugin-service skill detail endpoint
    and returns `skill_md_contents`, so clients can preview skills for
    remote plugins before the bundle is installed locally.
    
    ## Why
    
    Uninstalled remote plugin skills do not have local `SKILL.md` files.
    Without an on-demand remote read, the desktop plugin details UI cannot
    render the skill details modal for those skills.
    
    ## Validation
    
    - `just write-app-server-schema`
    - `just fmt`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-app-server --test all --
    suite::v2::plugin_read::plugin_skill_read_reads_remote_skill_contents_when_remote_plugin_enabled
    --exact`
    - `just fix -p codex-app-server-protocol -p codex-core-plugins -p
    codex-app-server`
  • Sync remote installed plugin bundles (#20268)
    ## Summary
    - Download missing remote installed plugin bundles during app-server
    startup and plugin/list refresh.
    - Upgrade cached remote installed bundles when the backend installed
    version changes.
    - Remove stale remote installed bundle caches without writing remote
    plugin state into config.toml.
    
    ## Review note
    This is a clean PR branch cut from the current diff on top of latest
    `origin/main`. The diff intentionally has no `codex-rs/core/**` files,
    so CODEOWNERS should not request the core-directory owner review from
    stale PR history.
    
    ## Validation
    Already run on the source branch before creating this clean PR:
    - `just fmt`
    - `cargo test -p codex-core-plugins`
    - `cargo test -p codex-app-server --test all
    app_server_startup_sync_downloads_remote_installed_plugin_bundles --
    --nocapture`
    - `cargo test -p codex-app-server --test all
    plugin_list_sync_upgrades_and_removes_remote_installed_plugin_bundles --
    --nocapture`
    - `cargo test -p codex-app-server --test all
    app_server_startup_remote_plugin_sync_runs_once -- --nocapture`
    - `just fix -p codex-core-plugins`
    - `just fix -p codex-app-server`
    - `git diff --check`
  • Add hooks/list app-server RPC (#19778)
    ## Why
    
    We need a way to list the available hooks to expose via the TUI and App
    so users can view and manage their hooks
    
    ## What
    
    - Adds `hooks/list` for one or more `cwd` values that returns discovered
    hook metadata
    
    ## Stack
    
    1. openai/codex#19705
    2. This PR - openai/codex#19778
    3. openai/codex#19840
    4. openai/codex#19882
    
    ## Review Notes
    
    The generated schema files account for most of the raw diff, these files
    have the core change:
    
    - `hooks/src/engine/discovery.rs` builds the inventory entries during
    hook discovery while leaving runtime handlers focused on execution.
    - `app-server/src/codex_message_processor.rs` wires `hooks/list` into
    the app-server flow for each requested `cwd`.
    - `app-server-protocol/src/protocol/v2.rs` defines the new v2
    request/response payloads exposed on the wire.
    
    ### Core Changes
    
    `core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
    `skills/list` and `hooks/list`can resolve plugin state for each
    requested `cwd`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: expose provider capability bounds to app server clients (#20049)
    follow up of #19442. The app server now exposes provider-derived bounds
    through a new v2 `modelProvider/read` method. The response reports the
    configured provider map key as `modelProvider` and returns the effective
    capability booleans so clients can align their UI with the same
    provider-owned limits used by core.
  • Fix plugin list workspace settings test isolation (#20086)
    Fixes test that often fails locally when running `cargo test`
    - Add an app-server test helper that combines managed-config isolation
    with custom env overrides.
    - Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
    so host home marketplaces do not affect results.
  • test: harden app-server integration tests (#19683)
    ## Why
    
    Windows Bazel runs in the permissions stack exposed that app-server
    integration tests were launching normal plugin startup warmups in every
    subprocess. Those warmups can call
    `https://chatgpt.com/backend-api/plugins/featured` when a test is not
    specifically exercising plugin startup, which adds slow background work,
    noisy stderr, and dependence on external network state. The relevant
    startup/featured-plugin behavior was introduced across #15042 and
    #15264.
    
    A few app-server tests also had long optional waits or unbounded cleanup
    paths, making failures expensive to diagnose and contributing to slow
    Windows shards. One external-agent config test from #18246 used a
    GitHub-style marketplace source, which was enough to exercise the
    pending remote-import path but also meant the background completion task
    could attempt a real clone.
    
    ## What Changed
    
    - Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
    plumbing and a hidden debug-only
    `--disable-plugin-startup-tasks-for-tests` app-server flag, so
    integration tests can suppress startup plugin warmups without adding a
    production env-var gate.
    - Has the app-server test harness pass that hidden flag by default,
    while opting plugin-startup coverage back in for tests that
    intentionally exercise startup sync and featured-plugin warmup behavior.
    - Lowers normal app-server subprocess logging from `info`/`debug` to
    `warn` to avoid multi-megabyte stderr output in Bazel logs.
    - Prevents the external-agent config test from attempting a real
    marketplace clone by using an invalid non-local source while still
    exercising the pending-import completion path.
    - Bounds optional filesystem/realtime waits and fake WebSocket
    test-server shutdown so failures produce targeted timeouts instead of
    hanging a shard.
    - Fixes the Unix script-resolution test in `rmcp-client` to exercise
    PATH resolution directly and include the actual spawn error in failures.
    
    ## Verification
    
    - `cargo check -p codex-app-server`
    - `cargo clippy -p codex-app-server --tests -- -D warnings`
    - `cargo test -p codex-rmcp-client
    program_resolver::tests::test_unix_executes_script_without_extension`
    - `cargo test -p codex-app-server --test all
    external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
    -- --nocapture`
    - `cargo test -p codex-app-server --test all
    plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
    --nocapture`
    - Windows Local Bazel passed with this test-hardening bundle before it
    was extracted from #19606.
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
    * #19395
    * #19394
    * #19393
    * #19392
    * #19606
    * __->__ #19683