30 Commits

  • fix(remote-control): avoid server token refresh retry storms (#30201)
    ## Why
    
    Remote-control websocket reconnects and pairing requests proactively
    refresh their server token. When `/server/refresh` returns a transient
    error such as `502`, the still-valid token was discarded as a usable
    connection path, causing reconnect failures and repeated refresh
    attempts that could amplify an upstream incident.
    
    ## What Changed
    
    - Start proactive refresh five minutes before token expiry and
    distinguish it from a required refresh for missing or expired tokens.
    - Continue websocket and pairing operations with the existing valid
    token after `429`, `5xx`, or timeout failures.
    - Share an in-memory `next_refresh_at` throttle across websocket and
    pairing callers, honoring both `Retry-After` formats and otherwise using
    a jittered 24–36 second delay.
    - Keep required refreshes strict, preserve `404` enrollment replacement,
    and clear token/throttle state for `401` and `403` auth recovery.
    - Preserve refresh response metadata internally and add focused
    wire-level and integration coverage.
    
    ## Verification
    
    Added behavioral coverage proving that:
    
    - a valid near-expiry token still completes websocket and pairing
    requests after transient refresh failures;
    - `Retry-After` suppresses a subsequent refresh across websocket and
    pairing callers;
    - request and response-body timeouts are classified as transient;
    - an expired token, including one that expires during refresh, cannot
    proceed to websocket connection;
    - auth failures clear the attempted token without overwriting a
    concurrently rotated token.
  • [codex] dedupe remote control account header (#29893)
    ## Why
    
    Remote-control HTTP requests applied the authentication headers and then
    appended `ChatGPT-Account-ID` again with
    `reqwest::RequestBuilder::header`. Since reqwest appends, the wire
    request could contain the same header twice. Intermediaries may coalesce
    duplicate values into `uuid,uuid`, which is not a valid account ID.
    
    ## What changed
    
    - Build remote-control request authentication headers in one place.
    - Apply provider headers first, then use `HeaderMap::insert` for the
    explicit account ID. This preserves the current account-ID precedence
    and all other authentication headers while ensuring exactly one account
    header is sent.
    - Preserve duplicate HTTP headers in the test harness and assert exactly
    one account header for enroll, refresh, list, and revoke requests.
    
    ## Validation
    
    Added focused coverage for:
    
    - Adding the explicit account header when the auth provider omits it.
    - Replacing multiple provider-supplied account values, including a
    differently cased header name.
    - Preserving authorization and routing headers while replacing only the
    account header.
    - Rejecting invalid account header values before sending a request.
    - Emitting exactly one account header for enroll, refresh, list, and
    revoke requests.
    - Maintaining header uniqueness across unauthorized recovery, retry, and
    error-response paths.
    - Emitting exactly one installation header for enroll and refresh
    requests.
    
    Checks run:
    
    - `just test -p codex-app-server-transport request_headers`: 3 passed
    - `just test -p codex-app-server-transport remote_control_http_mode`: 6
    passed
    - `just test -p codex-app-server-transport clients_tests`: 6 passed
    - `just test -p codex-app-server-transport`: 123 passed
    - `cargo test -p codex-app-server-transport`: 123 passed
    - `just clippy -p codex-app-server-transport`
    - `just fmt-check`
    - `bazel test
    //codex-rs/app-server-transport:app-server-transport-unit-tests`
  • auth: move domain mode below app wire types (#29721)
    ## Why
    
    Authentication mode is a domain concept used by login, model selection,
    telemetry, and transports. Keeping the canonical type in app-server
    protocol forces those lower-level crates to depend on an unrelated wire
    API.
    
    ## What changed
    
    - Added canonical `codex_protocol::auth::AuthMode` domain values.
    - Kept the app-server wire DTO unchanged and added an explicit app-side
    conversion.
    - Removed production app-server-protocol dependencies from login,
    model-provider-info, models-manager, and otel call paths.
    
    ## Stack
    
    This is PR 2 of 6, stacked on [PR
    #29714](https://github.com/openai/codex/pull/29714). Review only the
    delta from `codex/split-json-rpc-protocols`. Next: [PR
    #29722](https://github.com/openai/codex/pull/29722).
    
    ## Validation
    
    - Auth and login coverage passed in the focused protocol/domain test
    run.
    - App-server account and auth conversion coverage passed.
  • PAC 2 - Add shared auth system proxy contract (#26707)
    ## Summary
    
    Stacked on #26706.
    
    Adds the shared auth/system-proxy contract that later platform resolver
    PRs plug into. This PR moves Codex-owned auth and startup HTTP clients
    through a common route-aware boundary, but does not yet add Windows or
    macOS system proxy resolution.
    
    The default path remains unchanged when `respect_system_proxy` is absent
    or disabled.
    
    ## Implementation
    
    - Adds `codex-client/src/outbound_proxy.rs` with the shared
    route-selection model:
      - `OutboundProxyConfig`;
      - `ClientRouteClass`;
      - `RouteFailureClass`;
      - `build_reqwest_client_for_route`.
    - Preserves the existing reqwest/default-client behavior when no route
    config is supplied.
    - Uses the fixed MVP routing policy when route config is supplied:
    platform system/PAC/WPAD discovery, then explicit env proxy variables,
    then direct connection.
    - Keeps platform-specific system discovery behind the shared client
    boundary. This PR provides the contract and fallback behavior; later
    resolver PRs plug in Windows and macOS discovery.
    - Adds `login::AuthRouteConfig` so auth call sites depend on a small
    policy type instead of platform resolver details.
    - Maps the resolved `Config.respect_system_proxy` boolean into
    `AuthRouteConfig` for auth-owned clients.
    - Wires the route config through browser login, device-code login,
    access-token login, login status, logout/revoke, token refresh, API-key
    exchange, app-server account login, TUI/app startup, cloud-config
    bootstrap, cloud tasks, plugin auth, and exec startup config loading.
    
    ## End-user behavior
    
    - No behavior changes by default.
    - When `respect_system_proxy = true`, auth-owned clients opt into the
    shared route-aware client path.
    - On platforms without a resolver implementation in this PR, system
    discovery is unavailable and the route-aware path falls back to explicit
    env proxy handling, then direct connection.
    - Custom CA handling remains separate from proxy route selection and
    still runs through the shared client builder.
    - No proxy URLs, PAC contents, or resolved platform details are exposed
    through the public config surface introduced here.
    
    ## Tests
    
    Adds or updates coverage for:
    
    - preserving default auth-client fallback behavior when no route config
    is provided;
    - injected environment-proxy fallback without mutating process
    environment;
    - existing login-server E2E flows using explicit `auth_route_config:
    None` to guard unchanged default behavior;
    - updated auth manager, login, logout, cloud-config, startup, and
    plugin-auth call sites passing route config explicitly.
  • Use controlled time for remote initialization timeout test (#29329)
    ## Summary
    
    The remote-control initialization timeout test used a 50 ms wall-clock
    deadline around a 10 ms transport timeout. A busy CI runner could miss
    that outer deadline even when the rollback behavior was correct.
    
    Pause Tokio time and advance it explicitly through the transport timeout
    instead. The test still verifies that initialization fails and emits the
    matching connection-closed event, without depending on scheduler speed.
  • feat: opt ChatGPT auth into agent identity (#19049)
    ## Stack
    
    This is PR 2 of the simplified HAI single-run-task stack:
    
    - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
    assertion and task-registration primitives, including the shared
    run-task helper used by existing Agent Identity JWT auth.
    - [#19049](https://github.com/openai/codex/pull/19049)
    Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
    Agent Identity runtime auth and its single run task.
    - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
    provider auth that uses one backend-owned task id for first-party
    inference and compaction requests.
    
    [#19054](https://github.com/openai/codex/pull/19054) collapsed out of
    the active stack because the simplified design no longer needs a
    separate background/control-plane task helper.
    
    ## Summary
    
    This PR adds the disabled-by-default path for normal ChatGPT-login Codex
    sessions to obtain Agent Identity runtime auth through the Codex
    backend. Existing Agent Identity JWT startup mode remains a separate
    path and does not require the feature flag.
    
    What changed:
    
    - adds the experimental `use_agent_identity` feature flag and config
    schema entry
    - adds an explicit `AgentIdentityAuthPolicy` so call sites choose
    `JwtOnly` or `ChatGptAuth` instead of passing a bare boolean
    - stores standalone Agent Identity JWT credentials separately from
    backend-registered Agent Identity records
    - persists the registered Agent Identity record, private key, and single
    run task id in `auth.json` so process restarts reuse the same identity
    - derives the agent/task registration base URL from ChatGPT/Codex auth
    config while keeping JWT JWKS lookup separate
    - provisions and caches ChatGPT-derived Agent Identity runtime auth when
    `use_agent_identity` is enabled
    - reuses the shared run-task registration helper from PR1 rather than
    adding a second task-registration path
    
    This PR intentionally does not switch model inference over to
    `AgentAssertion` auth. The provider-auth integration lands in the next
    PR.
    
    ## Testing
    
    - `just test -p codex-login`
  • feat(app-server): enforce managed remote control disable (#27961)
    ## Why
    
    Managed deployments need a reliable deny gate for remote control.
    Persisted enablement and explicit startup requests currently remain able
    to start the transport, while the removed `features.remote_control` key
    is intentionally only a compatibility no-op.
    
    This adds a dedicated requirement that administrators can use to force
    remote control off without deleting the user's persisted preference.
    Removing the requirement and restarting restores the prior choice.
    
    ## What Changed
    
    - Added top-level `allow_remote_control` requirements parsing, sourced
    layer precedence, debug output, and `configRequirements/read` exposure
    as `allowRemoteControl`.
    - Added a typed transport policy captured from the startup requirements
    snapshot. Managed disable forces the initial state to disabled and
    prevents enrollment, refresh, connection, and persisted-preference
    mutation.
    - Rejected every `remoteControl/*` RPC before parameter deserialization
    with JSON-RPC `-32600` and `remote control is disabled by managed
    requirements`.
    - Preserved the existing disabled status notification and the previous
    behavior when the requirement is `true` or omitted.
    - Regenerated app-server protocol schemas and documented the new
    requirement.
    
    ## Verification
    
    - Confirmed all remote-control RPCs, including a malformed request,
    return the managed-policy error while the initial status notification
    remains `disabled`.
    - Confirmed explicit ephemeral startup and persisted enablement make no
    backend connection and leave the SQLite preference unchanged.
    - Confirmed `allow_remote_control = true` does not enable or block
    remote control and `configRequirements/read` returns
    `allowRemoteControl: false` for the deny policy.
    
    Related issue: N/A (managed-policy hardening).
  • feat: use encrypted local secrets for CLI auth (#27539)
    ## Why
    
    Windows Credential Manager limits generic credential blobs to 2,560
    bytes. Large serialized ChatGPT auth payloads can exceed that limit, so
    keyring-mode CLI auth needs a backend that keeps only the encryption key
    in the OS keyring and stores the payload in Codex's encrypted
    local-secrets file.
    
    This is the third PR in the encrypted-auth stack:
    
    1. #27504 — feature and config selection
    2. #27535 — auth-specific local-secrets namespaces
    3. This PR — CLI auth implementation and activation
    4. MCP OAuth implementation and activation
    
    ## What Changed
    
    - Added encrypted CLI-auth storage using the `CliAuth` secrets
    namespace.
    - Preserved direct keyring storage for platforms/configurations where it
    remains selected.
    - Selected the backend consistently for login, logout, refresh,
    device-code login, auth loading, and login restrictions.
    - Threaded resolved bootstrap/full config through CLI, exec, TUI,
    app-server account handling, cloud config, and cloud tasks.
    - Removed stale `auth.json` fallback data after successful encrypted
    saves and removed encrypted, direct-keyring, and fallback data during
    logout.
    - Added storage and integration coverage for both direct and encrypted
    keyring modes.
    
    MCP OAuth persistence is intentionally left to the next PR.
    
    ## Validation
    
    - `just test -p codex-login` — 131 passed
    - `just test -p codex-cli` — 280 passed
    - `just test -p codex-app-server v2::account` — 25 passed
    - `just test -p codex-cloud-config service` — 21 passed, 7 skipped
    - `just fix -p codex-login`
    - `just fix -p codex-cli`
    - `just fmt`
  • feat(app-server): persist remote-control desired state (#27445)
    ## Why
    
    Remote-control runtime enablement and persisted enrollment preference
    were represented by separate flags. That made startup rehydration, RPC
    persistence, and new-enrollment seeding race with one another, and it
    did not cleanly distinguish runtime-only CLI or daemon starts from
    durable app-server RPC changes.
    
    ## What Changed
    
    - Replace the parallel enablement, seed, and rehydration flags with one
    transport-owned `RemoteControlDesiredState`.
    - Add nullable enrollment-scoped persistence and preserve existing
    preferences during enrollment upserts.
    - Rehydrate plain startup only after auth and client scope resolve,
    without overwriting a concurrent RPC transition.
    - Make ordinary `remoteControl/enable` and `remoteControl/disable`
    durable while retaining `ephemeral: true` for runtime-only callers.
    - Have the daemon explicitly request ephemeral enablement and regenerate
    the app-server schemas.
    
    ## Verification
    
    - Covered migration and `NULL`/`0`/`1` persistence round trips.
    - Covered plain-start rehydration and runtime-only versus durable
    enrollment seeding.
    - Covered durable enable, durable disable, and ephemeral enable through
    app-server RPC.
    - Covered the daemon's exact `{ "ephemeral": true }` request payload.
    
    Related issue: N/A (internal remote-control persistence architecture
    change).
  • feat: add Bedrock API key as a managed auth mode (#27443)
    ## Why
    
    Codex needs to manage Amazon Bedrock API key credentials through the
    existing auth lifecycle instead of introducing a separate auth manager
    or provider-specific credential file. Treating Bedrock API key login as
    a primary auth mode gives it the same persistence, keyring, reload, and
    logout behavior as the existing OpenAI API key and ChatGPT modes.
    
    The credential is valid only for the `amazon-bedrock` model provider.
    OpenAI-compatible providers must reject this auth mode rather than
    treating the Bedrock key as an OpenAI bearer token.
    
    ## What changed
    
    - Added `bedrockApiKey` as an app-server `AuthMode` and
    `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode.
    - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to
    the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or
    the configured keyring backend.
    - Added `login_with_bedrock_api_key(...)`, parallel to
    `login_with_api_key(...)`, which replaces the current stored login with
    Bedrock credentials.
    - Reused generic auth reload and logout behavior instead of adding a
    Bedrock-specific auth manager or logout path.
    - Updated login restrictions, status reporting, diagnostics, telemetry
    classification, generated app-server schemas, and auth fixtures for the
    new mode.
    - Added explicit errors when Bedrock API key auth is selected with an
    OpenAI-compatible model provider.
    
    This PR establishes managed storage and auth-mode behavior. Routing the
    managed key and region into Amazon Bedrock requests will be in follow-up
    PRs.
  • fix(remote-control): preserve enrollment on generic websocket 404s (#26741)
    ## Why
    
    A remote-control WebSocket handshake can receive a generic HTTP 404 when
    an intermediary routes the request without preserving the WebSocket
    upgrade. Treating every 404 as proof that the remote app server is gone
    clears valid enrollment and causes repeated re-enrollment, new
    environment and server IDs, Habitat churn, and noisy `/server/enroll`
    traffic.
    
    ## What Changed
    
    - Clear enrollment only when a 404 JSON response explicitly contains
    `{"detail":"Remote app server not found"}`.
    - Preserve enrollment for empty, plain-text, malformed, or otherwise
    unrecognized 404 responses, return the transport error, and retry with
    the existing reconnect backoff.
    - Log the status, correlation headers (`request-id` or
    `x-oai-request-id`, plus `cf-ray`), and bounded/redacted response body
    for unrecognized 404s.
    - Cover both explicit missing-server re-enrollment and generic 404
    enrollment preservation/reconnect behavior.
    
    ## Verification
    
    `just test -p codex-app-server-transport` passes all 114 tests on the
    rebased branch, including the targeted explicit and generic WebSocket
    404 scenarios.
    
    Related issue: N/A
  • [codex-rs] support v2 personal access tokens (#25731)
    ## Summary
    
    - add v2 personal access token support for `codex login
    --with-access-token` and `CODEX_ACCESS_TOKEN`
    - classify opaque `at-` tokens separately from legacy Agent Identity
    JWTs
    - hydrate required ChatGPT account metadata through AuthAPI
    `/v1/user-auth-credential/whoami`
    - use PATs directly as bearer tokens while preserving existing ChatGPT
    account surfaces
    - expose PAT-backed auth as the explicit `personalAccessToken`
    app-server auth mode
    
    ## Implementation
    
    PAT auth is intentionally small and stateless. Loading a PAT performs
    one AuthAPI metadata request, stores the hydrated metadata in the
    in-memory auth object, and redacts the secret from debug output. Legacy
    Agent Identity JWT handling remains unchanged. The shared access-token
    classifier lives in a private neutral module because it dispatches
    between both credential types.
    
    PAT hydration fails closed when AuthAPI omits any required metadata,
    including email. Hydrated metadata is intentionally not persisted:
    startup performs a live `whoami` preflight so revoked tokens or changed
    account metadata are not accepted from a stale cache.
    
    ## Workspace restriction scope
    
    This change intentionally does **not** apply
    `forced_chatgpt_workspace_id` to PAT authentication. The setting is a
    client-side config guardrail, not an authorization boundary, and PAT
    does not currently require workspace-ID parity. The PAT login and
    `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without
    threading workspace-restriction state through access-token loading.
    Existing workspace checks for non-PAT auth remain on their established
    paths.
    
    ## App-server compatibility
    
    The public app-server `AuthMode` is shared across v1 and v2, and
    PAT-backed auth reports `personalAccessToken` through both APIs.
    Following human review, this intentionally removes the temporary v1
    compatibility mapping that reported PATs as `chatgpt`; the deprecated v1
    API is kept in parity with v2 rather than maintaining a separate closed
    enum. Clients with exhaustive auth-mode handling in either API version
    must add the new case and should generally treat it as ChatGPT-backed
    unless they need PAT-specific behavior.
    
    The v1 auth-status response still omits the raw PAT when `includeToken`
    is requested because that response cannot carry the account metadata
    needed to reuse the credential safely. Persisted PAT auth also omits the
    new enum value so older Codex builds can deserialize `auth.json` and
    infer PAT auth from the credential field after a rollback.
    
    ## Validation
    
    Latest review-fix validation:
    
    - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-cli
    stored_auth_validation_handles_personal_access_token`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226
    passed)
    - `CARGO_INCREMENTAL=0 just test -p codex-models-manager
    refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth`
    - `CARGO_INCREMENTAL=0 just test -p codex-tui
    existing_non_oauth_chatgpt_login_counts_as_signed_in`
    - `CARGO_INCREMENTAL=0 just fix -p codex-login -p
    codex-app-server-protocol -p codex-models-manager -p codex-tui -p
    codex-cli`
    - `just fmt`
    - `git diff --check`
    
    The broader `codex-tui` suite previously compiled and ran 2,834 tests.
    Three unrelated environment-sensitive guardian/IDE-socket tests failed
    after retries; the PAT-relevant TUI coverage passed.
  • feat(remote-control): add pairing status transport (#26449)
    ## What
    
    Adds transport support for checking remote-control pairing status
    against the backend.
    
    - Adds the normalized `server/pair/status` backend URL.
    - Adds backend request/response structs for exactly one lookup key:
    `pairing_code` or `manual_pairing_code`, returning `{ claimed }`.
    - Adds `RemoteControlEnrollment::pairing_status` and
    `RemoteControlHandle::pairing_status`.
    - Preserves auth refresh/retry behavior and backend error mapping.
    - Adds transport coverage for pending, claimed, manual-code payloads,
    token refresh, mapped backend errors, malformed responses, and URL
    normalization.
    
    ## Why
    
    Desktop needs a host-authenticated way to poll whether a QR or manual
    pairing code has been claimed.
    
    Related backend change: https://github.com/openai/openai/pull/990244
    
    ## Verification
    
    - `cargo test --manifest-path app-server-transport/Cargo.toml
    remote_control::tests::pairing_tests`
    - `cargo fmt --all --check`
    - `git diff --check`
  • feat(remote-control): allow pairing while disabled (#26215)
    ## Why
    
    `remoteControl/pairing/start` creates authorization for future
    remote-control connections, so it should not require the live websocket
    to already be enabled. Requiring enable first made pairing depend on
    presence instead of the persisted server enrollment that pairing
    actually uses.
    
    Pairing also needs to recover when that persisted server row is stale.
    If `/server/pair` returns `404`, making the first pairing attempt fail
    forces a manual retry even though the client can clear the stale row and
    create a replacement enrollment immediately.
    
    ## What Changed
    
    - Allow `remoteControl/pairing/start` to reuse or create the persisted
    remote-control server enrollment while remote control is disabled.
    - Keep the selected in-memory enrollment across disable and share it
    with websocket connect so a later enable uses the same selected server.
    - Thread the app-server client name through pairing so stdio persistence
    keeps using the websocket-owned enrollment key.
    - Recover pairing server-token auth failures through the existing
    refresh/auth-recovery path.
    - Recover stale pairing enrollment on `/server/pair` `404` by clearing
    the stale selected enrollment, re-enrolling once, and retrying pairing
    once.
    - Add focused disabled-pairing and stale-pairing recovery coverage.
    
    ## Verification
    
    -
    `remote_control_pairing_start_returns_pairing_artifacts_while_disabled`
    exercises pairing before enable.
    - `remote_control_handle_reenrolls_after_stale_pairing_enrollment`
    exercises stale `/server/pair` `404` recovery without a manual retry.
    
    Related: N/A
  • feat(app-server): add remote control client management RPCs (#25785)
    ## Why
    
    Remote-control clients need to list and revoke controller-device grants
    without enabling or enrolling the local relay. These are signed-in
    account-management operations, so coupling them to websocket, pairing,
    enrollment, or persisted relay state would prevent clients from managing
    stale grants from the picker.
    
    Related enhancement request: N/A. This adds the Codex app-server surface
    for the planned upstream environment-scoped revoke endpoint.
    
    ## What Changed
    
    - Added experimental app-server v2 RPCs:
      - `remoteControl/client/list`
      - `remoteControl/client/revoke`
    - Added picker-oriented protocol types and standard generated schema
    fixtures. The list response intentionally omits backend account id,
    enrollment status, and location fields.
    - Added `app-server-transport/src/transport/remote_control/clients.rs`
    for environment-scoped GET and DELETE requests. It builds escaped URL
    path segments, forwards optional pagination query fields, sends ChatGPT
    auth plus `chatgpt-account-id`, converts RFC3339 `last_seen_at` values
    to Unix seconds, accepts `204 No Content` revoke responses, and retries
    once after a `401`.
    - Extracted shared ChatGPT auth loading and recovery into
    `app-server-transport/src/transport/remote_control/auth.rs` so
    websocket, pairing, and client management use the same account-auth
    boundary.
    - Retained the configured remote-control base URL on
    `RemoteControlHandle` and resolve management URLs lazily, preserving
    deferred validation while relay startup is disabled.
    - Registered list as `global_shared_read("remote-control-clients")` and
    revoke as `global("remote-control-clients")`.
    
    ## Verification
    
    - Added transport coverage proving list and revoke work while relay
    state is disabled, IDs are escaped, picker-only fields are returned,
    timestamps are converted, revoke accepts `204`, auth headers are
    forwarded, `401` retries exactly once, `403` is not retried, and
    malformed list payloads retain decode context.
    - Added an app-server integration test proving both JSON-RPC methods
    work before relay enablement and successful revoke returns `{}`.
    - Regenerated and validated experimental and standard app-server schema
    fixtures.
  • feat(remote-control): add pairing start (#25675)
    ## Why
    
    Remote control enrollment authorizes a desktop server, but app-server v2
    did not expose the follow-up pairing operation needed to mint a
    short-lived controller pairing artifact from that enrolled server.
    Clients need a narrow RPC that starts pairing without exposing the
    backend `serverId` or conflating pairing with websocket connection
    state.
    
    Issue: N/A; internal remote-control pairing API change.
    
    ## What Changed
    
    Added experimental app-server v2 `remoteControl/pairing/start` with
    `manualCode` input and `pairingCode`, nullable `manualPairingCode`,
    `environmentId`, and Unix-seconds `expiresAt` output. The method
    serializes under its own `global("remote-control-pairing")` scope and is
    documented in `app-server/README.md`.
    
    Extended the remote-control transport with private `/server/pair`
    request/response types and normalized `pair_url` handling. Pairing uses
    the current enrolled server bearer, refreshes that bearer when needed,
    keeps backend `server_id` private, validates returned `server_id` and
    `environment_id` against the current enrollment, and preserves backend
    status/header/body context for failures and malformed responses.
    
    Wired the request through `RemoteControlRequestProcessor` and
    `MessageProcessor`, mapping unavailable/disabled pairing to
    `invalid_request` and backend failures to internal errors.
    
    ## Verification
    
    - `just test -p codex-app-server-transport`
    - `just test -p codex-app-server
    remote_control_pairing_start_returns_pairing_artifacts`
  • feat(app-server): migrate remote control to server tokens (#24141)
    ## Why
    
    `codex-backend` now authenticates remote-control server websocket
    connections with short-lived server tokens instead of the user's ChatGPT
    access token. `app-server` needs to mint and refresh those server tokens
    without persisting them, so a restart can reconnect from durable
    enrollment identity while keeping the bearer token memory-only.
    
    ## What Changed
    
    Updated the remote-control transport to consume `remote_control_token`
    and `expires_at` from server enroll responses and added
    `/server/refresh` support for persisted enrollments or expiring cached
    tokens.
    
    Websocket handshakes now send `Authorization: Bearer
    <remote_control_token>` with the existing server identity headers, and
    no longer send the ChatGPT bearer token or `chatgpt-account-id` on that
    websocket path.
    
    The in-memory enrollment state now owns the ephemeral server token
    cache, while SQLite still persists only `server_id`, `environment_id`,
    and `server_name`. Websocket `401`/`403` clears only the cached token
    for refresh on reconnect; websocket or refresh `404` clears stale
    persisted enrollment and re-enrolls. Response body previews redact
    `remote_control_token` before surfacing parse errors.
    
    ## Verification
    
    - `just test -p codex-app-server-transport`
    - Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control
    --json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached
    `status:"connected"` with
    `environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.
  • Uprev Rust toolchain pins to 1.95.0 (#24684)
    ## Summary
    - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
    Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
    environment config.
    - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
    match the new version.
    - Leave purpose-specific toolchains unchanged, including the
    `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
    build pin.
    - Includes fixes for new lints from `just fix` and a few codex-authored
    fixes for lints without a suggestion.
  • fix(remote-control): surface websocket task stalls (#24473)
    ## Why
    
    When the app-server remote-control websocket path stalls during
    connection setup or teardown, the existing logs do not show where the
    task stopped, and several awaits can keep the task from returning
    promptly. That makes offline or stale-host incidents hard to distinguish
    from expected shutdown or disable flow.
    
    Issue: N/A (internal incident investigation)
    
    ## What Changed
    
    Added structured lifecycle and status logging around remote-control
    enable/disable requests, websocket task startup and exit, connection
    cycles, enrollment context, and status/environment transitions.
    
    Bound websocket connect, transport-event forwarding, and
    connection-worker shutdown waits. On timeout, the code logs the stalled
    operation and stops or aborts workers so the loop can reconnect or exit
    instead of waiting indefinitely. Ping sends now also observe shutdown
    cancellation.
  • fix(remote-control): cap reconnect backoff (#24164)
    ## Why
    
    Remote-control websocket reconnects currently use the shared exponential
    backoff helper without a local ceiling, so a long failure streak can
    stretch retries out indefinitely and leave the runtime behavior hard to
    inspect from logs.
    
    ## What Changed
    
    Cap the remote-control reconnect delay at 30 seconds, then reset the
    reconnect attempt counter once that capped delay is emitted so the next
    failure starts from the initial jittered delay again.
    
    The reconnect failure log now records the attempt number, chosen delay,
    and whether the cap triggered a reset, with a separate info log when the
    backoff counter is reset after the cap.
    
    ## Verification
    
    `just test -p codex-app-server-transport`
    
    Related issue: N/A
  • fix(remote-control): retry after auth recovery (#23775)
    ## Why
    
    When remote control hits an auth failure such as a revoked or reused
    refresh token, the websocket loop falls into reconnect backoff. If the
    user fixes auth while that loop is sleeping, remote control can stay
    offline until the old retry timer expires because nothing wakes the loop
    or resets its exhausted auth recovery state.
    
    ## What Changed
    
    Added an auth-change watch on `AuthManager` for refresh-relevant cached
    auth updates.
    
    The remote-control websocket loop now subscribes to that signal, resets
    `UnauthorizedRecovery` and reconnect backoff when auth changes, and
    retries immediately instead of waiting for the previous delay.
    
    Updated the remote-control transport test to verify that reloading auth
    with the now-available account id wakes enrollment before the prior
    retry delay.
    
    ## Verification
    
    `cargo test -p codex-app-server-transport
    remote_control_waits_for_account_id_before_enrolling`
  • fix: serialize unix app-server startup (#23516)
    # Summary
    
    Unix-socket app-server startup can currently race when multiple launch
    attempts target the same `CODEX_HOME`. Those processes can overlap
    before the control socket exists, which lets them enter SQLite state
    initialization concurrently and reproduce the startup corruption pattern
    seen in SSH mode.
    
    This change makes the app-server own that singleton startup guarantee.
    Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before
    SQLite initialization, runs the existing control-socket preparation
    check while holding that lock, returns the established `AddrInUse` error
    when another live listener already owns the socket, and releases the
    lock once the new listener has bound its socket.
    
    # Design decisions
    
    - The singleton rule lives in `app-server --listen unix://`, not in a
    desktop-only caller path, so every Unix-socket launch gets the same race
    protection.
    - A duplicate raw app-server launch returns an error instead of silently
    succeeding. The attach operation remains `app-server proxy`, which
    continues to connect to an already-running listener.
    - The lock is held only across the dangerous startup window: socket
    preparation, SQLite initialization, and socket bind. It is not held for
    the app-server lifetime.
    - Listener detection stays in `prepare_control_socket_path(...)`, so the
    preexisting live-listener and stale-socket behavior remains the single
    source of truth.
    
    # Testing
    
    Tests: targeted Unix-socket transport tests on the branch checkout, full
    `codex-cli` build on `efrazer-db10`, and an SSH-style smoke on
    `efrazer-db10` covering concurrent app-server starts, explicit
    duplicate-start errors, and absence of SQLite startup-error matches in
    launch logs.
  • feat(app-server): update remote control APIs for better UX (#22877)
    ## Why
    To help improve `codex remote-control` CLI UX which I plan to do in a
    followup, this PR adds `server-name` to the various remote control APIs:
    - `remoteControl/enable`
    - `remoteControl/disable`
    - `remoteControl/status/changed`
    
    Also, add a `remoteControl/status/read` API. This will be helpful in the
    Codex App.
  • enable/disable remote control at runtime, not via features (#22578)
    ## Why
    reapplies https://github.com/openai/codex/pull/22386 which was
    previously reverted
    
    Also, introduce `remoteControl/enable` and `remoteControl/disable`
    app-server APIs to toggle on/off remote control at runtime for a given
    running app-server instance.
    
    ## What Changed
    
    - Adds experimental v2 RPCs:
      - `remoteControl/enable`
      - `remoteControl/disable`
    - Adds `RemoteControlRequestProcessor` and routes the new RPCs through
    it instead of `ConfigRequestProcessor`.
    - Adds named `RemoteControlHandle::enable`, `disable`, and `status`
    methods.
    - Makes `remoteControl/enable` return an error when sqlite state DB is
    unavailable, while keeping enrollment/websocket failures as async status
    updates.
    - Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden
    `--remote-control` flags for `codex app-server` and `codex-app-server`.
    - Updates managed daemon startup to use `codex app-server
    --remote-control --listen unix://`.
    - Marks `Feature::RemoteControl` as removed and ignores
    `[features].remote_control`.
    - Updates app-server README entries for the new remote-control methods.
  • Restore app-server websocket listener with auth guard (#22404)
    ## Why
    PR #21843 removed the TCP websocket app-server listener, but that also
    removed functionality that still needs to exist. Restoring it as-is
    would reopen the old remote exposure problem, so this keeps the restored
    listener while making remote and non-loopback usage require explicit
    auth.
    
    ## What Changed
    - Mostly reverts #21843 and reapplies the small merge-conflict
    resolutions needed on top of current main.
    - Restores ws://IP:PORT parsing, the app-server TCP websocket acceptor,
    websocket auth CLI flags, and the associated tests.
    - The only intentional behavior change from the restored code is that
    non-loopback websocket listeners now fail startup unless --ws-auth
    capability-token or --ws-auth signed-bearer-token is configured.
    Loopback listeners remain available for local and SSH-forwarding
    workflows.
    
    ## Reviewer Focus
    Please focus review on the small auth-enforcement delta layered on top
    of the revert:
    
    - codex-rs/app-server-transport/src/transport/websocket.rs:
    start_websocket_acceptor now rejects unauthenticated non-loopback
    websocket binds before accepting connections.
    - codex-rs/app-server-transport/src/transport/auth.rs: helper logic
    classifies unauthenticated non-loopback listeners.
    - codex-rs/app-server/tests/suite/v2/connection_handling_websocket.rs:
    tests cover unauthenticated ws://0.0.0.0 startup rejection and
    authenticated non-loopback capability-token startup.
    
    Everything else is intended to be revert/merge-conflict restoration
    rather than new product behavior.
    
    ## Verification
    
    - Manually verified that TUI remoting is restored and that auth is
    enforced for non-localhost urls.
  • app-server: remove TCP websocket listener (#21843)
    ## Why
    
    The app-server no longer needs to expose a TCP websocket listener.
    Keeping that transport also kept around a separate listener/auth surface
    that is unnecessary now that local clients can use stdio or the
    Unix-domain control socket, while remote connectivity is handled by
    `remote_control`.
    
    ## What Changed
    
    - Removed `ws://IP:PORT` parsing and the `AppServerTransport::WebSocket`
    startup path.
    - Deleted the app-server websocket listener auth module and removed
    related CLI flags/dependencies.
    - Kept websocket framing only where it is still needed: over the
    Unix-domain control socket and in the outbound `remote_control`
    connection.
    - Updated app-server CLI/help text and `app-server/README.md` to
    document only `stdio://`, `unix://`, `unix://PATH`, and `off` for local
    transports.
    - Converted affected app-server integration coverage from TCP websocket
    listeners to UDS-backed websocket connections, and added a parse test
    that rejects `ws://` listen URLs.
    - Removed the now-unused workspace `constant_time_eq` dependency and
    refreshed `Cargo.lock` after `cargo shear` caught the drift.
    - Moved test app-server UDS socket paths to short Unix temp paths so
    macOS Bazel test sandboxes do not exceed Unix socket path limits.
    
    ## Verification
    
    - Added/updated tests around UDS websocket transport behavior and
    `ws://` listen URL rejection.
    - `cargo shear`
    - `cargo metadata --no-deps --format-version 1`
    - `cargo test -p codex-app-server unix_socket_transport`
    - `cargo test -p codex-app-server unix_socket_disconnect`
    - `just fix -p codex-app-server`
    - `git diff --check`
    
    Local full Rust test execution was blocked before compilation by an
    external fetch failure for the pinned `nornagon/crossterm` git
    dependency. `just bazel-lock-update` and `just bazel-lock-check` were
    retried after the manifest cleanup but remain blocked by external
    BuildBuddy/V8 fetch timeouts.
  • feat: Use installation ID in remote enrollments (#21662)
    * Pass installation ID for storage on enrollments server for
    deduping/grouping multiple appservers per installation
    * Pass installation ID in remoteControl/status/changed events
  • Disable empty Cargo test targets (#21584)
    ## Summary
    
    `cargo test` has entails both running standard Rust tests and doctests.
    It turns out that the doctest discovery is fairly slow, and it's a cost
    you pay even for crates that don't include any doctests.
    
    This PR disables doctests with `doctest = false` for crates that lack
    any doctests.
    
    For the collection of crates below, this speeds up test execution by
    >4x.
    
    E.g., before this PR:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
      Range (min … max):    0.418 s … 14.529 s    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
      Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
      Range (min … max):   418.0 ms … 436.8 ms    10 runs
    ```
    
    For a single crate, with >2x speedup, before:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
      Range (min … max):   480.9 ms … 512.0 ms    10 runs
    ```
    
    And after:
    
    ```
    Benchmark 1: cargo test -p codex-utils-string
      Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
      Range (min … max):   206.8 ms … 221.0 ms    13 runs
    ```
    
    Co-authored-by: Codex <noreply@openai.com>
  • app-server: move transport into dedicated crate (#20545)
    ## Why
    
    `codex-app-server` currently owns both request-processing code and
    transport implementation details. Splitting the transport layer into its
    own crate makes that boundary explicit, reduces the amount of
    transport-specific dependency surface carried by `codex-app-server`, and
    gives future transport work a narrower place to evolve.
    
    ## What changed
    
    - Added `codex-app-server-transport` and moved the existing transport
    tree into it, including stdio, unix socket, websocket, remote-control
    transport, and websocket auth.
    - Moved shared transport-facing message types into the new crate so both
    the transport implementation and `codex-app-server` use the same
    definitions.
    - Kept processor-facing connection state and outbound routing in
    `codex-app-server`, with the routing tests moved next to that local
    wrapper.
    - Updated workspace metadata, Bazel crate metadata, and
    `codex-app-server` dependencies for the new crate boundary.
    
    ## Validation
    
    - `cargo metadata --locked --no-deps`
    - `git diff --check`
    - Attempted `cargo test -p codex-app-server-transport`, `cargo test -p
    codex-app-server`, `just fix -p codex-app-server-transport`, and `just
    fix -p codex-app-server`; all were blocked before compilation by the
    existing `packageproxy` resolution failure for locked `rustls-webpki =
    0.103.13`.
    - Attempted Bazel build / lockfile validation; those were blocked by
    external fetch failures against BuildBuddy / GitHub while resolving
    `v8`.