codex

fix(remote-control): avoid server token refresh retry storms (#30201 )

## Why

Remote-control websocket reconnects and pairing requests proactively
refresh their server token. When `/server/refresh` returns a transient
error such as `502`, the still-valid token was discarded as a usable
connection path, causing reconnect failures and repeated refresh
attempts that could amplify an upstream incident.

## What Changed

- Start proactive refresh five minutes before token expiry and
distinguish it from a required refresh for missing or expired tokens.
- Continue websocket and pairing operations with the existing valid
token after `429`, `5xx`, or timeout failures.
- Share an in-memory `next_refresh_at` throttle across websocket and
pairing callers, honoring both `Retry-After` formats and otherwise using
a jittered 24–36 second delay.
- Keep required refreshes strict, preserve `404` enrollment replacement,
and clear token/throttle state for `401` and `403` auth recovery.
- Preserve refresh response metadata internally and add focused
wire-level and integration coverage.

## Verification

Added behavioral coverage proving that:

- a valid near-expiry token still completes websocket and pairing
requests after transient refresh failures;
- `Retry-After` suppresses a subsequent refresh across websocket and
pairing callers;
- request and response-body timeouts are classified as transient;
- an expired token, including one that expires during refresh, cannot
proceed to websocket connection;
- auth failures clear the attempted token without overwriting a
concurrently rotated token.

Anton Panasenko · 2026-06-26 17:34:52 -07:00

d047c33a1b

[codex] dedupe remote control account header (#29893 )

## Why

Remote-control HTTP requests applied the authentication headers and then
appended `ChatGPT-Account-ID` again with
`reqwest::RequestBuilder::header`. Since reqwest appends, the wire
request could contain the same header twice. Intermediaries may coalesce
duplicate values into `uuid,uuid`, which is not a valid account ID.

## What changed

- Build remote-control request authentication headers in one place.
- Apply provider headers first, then use `HeaderMap::insert` for the
explicit account ID. This preserves the current account-ID precedence
and all other authentication headers while ensuring exactly one account
header is sent.
- Preserve duplicate HTTP headers in the test harness and assert exactly
one account header for enroll, refresh, list, and revoke requests.

## Validation

Added focused coverage for:

- Adding the explicit account header when the auth provider omits it.
- Replacing multiple provider-supplied account values, including a
differently cased header name.
- Preserving authorization and routing headers while replacing only the
account header.
- Rejecting invalid account header values before sending a request.
- Emitting exactly one account header for enroll, refresh, list, and
revoke requests.
- Maintaining header uniqueness across unauthorized recovery, retry, and
error-response paths.
- Emitting exactly one installation header for enroll and refresh
requests.

Checks run:

- `just test -p codex-app-server-transport request_headers`: 3 passed
- `just test -p codex-app-server-transport remote_control_http_mode`: 6
passed
- `just test -p codex-app-server-transport clients_tests`: 6 passed
- `just test -p codex-app-server-transport`: 123 passed
- `cargo test -p codex-app-server-transport`: 123 passed
- `just clippy -p codex-app-server-transport`
- `just fmt-check`
- `bazel test
//codex-rs/app-server-transport:app-server-transport-unit-tests`

Shuo · 2026-06-24 15:06:53 -07:00

bb05c1f30f

auth: move domain mode below app wire types (#29721 )

## Why

Authentication mode is a domain concept used by login, model selection,
telemetry, and transports. Keeping the canonical type in app-server
protocol forces those lower-level crates to depend on an unrelated wire
API.

## What changed

- Added canonical `codex_protocol::auth::AuthMode` domain values.
- Kept the app-server wire DTO unchanged and added an explicit app-side
conversion.
- Removed production app-server-protocol dependencies from login,
model-provider-info, models-manager, and otel call paths.

## Stack

This is PR 2 of 6, stacked on [PR
#29714](https://github.com/openai/codex/pull/29714). Review only the
delta from `codex/split-json-rpc-protocols`. Next: [PR
#29722](https://github.com/openai/codex/pull/29722).

## Validation

- Auth and login coverage passed in the focused protocol/domain test
run.
- App-server account and auth conversion coverage passed.

Adam Perry @ OpenAI · 2026-06-24 03:10:20 +00:00

31372078d1

PAC 2 - Add shared auth system proxy contract (#26707 )

## Summary

Stacked on #26706.

Adds the shared auth/system-proxy contract that later platform resolver
PRs plug into. This PR moves Codex-owned auth and startup HTTP clients
through a common route-aware boundary, but does not yet add Windows or
macOS system proxy resolution.

The default path remains unchanged when `respect_system_proxy` is absent
or disabled.

## Implementation

- Adds `codex-client/src/outbound_proxy.rs` with the shared
route-selection model:
  - `OutboundProxyConfig`;
  - `ClientRouteClass`;
  - `RouteFailureClass`;
  - `build_reqwest_client_for_route`.
- Preserves the existing reqwest/default-client behavior when no route
config is supplied.
- Uses the fixed MVP routing policy when route config is supplied:
platform system/PAC/WPAD discovery, then explicit env proxy variables,
then direct connection.
- Keeps platform-specific system discovery behind the shared client
boundary. This PR provides the contract and fallback behavior; later
resolver PRs plug in Windows and macOS discovery.
- Adds `login::AuthRouteConfig` so auth call sites depend on a small
policy type instead of platform resolver details.
- Maps the resolved `Config.respect_system_proxy` boolean into
`AuthRouteConfig` for auth-owned clients.
- Wires the route config through browser login, device-code login,
access-token login, login status, logout/revoke, token refresh, API-key
exchange, app-server account login, TUI/app startup, cloud-config
bootstrap, cloud tasks, plugin auth, and exec startup config loading.

## End-user behavior

- No behavior changes by default.
- When `respect_system_proxy = true`, auth-owned clients opt into the
shared route-aware client path.
- On platforms without a resolver implementation in this PR, system
discovery is unavailable and the route-aware path falls back to explicit
env proxy handling, then direct connection.
- Custom CA handling remains separate from proxy route selection and
still runs through the shared client builder.
- No proxy URLs, PAC contents, or resolved platform details are exposed
through the public config surface introduced here.

## Tests

Adds or updates coverage for:

- preserving default auth-client fallback behavior when no route config
is provided;
- injected environment-proxy fallback without mutating process
environment;
- existing login-server E2E flows using explicit `auth_route_config:
None` to guard unchanged default behavior;
- updated auth manager, login, logout, cloud-config, startup, and
plugin-auth call sites passing route config explicitly.

canvrno-oai · 2026-06-22 13:03:11 -07:00

1659c4a629

Use controlled time for remote initialization timeout test (#29329 )

## Summary

The remote-control initialization timeout test used a 50 ms wall-clock
deadline around a 10 ms transport timeout. A busy CI runner could miss
that outer deadline even when the rollback behavior was correct.

Pause Tokio time and advance it explicitly through the transport timeout
instead. The test still verifies that initialization fails and emits the
matching connection-closed event, without depending on scheduler speed.

jif · 2026-06-21 14:53:16 +02:00

6f5dd7b422

feat: opt ChatGPT auth into agent identity (#19049 )

## Stack

This is PR 2 of the simplified HAI single-run-task stack:

- [#19047](https://github.com/openai/codex/pull/19047) Agent Identity
assertion and task-registration primitives, including the shared
run-task helper used by existing Agent Identity JWT auth.
- [#19049](https://github.com/openai/codex/pull/19049)
Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted
Agent Identity runtime auth and its single run task.
- [#19051](https://github.com/openai/codex/pull/19051) Run-scoped
provider auth that uses one backend-owned task id for first-party
inference and compaction requests.

[#19054](https://github.com/openai/codex/pull/19054) collapsed out of
the active stack because the simplified design no longer needs a
separate background/control-plane task helper.

## Summary

This PR adds the disabled-by-default path for normal ChatGPT-login Codex
sessions to obtain Agent Identity runtime auth through the Codex
backend. Existing Agent Identity JWT startup mode remains a separate
path and does not require the feature flag.

What changed:

- adds the experimental `use_agent_identity` feature flag and config
schema entry
- adds an explicit `AgentIdentityAuthPolicy` so call sites choose
`JwtOnly` or `ChatGptAuth` instead of passing a bare boolean
- stores standalone Agent Identity JWT credentials separately from
backend-registered Agent Identity records
- persists the registered Agent Identity record, private key, and single
run task id in `auth.json` so process restarts reuse the same identity
- derives the agent/task registration base URL from ChatGPT/Codex auth
config while keeping JWT JWKS lookup separate
- provisions and caches ChatGPT-derived Agent Identity runtime auth when
`use_agent_identity` is enabled
- reuses the shared run-task registration helper from PR1 rather than
adding a second task-registration path

This PR intentionally does not switch model inference over to
`AgentAssertion` auth. The provider-auth integration lands in the next
PR.

## Testing

- `just test -p codex-login`

Adrian · 2026-06-18 14:05:27 -07:00

ec848dde0e

feat(app-server): enforce managed remote control disable (#27961 )

## Why

Managed deployments need a reliable deny gate for remote control.
Persisted enablement and explicit startup requests currently remain able
to start the transport, while the removed `features.remote_control` key
is intentionally only a compatibility no-op.

This adds a dedicated requirement that administrators can use to force
remote control off without deleting the user's persisted preference.
Removing the requirement and restarting restores the prior choice.

## What Changed

- Added top-level `allow_remote_control` requirements parsing, sourced
layer precedence, debug output, and `configRequirements/read` exposure
as `allowRemoteControl`.
- Added a typed transport policy captured from the startup requirements
snapshot. Managed disable forces the initial state to disabled and
prevents enrollment, refresh, connection, and persisted-preference
mutation.
- Rejected every `remoteControl/*` RPC before parameter deserialization
with JSON-RPC `-32600` and `remote control is disabled by managed
requirements`.
- Preserved the existing disabled status notification and the previous
behavior when the requirement is `true` or omitted.
- Regenerated app-server protocol schemas and documented the new
requirement.

## Verification

- Confirmed all remote-control RPCs, including a malformed request,
return the managed-policy error while the initial status notification
remains `disabled`.
- Confirmed explicit ephemeral startup and persisted enablement make no
backend connection and leave the SQLite preference unchanged.
- Confirmed `allow_remote_control = true` does not enable or block
remote control and `configRequirements/read` returns
`allowRemoteControl: false` for the deny policy.

Related issue: N/A (managed-policy hardening).

Anton Panasenko · 2026-06-12 20:10:12 -07:00

b9dc3b7a8b

feat: use encrypted local secrets for CLI auth (#27539 )

## Why

Windows Credential Manager limits generic credential blobs to 2,560
bytes. Large serialized ChatGPT auth payloads can exceed that limit, so
keyring-mode CLI auth needs a backend that keeps only the encryption key
in the OS keyring and stores the payload in Codex's encrypted
local-secrets file.

This is the third PR in the encrypted-auth stack:

1. #27504 — feature and config selection
2. #27535 — auth-specific local-secrets namespaces
3. This PR — CLI auth implementation and activation
4. MCP OAuth implementation and activation

## What Changed

- Added encrypted CLI-auth storage using the `CliAuth` secrets
namespace.
- Preserved direct keyring storage for platforms/configurations where it
remains selected.
- Selected the backend consistently for login, logout, refresh,
device-code login, auth loading, and login restrictions.
- Threaded resolved bootstrap/full config through CLI, exec, TUI,
app-server account handling, cloud config, and cloud tasks.
- Removed stale `auth.json` fallback data after successful encrypted
saves and removed encrypted, direct-keyring, and fallback data during
logout.
- Added storage and integration coverage for both direct and encrypted
keyring modes.

MCP OAuth persistence is intentionally left to the next PR.

## Validation

- `just test -p codex-login` — 131 passed
- `just test -p codex-cli` — 280 passed
- `just test -p codex-app-server v2::account` — 25 passed
- `just test -p codex-cloud-config service` — 21 passed, 7 skipped
- `just fix -p codex-login`
- `just fix -p codex-cli`
- `just fmt`

Celia Chen · 2026-06-12 21:23:50 +00:00

56c97e3b5c

feat(app-server): persist remote-control desired state (#27445 )

## Why

Remote-control runtime enablement and persisted enrollment preference
were represented by separate flags. That made startup rehydration, RPC
persistence, and new-enrollment seeding race with one another, and it
did not cleanly distinguish runtime-only CLI or daemon starts from
durable app-server RPC changes.

## What Changed

- Replace the parallel enablement, seed, and rehydration flags with one
transport-owned `RemoteControlDesiredState`.
- Add nullable enrollment-scoped persistence and preserve existing
preferences during enrollment upserts.
- Rehydrate plain startup only after auth and client scope resolve,
without overwriting a concurrent RPC transition.
- Make ordinary `remoteControl/enable` and `remoteControl/disable`
durable while retaining `ephemeral: true` for runtime-only callers.
- Have the daemon explicitly request ephemeral enablement and regenerate
the app-server schemas.

## Verification

- Covered migration and `NULL`/`0`/`1` persistence round trips.
- Covered plain-start rehydration and runtime-only versus durable
enrollment seeding.
- Covered durable enable, durable disable, and ephemeral enable through
app-server RPC.
- Covered the daemon's exact `{ "ephemeral": true }` request payload.

Related issue: N/A (internal remote-control persistence architecture
change).

Anton Panasenko · 2026-06-11 21:28:52 -07:00

d61dfeb23a

feat: add Bedrock API key as a managed auth mode (#27443 )

## Why

Codex needs to manage Amazon Bedrock API key credentials through the
existing auth lifecycle instead of introducing a separate auth manager
or provider-specific credential file. Treating Bedrock API key login as
a primary auth mode gives it the same persistence, keyring, reload, and
logout behavior as the existing OpenAI API key and ChatGPT modes.

The credential is valid only for the `amazon-bedrock` model provider.
OpenAI-compatible providers must reject this auth mode rather than
treating the Bedrock key as an OpenAI bearer token.

## What changed

- Added `bedrockApiKey` as an app-server `AuthMode` and
`CodexAuth::BedrockApiKey` as a primary `AuthManager` mode.
- Added `BedrockApiKeyAuth`, containing the API key and AWS region, to
the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or
the configured keyring backend.
- Added `login_with_bedrock_api_key(...)`, parallel to
`login_with_api_key(...)`, which replaces the current stored login with
Bedrock credentials.
- Reused generic auth reload and logout behavior instead of adding a
Bedrock-specific auth manager or logout path.
- Updated login restrictions, status reporting, diagnostics, telemetry
classification, generated app-server schemas, and auth fixtures for the
new mode.
- Added explicit errors when Bedrock API key auth is selected with an
OpenAI-compatible model provider.

This PR establishes managed storage and auth-mode behavior. Routing the
managed key and region into Amazon Bedrock requests will be in follow-up
PRs.

Celia Chen · 2026-06-10 20:42:38 -07:00

06afd63f4a

fix(remote-control): preserve enrollment on generic websocket 404s (#26741 )

## Why

A remote-control WebSocket handshake can receive a generic HTTP 404 when
an intermediary routes the request without preserving the WebSocket
upgrade. Treating every 404 as proof that the remote app server is gone
clears valid enrollment and causes repeated re-enrollment, new
environment and server IDs, Habitat churn, and noisy `/server/enroll`
traffic.

## What Changed

- Clear enrollment only when a 404 JSON response explicitly contains
`{"detail":"Remote app server not found"}`.
- Preserve enrollment for empty, plain-text, malformed, or otherwise
unrecognized 404 responses, return the transport error, and retry with
the existing reconnect backoff.
- Log the status, correlation headers (`request-id` or
`x-oai-request-id`, plus `cf-ray`), and bounded/redacted response body
for unrecognized 404s.
- Cover both explicit missing-server re-enrollment and generic 404
enrollment preservation/reconnect behavior.

## Verification

`just test -p codex-app-server-transport` passes all 114 tests on the
rebased branch, including the targeted explicit and generic WebSocket
404 scenarios.

Related issue: N/A

Anton Panasenko · 2026-06-05 22:54:57 -07:00

87b808bb57

[codex-rs] support v2 personal access tokens (#25731 )

## Summary

- add v2 personal access token support for `codex login
--with-access-token` and `CODEX_ACCESS_TOKEN`
- classify opaque `at-` tokens separately from legacy Agent Identity
JWTs
- hydrate required ChatGPT account metadata through AuthAPI
`/v1/user-auth-credential/whoami`
- use PATs directly as bearer tokens while preserving existing ChatGPT
account surfaces
- expose PAT-backed auth as the explicit `personalAccessToken`
app-server auth mode

## Implementation

PAT auth is intentionally small and stateless. Loading a PAT performs
one AuthAPI metadata request, stores the hydrated metadata in the
in-memory auth object, and redacts the secret from debug output. Legacy
Agent Identity JWT handling remains unchanged. The shared access-token
classifier lives in a private neutral module because it dispatches
between both credential types.

PAT hydration fails closed when AuthAPI omits any required metadata,
including email. Hydrated metadata is intentionally not persisted:
startup performs a live `whoami` preflight so revoked tokens or changed
account metadata are not accepted from a stale cache.

## Workspace restriction scope

This change intentionally does **not** apply
`forced_chatgpt_workspace_id` to PAT authentication. The setting is a
client-side config guardrail, not an authorization boundary, and PAT
does not currently require workspace-ID parity. The PAT login and
`CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without
threading workspace-restriction state through access-token loading.
Existing workspace checks for non-PAT auth remain on their established
paths.

## App-server compatibility

The public app-server `AuthMode` is shared across v1 and v2, and
PAT-backed auth reports `personalAccessToken` through both APIs.
Following human review, this intentionally removes the temporary v1
compatibility mapping that reported PATs as `chatgpt`; the deprecated v1
API is kept in parity with v2 rather than maintaining a separate closed
enum. Clients with exhaustive auth-mode handling in either API version
must add the new case and should generally treat it as ChatGPT-backed
unless they need PAT-specific behavior.

The v1 auth-status response still omits the raw PAT when `includeToken`
is requested because that response cannot carry the account metadata
needed to reuse the credential safely. Persisted PAT auth also omits the
new enum value so older Codex builds can deserialize `auth.json` and
infer PAT auth from the credential field after a rollback.

## Validation

Latest review-fix validation:

- `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed)
- `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed)
- `CARGO_INCREMENTAL=0 just test -p codex-cli
stored_auth_validation_handles_personal_access_token`
- `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226
passed)
- `CARGO_INCREMENTAL=0 just test -p codex-models-manager
refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth`
- `CARGO_INCREMENTAL=0 just test -p codex-tui
existing_non_oauth_chatgpt_login_counts_as_signed_in`
- `CARGO_INCREMENTAL=0 just fix -p codex-login -p
codex-app-server-protocol -p codex-models-manager -p codex-tui -p
codex-cli`
- `just fmt`
- `git diff --check`

The broader `codex-tui` suite previously compiled and ran 2,834 tests.
Three unrelated environment-sensitive guardian/IDE-socket tests failed
after retries; the PAT-relevant TUI coverage passed.

cooper-oai · 2026-06-05 17:36:18 -07:00

df7818c7d1

feat(remote-control): add pairing status transport (#26449 )

## What

Adds transport support for checking remote-control pairing status
against the backend.

- Adds the normalized `server/pair/status` backend URL.
- Adds backend request/response structs for exactly one lookup key:
`pairing_code` or `manual_pairing_code`, returning `{ claimed }`.
- Adds `RemoteControlEnrollment::pairing_status` and
`RemoteControlHandle::pairing_status`.
- Preserves auth refresh/retry behavior and backend error mapping.
- Adds transport coverage for pending, claimed, manual-code payloads,
token refresh, mapped backend errors, malformed responses, and URL
normalization.

## Why

Desktop needs a host-authenticated way to poll whether a QR or manual
pairing code has been claimed.

Related backend change: https://github.com/openai/openai/pull/990244

## Verification

- `cargo test --manifest-path app-server-transport/Cargo.toml
remote_control::tests::pairing_tests`
- `cargo fmt --all --check`
- `git diff --check`

hefuc-oai · 2026-06-05 10:07:25 -07:00

da490ba9de

feat(remote-control): allow pairing while disabled (#26215 )

## Why

`remoteControl/pairing/start` creates authorization for future
remote-control connections, so it should not require the live websocket
to already be enabled. Requiring enable first made pairing depend on
presence instead of the persisted server enrollment that pairing
actually uses.

Pairing also needs to recover when that persisted server row is stale.
If `/server/pair` returns `404`, making the first pairing attempt fail
forces a manual retry even though the client can clear the stale row and
create a replacement enrollment immediately.

## What Changed

- Allow `remoteControl/pairing/start` to reuse or create the persisted
remote-control server enrollment while remote control is disabled.
- Keep the selected in-memory enrollment across disable and share it
with websocket connect so a later enable uses the same selected server.
- Thread the app-server client name through pairing so stdio persistence
keeps using the websocket-owned enrollment key.
- Recover pairing server-token auth failures through the existing
refresh/auth-recovery path.
- Recover stale pairing enrollment on `/server/pair` `404` by clearing
the stale selected enrollment, re-enrolling once, and retrying pairing
once.
- Add focused disabled-pairing and stale-pairing recovery coverage.

## Verification

-
`remote_control_pairing_start_returns_pairing_artifacts_while_disabled`
exercises pairing before enable.
- `remote_control_handle_reenrolls_after_stale_pairing_enrollment`
exercises stale `/server/pair` `404` recovery without a manual retry.

Related: N/A

Anton Panasenko · 2026-06-05 05:12:23 +00:00

64e0829cab

feat(app-server): add remote control client management RPCs (#25785 )

## Why

Remote-control clients need to list and revoke controller-device grants
without enabling or enrolling the local relay. These are signed-in
account-management operations, so coupling them to websocket, pairing,
enrollment, or persisted relay state would prevent clients from managing
stale grants from the picker.

Related enhancement request: N/A. This adds the Codex app-server surface
for the planned upstream environment-scoped revoke endpoint.

## What Changed

- Added experimental app-server v2 RPCs:
  - `remoteControl/client/list`
  - `remoteControl/client/revoke`
- Added picker-oriented protocol types and standard generated schema
fixtures. The list response intentionally omits backend account id,
enrollment status, and location fields.
- Added `app-server-transport/src/transport/remote_control/clients.rs`
for environment-scoped GET and DELETE requests. It builds escaped URL
path segments, forwards optional pagination query fields, sends ChatGPT
auth plus `chatgpt-account-id`, converts RFC3339 `last_seen_at` values
to Unix seconds, accepts `204 No Content` revoke responses, and retries
once after a `401`.
- Extracted shared ChatGPT auth loading and recovery into
`app-server-transport/src/transport/remote_control/auth.rs` so
websocket, pairing, and client management use the same account-auth
boundary.
- Retained the configured remote-control base URL on
`RemoteControlHandle` and resolve management URLs lazily, preserving
deferred validation while relay startup is disabled.
- Registered list as `global_shared_read("remote-control-clients")` and
revoke as `global("remote-control-clients")`.

## Verification

- Added transport coverage proving list and revoke work while relay
state is disabled, IDs are escaped, picker-only fields are returned,
timestamps are converted, revoke accepts `204`, auth headers are
forwarded, `401` retries exactly once, `403` is not retried, and
malformed list payloads retain decode context.
- Added an app-server integration test proving both JSON-RPC methods
work before relay enablement and successful revoke returns `{}`.
- Regenerated and validated experimental and standard app-server schema
fixtures.

Anton Panasenko · 2026-06-02 17:01:02 -07:00

98a62a62ce

feat(remote-control): add pairing start (#25675 )

## Why

Remote control enrollment authorizes a desktop server, but app-server v2
did not expose the follow-up pairing operation needed to mint a
short-lived controller pairing artifact from that enrolled server.
Clients need a narrow RPC that starts pairing without exposing the
backend `serverId` or conflating pairing with websocket connection
state.

Issue: N/A; internal remote-control pairing API change.

## What Changed

Added experimental app-server v2 `remoteControl/pairing/start` with
`manualCode` input and `pairingCode`, nullable `manualPairingCode`,
`environmentId`, and Unix-seconds `expiresAt` output. The method
serializes under its own `global("remote-control-pairing")` scope and is
documented in `app-server/README.md`.

Extended the remote-control transport with private `/server/pair`
request/response types and normalized `pair_url` handling. Pairing uses
the current enrolled server bearer, refreshes that bearer when needed,
keeps backend `server_id` private, validates returned `server_id` and
`environment_id` against the current enrollment, and preserves backend
status/header/body context for failures and malformed responses.

Wired the request through `RemoteControlRequestProcessor` and
`MessageProcessor`, mapping unavailable/disabled pairing to
`invalid_request` and backend failures to internal errors.

## Verification

- `just test -p codex-app-server-transport`
- `just test -p codex-app-server
remote_control_pairing_start_returns_pairing_artifacts`

Anton Panasenko · 2026-06-02 01:05:50 +00:00

0002316687

feat(app-server): migrate remote control to server tokens (#24141 )

## Why

`codex-backend` now authenticates remote-control server websocket
connections with short-lived server tokens instead of the user's ChatGPT
access token. `app-server` needs to mint and refresh those server tokens
without persisting them, so a restart can reconnect from durable
enrollment identity while keeping the bearer token memory-only.

## What Changed

Updated the remote-control transport to consume `remote_control_token`
and `expires_at` from server enroll responses and added
`/server/refresh` support for persisted enrollments or expiring cached
tokens.

Websocket handshakes now send `Authorization: Bearer
<remote_control_token>` with the existing server identity headers, and
no longer send the ChatGPT bearer token or `chatgpt-account-id` on that
websocket path.

The in-memory enrollment state now owns the ephemeral server token
cache, while SQLite still persists only `server_id`, `environment_id`,
and `server_name`. Websocket `401`/`403` clears only the cached token
for refresh on reconnect; websocket or refresh `404` clears stale
persisted enrollment and re-enrolls. Response body previews redact
`remote_control_token` before surfacing parse errors.

## Verification

- `just test -p codex-app-server-transport`
- Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control
--json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached
`status:"connected"` with
`environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.

Anton Panasenko · 2026-05-28 15:57:08 -07:00

912d7d4f75

Uprev Rust toolchain pins to 1.95.0 (#24684 )

## Summary
- Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
environment config.
- Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
match the new version.
- Leave purpose-specific toolchains unchanged, including the
`argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
build pin.
- Includes fixes for new lints from `just fix` and a few codex-authored
fixes for lints without a suggestion.

Adam Perry @ OpenAI · 2026-05-26 20:59:47 -07:00

cca1e0ba1d

fix(remote-control): surface websocket task stalls (#24473 )

## Why

When the app-server remote-control websocket path stalls during
connection setup or teardown, the existing logs do not show where the
task stopped, and several awaits can keep the task from returning
promptly. That makes offline or stale-host incidents hard to distinguish
from expected shutdown or disable flow.

Issue: N/A (internal incident investigation)

## What Changed

Added structured lifecycle and status logging around remote-control
enable/disable requests, websocket task startup and exit, connection
cycles, enrollment context, and status/environment transitions.

Bound websocket connect, transport-event forwarding, and
connection-worker shutdown waits. On timeout, the code logs the stalled
operation and stops or aborts workers so the loop can reconnect or exit
instead of waiting indefinitely. Ping sends now also observe shutdown
cancellation.

Anton Panasenko · 2026-05-26 13:17:58 -07:00

3da89d4831

fix(remote-control): cap reconnect backoff (#24164 )

## Why

Remote-control websocket reconnects currently use the shared exponential
backoff helper without a local ceiling, so a long failure streak can
stretch retries out indefinitely and leave the runtime behavior hard to
inspect from logs.

## What Changed

Cap the remote-control reconnect delay at 30 seconds, then reset the
reconnect attempt counter once that capped delay is emitted so the next
failure starts from the initial jittered delay again.

The reconnect failure log now records the attempt number, chosen delay,
and whether the cap triggered a reset, with a separate info log when the
backoff counter is reset after the cap.

## Verification

`just test -p codex-app-server-transport`

Related issue: N/A

Anton Panasenko · 2026-05-23 00:38:22 +00:00

03e6c5f600

fix(remote-control): retry after auth recovery (#23775 )

## Why

When remote control hits an auth failure such as a revoked or reused
refresh token, the websocket loop falls into reconnect backoff. If the
user fixes auth while that loop is sleeping, remote control can stay
offline until the old retry timer expires because nothing wakes the loop
or resets its exhausted auth recovery state.

## What Changed

Added an auth-change watch on `AuthManager` for refresh-relevant cached
auth updates.

The remote-control websocket loop now subscribes to that signal, resets
`UnauthorizedRecovery` and reconnect backoff when auth changes, and
retries immediately instead of waiting for the previous delay.

Updated the remote-control transport test to verify that reloading auth
with the now-available account id wakes enrollment before the prior
retry delay.

## Verification

`cargo test -p codex-app-server-transport
remote_control_waits_for_account_id_before_enrolling`

Anton Panasenko · 2026-05-21 14:38:30 -07:00

58be470d15

fix: serialize unix app-server startup (#23516 )

# Summary

Unix-socket app-server startup can currently race when multiple launch
attempts target the same `CODEX_HOME`. Those processes can overlap
before the control socket exists, which lets them enter SQLite state
initialization concurrently and reproduce the startup corruption pattern
seen in SSH mode.

This change makes the app-server own that singleton startup guarantee.
Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before
SQLite initialization, runs the existing control-socket preparation
check while holding that lock, returns the established `AddrInUse` error
when another live listener already owns the socket, and releases the
lock once the new listener has bound its socket.

# Design decisions

- The singleton rule lives in `app-server --listen unix://`, not in a
desktop-only caller path, so every Unix-socket launch gets the same race
protection.
- A duplicate raw app-server launch returns an error instead of silently
succeeding. The attach operation remains `app-server proxy`, which
continues to connect to an already-running listener.
- The lock is held only across the dangerous startup window: socket
preparation, SQLite initialization, and socket bind. It is not held for
the app-server lifetime.
- Listener detection stays in `prepare_control_socket_path(...)`, so the
preexisting live-listener and stale-socket behavior remains the single
source of truth.

# Testing

Tests: targeted Unix-socket transport tests on the branch checkout, full
`codex-cli` build on `efrazer-db10`, and an SSH-style smoke on
`efrazer-db10` covering concurrent app-server starts, explicit
duplicate-start errors, and absence of SQLite startup-error matches in
launch logs.

efrazer-oai · 2026-05-19 14:57:11 -07:00

c2141c7ce0

feat(app-server): update remote control APIs for better UX (#22877 )

## Why
To help improve `codex remote-control` CLI UX which I plan to do in a
followup, this PR adds `server-name` to the various remote control APIs:
- `remoteControl/enable`
- `remoteControl/disable`
- `remoteControl/status/changed`

Also, add a `remoteControl/status/read` API. This will be helpful in the
Codex App.

Owen Lin · 2026-05-15 14:33:24 -07:00

6a331a66eb

enable/disable remote control at runtime, not via features (#22578 )

## Why
reapplies https://github.com/openai/codex/pull/22386 which was
previously reverted

Also, introduce `remoteControl/enable` and `remoteControl/disable`
app-server APIs to toggle on/off remote control at runtime for a given
running app-server instance.

## What Changed

- Adds experimental v2 RPCs:
  - `remoteControl/enable`
  - `remoteControl/disable`
- Adds `RemoteControlRequestProcessor` and routes the new RPCs through
it instead of `ConfigRequestProcessor`.
- Adds named `RemoteControlHandle::enable`, `disable`, and `status`
methods.
- Makes `remoteControl/enable` return an error when sqlite state DB is
unavailable, while keeping enrollment/websocket failures as async status
updates.
- Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden
`--remote-control` flags for `codex app-server` and `codex-app-server`.
- Updates managed daemon startup to use `codex app-server
--remote-control --listen unix://`.
- Marks `Feature::RemoteControl` as removed and ignores
`[features].remote_control`.
- Updates app-server README entries for the new remote-control methods.

Owen Lin · 2026-05-14 01:07:46 +00:00

4e368aa2e9

Restore app-server websocket listener with auth guard (#22404 )

## Why
PR #21843 removed the TCP websocket app-server listener, but that also
removed functionality that still needs to exist. Restoring it as-is
would reopen the old remote exposure problem, so this keeps the restored
listener while making remote and non-loopback usage require explicit
auth.

## What Changed
- Mostly reverts #21843 and reapplies the small merge-conflict
resolutions needed on top of current main.
- Restores ws://IP:PORT parsing, the app-server TCP websocket acceptor,
websocket auth CLI flags, and the associated tests.
- The only intentional behavior change from the restored code is that
non-loopback websocket listeners now fail startup unless --ws-auth
capability-token or --ws-auth signed-bearer-token is configured.
Loopback listeners remain available for local and SSH-forwarding
workflows.

## Reviewer Focus
Please focus review on the small auth-enforcement delta layered on top
of the revert:

- codex-rs/app-server-transport/src/transport/websocket.rs:
start_websocket_acceptor now rejects unauthenticated non-loopback
websocket binds before accepting connections.
- codex-rs/app-server-transport/src/transport/auth.rs: helper logic
classifies unauthenticated non-loopback listeners.
- codex-rs/app-server/tests/suite/v2/connection_handling_websocket.rs:
tests cover unauthenticated ws://0.0.0.0 startup rejection and
authenticated non-loopback capability-token startup.

Everything else is intended to be revert/merge-conflict restoration
rather than new product behavior.

## Verification

- Manually verified that TUI remoting is restored and that auth is
enforced for non-localhost urls.

Eric Traut · 2026-05-12 18:40:53 -07:00

51bfb5f3b1

app-server: remove TCP websocket listener (#21843 )

## Why

The app-server no longer needs to expose a TCP websocket listener.
Keeping that transport also kept around a separate listener/auth surface
that is unnecessary now that local clients can use stdio or the
Unix-domain control socket, while remote connectivity is handled by
`remote_control`.

## What Changed

- Removed `ws://IP:PORT` parsing and the `AppServerTransport::WebSocket`
startup path.
- Deleted the app-server websocket listener auth module and removed
related CLI flags/dependencies.
- Kept websocket framing only where it is still needed: over the
Unix-domain control socket and in the outbound `remote_control`
connection.
- Updated app-server CLI/help text and `app-server/README.md` to
document only `stdio://`, `unix://`, `unix://PATH`, and `off` for local
transports.
- Converted affected app-server integration coverage from TCP websocket
listeners to UDS-backed websocket connections, and added a parse test
that rejects `ws://` listen URLs.
- Removed the now-unused workspace `constant_time_eq` dependency and
refreshed `Cargo.lock` after `cargo shear` caught the drift.
- Moved test app-server UDS socket paths to short Unix temp paths so
macOS Bazel test sandboxes do not exceed Unix socket path limits.

## Verification

- Added/updated tests around UDS websocket transport behavior and
`ws://` listen URL rejection.
- `cargo shear`
- `cargo metadata --no-deps --format-version 1`
- `cargo test -p codex-app-server unix_socket_transport`
- `cargo test -p codex-app-server unix_socket_disconnect`
- `just fix -p codex-app-server`
- `git diff --check`

Local full Rust test execution was blocked before compilation by an
external fetch failure for the pinned `nornagon/crossterm` git
dependency. `just bazel-lock-update` and `just bazel-lock-check` were
retried after the manifest cleanup but remain blocked by external
BuildBuddy/V8 fetch timeouts.

Ruslan Nigmatullin · 2026-05-11 10:17:26 -07:00

a124ddb854

feat: Use installation ID in remote enrollments (#21662 )

* Pass installation ID for storage on enrollments server for
deduping/grouping multiple appservers per installation
* Pass installation ID in remoteControl/status/changed events

David de Regt · 2026-05-08 17:54:01 +00:00

872b8b15b3

Disable empty Cargo test targets (#21584 )

## Summary

`cargo test` has entails both running standard Rust tests and doctests.
It turns out that the doctest discovery is fairly slow, and it's a cost
you pay even for crates that don't include any doctests.

This PR disables doctests with `doctest = false` for crates that lack
any doctests.

For the collection of crates below, this speeds up test execution by
>4x.

E.g., before this PR:

```
Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
  Time (mean ± σ):      1.849 s ±  4.455 s    [User: 0.752 s, System: 1.367 s]
  Range (min … max):    0.418 s … 14.529 s    10 runs
```

And after:

```
Benchmark 1: cargo test     -p codex-utils-absolute-path     -p codex-utils-cache     -p codex-utils-cli     -p codex-utils-home-dir     -p codex-utils-output-truncation     -p codex-utils-path     -p codex-utils-string     -p codex-utils-template     -p codex-utils-elapsed     -p codex-utils-json-to-toml
  Time (mean ± σ):     428.6 ms ±   6.9 ms    [User: 187.7 ms, System: 219.7 ms]
  Range (min … max):   418.0 ms … 436.8 ms    10 runs
```

For a single crate, with >2x speedup, before:

```
Benchmark 1: cargo test -p codex-utils-string
  Time (mean ± σ):     491.1 ms ±   9.0 ms    [User: 229.8 ms, System: 234.9 ms]
  Range (min … max):   480.9 ms … 512.0 ms    10 runs
```

And after:

```
Benchmark 1: cargo test -p codex-utils-string
  Time (mean ± σ):     213.9 ms ±   4.3 ms    [User: 112.8 ms, System: 84.0 ms]
  Range (min … max):   206.8 ms … 221.0 ms    13 runs
```

Co-authored-by: Codex <noreply@openai.com>

Charlie Marsh · 2026-05-07 15:44:17 -07:00

54ef99a365

device-key: clean up unused crate (#21487 )

Ruslan Nigmatullin · 2026-05-07 09:01:44 -07:00

e64a8979b0

app-server: move transport into dedicated crate (#20545 )

## Why

`codex-app-server` currently owns both request-processing code and
transport implementation details. Splitting the transport layer into its
own crate makes that boundary explicit, reduces the amount of
transport-specific dependency surface carried by `codex-app-server`, and
gives future transport work a narrower place to evolve.

## What changed

- Added `codex-app-server-transport` and moved the existing transport
tree into it, including stdio, unix socket, websocket, remote-control
transport, and websocket auth.
- Moved shared transport-facing message types into the new crate so both
the transport implementation and `codex-app-server` use the same
definitions.
- Kept processor-facing connection state and outbound routing in
`codex-app-server`, with the routing tests moved next to that local
wrapper.
- Updated workspace metadata, Bazel crate metadata, and
`codex-app-server` dependencies for the new crate boundary.

## Validation

- `cargo metadata --locked --no-deps`
- `git diff --check`
- Attempted `cargo test -p codex-app-server-transport`, `cargo test -p
codex-app-server`, `just fix -p codex-app-server-transport`, and `just
fix -p codex-app-server`; all were blocked before compilation by the
existing `packageproxy` resolution failure for locked `rustls-webpki =
0.103.13`.
- Attempted Bazel build / lockfile validation; those were blocked by
external fetch failures against BuildBuddy / GitHub while resolving
`v8`.

Ruslan Nigmatullin · 2026-05-01 09:23:47 -07:00

41e171fcf2

30 Commits