mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
30 Commits
-
fix(remote-control): avoid server token refresh retry storms (#30201)
## Why Remote-control websocket reconnects and pairing requests proactively refresh their server token. When `/server/refresh` returns a transient error such as `502`, the still-valid token was discarded as a usable connection path, causing reconnect failures and repeated refresh attempts that could amplify an upstream incident. ## What Changed - Start proactive refresh five minutes before token expiry and distinguish it from a required refresh for missing or expired tokens. - Continue websocket and pairing operations with the existing valid token after `429`, `5xx`, or timeout failures. - Share an in-memory `next_refresh_at` throttle across websocket and pairing callers, honoring both `Retry-After` formats and otherwise using a jittered 24–36 second delay. - Keep required refreshes strict, preserve `404` enrollment replacement, and clear token/throttle state for `401` and `403` auth recovery. - Preserve refresh response metadata internally and add focused wire-level and integration coverage. ## Verification Added behavioral coverage proving that: - a valid near-expiry token still completes websocket and pairing requests after transient refresh failures; - `Retry-After` suppresses a subsequent refresh across websocket and pairing callers; - request and response-body timeouts are classified as transient; - an expired token, including one that expires during refresh, cannot proceed to websocket connection; - auth failures clear the attempted token without overwriting a concurrently rotated token.
Anton Panasenko ·
2026-06-26 17:34:52 -07:00 -
[codex] dedupe remote control account header (#29893)
## Why Remote-control HTTP requests applied the authentication headers and then appended `ChatGPT-Account-ID` again with `reqwest::RequestBuilder::header`. Since reqwest appends, the wire request could contain the same header twice. Intermediaries may coalesce duplicate values into `uuid,uuid`, which is not a valid account ID. ## What changed - Build remote-control request authentication headers in one place. - Apply provider headers first, then use `HeaderMap::insert` for the explicit account ID. This preserves the current account-ID precedence and all other authentication headers while ensuring exactly one account header is sent. - Preserve duplicate HTTP headers in the test harness and assert exactly one account header for enroll, refresh, list, and revoke requests. ## Validation Added focused coverage for: - Adding the explicit account header when the auth provider omits it. - Replacing multiple provider-supplied account values, including a differently cased header name. - Preserving authorization and routing headers while replacing only the account header. - Rejecting invalid account header values before sending a request. - Emitting exactly one account header for enroll, refresh, list, and revoke requests. - Maintaining header uniqueness across unauthorized recovery, retry, and error-response paths. - Emitting exactly one installation header for enroll and refresh requests. Checks run: - `just test -p codex-app-server-transport request_headers`: 3 passed - `just test -p codex-app-server-transport remote_control_http_mode`: 6 passed - `just test -p codex-app-server-transport clients_tests`: 6 passed - `just test -p codex-app-server-transport`: 123 passed - `cargo test -p codex-app-server-transport`: 123 passed - `just clippy -p codex-app-server-transport` - `just fmt-check` - `bazel test //codex-rs/app-server-transport:app-server-transport-unit-tests`
Shuo ·
2026-06-24 15:06:53 -07:00 -
auth: move domain mode below app wire types (#29721)
## Why Authentication mode is a domain concept used by login, model selection, telemetry, and transports. Keeping the canonical type in app-server protocol forces those lower-level crates to depend on an unrelated wire API. ## What changed - Added canonical `codex_protocol::auth::AuthMode` domain values. - Kept the app-server wire DTO unchanged and added an explicit app-side conversion. - Removed production app-server-protocol dependencies from login, model-provider-info, models-manager, and otel call paths. ## Stack This is PR 2 of 6, stacked on [PR #29714](https://github.com/openai/codex/pull/29714). Review only the delta from `codex/split-json-rpc-protocols`. Next: [PR #29722](https://github.com/openai/codex/pull/29722). ## Validation - Auth and login coverage passed in the focused protocol/domain test run. - App-server account and auth conversion coverage passed.
Adam Perry @ OpenAI ·
2026-06-24 03:10:20 +00:00 -
PAC 2 - Add shared auth system proxy contract (#26707)
## Summary Stacked on #26706. Adds the shared auth/system-proxy contract that later platform resolver PRs plug into. This PR moves Codex-owned auth and startup HTTP clients through a common route-aware boundary, but does not yet add Windows or macOS system proxy resolution. The default path remains unchanged when `respect_system_proxy` is absent or disabled. ## Implementation - Adds `codex-client/src/outbound_proxy.rs` with the shared route-selection model: - `OutboundProxyConfig`; - `ClientRouteClass`; - `RouteFailureClass`; - `build_reqwest_client_for_route`. - Preserves the existing reqwest/default-client behavior when no route config is supplied. - Uses the fixed MVP routing policy when route config is supplied: platform system/PAC/WPAD discovery, then explicit env proxy variables, then direct connection. - Keeps platform-specific system discovery behind the shared client boundary. This PR provides the contract and fallback behavior; later resolver PRs plug in Windows and macOS discovery. - Adds `login::AuthRouteConfig` so auth call sites depend on a small policy type instead of platform resolver details. - Maps the resolved `Config.respect_system_proxy` boolean into `AuthRouteConfig` for auth-owned clients. - Wires the route config through browser login, device-code login, access-token login, login status, logout/revoke, token refresh, API-key exchange, app-server account login, TUI/app startup, cloud-config bootstrap, cloud tasks, plugin auth, and exec startup config loading. ## End-user behavior - No behavior changes by default. - When `respect_system_proxy = true`, auth-owned clients opt into the shared route-aware client path. - On platforms without a resolver implementation in this PR, system discovery is unavailable and the route-aware path falls back to explicit env proxy handling, then direct connection. - Custom CA handling remains separate from proxy route selection and still runs through the shared client builder. - No proxy URLs, PAC contents, or resolved platform details are exposed through the public config surface introduced here. ## Tests Adds or updates coverage for: - preserving default auth-client fallback behavior when no route config is provided; - injected environment-proxy fallback without mutating process environment; - existing login-server E2E flows using explicit `auth_route_config: None` to guard unchanged default behavior; - updated auth manager, login, logout, cloud-config, startup, and plugin-auth call sites passing route config explicitly.
canvrno-oai ·
2026-06-22 13:03:11 -07:00 -
Use controlled time for remote initialization timeout test (#29329)
## Summary The remote-control initialization timeout test used a 50 ms wall-clock deadline around a 10 ms transport timeout. A busy CI runner could miss that outer deadline even when the rollback behavior was correct. Pause Tokio time and advance it explicitly through the transport timeout instead. The test still verifies that initialization fails and emits the matching connection-closed event, without depending on scheduler speed.
jif ·
2026-06-21 14:53:16 +02:00 -
feat: opt ChatGPT auth into agent identity (#19049)
## Stack This is PR 2 of the simplified HAI single-run-task stack: - [#19047](https://github.com/openai/codex/pull/19047) Agent Identity assertion and task-registration primitives, including the shared run-task helper used by existing Agent Identity JWT auth. - [#19049](https://github.com/openai/codex/pull/19049) Disabled-by-default ChatGPT auth opt-in that provisions/reuses persisted Agent Identity runtime auth and its single run task. - [#19051](https://github.com/openai/codex/pull/19051) Run-scoped provider auth that uses one backend-owned task id for first-party inference and compaction requests. [#19054](https://github.com/openai/codex/pull/19054) collapsed out of the active stack because the simplified design no longer needs a separate background/control-plane task helper. ## Summary This PR adds the disabled-by-default path for normal ChatGPT-login Codex sessions to obtain Agent Identity runtime auth through the Codex backend. Existing Agent Identity JWT startup mode remains a separate path and does not require the feature flag. What changed: - adds the experimental `use_agent_identity` feature flag and config schema entry - adds an explicit `AgentIdentityAuthPolicy` so call sites choose `JwtOnly` or `ChatGptAuth` instead of passing a bare boolean - stores standalone Agent Identity JWT credentials separately from backend-registered Agent Identity records - persists the registered Agent Identity record, private key, and single run task id in `auth.json` so process restarts reuse the same identity - derives the agent/task registration base URL from ChatGPT/Codex auth config while keeping JWT JWKS lookup separate - provisions and caches ChatGPT-derived Agent Identity runtime auth when `use_agent_identity` is enabled - reuses the shared run-task registration helper from PR1 rather than adding a second task-registration path This PR intentionally does not switch model inference over to `AgentAssertion` auth. The provider-auth integration lands in the next PR. ## Testing - `just test -p codex-login`
Adrian ·
2026-06-18 14:05:27 -07:00 -
feat(app-server): enforce managed remote control disable (#27961)
## Why Managed deployments need a reliable deny gate for remote control. Persisted enablement and explicit startup requests currently remain able to start the transport, while the removed `features.remote_control` key is intentionally only a compatibility no-op. This adds a dedicated requirement that administrators can use to force remote control off without deleting the user's persisted preference. Removing the requirement and restarting restores the prior choice. ## What Changed - Added top-level `allow_remote_control` requirements parsing, sourced layer precedence, debug output, and `configRequirements/read` exposure as `allowRemoteControl`. - Added a typed transport policy captured from the startup requirements snapshot. Managed disable forces the initial state to disabled and prevents enrollment, refresh, connection, and persisted-preference mutation. - Rejected every `remoteControl/*` RPC before parameter deserialization with JSON-RPC `-32600` and `remote control is disabled by managed requirements`. - Preserved the existing disabled status notification and the previous behavior when the requirement is `true` or omitted. - Regenerated app-server protocol schemas and documented the new requirement. ## Verification - Confirmed all remote-control RPCs, including a malformed request, return the managed-policy error while the initial status notification remains `disabled`. - Confirmed explicit ephemeral startup and persisted enablement make no backend connection and leave the SQLite preference unchanged. - Confirmed `allow_remote_control = true` does not enable or block remote control and `configRequirements/read` returns `allowRemoteControl: false` for the deny policy. Related issue: N/A (managed-policy hardening).
Anton Panasenko ·
2026-06-12 20:10:12 -07:00 -
feat: use encrypted local secrets for CLI auth (#27539)
## Why Windows Credential Manager limits generic credential blobs to 2,560 bytes. Large serialized ChatGPT auth payloads can exceed that limit, so keyring-mode CLI auth needs a backend that keeps only the encryption key in the OS keyring and stores the payload in Codex's encrypted local-secrets file. This is the third PR in the encrypted-auth stack: 1. #27504 — feature and config selection 2. #27535 — auth-specific local-secrets namespaces 3. This PR — CLI auth implementation and activation 4. MCP OAuth implementation and activation ## What Changed - Added encrypted CLI-auth storage using the `CliAuth` secrets namespace. - Preserved direct keyring storage for platforms/configurations where it remains selected. - Selected the backend consistently for login, logout, refresh, device-code login, auth loading, and login restrictions. - Threaded resolved bootstrap/full config through CLI, exec, TUI, app-server account handling, cloud config, and cloud tasks. - Removed stale `auth.json` fallback data after successful encrypted saves and removed encrypted, direct-keyring, and fallback data during logout. - Added storage and integration coverage for both direct and encrypted keyring modes. MCP OAuth persistence is intentionally left to the next PR. ## Validation - `just test -p codex-login` — 131 passed - `just test -p codex-cli` — 280 passed - `just test -p codex-app-server v2::account` — 25 passed - `just test -p codex-cloud-config service` — 21 passed, 7 skipped - `just fix -p codex-login` - `just fix -p codex-cli` - `just fmt`
Celia Chen ·
2026-06-12 21:23:50 +00:00 -
feat(app-server): persist remote-control desired state (#27445)
## Why Remote-control runtime enablement and persisted enrollment preference were represented by separate flags. That made startup rehydration, RPC persistence, and new-enrollment seeding race with one another, and it did not cleanly distinguish runtime-only CLI or daemon starts from durable app-server RPC changes. ## What Changed - Replace the parallel enablement, seed, and rehydration flags with one transport-owned `RemoteControlDesiredState`. - Add nullable enrollment-scoped persistence and preserve existing preferences during enrollment upserts. - Rehydrate plain startup only after auth and client scope resolve, without overwriting a concurrent RPC transition. - Make ordinary `remoteControl/enable` and `remoteControl/disable` durable while retaining `ephemeral: true` for runtime-only callers. - Have the daemon explicitly request ephemeral enablement and regenerate the app-server schemas. ## Verification - Covered migration and `NULL`/`0`/`1` persistence round trips. - Covered plain-start rehydration and runtime-only versus durable enrollment seeding. - Covered durable enable, durable disable, and ephemeral enable through app-server RPC. - Covered the daemon's exact `{ "ephemeral": true }` request payload. Related issue: N/A (internal remote-control persistence architecture change).Anton Panasenko ·
2026-06-11 21:28:52 -07:00 -
feat: add Bedrock API key as a managed auth mode (#27443)
## Why Codex needs to manage Amazon Bedrock API key credentials through the existing auth lifecycle instead of introducing a separate auth manager or provider-specific credential file. Treating Bedrock API key login as a primary auth mode gives it the same persistence, keyring, reload, and logout behavior as the existing OpenAI API key and ChatGPT modes. The credential is valid only for the `amazon-bedrock` model provider. OpenAI-compatible providers must reject this auth mode rather than treating the Bedrock key as an OpenAI bearer token. ## What changed - Added `bedrockApiKey` as an app-server `AuthMode` and `CodexAuth::BedrockApiKey` as a primary `AuthManager` mode. - Added `BedrockApiKeyAuth`, containing the API key and AWS region, to the existing `AuthDotJson` payload stored in `$CODEX_HOME/auth.json` or the configured keyring backend. - Added `login_with_bedrock_api_key(...)`, parallel to `login_with_api_key(...)`, which replaces the current stored login with Bedrock credentials. - Reused generic auth reload and logout behavior instead of adding a Bedrock-specific auth manager or logout path. - Updated login restrictions, status reporting, diagnostics, telemetry classification, generated app-server schemas, and auth fixtures for the new mode. - Added explicit errors when Bedrock API key auth is selected with an OpenAI-compatible model provider. This PR establishes managed storage and auth-mode behavior. Routing the managed key and region into Amazon Bedrock requests will be in follow-up PRs.
Celia Chen ·
2026-06-10 20:42:38 -07:00 -
fix(remote-control): preserve enrollment on generic websocket 404s (#26741)
## Why A remote-control WebSocket handshake can receive a generic HTTP 404 when an intermediary routes the request without preserving the WebSocket upgrade. Treating every 404 as proof that the remote app server is gone clears valid enrollment and causes repeated re-enrollment, new environment and server IDs, Habitat churn, and noisy `/server/enroll` traffic. ## What Changed - Clear enrollment only when a 404 JSON response explicitly contains `{"detail":"Remote app server not found"}`. - Preserve enrollment for empty, plain-text, malformed, or otherwise unrecognized 404 responses, return the transport error, and retry with the existing reconnect backoff. - Log the status, correlation headers (`request-id` or `x-oai-request-id`, plus `cf-ray`), and bounded/redacted response body for unrecognized 404s. - Cover both explicit missing-server re-enrollment and generic 404 enrollment preservation/reconnect behavior. ## Verification `just test -p codex-app-server-transport` passes all 114 tests on the rebased branch, including the targeted explicit and generic WebSocket 404 scenarios. Related issue: N/AAnton Panasenko ·
2026-06-05 22:54:57 -07:00 -
[codex-rs] support v2 personal access tokens (#25731)
## Summary - add v2 personal access token support for `codex login --with-access-token` and `CODEX_ACCESS_TOKEN` - classify opaque `at-` tokens separately from legacy Agent Identity JWTs - hydrate required ChatGPT account metadata through AuthAPI `/v1/user-auth-credential/whoami` - use PATs directly as bearer tokens while preserving existing ChatGPT account surfaces - expose PAT-backed auth as the explicit `personalAccessToken` app-server auth mode ## Implementation PAT auth is intentionally small and stateless. Loading a PAT performs one AuthAPI metadata request, stores the hydrated metadata in the in-memory auth object, and redacts the secret from debug output. Legacy Agent Identity JWT handling remains unchanged. The shared access-token classifier lives in a private neutral module because it dispatches between both credential types. PAT hydration fails closed when AuthAPI omits any required metadata, including email. Hydrated metadata is intentionally not persisted: startup performs a live `whoami` preflight so revoked tokens or changed account metadata are not accepted from a stale cache. ## Workspace restriction scope This change intentionally does **not** apply `forced_chatgpt_workspace_id` to PAT authentication. The setting is a client-side config guardrail, not an authorization boundary, and PAT does not currently require workspace-ID parity. The PAT login and `CODEX_ACCESS_TOKEN` paths therefore validate through AuthAPI without threading workspace-restriction state through access-token loading. Existing workspace checks for non-PAT auth remain on their established paths. ## App-server compatibility The public app-server `AuthMode` is shared across v1 and v2, and PAT-backed auth reports `personalAccessToken` through both APIs. Following human review, this intentionally removes the temporary v1 compatibility mapping that reported PATs as `chatgpt`; the deprecated v1 API is kept in parity with v2 rather than maintaining a separate closed enum. Clients with exhaustive auth-mode handling in either API version must add the new case and should generally treat it as ChatGPT-backed unless they need PAT-specific behavior. The v1 auth-status response still omits the raw PAT when `includeToken` is requested because that response cannot carry the account metadata needed to reuse the credential safely. Persisted PAT auth also omits the new enum value so older Codex builds can deserialize `auth.json` and infer PAT auth from the credential field after a rollback. ## Validation Latest review-fix validation: - `CARGO_INCREMENTAL=0 just test -p codex-login` (126 passed) - `CARGO_INCREMENTAL=0 just test -p codex-cli` (263 passed) - `CARGO_INCREMENTAL=0 just test -p codex-cli stored_auth_validation_handles_personal_access_token` - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol` (226 passed) - `CARGO_INCREMENTAL=0 just test -p codex-models-manager refresh_available_models_uses_remote_only_catalog_for_chatgpt_auth` - `CARGO_INCREMENTAL=0 just test -p codex-tui existing_non_oauth_chatgpt_login_counts_as_signed_in` - `CARGO_INCREMENTAL=0 just fix -p codex-login -p codex-app-server-protocol -p codex-models-manager -p codex-tui -p codex-cli` - `just fmt` - `git diff --check` The broader `codex-tui` suite previously compiled and ran 2,834 tests. Three unrelated environment-sensitive guardian/IDE-socket tests failed after retries; the PAT-relevant TUI coverage passed.
cooper-oai ·
2026-06-05 17:36:18 -07:00 -
feat(remote-control): add pairing status transport (#26449)
## What Adds transport support for checking remote-control pairing status against the backend. - Adds the normalized `server/pair/status` backend URL. - Adds backend request/response structs for exactly one lookup key: `pairing_code` or `manual_pairing_code`, returning `{ claimed }`. - Adds `RemoteControlEnrollment::pairing_status` and `RemoteControlHandle::pairing_status`. - Preserves auth refresh/retry behavior and backend error mapping. - Adds transport coverage for pending, claimed, manual-code payloads, token refresh, mapped backend errors, malformed responses, and URL normalization. ## Why Desktop needs a host-authenticated way to poll whether a QR or manual pairing code has been claimed. Related backend change: https://github.com/openai/openai/pull/990244 ## Verification - `cargo test --manifest-path app-server-transport/Cargo.toml remote_control::tests::pairing_tests` - `cargo fmt --all --check` - `git diff --check`hefuc-oai ·
2026-06-05 10:07:25 -07:00 -
feat(remote-control): allow pairing while disabled (#26215)
## Why `remoteControl/pairing/start` creates authorization for future remote-control connections, so it should not require the live websocket to already be enabled. Requiring enable first made pairing depend on presence instead of the persisted server enrollment that pairing actually uses. Pairing also needs to recover when that persisted server row is stale. If `/server/pair` returns `404`, making the first pairing attempt fail forces a manual retry even though the client can clear the stale row and create a replacement enrollment immediately. ## What Changed - Allow `remoteControl/pairing/start` to reuse or create the persisted remote-control server enrollment while remote control is disabled. - Keep the selected in-memory enrollment across disable and share it with websocket connect so a later enable uses the same selected server. - Thread the app-server client name through pairing so stdio persistence keeps using the websocket-owned enrollment key. - Recover pairing server-token auth failures through the existing refresh/auth-recovery path. - Recover stale pairing enrollment on `/server/pair` `404` by clearing the stale selected enrollment, re-enrolling once, and retrying pairing once. - Add focused disabled-pairing and stale-pairing recovery coverage. ## Verification - `remote_control_pairing_start_returns_pairing_artifacts_while_disabled` exercises pairing before enable. - `remote_control_handle_reenrolls_after_stale_pairing_enrollment` exercises stale `/server/pair` `404` recovery without a manual retry. Related: N/A
Anton Panasenko ·
2026-06-05 05:12:23 +00:00 -
feat(app-server): add remote control client management RPCs (#25785)
## Why Remote-control clients need to list and revoke controller-device grants without enabling or enrolling the local relay. These are signed-in account-management operations, so coupling them to websocket, pairing, enrollment, or persisted relay state would prevent clients from managing stale grants from the picker. Related enhancement request: N/A. This adds the Codex app-server surface for the planned upstream environment-scoped revoke endpoint. ## What Changed - Added experimental app-server v2 RPCs: - `remoteControl/client/list` - `remoteControl/client/revoke` - Added picker-oriented protocol types and standard generated schema fixtures. The list response intentionally omits backend account id, enrollment status, and location fields. - Added `app-server-transport/src/transport/remote_control/clients.rs` for environment-scoped GET and DELETE requests. It builds escaped URL path segments, forwards optional pagination query fields, sends ChatGPT auth plus `chatgpt-account-id`, converts RFC3339 `last_seen_at` values to Unix seconds, accepts `204 No Content` revoke responses, and retries once after a `401`. - Extracted shared ChatGPT auth loading and recovery into `app-server-transport/src/transport/remote_control/auth.rs` so websocket, pairing, and client management use the same account-auth boundary. - Retained the configured remote-control base URL on `RemoteControlHandle` and resolve management URLs lazily, preserving deferred validation while relay startup is disabled. - Registered list as `global_shared_read("remote-control-clients")` and revoke as `global("remote-control-clients")`. ## Verification - Added transport coverage proving list and revoke work while relay state is disabled, IDs are escaped, picker-only fields are returned, timestamps are converted, revoke accepts `204`, auth headers are forwarded, `401` retries exactly once, `403` is not retried, and malformed list payloads retain decode context. - Added an app-server integration test proving both JSON-RPC methods work before relay enablement and successful revoke returns `{}`. - Regenerated and validated experimental and standard app-server schema fixtures.Anton Panasenko ·
2026-06-02 17:01:02 -07:00 -
feat(remote-control): add pairing start (#25675)
## Why Remote control enrollment authorizes a desktop server, but app-server v2 did not expose the follow-up pairing operation needed to mint a short-lived controller pairing artifact from that enrolled server. Clients need a narrow RPC that starts pairing without exposing the backend `serverId` or conflating pairing with websocket connection state. Issue: N/A; internal remote-control pairing API change. ## What Changed Added experimental app-server v2 `remoteControl/pairing/start` with `manualCode` input and `pairingCode`, nullable `manualPairingCode`, `environmentId`, and Unix-seconds `expiresAt` output. The method serializes under its own `global("remote-control-pairing")` scope and is documented in `app-server/README.md`. Extended the remote-control transport with private `/server/pair` request/response types and normalized `pair_url` handling. Pairing uses the current enrolled server bearer, refreshes that bearer when needed, keeps backend `server_id` private, validates returned `server_id` and `environment_id` against the current enrollment, and preserves backend status/header/body context for failures and malformed responses. Wired the request through `RemoteControlRequestProcessor` and `MessageProcessor`, mapping unavailable/disabled pairing to `invalid_request` and backend failures to internal errors. ## Verification - `just test -p codex-app-server-transport` - `just test -p codex-app-server remote_control_pairing_start_returns_pairing_artifacts`Anton Panasenko ·
2026-06-02 01:05:50 +00:00 -
feat(app-server): migrate remote control to server tokens (#24141)
## Why `codex-backend` now authenticates remote-control server websocket connections with short-lived server tokens instead of the user's ChatGPT access token. `app-server` needs to mint and refresh those server tokens without persisting them, so a restart can reconnect from durable enrollment identity while keeping the bearer token memory-only. ## What Changed Updated the remote-control transport to consume `remote_control_token` and `expires_at` from server enroll responses and added `/server/refresh` support for persisted enrollments or expiring cached tokens. Websocket handshakes now send `Authorization: Bearer <remote_control_token>` with the existing server identity headers, and no longer send the ChatGPT bearer token or `chatgpt-account-id` on that websocket path. The in-memory enrollment state now owns the ephemeral server token cache, while SQLite still persists only `server_id`, `environment_id`, and `server_name`. Websocket `401`/`403` clears only the cached token for refresh on reconnect; websocket or refresh `404` clears stale persisted enrollment and re-enrolls. Response body previews redact `remote_control_token` before surfacing parse errors. ## Verification - `just test -p codex-app-server-transport` - Manual prod smoke with an isolated `CODEX_HOME`: `codex remote-control --json -c 'chatgpt_base_url="https://chatgpt.com/backend-api"'` reached `status:"connected"` with `environmentId:"env_i_6a17d9f1d764832986da2e80f4554f1b"`.
Anton Panasenko ·
2026-05-28 15:57:08 -07:00 -
Uprev Rust toolchain pins to 1.95.0 (#24684)
## Summary - Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across Cargo, Bazel, CI, release workflows, devcontainers, and the Codex environment config. - Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts match the new version. - Leave purpose-specific toolchains unchanged, including the `argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0` build pin. - Includes fixes for new lints from `just fix` and a few codex-authored fixes for lints without a suggestion.
Adam Perry @ OpenAI ·
2026-05-26 20:59:47 -07:00 -
fix(remote-control): surface websocket task stalls (#24473)
## Why When the app-server remote-control websocket path stalls during connection setup or teardown, the existing logs do not show where the task stopped, and several awaits can keep the task from returning promptly. That makes offline or stale-host incidents hard to distinguish from expected shutdown or disable flow. Issue: N/A (internal incident investigation) ## What Changed Added structured lifecycle and status logging around remote-control enable/disable requests, websocket task startup and exit, connection cycles, enrollment context, and status/environment transitions. Bound websocket connect, transport-event forwarding, and connection-worker shutdown waits. On timeout, the code logs the stalled operation and stops or aborts workers so the loop can reconnect or exit instead of waiting indefinitely. Ping sends now also observe shutdown cancellation.
Anton Panasenko ·
2026-05-26 13:17:58 -07:00 -
fix(remote-control): cap reconnect backoff (#24164)
## Why Remote-control websocket reconnects currently use the shared exponential backoff helper without a local ceiling, so a long failure streak can stretch retries out indefinitely and leave the runtime behavior hard to inspect from logs. ## What Changed Cap the remote-control reconnect delay at 30 seconds, then reset the reconnect attempt counter once that capped delay is emitted so the next failure starts from the initial jittered delay again. The reconnect failure log now records the attempt number, chosen delay, and whether the cap triggered a reset, with a separate info log when the backoff counter is reset after the cap. ## Verification `just test -p codex-app-server-transport` Related issue: N/A
Anton Panasenko ·
2026-05-23 00:38:22 +00:00 -
fix(remote-control): retry after auth recovery (#23775)
## Why When remote control hits an auth failure such as a revoked or reused refresh token, the websocket loop falls into reconnect backoff. If the user fixes auth while that loop is sleeping, remote control can stay offline until the old retry timer expires because nothing wakes the loop or resets its exhausted auth recovery state. ## What Changed Added an auth-change watch on `AuthManager` for refresh-relevant cached auth updates. The remote-control websocket loop now subscribes to that signal, resets `UnauthorizedRecovery` and reconnect backoff when auth changes, and retries immediately instead of waiting for the previous delay. Updated the remote-control transport test to verify that reloading auth with the now-available account id wakes enrollment before the prior retry delay. ## Verification `cargo test -p codex-app-server-transport remote_control_waits_for_account_id_before_enrolling`
Anton Panasenko ·
2026-05-21 14:38:30 -07:00 -
fix: serialize unix app-server startup (#23516)
# Summary Unix-socket app-server startup can currently race when multiple launch attempts target the same `CODEX_HOME`. Those processes can overlap before the control socket exists, which lets them enter SQLite state initialization concurrently and reproduce the startup corruption pattern seen in SSH mode. This change makes the app-server own that singleton startup guarantee. Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before SQLite initialization, runs the existing control-socket preparation check while holding that lock, returns the established `AddrInUse` error when another live listener already owns the socket, and releases the lock once the new listener has bound its socket. # Design decisions - The singleton rule lives in `app-server --listen unix://`, not in a desktop-only caller path, so every Unix-socket launch gets the same race protection. - A duplicate raw app-server launch returns an error instead of silently succeeding. The attach operation remains `app-server proxy`, which continues to connect to an already-running listener. - The lock is held only across the dangerous startup window: socket preparation, SQLite initialization, and socket bind. It is not held for the app-server lifetime. - Listener detection stays in `prepare_control_socket_path(...)`, so the preexisting live-listener and stale-socket behavior remains the single source of truth. # Testing Tests: targeted Unix-socket transport tests on the branch checkout, full `codex-cli` build on `efrazer-db10`, and an SSH-style smoke on `efrazer-db10` covering concurrent app-server starts, explicit duplicate-start errors, and absence of SQLite startup-error matches in launch logs.
efrazer-oai ·
2026-05-19 14:57:11 -07:00 -
feat(app-server): update remote control APIs for better UX (#22877)
## Why To help improve `codex remote-control` CLI UX which I plan to do in a followup, this PR adds `server-name` to the various remote control APIs: - `remoteControl/enable` - `remoteControl/disable` - `remoteControl/status/changed` Also, add a `remoteControl/status/read` API. This will be helpful in the Codex App.
Owen Lin ·
2026-05-15 14:33:24 -07:00 -
enable/disable remote control at runtime, not via features (#22578)
## Why reapplies https://github.com/openai/codex/pull/22386 which was previously reverted Also, introduce `remoteControl/enable` and `remoteControl/disable` app-server APIs to toggle on/off remote control at runtime for a given running app-server instance. ## What Changed - Adds experimental v2 RPCs: - `remoteControl/enable` - `remoteControl/disable` - Adds `RemoteControlRequestProcessor` and routes the new RPCs through it instead of `ConfigRequestProcessor`. - Adds named `RemoteControlHandle::enable`, `disable`, and `status` methods. - Makes `remoteControl/enable` return an error when sqlite state DB is unavailable, while keeping enrollment/websocket failures as async status updates. - Adds `AppServerRuntimeOptions.remote_control_enabled` and hidden `--remote-control` flags for `codex app-server` and `codex-app-server`. - Updates managed daemon startup to use `codex app-server --remote-control --listen unix://`. - Marks `Feature::RemoteControl` as removed and ignores `[features].remote_control`. - Updates app-server README entries for the new remote-control methods.
Owen Lin ·
2026-05-14 01:07:46 +00:00 -
Restore app-server websocket listener with auth guard (#22404)
## Why PR #21843 removed the TCP websocket app-server listener, but that also removed functionality that still needs to exist. Restoring it as-is would reopen the old remote exposure problem, so this keeps the restored listener while making remote and non-loopback usage require explicit auth. ## What Changed - Mostly reverts #21843 and reapplies the small merge-conflict resolutions needed on top of current main. - Restores ws://IP:PORT parsing, the app-server TCP websocket acceptor, websocket auth CLI flags, and the associated tests. - The only intentional behavior change from the restored code is that non-loopback websocket listeners now fail startup unless --ws-auth capability-token or --ws-auth signed-bearer-token is configured. Loopback listeners remain available for local and SSH-forwarding workflows. ## Reviewer Focus Please focus review on the small auth-enforcement delta layered on top of the revert: - codex-rs/app-server-transport/src/transport/websocket.rs: start_websocket_acceptor now rejects unauthenticated non-loopback websocket binds before accepting connections. - codex-rs/app-server-transport/src/transport/auth.rs: helper logic classifies unauthenticated non-loopback listeners. - codex-rs/app-server/tests/suite/v2/connection_handling_websocket.rs: tests cover unauthenticated ws://0.0.0.0 startup rejection and authenticated non-loopback capability-token startup. Everything else is intended to be revert/merge-conflict restoration rather than new product behavior. ## Verification - Manually verified that TUI remoting is restored and that auth is enforced for non-localhost urls.
Eric Traut ·
2026-05-12 18:40:53 -07:00 -
app-server: remove TCP websocket listener (#21843)
## Why The app-server no longer needs to expose a TCP websocket listener. Keeping that transport also kept around a separate listener/auth surface that is unnecessary now that local clients can use stdio or the Unix-domain control socket, while remote connectivity is handled by `remote_control`. ## What Changed - Removed `ws://IP:PORT` parsing and the `AppServerTransport::WebSocket` startup path. - Deleted the app-server websocket listener auth module and removed related CLI flags/dependencies. - Kept websocket framing only where it is still needed: over the Unix-domain control socket and in the outbound `remote_control` connection. - Updated app-server CLI/help text and `app-server/README.md` to document only `stdio://`, `unix://`, `unix://PATH`, and `off` for local transports. - Converted affected app-server integration coverage from TCP websocket listeners to UDS-backed websocket connections, and added a parse test that rejects `ws://` listen URLs. - Removed the now-unused workspace `constant_time_eq` dependency and refreshed `Cargo.lock` after `cargo shear` caught the drift. - Moved test app-server UDS socket paths to short Unix temp paths so macOS Bazel test sandboxes do not exceed Unix socket path limits. ## Verification - Added/updated tests around UDS websocket transport behavior and `ws://` listen URL rejection. - `cargo shear` - `cargo metadata --no-deps --format-version 1` - `cargo test -p codex-app-server unix_socket_transport` - `cargo test -p codex-app-server unix_socket_disconnect` - `just fix -p codex-app-server` - `git diff --check` Local full Rust test execution was blocked before compilation by an external fetch failure for the pinned `nornagon/crossterm` git dependency. `just bazel-lock-update` and `just bazel-lock-check` were retried after the manifest cleanup but remain blocked by external BuildBuddy/V8 fetch timeouts.
Ruslan Nigmatullin ·
2026-05-11 10:17:26 -07:00 -
feat: Use installation ID in remote enrollments (#21662)
* Pass installation ID for storage on enrollments server for deduping/grouping multiple appservers per installation * Pass installation ID in remoteControl/status/changed events
David de Regt ·
2026-05-08 17:54:01 +00:00 -
Disable empty Cargo test targets (#21584)
## Summary `cargo test` has entails both running standard Rust tests and doctests. It turns out that the doctest discovery is fairly slow, and it's a cost you pay even for crates that don't include any doctests. This PR disables doctests with `doctest = false` for crates that lack any doctests. For the collection of crates below, this speeds up test execution by >4x. E.g., before this PR: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 1.849 s ± 4.455 s [User: 0.752 s, System: 1.367 s] Range (min … max): 0.418 s … 14.529 s 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 428.6 ms ± 6.9 ms [User: 187.7 ms, System: 219.7 ms] Range (min … max): 418.0 ms … 436.8 ms 10 runs ``` For a single crate, with >2x speedup, before: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 491.1 ms ± 9.0 ms [User: 229.8 ms, System: 234.9 ms] Range (min … max): 480.9 ms … 512.0 ms 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 213.9 ms ± 4.3 ms [User: 112.8 ms, System: 84.0 ms] Range (min … max): 206.8 ms … 221.0 ms 13 runs ``` Co-authored-by: Codex <noreply@openai.com>
Charlie Marsh ·
2026-05-07 15:44:17 -07:00 -
Ruslan Nigmatullin ·
2026-05-07 09:01:44 -07:00 -
app-server: move transport into dedicated crate (#20545)
## Why `codex-app-server` currently owns both request-processing code and transport implementation details. Splitting the transport layer into its own crate makes that boundary explicit, reduces the amount of transport-specific dependency surface carried by `codex-app-server`, and gives future transport work a narrower place to evolve. ## What changed - Added `codex-app-server-transport` and moved the existing transport tree into it, including stdio, unix socket, websocket, remote-control transport, and websocket auth. - Moved shared transport-facing message types into the new crate so both the transport implementation and `codex-app-server` use the same definitions. - Kept processor-facing connection state and outbound routing in `codex-app-server`, with the routing tests moved next to that local wrapper. - Updated workspace metadata, Bazel crate metadata, and `codex-app-server` dependencies for the new crate boundary. ## Validation - `cargo metadata --locked --no-deps` - `git diff --check` - Attempted `cargo test -p codex-app-server-transport`, `cargo test -p codex-app-server`, `just fix -p codex-app-server-transport`, and `just fix -p codex-app-server`; all were blocked before compilation by the existing `packageproxy` resolution failure for locked `rustls-webpki = 0.103.13`. - Attempted Bazel build / lockfile validation; those were blocked by external fetch failures against BuildBuddy / GitHub while resolving `v8`.
Ruslan Nigmatullin ·
2026-05-01 09:23:47 -07:00