mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
dev
71 Commits
-
[codex] Enable remote plugins by default (#30297)
## Summary - enable the remote plugin feature by default - promote the remote plugin feature from under development to stable - preserve the existing `features.remote_plugin` override for explicitly disabling it - keep legacy disabled-path coverage explicit in TUI and app-server tests ## Impact Remote plugin functionality is enabled by default for configurations that do not set the feature flag. The existing Codex backend authentication gate still applies. ## Validation - `just fmt` - `just test -p codex-features` - `just test -p codex-tui plugins_popup_remote_section_fallback_states_snapshot` - targeted `codex-app-server` plugin-list and skills-list tests - `git diff --check` The full TUI and app-server suites were also exercised locally. All remote-plugin-related coverage passed; unrelated local sandbox/test-binary failures remain outside this change.
xl-openai ·
2026-06-28 11:46:25 -07:00 -
feat(app-server): list descendant threads by ancestor (#29591)
## Why `thread/list` can filter direct children with `parentThreadId`, but clients cannot request an entire spawned subtree. Discovering every descendant requires repeated client-side requests and gives up the database's existing filtering and pagination path. ## What changed Experimental clients can use `ancestorThreadId` to return strict descendants at any depth while `parentThreadId` retains its direct-child meaning. The filters are mutually exclusive, the ancestor is excluded, and every result preserves its immediate `parentThreadId` so callers can reconstruct the tree. ## How it works - **Explicit relationship:** Internal list parameters distinguish direct children from transitive descendants without changing the meaning of `parentThreadId`. - **Existing graph:** Persisted parent-child spawn edges remain the source of truth, so descendant lookup needs no schema migration or ancestry cache. - **Indexed traversal:** A recursive SQLite query starts from the parent-edge index, walks each generation, and applies thread filters, sorting, and cursor pagination in the same database request. - **Reconstructable results:** The response stays flat and normally ordered while carrying each descendant's immediate parent. ## Verification Ran 550 tests across the protocol, state, rollout, and thread-store crates, then reran the four focused state, store, and app-server descendant-listing tests after the final diff reduction. Scoped Clippy and formatting checks passed. Stable and experimental schema generation was checked; the stable fixtures remain unchanged while the experimental schema includes the new field.
Brent Traut ·
2026-06-24 13:08:14 -07:00 -
Separate local and remote plugin analytics IDs (#29495)
## Why Plugin analytics overloaded `plugin_id`: most events used the Codex `<plugin>@<marketplace>` identity, while remote install events used the backend plugin ID. That makes the same field change meaning across event types and complicates downstream identity resolution. This change makes the contract unambiguous: - `plugin_id`: the local Codex `<plugin>@<marketplace>` identity, when resolved - `remote_plugin_id`: the backend plugin identity, when available For a remote install failure that happens before plugin details resolve, `plugin_id` is `null` and `remote_plugin_id` remains populated. ## What changed All six plugin analytics events use the same identity contract: - `codex_plugin_installed` - `codex_plugin_install_failed` - `codex_plugin_uninstalled` - `codex_plugin_enabled` - `codex_plugin_disabled` - `codex_plugin_used` Remote identity is resolved from the current installed-plugin snapshot first, with persisted install metadata as fallback. The telemetry metadata type keeps local identity optional for failures that occur before remote details are available. The app-server test client's manual analytics smokes now find remote mutation events through `remote_plugin_id` and validate that `plugin_id` remains local. ## Remote uninstall Resolve and capture telemetry metadata before removing the local plugin cache, then emit `codex_plugin_uninstalled` after the backend confirms success. The event is also emitted when backend uninstall succeeds but local cache cleanup reports `CacheRemove`. If a concurrent remote-cache refresh removes the local bundle before telemetry capture, the already-fetched remote plugin detail supplies fallback capability metadata. ## Validation - `just test -p codex-analytics` — 82 passed - `just test -p codex-core-plugins` — 271 passed - `just test -p codex-app-server-test-client` — 5 passed - `just test -p codex-plugin` — 3 passed - `just test -p codex-app-server plugin_install` — 37 passed - `just test -p codex-app-server plugin_uninstall` — 10 passed The production app-server install/uninstall flow was also exercised against `plugins~Plugin_f1b845ac33888191ac156169c58733c2` (`build-ios-apps@openai-curated-remote`), and the plugin's original uninstalled state was restored.
jameswt-oai ·
2026-06-23 12:27:14 -07:00 -
[codex] handle request_user_input in app-server test client (#29476)
## Why `codex-app-server-test-client` previously treated `item/tool/requestUserInput` as an unsupported server request and terminated the connection. That made it impossible to use the client for end-to-end testing of interactive turns: an operator could observe the request, but could not answer it and confirm that the same turn resumed. ## What changed - Handle `ToolRequestUserInput` server requests in the test client's central request dispatcher. - Render numbered terminal choices, accept exact option labels, support free-form `Other` and text-only questions, and collect multiple answers. - Send a protocol-native `ToolRequestUserInputResponse` and continue streaming the active turn. - Fail clearly when interactive input is requested without a terminal. - Document the interactive behavior and add focused tests for option selection, free-form answers, multiple questions, and invalid-selection retries. ## Testing - `just test -p codex-app-server-test-client` - `just bazel-lock-check` - Manually exercised the app-server flow, selected `TUI`, observed `serverRequest/resolved`, and verified that the same turn completed with the selected answer.
Celia Chen ·
2026-06-22 13:55:32 -07:00 -
Support
openai/formextended form elicitations (#27500)# Summary Allow App Server clients to opt into `openai/form` MCP elicitations.
Gabriel Peal ·
2026-06-18 11:54:49 -07:00 -
unified-exec: retain PathUri in command events (#28780)
## Why App-server must report command events containing foreign-platform paths without changing existing client or rollout path-string formats. ## What changed - retain `PathUri` through exec command begin/end events - convert cwd values to `LegacyAppPathString` at the app-server compatibility boundary - drop command actions with foreign paths and log them - serialize rollout-trace cwd values using their inferred native path representation - restore Wine coverage for retained Windows cwd values and successful completion
Adam Perry @ OpenAI ·
2026-06-18 05:00:04 +00:00 -
Scope command approvals by execution environment (#28738)
## Why Command approval cache keys included the command and working directory, but not the execution environment. An approval for `/workspace` locally could therefore be reused for the same command and path on an executor. ## What changed - Include the selected environment ID in shell and unified-exec approval cache keys. - Carry that ID through the normal command approval request so clients can show which environment is being approved. - Expose the environment through app-server as a required nullable `environmentId` and show it in the inline TUI approval prompt. - Keep older recorded approval events compatible when the environment is absent. For example, `echo ok` in local `/workspace` and `echo ok` in executor `/workspace` now produce different approval keys and separate prompts. ## Scope This PR does not change network approvals, Guardian review actions, MCP elicitation, full-screen TUI rendering, or environment-ID validation. Remote `shell_command` execution itself remains in #28722; this PR only makes its approval key environment-aware.
jif ·
2026-06-17 19:52:43 +02:00 -
[codex-app-server-test-client] Plugin Install/Uninstall Analytics Smoke Test (#27100)
## This PR The original [combined remote plugin analytics PR #26281](https://github.com/openai/codex/pull/26281) mixed reusable analytics test infrastructure, two manual smoke workflows, a metadata refactor, and the final identity behavior. This PR adds the account-mutating validation workflow separately so its cleanup and recovery guarantees can be reviewed without the final analytics behavior change. - Add a manually invoked remote plugin install/uninstall smoke workflow. - Require explicit account-mutation confirmation and an initially uninstalled plugin. - Validate the current `codex_plugin_installed` contract, where `plugin_id` is the backend ID. - Restore and verify the original uninstalled state, with a dedicated recovery command. This baseline intentionally does not require `codex_plugin_uninstalled`, because production does not emit that event yet. The final PR will update this smoke to require local `plugin_id`, `remote_plugin_id`, and uninstall emission. Review this PR as the net diff against #27099. ## Testing - `just test -p codex-app-server-test-client` (3 focused capture/validation tests passed) - The live workflow was previously exercised on the green combined reference branch, and the original uninstalled account state was restored. - CI is green across the required platform matrix. ## Split Overview ```text main ├── #27093 Debug analytics capture │ └── #27099 Non-mutating plugin smoke │ └── #27100 Remote install/uninstall smoke ← you are here └── #27102 Plugin telemetry metadata refactor After #27093, #27099, #27100, and #27102 merge: └── Final PR: add remote_plugin_id to plugin analytics ``` Review order and dependencies: 1. [#27093 Add debug-only analytics event capture](https://github.com/openai/codex/pull/27093) (based on `main`) 2. [#27099 Add a plugin analytics smoke workflow](https://github.com/openai/codex/pull/27099) (stacked on #27093) 3. [#27100 Add a remote plugin analytics mutation smoke workflow](https://github.com/openai/codex/pull/27100) **(this PR, stacked on #27099)** 4. [#27102 Centralize plugin telemetry metadata construction](https://github.com/openai/codex/pull/27102) (independent, based on `main`) 5. Final remote-ID behavior PR (created after PRs 1-4 merge) The original [#26281](https://github.com/openai/codex/pull/26281) remains open as the green aggregate reference until the final PR is published.
jameswt-oai ·
2026-06-16 12:28:45 -07:00 -
[codex-app-server-test-client & codex-app-server] Plugin Usage Analytics Smoke Test (#27099)
## This PR The original [combined remote plugin analytics PR #26281](https://github.com/openai/codex/pull/26281) mixed reusable analytics test infrastructure, two manual smoke workflows, a metadata refactor, and the final identity behavior. This PR establishes a non-mutating end-to-end plugin smoke workflow before any analytics identity semantics change. - Add `plugin-analytics-smoke` to the existing app-server test client. - Exercise plugin disable, enable, and use through production app-server RPC paths. - Isolate config writes in a temporary file and use a loopback Responses API server. - Capture analytics without sending them to the production analytics backend. - Validate the current local `plugin_id`, names, capability metadata, thread, turn, and model fields. This is intentionally a baseline smoke workflow. It does not assert `remote_plugin_id`; the final PR will update it when that field exists. Review this PR as the net diff against #27093. ## Testing - The test-client target compiles successfully. - The combined reference branch exercised the manual smoke against the live remote plugin service. - CI is green across the required platform matrix. ## Split Overview ```text main ├── #27093 Debug analytics capture │ └── #27099 Non-mutating plugin smoke ← you are here │ └── #27100 Remote install/uninstall smoke └── #27102 Plugin telemetry metadata refactor After #27093, #27099, #27100, and #27102 merge: └── Final PR: add remote_plugin_id to plugin analytics ``` Review order and dependencies: 1. [#27093 Add debug-only analytics event capture](https://github.com/openai/codex/pull/27093) (based on `main`) 2. [#27099 Add a plugin analytics smoke workflow](https://github.com/openai/codex/pull/27099) **(this PR, stacked on #27093)** 3. [#27100 Add a remote plugin analytics mutation smoke workflow](https://github.com/openai/codex/pull/27100) (stacked on this PR) 4. [#27102 Centralize plugin telemetry metadata construction](https://github.com/openai/codex/pull/27102) (independent, based on `main`) 5. Final remote-ID behavior PR (created after PRs 1-4 merge) The original [#26281](https://github.com/openai/codex/pull/26281) remains open as the green aggregate reference until the final PR is published.
jameswt-oai ·
2026-06-16 10:11:41 -07:00 -
Expose explicit dynamic tool namespaces in thread start (#27371)
Stacked on #27365. ## Stack note [#27365](https://github.com/openai/codex/pull/27365) kept `thread/start` unchanged and converted its input in `thread_processor`. This PR updates `thread/start` to accept explicit functions and namespaces directly. Legacy per-tool arrays are still accepted and converted while reading the request. As a result, `thread_processor` can validate and pass the tools through directly, which is why some code added in #27365 is removed here. ## Why `thread/start.dynamicTools` still repeats namespace data on each function even though core now stores explicit namespace groups. The request API should use the same shape so each namespace has one description and one member list. ## What changed - Accept top-level functions and explicit namespace objects in `dynamicTools`. - Continue accepting fully legacy flat arrays, including `exposeToContext`. - Reject arrays that mix legacy and canonical entries. - Reuse the protocol types directly and remove the temporary app-server adapter. - Update validation, docs, the test client, and generated schemas. ## Test plan - `just test -p codex-app-server-protocol` - `just test -p codex-app-server dynamic_tool_call_round_trip_sends_text_content_items_to_model` - `just test -p codex-app-server thread_start_normalizes_legacy_dynamic_tools_into_model_request` - `just test -p codex-app-server thread_start_rejects_mixed_dynamic_tool_formats` - `just test -p codex-app-server thread_start_rejects_hidden_dynamic_tools_without_namespace`
sayan-oai ·
2026-06-15 15:35:57 +00:00 -
feat(app-server): filter threads by parent (#26662)
## Why Clients that display or coordinate spawned subagents need an authoritative snapshot of a thread's immediate spawned children when they connect to app-server or recover after missing live events. `thread/list` cannot query by parent, so clients must otherwise scan unrelated threads or reconstruct relationships from rollout history and transient events. The direct spawn relationship already exists in persisted `thread_spawn_edges` state. Review and Guardian threads do not participate in that lifecycle and are intentionally outside this filter's scope. ## What changed This adds an experimental `parentThreadId` filter to `thread/list`. Parent-filtered requests return direct spawned children from persisted state while preserving the existing response shape, explicit filters, sorting, and timestamp-only cursor behavior. The lookup does not read rollout transcripts or recursively return descendants. Supersedes #25112 with the narrower `thread/list` filter approach. ## How it works 1. An experimental client passes a valid thread ID as `parentThreadId`. 2. App-server routes the list through the existing thread-store and state-database boundaries. 3. SQLite selects threads whose IDs have a direct persisted spawn edge from that parent. 4. Omitted provider and source filters include all values; explicit filters keep ordinary `thread/list` semantics. 5. Grandchildren, Review threads, and Guardian threads are excluded. ## Verification State (144 tests), rollout (69 tests), and focused app-server thread-list (31 tests) suites passed. Scoped Clippy checks and repository formatting also passed. Coverage includes direct spawned children, omitted grandchildren, pagination, malformed IDs, mixed source kinds, explicit filters, and operation without rollout files.
Brent Traut ·
2026-06-14 00:14:26 -07:00 -
[codex] Add user input client ids (#24653)
## Summary Adds an optional `clientId` field to app-server v2 `UserInput` and carries it through the core `UserInput` model so clients can correlate echoed user input items without relying on payload equality. ## Details - Adds `client_id: Option<String>` to core `UserInput` variants. - Exposes the v2 app-server field as `clientId` on the wire and in generated TypeScript. - Preserves the id when converting between app-server v2 and core protocol types. - Regenerates app-server schema fixtures. ## Validation - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-protocol` - `git diff --check`
Alexi Christakis ·
2026-05-28 14:54:39 -07:00 -
[codex] request desktop attestation from app (#20619)
## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths.  ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com>
Jiaming Zhang ·
2026-05-08 12:36:02 -07:00 -
[codex-analytics] plumb protocol-native review timing (#21434)
## Why We want terminal tool review analytics, but the reducer should not stamp review timing from its own wall clock. This PR plumbs review timing through the real protocol and app-server seams so downstream analytics can consume the emitter's timestamps directly. Guardian reviews keep their enriched `started_at` / `completed_at` analytics fields by deriving those legacy second-based values from the same protocol-native millisecond lifecycle timestamps, rather than sampling a separate analytics clock. ## What changed - add `started_at_ms` to user approval request payloads - add `started_at_ms` / `completed_at_ms` to guardian review notifications - preserve Guardian review `started_at` / `completed_at` enrichment from the protocol-native timing source - stamp typed `ServerResponse` analytics facts with app-server-observed `completed_at_ms` - thread the new timing fields through core, protocol, app-server, TUI, and analytics fixtures ## Verification - `cargo test -p codex-app-server outgoing_message --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-app-server-protocol guardian --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml` - `cargo test -p codex-analytics analytics_client_tests --manifest-path codex-rs/Cargo.toml` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434). * #18748 * __->__ #21434 * #18747 * #17090 * #17089 * #20514
rhan-oai ·
2026-05-07 20:31:41 -07:00 -
Disable empty Cargo test targets (#21584)
## Summary `cargo test` has entails both running standard Rust tests and doctests. It turns out that the doctest discovery is fairly slow, and it's a cost you pay even for crates that don't include any doctests. This PR disables doctests with `doctest = false` for crates that lack any doctests. For the collection of crates below, this speeds up test execution by >4x. E.g., before this PR: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 1.849 s ± 4.455 s [User: 0.752 s, System: 1.367 s] Range (min … max): 0.418 s … 14.529 s 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-absolute-path -p codex-utils-cache -p codex-utils-cli -p codex-utils-home-dir -p codex-utils-output-truncation -p codex-utils-path -p codex-utils-string -p codex-utils-template -p codex-utils-elapsed -p codex-utils-json-to-toml Time (mean ± σ): 428.6 ms ± 6.9 ms [User: 187.7 ms, System: 219.7 ms] Range (min … max): 418.0 ms … 436.8 ms 10 runs ``` For a single crate, with >2x speedup, before: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 491.1 ms ± 9.0 ms [User: 229.8 ms, System: 234.9 ms] Range (min … max): 480.9 ms … 512.0 ms 10 runs ``` And after: ``` Benchmark 1: cargo test -p codex-utils-string Time (mean ± σ): 213.9 ms ± 4.3 ms [User: 112.8 ms, System: 84.0 ms] Range (min … max): 206.8 ms … 221.0 ms 13 runs ``` Co-authored-by: Codex <noreply@openai.com>
Charlie Marsh ·
2026-05-07 15:44:17 -07:00 -
Update Codex login success page UX (#20136)
## Summary update the local login success page to match the Codex desktop auth UX use theme-aware colors and an inline 20px Codex mark keep the actual localhost success page aligned with the browser auth UX PR ## Tests <img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00 34 PM" src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c" />
rafael-jac ·
2026-04-29 19:14:53 -04:00 -
permissions: remove legacy read-only access modes (#19449)
## Why `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`: `FullAccess` meant the historical read-only/workspace-write modes could read the full filesystem, while `Restricted` tried to carry partial readable roots. The partial-read model now belongs in `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on `SandboxPolicy` makes every legacy projection reintroduce lossy read-root bookkeeping and creates unnecessary noise in the rest of the permissions migration. This PR makes the legacy policy model narrower and explicit: `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent the old full-read sandbox modes only. Split readable roots, deny-read globs, and platform-default/minimal read behavior stay in the runtime permissions model. ## What changed - Removes `ReadOnlyAccess` from `codex_protocol::protocol::SandboxPolicy`, including the generated `access` and `readOnlyAccess` API fields. - Updates legacy policy/profile conversions so restricted filesystem reads are represented only by `FileSystemSandboxPolicy` / `PermissionProfile` entries. - Keeps app-server v2 compatible with legacy `fullAccess` read-access payloads by accepting and ignoring that no-op shape, while rejecting legacy `restricted` read-access payloads instead of silently widening them to full-read legacy policies. - Carries Windows sandbox platform-default read behavior with an explicit override flag instead of depending on `ReadOnlyAccess::Restricted`. - Refreshes generated app-server schema/types and updates tests/docs for the simplified legacy policy shape. ## Verification - `cargo check -p codex-app-server-protocol --tests` - `cargo check -p codex-windows-sandbox --tests` - `cargo test -p codex-app-server-protocol sandbox_policy_` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19449
Michael Bolin ·
2026-04-24 17:16:58 -07:00 -
Support multiple cwd filters for thread list (#18502)
## Summary - Teach app-server `thread/list` to accept either a single `cwd` or an array of cwd filters, returning threads whose recorded session cwd matches any requested path - Add `useStateDbOnly` as an explicit opt-in fast path for callers that want to answer `thread/list` from SQLite without scanning JSONL rollout files - Preserve backwards compatibility: by default, `thread/list` still scans JSONL rollouts and repairs SQLite state - Wire the new cwd array and SQLite-only options through app-server, local/remote thread-store, rollout listing, generated TypeScript/schema fixtures, proto output, and docs ## Test Plan - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-rollout` - `cargo test -p codex-thread-store` - `cargo test -p codex-app-server thread_list` - `just fmt` - `just fix -p codex-app-server-protocol -p codex-rollout -p codex-thread-store -p codex-app-server` - `cargo build -p codex-cli --bin codex`
acrognale-oai ·
2026-04-22 06:10:09 -04:00 -
Add sorting/backwardsCursor to thread/list and new thread/turns/list api (#17305)
To improve performance of UI loads from the app, add two main improvements: 1. The `thread/list` api now gets a `sortDirection` request field and a `backwardsCursor` to the response, which lets you paginate forwards and backwards from a window. This lets you fetch the first few items to display immediately while you paginate to fill in history, then can paginate "backwards" on future loads to catch up with any changes since the last UI load without a full reload of the entire data set. 2. Added a new `thread/turns/list` api which also has sortDirection and backwardsCursor for the same behavior as `thread/list`, allowing you the same small-fetch for immediate display followed by background fill-in and resync catchup.
David de Regt ·
2026-04-17 11:49:02 -07:00 -
Add ChatGPT device-code login to app server (#15525)
## Problem App-server clients could only initiate ChatGPT login through the browser callback flow, even though the shared login crate already supports device-code auth. That left VS Code, Codex App, and other app-server clients without a first-class way to use the existing device-code backend when browser redirects are brittle or when the client UX wants to own the login ceremony. ## Mental model This change adds a second ChatGPT login start path to app-server: clients can now call `account/login/start` with `type: "chatgptDeviceCode"`. App-server immediately returns a `loginId` plus the device-code UX payload (`verificationUrl` and `userCode`), then completes the login asynchronously in the background using the existing `codex_login` polling flow. Successful device-code login still resolves to ordinary `chatgpt` auth, and completion continues to flow through the existing `account/login/completed` and `account/updated` notifications. ## Non-goals This does not introduce a new auth mode, a new account shape, or a device-code eligibility discovery API. It also does not add automatic fallback to browser login in core; clients remain responsible for choosing when to request device code and whether to retry with a different UX if the backend/admin policy rejects it. ## Tradeoffs We intentionally keep `login_chatgpt_common` as a local validation helper instead of turning it into a capability probe. Device-code eligibility is checked by actually calling `request_device_code`, which means policy-disabled cases surface as an immediate request error rather than an async completion event. We also keep the active-login state machine minimal: browser and device-code logins share the same public cancel contract, but device-code cancellation is implemented with a local cancel token rather than a larger cross-crate refactor. ## Architecture The protocol grows a new `chatgptDeviceCode` request/response variant in app-server v2. On the server side, the new handler reuses the existing ChatGPT login precondition checks, calls `request_device_code`, returns the device-code payload, and then spawns a background task that waits on either cancellation or `complete_device_code_login`. On success, it reuses the existing auth reload and cloud-requirements refresh path before emitting `account/login/completed` success and `account/updated`. On failure or cancellation, it emits only `account/login/completed` failure. The existing `account/login/cancel { loginId }` contract remains unchanged and now works for both browser and device-code attempts. ## Tests Added protocol serialization coverage for the new request/response variant, plus app-server tests for device-code success, failure, cancel, and start-time rejection behavior. Existing browser ChatGPT login coverage remains in place to show that the callback-based flow is unchanged.daniel-oai ·
2026-03-27 00:27:15 -07:00 -
chore: remove skill metadata from command approval payloads (#15906)
## Why This is effectively a follow-up to [#15812](https://github.com/openai/codex/pull/15812). That change removed the special skill-script exec path, but `skill_metadata` was still being threaded through command-approval payloads even though the approval flow no longer uses it to render prompts or resolve decisions. Keeping it around added extra protocol, schema, and client surface area without changing behavior. Removing it keeps the command-approval contract smaller and avoids carrying a dead field through app-server, TUI, and MCP boundaries. ## What changed - removed `ExecApprovalRequestSkillMetadata` and the corresponding `skillMetadata` field from core approval events and the v2 app-server protocol - removed the generated JSON and TypeScript schema output for that field - updated app-server, MCP server, TUI, and TUI app-server approval plumbing to stop forwarding the field - cleaned up tests that previously constructed or asserted `skillMetadata` ## Testing - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-protocol` - `cargo test -p codex-app-server-test-client` - `cargo test -p codex-mcp-server` - `just argument-comment-lint`
Michael Bolin ·
2026-03-26 15:32:03 -07:00 -
Apply argument comment lint across codex-rs (#14652)
## Why Once the repo-local lint exists, `codex-rs` needs to follow the checked-in convention and CI needs to keep it from drifting. This commit applies the fallback `/*param*/` style consistently across existing positional literal call sites without changing those APIs. The longer-term preference is still to avoid APIs that require comments by choosing clearer parameter types and call shapes. This PR is intentionally the mechanical follow-through for the places where the existing signatures stay in place. After rebasing onto newer `main`, the rollout also had to cover newly introduced `tui_app_server` call sites. That made it clear the first cut of the CI job was too expensive for the common path: it was spending almost as much time installing `cargo-dylint` and re-testing the lint crate as a representative test job spends running product tests. The CI update keeps the full workspace enforcement but trims that extra overhead from ordinary `codex-rs` PRs. ## What changed - keep a dedicated `argument_comment_lint` job in `rust-ci` - mechanically annotate remaining opaque positional literals across `codex-rs` with exact `/*param*/` comments, including the rebased `tui_app_server` call sites that now fall under the lint - keep the checked-in style aligned with the lint policy by using `/*param*/` and leaving string and char literals uncommented - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo registry/git metadata in the lint job - split changed-path detection so the lint crate's own `cargo test` step runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes - continue to run the repo wrapper over the `codex-rs` workspace, so product-code enforcement is unchanged Most of the code changes in this commit are intentionally mechanical comment rewrites or insertions driven by the lint itself. ## Verification - `./tools/argument-comment-lint/run.sh --workspace` - `cargo test -p codex-tui-app-server -p codex-tui` - parsed `.github/workflows/rust-ci.yml` locally with PyYAML --- * -> #14652 * #14651
Michael Bolin ·
2026-03-16 16:48:15 -07:00 -
chore(app-server): stop emitting codex/event/ notifications (#14392)
## Description This PR stops emitting legacy `codex/event/*` notifications from the public app-server transports. It's been a long time coming! app-server was still producing a raw notification stream from core, alongside the typed app-server notifications and server requests, for compatibility reasons. Now, external clients should no longer be depending on those legacy notifications, so this change removes them from the stdio and websocket contract and updates the surrounding docs, examples, and tests to match. ### Caveat I left the "in-process" version of app-server alone for now, since `codex exec` was recently based on top of app-server via this in-process form here: https://github.com/openai/codex/pull/14005 Seems like `codex exec` still consumes some legacy notifications internally, so this branch only removes `codex/event/*` from app-server over stdio and websockets. ## Follow-up Once `codex exec` is fully migrated off `codex/event/*` notifications, we'll be able to stop emitting them entirely entirely instead of just filtering it at the external transport boundary.
Owen Lin ·
2026-03-12 00:45:20 +00:00 -
fix(otel): make HTTP trace export survive app-server runtimes (#14300)
## Summary This PR fixes OTLP HTTP trace export in runtimes where the previous exporter setup was unreliable, especially around app-server usage. It also removes the old `codex_otel::otel_provider` compatibility shim and switches remaining call sites over to the crate-root `codex_otel::OtelProvider` export. ## What changed - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes. - Add an async HTTP client path for trace export when we are already inside a multi-thread Tokio runtime. - Make provider shutdown flush traces before tearing down the tracer provider. - Add loopback coverage that verifies traces are actually sent to `/v1/traces`: - outside Tokio - inside a multi-thread Tokio runtime - inside a current-thread Tokio runtime - Remove the `codex_otel::otel_provider` shim and update remaining imports. ## Why I hit cases where spans were being created correctly but never made it to the collector. The issue turned out to be in exporter/runtime behavior rather than the span plumbing itself. This PR narrows that gap and gives us regression coverage for the actual export path.
Owen Lin ·
2026-03-11 12:33:10 -07:00 -
Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
### Purpose While trying to build out CLI-Tools for the agent to use under skills we have found that those tools sometimes need to invoke a user elicitation. These elicitations are handled out of band of the codex app-server but need to indicate to the exec manager that the command running is not going to progress on the usual timeout horizon. ### Example Model calls universal exec: `$ download-credit-card-history --start-date 2026-01-19 --end-date 2026-02-19 > credit_history.jsonl` download-cred-card-history might hit a hosted/preauthenticated service to fetch data. That service might decide that the request requires an end user approval the access to the personal data. It should be able to signal to the running thread that the command in question is blocked on user elicitation. In that case we want the exec to continue, but the timeout to not expire on the tool call, essentially freezing time until the user approves or rejects the command at which point the tool would signal the app-server to decrement the outstanding elicitation count. Now timeouts would proceed as normal. ### What's Added - New v2 RPC methods: - thread/increment_elicitation - thread/decrement_elicitation - Protocol updates in: - codex-rs/app-server-protocol/src/protocol/common.rs - codex-rs/app-server-protocol/src/protocol/v2.rs - App-server handlers wired in: - codex-rs/app-server/src/codex_message_processor.rs ### Behavior - Counter starts at 0 per thread. - increment atomically increases the counter. - decrement atomically decreases the counter; decrement at 0 returns invalid request. - Transition rules: - 0 -> 1: broadcast pause state, pausing all active stopwatches immediately. - \>0 -> >0: remain paused. - 1 -> 0: broadcast unpause state, resuming stopwatches. - Core thread/session logic: - codex-rs/core/src/codex_thread.rs - codex-rs/core/src/codex.rs - codex-rs/core/src/mcp_connection_manager.rs ### Exec-server stopwatch integration - Added centralized stopwatch tracking/controller: - codex-rs/exec-server/src/posix/stopwatch_controller.rs - Hooked pause/unpause broadcast handling + stopwatch registration: - codex-rs/exec-server/src/posix/mcp.rs - codex-rs/exec-server/src/posix/stopwatch.rs - codex-rs/exec-server/src/posix.rsChanning Conger ·
2026-03-09 22:29:26 -07:00 -
app-server: include experimental skill metadata in exec approval requests (#13929)
## Summary This change surfaces skill metadata on command approval requests so app-server clients can tell when an approval came from a skill script and identify the originating `SKILL.md`. - add `skill_metadata` to exec approval events in the shared protocol - thread skill metadata through core shell escalation and delegated approval handling for skill-triggered approvals - expose the field in app-server v2 as experimental `skillMetadata` - regenerate the JSON/TypeScript schemas and cover the new field in protocol, transport, core, and TUI tests ## Why Skill-triggered approvals already carry skill context inside core, but app-server clients could not see which skill caused the prompt. Sending the skill metadata with the approval request makes it possible for clients to present better approval UX and connect the prompt back to the relevant skill definition. ## example event in app-server-v2 verified that we see this event when experimental api is on: ``` < { < "id": 11, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": { < "accessibility": false, < "automations": { < "bundle_ids": [ < "com.apple.Notes" < ] < }, < "calendar": false, < "preferences": "read_only" < }, < "network": null < }, < "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes", < "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy", < "skillMetadata": { < "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md" < }, < "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69", < "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f" < } < }` ``` & verified that this is the event when experimental api is off: ``` < { < "id": 13, < "method": "item/commandExecution/requestApproval", < "params": { < "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt", < "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4", < "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a" < } < } ```Celia Chen ·
2026-03-08 18:07:46 -07:00 -
app-server: Add streaming and tty/pty capabilities to
command/exec(#13640)* Add an ability to stream stdin, stdout, and stderr * Streaming of stdout and stderr has a configurable cap for total amount of transmitted bytes (with an ability to disable it) * Add support for overriding environment variables * Add an ability to terminate running applications (using `command/exec/terminate`) * Add TTY/PTY support, with an ability to resize the terminal (using `command/exec/resize`)
Ruslan Nigmatullin ·
2026-03-06 17:30:17 -08:00 -
feat(app-server-test-client): OTEL setup for tracing (#13493)
### Overview This PR: - Updates `app-server-test-client` to load OTEL settings from `$CODEX_HOME/config.toml` and initializes its own OTEL provider. - Add real client root spans to app-server test client traces. This updates `codex-app-server-test-client` so its Datadog traces reflect the full client-driven flow instead of a set of server spans stitched together under a synthetic parent. Before this change, the test client generated a fake `traceparent` once and reused it for every JSON-RPC request. That kept the requests in one trace, but there was no real client span at the top, so Datadog ended up showing the sequence in a slightly misleading way, where all RPCs were anchored under `initialize`. Now the test client: - loads OTEL settings from the normal Codex config path, including `$CODEX_HOME/config.toml` and existing --config overrides - initializes tracing the same way other Codex binaries do when trace export is enabled - creates a real client root span for each scripted command - creates per-request client spans for JSON-RPC methods like `initialize`, `thread/start`, and `turn/start` - injects W3C trace context from the current client span into request.trace instead of reusing a fabricated carrier This gives us a cleaner trace shape in Datadog: - one trace URL for the whole scripted flow - a visible client root span - proper client/server parent-child relationships for each app-server request
Owen Lin ·
2026-03-04 13:30:09 -08:00 -
Feat: Preserve network access on read-only sandbox policies (#13409)
## Summary `PermissionProfile.network` could not be preserved when additional or compiled permissions resolved to `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access field. This change makes read-only + network enabled representable directly and threads that through the protocol, app-server v2 mirror, and permission- merging logic. ## What changed - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core protocol and app-server v2 protocol. - Kept backward compatibility by defaulting the new field to false, so legacy read-only payloads still deserialize unchanged. - Updated `has_full_network_access()` and sandbox summaries to respect read-only network access. - Preserved PermissionProfile.network when: - compiling skill permission profiles into sandbox policies - normalizing additional permissions - merging additional permissions into existing sandbox policies - Updated the approval overlay to show network in the rendered permission rule when requested. - Regenerated app-server schema fixtures for the new v2 wire shape.Celia Chen ·
2026-03-04 02:41:57 +00:00 -
chore(app-server): delete v1 RPC methods and notifications (#13375)
## Summary This removes the old app-server v1 methods and notifications we no longer need, while keeping the small set the main codex app client still depends on for now. The remaining legacy surface is: - `initialize` - `getConversationSummary` - `getAuthStatus` - `gitDiffToRemote` - `fuzzyFileSearch` - `fuzzyFileSearch/sessionStart` - `fuzzyFileSearch/sessionUpdate` - `fuzzyFileSearch/sessionStop` And the raw `codex/event/*` notifications emitted from core. These notifications will be removed in a followup PR. ## What changed - removed deprecated v1 request variants from the protocol and app-server dispatcher - removed deprecated typed notifications: `authStatusChange`, `loginChatGptComplete`, and `sessionConfigured` - updated the app-server test client to use v2 flows instead of deleted v1 flows - deleted legacy-only app-server test suites and added focused coverage for `getConversationSummary` - regenerated app-server schema fixtures and updated the MCP interface docs to match the remaining compatibility surface ## Testing - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server`
Owen Lin ·
2026-03-03 13:18:25 -08:00 -
Owen Lin ·
2026-03-02 17:24:48 -08:00 -
app-server: Add an ability to watch events in the test client (#13080)
Add a `watch` subcommand to `codex-app-server-test-client` binary to help in manual testing of events flow.
Ruslan Nigmatullin ·
2026-02-27 17:19:53 -08:00 -
feat: include available decisions in command approval requests (#12758)
Command-approval clients currently infer which choices to show from side-channel fields like `networkApprovalContext`, `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes the request shape harder to evolve, and it forces each client to replicate the server's heuristics instead of receiving the exact decision list for the prompt. This PR introduces a mapping between `CommandExecutionApprovalDecision` and `codex_protocol::protocol::ReviewDecision`: ```rust impl From<CoreReviewDecision> for CommandExecutionApprovalDecision { fn from(value: CoreReviewDecision) -> Self { match value { CoreReviewDecision::Approved => Self::Accept, CoreReviewDecision::ApprovedExecpolicyAmendment { proposed_execpolicy_amendment, } => Self::AcceptWithExecpolicyAmendment { execpolicy_amendment: proposed_execpolicy_amendment.into(), }, CoreReviewDecision::ApprovedForSession => Self::AcceptForSession, CoreReviewDecision::NetworkPolicyAmendment { network_policy_amendment, } => Self::ApplyNetworkPolicyAmendment { network_policy_amendment: network_policy_amendment.into(), }, CoreReviewDecision::Abort => Self::Cancel, CoreReviewDecision::Denied => Self::Decline, } } } ``` And updates `CommandExecutionRequestApprovalParams` to have a new field: ```rust available_decisions: Option<Vec<CommandExecutionApprovalDecision>> ``` when, if specified, should make it easier for clients to display an appropriate list of options in the UI. This makes it possible for `CoreShellActionProvider::prompt()` in `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly, adding support for `ApprovedForSession` when approving a skill script, which was previously missing in the TUI. Note this results in a significant change to `exec_options()` in `approval_overlay.rs`, as the displayed options are now derived from `available_decisions: &[ReviewDecision]`. ## What Changed - Add `available_decisions` to [`ExecApprovalRequestEvent`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/protocol/src/approvals.rs#L111-L175), including helpers to derive the legacy default choices when older senders omit the field. - Map `codex_protocol::protocol::ReviewDecision` to app-server `CommandExecutionApprovalDecision` and expose the ordered list as experimental `availableDecisions` in [`CommandExecutionRequestApprovalParams`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/app-server-protocol/src/protocol/v2.rs#L3798-L3807). - Thread optional `available_decisions` through the core approval path so Unix shell escalation can explicitly request `ApprovedForSession` for session-scoped approvals instead of relying on client heuristics. [`unix_escalation.rs`](https://github.com/openai/codex/blob/de00e932dd9801de0a4faac0519162099753f331/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs#L194-L214) - Update the TUI approval overlay to build its buttons from the ordered decision list, while preserving the legacy fallback when `available_decisions` is missing. - Update the app-server README, test client output, and generated schema artifacts to document and surface the new field. ## Testing - Add `approval_overlay.rs` coverage for explicit decision lists, including the generic `ApprovedForSession` path and network approval options. - Update `chatwidget/tests.rs` and app-server protocol tests to populate the new optional field and keep older event shapes working. ## Developers Docs - If we document `item/commandExecution/requestApproval` on [developers.openai.com/codex](https://developers.openai.com/codex), add experimental `availableDecisions` as the preferred source of approval choices and note that older servers may omit it.Michael Bolin ·
2026-02-26 01:10:46 +00:00 -
Revert "Add skill approval event/response (#12633)" (#12811)
This reverts commit https://github.com/openai/codex/pull/12633. We no longer need this PR, because we favor sending normal exec command approval server request with `additional_permissions` of skill permissions instead
Celia Chen ·
2026-02-26 01:02:42 +00:00 -
feat: add search term to thread list (#12578)
Add `searchTerm` to `thread/list` that will search for a match in the titles (the condition being `searchTerm` $$\in$$ `title`)
jif-oai ·
2026-02-25 09:59:41 +00:00 -
feat(ui): add network approval persistence plumbing (#12358)
## Summary - add TUI approval options for persistent network host rules - add app-server v2 approval payload plumbing for network approval context + proposed network policy amendments - add app-server handling to translate `applyNetworkPolicyAmendment` decisions back into core review decisions - update docs/test client output and generated app-server schemas/types
viyatb-oai ·
2026-02-25 07:06:19 +00:00 -
feat: add experimental additionalPermissions to v2 command execution approval requests (#12737)
This adds additionalPermissions to the app-server v2 item/commandExecution/requestApproval payload as an experimental field. The field is now exposed on CommandExecutionRequestApprovalParams and is populated from the existing core approval event when a command requests additional sandbox permissions. This PR also contains changes to make server requests to support experiment API. A real app server test client test: sample payload with experimental flag off: ``` { < "id": 0, < "method": "item/commandExecution/requestApproval", < "params": { < "command": "/bin/zsh -lc 'mkdir -p ~/some/test && touch ~/some/test/file'", < "commandActions": [ < { < "command": "mkdir -p '~/some/test'", < "type": "unknown" < }, < { < "command": "touch '~/some/test/file'", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_QLp0LWkQ1XkU6VW9T2vUZFWB", < "proposedExecpolicyAmendment": [ < "mkdir", < "-p", < "~/some/test" < ], < "reason": "Do you want to allow creating ~/some/test/file outside the workspace?", < "threadId": "019c9309-e209-7d82-a01b-dcf9556a354d", < "turnId": "019c9309-e27a-7f33-834f-6011e795c2d6" < } < } ``` with experimental flag on: ``` < { < "id": 0, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": null, < "network": true < }, < "command": "/bin/zsh -lc 'install -D /dev/null ~/some/test/file'", < "commandActions": [ < { < "command": "install -D /dev/null '~/some/test/file'", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_K3U4b3dRbj3eMCqslmncbGsq", < "proposedExecpolicyAmendment": [ < "install", < "-D" < ], < "reason": "Do you want to allow creating the file at ~/some/test/file outside the workspace sandbox?", < "threadId": "019c9303-3a8e-76e1-81bf-d67ac446d892", < "turnId": "019c9303-3af1-7143-88a1-73132f771234" < } < } ```Celia Chen ·
2026-02-25 05:16:35 +00:00 -
Add skill approval event/response (#12633)
Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.
pakrym-oai ·
2026-02-23 22:28:58 -08:00 -
Refactor network approvals to host/protocol/port scope (#12140)
## Summary Simplify network approvals by removing per-attempt proxy correlation and moving to session-level approval dedupe keyed by (host, protocol, port). Instead of encoding attempt IDs into proxy credentials/URLs, we now treat approvals as a destination policy decision. - Concurrent calls to the same destination share one approval prompt. - Different destinations (or same host on different ports) get separate prompts. - Allow once approves the current queued request group only. - Allow for session caches that (host, protocol, port) and auto-allows future matching requests. - Never policy continues to deny without prompting. Example: - 3 calls: - a.com (line 443) - b.com (line 443) - a.com (line 443) => 2 prompts total (a, b), second a waits on the first decision. - a.com:80 is treated separately from a.com line 443 ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-core tools::network_approval::tests` - `cargo test -p codex-core` (unit tests pass; existing integration-suite failures remain in this environment)
viyatb-oai ·
2026-02-20 10:39:55 -08:00 -
feat(core): zsh exec bridge (#12052)
zsh fork PR stack: - https://github.com/openai/codex/pull/12051 - https://github.com/openai/codex/pull/12052 👈 ### Summary This PR introduces a feature-gated native shell runtime path that routes shell execution through a patched zsh exec bridge, removing MCP-specific behavior from the shell hot path while preserving existing CommandExecution lifecycle semantics. When shell_zsh_fork is enabled, shell commands run via patched zsh with per-`execve` interception through EXEC_WRAPPER. Core receives wrapper IPC requests over a Unix socket, applies existing approval policy, and returns allow/deny before the subcommand executes. ### What’s included **1) New zsh exec bridge runtime in core** - Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for EXEC_WRAPPER invocations. - Per-execution Unix-socket IPC handling for wrapper requests/responses. - Approval callback integration using existing core approval orchestration. - Streaming stdout/stderr deltas to existing command output event pipeline. - Error handling for malformed IPC, denial/abort, and execution failures. **2) Session lifecycle integration** SessionServices now owns a `ZshExecBridge`. Session startup initializes bridge state; shutdown tears it down cleanly. **3) Shell runtime routing (feature-gated)** When `shell_zsh_fork` is enabled: - Build execution env/spec as usual. - Add wrapper socket env wiring. - Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of the regular shell path. - Non-zsh-fork behavior remains unchanged. **4) Config + feature wiring** - Added `Feature::ShellZshFork` (under development). - Added config support for `zsh_path` (optional absolute path to patched zsh): - `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema. - Session startup validates that `zsh_path` exists/usable when zsh-fork is enabled. - Added startup test for missing `zsh_path` failure mode. **5) Seatbelt/sandbox updates for wrapper IPC** - Extended seatbelt policy generation to optionally allow outbound connection to explicitly permitted Unix sockets. - Wired sandboxing path to pass wrapper socket path through to seatbelt policy generation. - Added/updated seatbelt tests for explicit socket allow rule and argument emission. **6) Runtime entrypoint hooks** - This allows the same binary to act as the zsh wrapper subprocess when invoked via `EXEC_WRAPPER`. **7) Tool selection behavior** - ToolsConfig now prefers ShellCommand type when shell_zsh_fork is enabled. - Added test coverage for precedence with unified-exec enabled.
Owen Lin ·
2026-02-17 20:19:53 -08:00 -
feat(core): plumb distinct approval ids for command approvals (#12051)
zsh fork PR stack: - https://github.com/openai/codex/pull/12051 👈 - https://github.com/openai/codex/pull/12052 With upcoming support for a fork of zsh that allows us to intercept `execve` and run execpolicy checks for each subcommand as part of a `CommandExecution`, it will be possible for there to be multiple approval requests for a shell command like `/path/to/zsh -lc 'git status && rg \"TODO\" src && make test'`. To support that, this PR introduces a new `approval_id` field across core, protocol, and app-server so that we can associate approvals properly for subcommands.
Owen Lin ·
2026-02-18 01:55:57 +00:00 -
codex-rs: fix thread resume rejoin semantics (#11756)
## Summary - always rejoin an in-memory running thread on `thread/resume`, even when overrides are present - reject `thread/resume` when `history` is provided for a running thread - reject `thread/resume` when `path` mismatches the running thread rollout path - warn (but do not fail) on override mismatches for running threads - add more `thread_resume` integration tests and fixes; including restart-based resume-with-overrides coverage ## Validation - `just fmt` - `cargo test -p codex-app-server --test all thread_resume` - manual test with app-server-test-client https://github.com/openai/codex/pull/11755 - manual test both stdio and websocket in app
Max Johnson ·
2026-02-13 23:09:58 +00:00 -
app-server-test-client websocket client and thread tools (#11755)
- add websocket endpoint mode with default ws://127.0.0.1:4222 while keeping stdio codex-bin path compatibility - add thread-resume (follow stream) and thread-list commands for manual thread lifecycle testing - quickstart docs
Max Johnson ·
2026-02-13 17:34:35 +00:00 -
feat: make sandbox read access configurable with
ReadOnlyAccess(#11387)`SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.Michael Bolin ·
2026-02-11 18:31:14 -08:00 -
fix: remove errant Cargo.lock files (#11526)
These leaked into the repo: - #4905 `codex-rs/windows-sandbox-rs/Cargo.lock` - #5391 `codex-rs/app-server-test-client/Cargo.lock` Note that these affect cache keys such as: https://github.com/openai/codex/blob/9722567a80b4bac81b74f679828747031fd95fa0/.github/workflows/rust-release.yml#L154 so it seems best to remove them.
Michael Bolin ·
2026-02-12 02:28:02 +00:00 -
chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
Problem: 1. turn id is constructed in-memory; 2. on resuming threads, turn_id might not be unique; 3. client cannot no the boundary of a turn from rollout files easily. This PR does three things: 1. persist `task_started` and `task_complete` events; 1. persist `turn_id` in rollout turn events; 5. generate turn_id as unique uuids instead of incrementing it in memory. This helps us resolve the issue of clients wanting to have unique turn ids for resuming a thread, and knowing the boundry of each turn in rollout files. example debug logs ``` 2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None } 2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None } 2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None } ``` backward compatibility: if you try to resume an old session without task_started and task_complete event populated, the following happens: - If you resume and do nothing: those reconstructed historical IDs can differ next time you resume. - If you resume and send a new turn: the new turn gets a fresh UUID from live submission flow and is persisted, so that new turn’s ID is stable on later resumes. I think this behavior is fine, because we only care about deterministic turn id once a turn is triggered.Celia Chen ·
2026-02-11 03:56:01 +00:00 -
feat: opt-out of events in the app-server (#11319)
Add `optOutNotificationMethods` in the app-server to opt-out events based on exact method matching
jif-oai ·
2026-02-10 18:04:52 +00:00 -
chore: add
codex debug app-servertooling (#10367)codex debug app-server <user message> forwards the message through codex-app-server-test-client’s send_message_v2 library entry point, using std::env::current_exe() to resolve the codex binary. for how it looks like, see: ``` celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server --help Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.34s Tooling: helps debug the app server Usage: codex debug app-server [OPTIONS] <COMMAND> Commands: send-message-v2 help Print this message or the help of the given subcommand(s) ```` and ``` celia@com-92114 codex-rs % cargo build -p codex-cli && target/debug/codex debug app-server send-message-v2 "hello world" Compiling codex-cli v0.0.0 (/Users/celia/code/codex/codex-rs/cli) Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.38s > { > "method": "initialize", > "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954", > "params": { > "clientInfo": { > "name": "codex-toy-app-server", > "title": "Codex Toy App Server", > "version": "0.0.0" > }, > "capabilities": { > "experimentalApi": true > } > } > } < { < "id": "f8ba9f60-3a49-4ea9-81d6-4ab6853e3954", < "result": { < "userAgent": "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)" < } < } < initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.2.0; arm64) vscode/2.4.27 (codex-toy-app-server; 0.0.0)" } > { > "method": "thread/start", > "id": "203f1630-beee-4e60-b17b-9eff16b1638b", > "params": { > "model": null, > "modelProvider": null, > "cwd": null, > "approvalPolicy": null, > "sandbox": null, > "config": null, > "baseInstructions": null, > "developerInstructions": null, > "personality": null, > "ephemeral": null, > "dynamicTools": null, > "mockExperimentalField": null, > "experimentalRawEvents": false > } > } ... ```Celia Chen ·
2026-02-03 23:17:34 +00:00 -
feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
We started working with MCP in Codex before https://crates.io/crates/rmcp was mature, so we had our own crate for MCP types that was generated from the MCP schema: https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md Now that `rmcp` is more mature, it makes more sense to use their MCP types in Rust, as they handle details (like the `_meta` field) that our custom version ignored. Though one advantage that our custom types had is that our generated types implemented `JsonSchema` and `ts_rs::TS`, whereas the types in `rmcp` do not. As such, part of the work of this PR is leveraging the adapters between `rmcp` types and the serializable types that are API for us (app server and MCP) introduced in #10356. Note this PR results in a number of changes to `codex-rs/app-server-protocol/schema`, which merit special attention during review. We must ensure that these changes are still backwards-compatible, which is possible because we have: ```diff - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, }; + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, }; ``` so `ContentBlock` has been replaced with the more general `JsonValue`. Note that `ContentBlock` was defined as: ```typescript export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource; ``` so the deletion of those individual variants should not be a cause of great concern. Similarly, we have the following change in `codex-rs/app-server-protocol/schema/typescript/Tool.ts`: ``` - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, }; + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, }; ``` so: - `annotations?: ToolAnnotations` ➡️ `JsonValue` - `inputSchema: ToolInputSchema` ➡️ `JsonValue` - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue` and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349). * #10357 * __->__ #10349 * #10356
Michael Bolin ·
2026-02-02 17:41:55 -08:00 -
[feat] persist thread_dynamic_tools in db (#10252)
Persist thread_dynamic_tools in sqlite and read first from it. Fall back to rollout files if it's not found. Persist dynamic tools to both sqlite and rollout files. Saw that new sessions get populated to db correctly & old sessions get backfilled correctly at startup: ``` celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \ "select thread_id, position,name,description,input_schema from thread_dynamic_tools;" 019c0cad-ec0d-74b2-a787-e8b33a349117|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"} .... 019c10ca-aa4b-7620-ae40-c0919fbd7ea7|0|geo_lookup|lookup a city|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"} ```Celia Chen ·
2026-02-03 00:06:44 +00:00