mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
755880ef9c99bf2b3dbdaea7feb4c19f567b63a7
5876 Commits
-
permissions: derive config defaults as profiles (#19772)
## Why This continues the permissions migration by making legacy config default resolution produce the canonical `PermissionProfile` first. The legacy `SandboxPolicy` projection should stay available at compatibility boundaries, but config loading should not create a legacy policy just to immediately convert it back into a profile. Specifically, when `default_permissions` is not specified in `config.toml`, instead of creating a `SandboxPolicy` in `codex-rs/core/src/config/mod.rs` and then trying to derive a `PermissionProfile` from it, we use `derive_permission_profile()` to create a more faithful `PermissionProfile` using the values of `ConfigToml` directly. This also keeps the existing behavior of `sandbox_workspace_write` and extra writable roots after #19841 replaced `:cwd` with `:project_roots`. Legacy workspace-write defaults are represented as symbolic `:project_roots` write access plus symbolic project-root metadata carveouts. Extra absolute writable roots are still added directly and continue to get concrete metadata protections for paths that exist under those roots. The platform sandboxes differ when a symbolic project-root subpath does not exist yet. * **Seatbelt** can encode literal/subpath exclusions directly, so macOS emits project-root metadata subpath policies even if `.git`, `.agents`, or `.codex` do not exist. * **bwrap** has to materialize bind-mount targets. Binding `/dev/null` to a missing `.git` can create a host-visible placeholder that changes Git repo discovery. Binding missing `.agents` would not affect Git discovery, but it would still create a host-visible project metadata placeholder from an automatic compatibility carveout. Linux therefore skips only missing automatic `.git` and `.agents` read-only metadata masks; missing `.codex` remains protected so first-time project config creation goes through the protected-path approval flow. User-authored `read` and `none` subpath rules keep normal bwrap behavior, and `none` can still mask the first missing component to prevent creation under writable roots. ## What Changed - Adds profile-native helpers for legacy workspace-write semantics, including `PermissionProfile::workspace_write_with()`, `FileSystemSandboxPolicy::workspace_write()`, and `FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`. - Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy workspace-write constructor so both `from_legacy_sandbox_policy()` and `From<&SandboxPolicy>` include the project-root metadata carveouts. - Removes the no-carveout `legacy_workspace_write_base_policy()` path and the `prune_read_entries_under_writable_roots()` cleanup that was only needed by that split construction. - Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode fallback resolution; named `default_permissions` profiles continue through the permissions profile pipeline instead of being reconstructed from `sandbox_mode`. - Updates `Config::load()` to start from the derived profile, validate that it still has a legacy compatibility projection, and apply additional writable roots directly to managed workspace-write filesystem policies. - Updates Linux bwrap argument construction so missing automatic `.git`/`.agents` symbolic project-root read-only carveouts are skipped before emitting bind args; missing `.codex`, user-authored `read`/`none` subpath rules, and existing missing writable-root behavior are preserved. - Adds coverage that legacy workspace-write config produces symbolic project-root metadata carveouts, extra legacy workspace writable roots still protect existing metadata paths such as `.git`, and bwrap skips missing `.git`/`.agents` project-root carveouts while preserving missing `.codex` and user-authored missing subpath rules. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772). * #19776 * #19775 * #19774 * #19773 * __->__ #19772
Michael Bolin ·
2026-04-27 16:50:10 -07:00 -
Streamline review and feedback handlers (#19498)
## Why The remaining review, interrupt, fuzzy search, feedback, and git-diff handlers still had local send-error branches that obscured otherwise simple request handling. This final slice flattens those handlers without changing the public protocol behavior. ## What Changed - Streamlined review start, turn interrupt, fuzzy search session, feedback upload, and git diff handlers in `codex-rs/app-server/src/codex_message_processor.rs`. - Converted validation and upload failures into returned JSON-RPC errors where that avoids nested `send_error`/`return` blocks. - Left unrelated sandbox setup and notification code untouched. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::review -- --test-threads=1`
pakrym-oai ·
2026-04-27 16:36:04 -07:00 -
Add MCP app feature flag (#19884)
## Summary - Add the `enable_mcp_apps` feature flag to the `codex-features` registry - Keep it under development and disabled by default ## Testing - Unit tests for `codex-features` passed - Formatting passed
Matthew Zeng ·
2026-04-27 16:28:49 -07:00 -
Show action required in terminal title (#18372)
Implements #18162 This updates the TUI terminal title to show an explicit action-required state when Codex is blocked on user approval or input. The terminal title now uses the activity title item to cover both active work and blocked-on-user states, while still accepting the legacy spinner config value. Changes - Rename the terminal title item from `spinner` to `activity` while preserving legacy config compatibility - Show `[ ! ] Action Required `while approval or input overlays are active, with a blinking `[ . ]` alternate state - Suppress the normal working spinner while Codex is blocked on user action - Add targeted coverage for action-required title behavior and legacy title-item parsing Testing - Trigger an approval or input modal and confirm the tab title alternates between `[ ! ] Action Required` and `[ . ] Action Required` - Disable the activity title item and confirm the action-required title does not appear - Resolve the prompt and confirm the title returns to the normal spinning/idel state https://github.com/user-attachments/assets/e9ecc530-a6be-4fd7-b9a6-d550a790eb2c
canvrno-oai ·
2026-04-27 15:27:11 -07:00 -
Streamline turn and realtime handlers (#19497)
## Why Turn and realtime handlers had nested validation and send-error branches that made the request path longer than the behavior warranted. This slice keeps the same request semantics while letting the handlers return errors from the failing step. ## What Changed - Streamlined turn start, injected item, and turn steer request handling in `codex-rs/app-server/src/codex_message_processor.rs`. - Applied the same result-returning shape to realtime session response handlers. - Preserved existing request validation and thread-manager interactions. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::turn_start -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::turn_steer -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::thread_inject_items -- --test-threads=1`
pakrym-oai ·
2026-04-27 15:21:59 -07:00 -
Streamline thread resume and fork handlers (#19495)
## Why Thread resume and fork had some of the deepest error-handling indentation in this area because helpers emitted request errors directly. Returning those failures gives the handlers a single request boundary while preserving the async pending-resume behavior. ## What Changed - Converted thread resume helpers in `codex-rs/app-server/src/codex_message_processor.rs` to return `Result` values for validation and view loading failures. - Applied the same pattern to thread fork request handling. - Simplified pending resume error construction by using the shared JSON-RPC error helpers. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::thread_resume -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::thread_fork -- --test-threads=1`
pakrym-oai ·
2026-04-27 15:04:53 -07:00 -
[codex] Trace cancelled inference streams (#19839)
Records cancelled inference streams when Codex stops consuming a provider response before `response.completed`, preserving complete output items observed before cancellation. Also closes still-running inference calls when the owning turn ends, so reduced rollout traces do not leave stale `Running` inference nodes. Covered by focused reducer coverage and a core stream-drop test for partial output preservation.
cassirer-openai ·
2026-04-27 21:58:29 +00:00 -
Streamline thread read handlers (#19494)
## Why The thread read/list handlers mostly assemble views, but their error handling was interleaved with response emission. Returning view-building errors from the helper path keeps those handlers focused on data assembly. ## What Changed - Added a small mapper for `ThreadReadViewError` to JSON-RPC errors in `codex-rs/app-server/src/codex_message_processor.rs`. - Streamlined thread list, loaded-thread, read, turn-list, and summary handlers to produce result values for the request boundary. - Kept the existing invalid-request vs internal-error distinctions for missing or unreadable thread data. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all conversation_summary -- --test-threads=1`
pakrym-oai ·
2026-04-27 14:30:24 -07:00 -
Publish Python SDK with Codex-pinned versioning (#18996)
**note**: a large chunk of this diff comes from regenerating Python types after app-server schema changes on `main`. This is PR 3 of 3 for the Python SDK PyPI publishing split. PR #18862 refreshed the generated SDK surface, and PR #18865 made the runtime package publishable as `openai-codex-cli-bin`; this final PR makes the SDK package publishable as `openai-codex-app-server-sdk` and pins both packages to the same Codex runtime version. The key idea is that the published SDK version is the Codex runtime version. That one version now drives the SDK package version, the exact runtime dependency, the client version reported by the SDK, and the bootstrap runtime pin. This keeps release-time versioning in one lane instead of scattering checked-in literals through the package. ## What changed - Rename the SDK distribution from `codex-app-server-sdk` to `openai-codex-app-server-sdk` for conflict-free PyPI publishing. - Use `stage-sdk --codex-version ...` with one Codex version for both the SDK package version and exact `openai-codex-cli-bin` dependency. - Preserve hidden legacy `--runtime-version` / `--sdk-version` args only to reject mismatched versions during staging. - Map PEP 440 package versions back to Codex release tags for runtime setup downloads, e.g. `0.116.0a1` -> `rust-v0.116.0-alpha.1`. - Derive `codex_app_server.__version__`, the default `AppServerConfig.client_version`, and `_runtime_setup.pinned_runtime_version()` from the SDK package/project version instead of hardcoding duplicate version strings. - Carry the current generated SDK refresh from `main` so `generate-types` stays clean after recent app-server schema changes. - Update `sdk/python/uv.lock` for the renamed editable package. ## Validation - `uv run --extra dev pytest` in `sdk/python` -> 59 passed, 37 skipped. - Targeted `uv run ruff check` for the touched SDK files. - `git diff --check`. - Staged runtime with `--codex-version rust-v0.116.0-alpha.1 --platform-tag macosx_11_0_arm64`. - Staged SDK with `--codex-version rust-v0.116.0-alpha.1`. - Built runtime wheel, SDK wheel, and SDK sdist. - `twine check /tmp/codex-python-pr3-build/dist/*` -> passed. - Clean venv smoke installed `openai-codex-app-server-sdk==0.116.0a1` from local dist and pulled `openai-codex-cli-bin==0.116.0a1`. - Smoke imports passed for `Codex` and `bundled_codex_path()`.
Steve Coffey ·
2026-04-27 14:28:46 -07:00 -
[codex] Shard exec Bazel integration test (#19862)
## Summary - shard `//codex-rs/exec:exec-all-test` into 8 Bazel shards - keep the existing `no-sandbox` test tag unchanged ## Why The Windows Bazel lane has been timing out this aggregated integration test target at the default 300s test timeout. The target runs the combined `codex-rs/exec/tests/all.rs` integration binary; sharding lets Bazel split the Rust test cases across parallel test actions instead of running the whole integration suite as one long action. ## Validation Not run locally, per the Codex repo workflow for development-phase changes. Co-authored-by: Codex <noreply@openai.com>
starr-openai ·
2026-04-27 21:22:53 +00:00 -
Streamline thread mutation handlers (#19493)
## Why Thread mutation handlers had many short error branches whose only job was to emit a JSON-RPC error and stop. This slice keeps those errors visible, but lets each handler build a result and return early from validation helpers instead of nesting the main path. ## What Changed - Streamlined thread archive/unarchive, rename, memory, metadata, rollback, compact, background terminal, shell, and guardian handlers in `codex-rs/app-server/src/codex_message_processor.rs`. - Reused shared JSON-RPC error constructors in `codex-rs/app-server/src/bespoke_event_handling.rs` for rollback-related request failures. - Preserved direct `send_error` calls where they remain the simplest boundary for pending async event responses. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::thread_rollback -- --test-threads=1`
pakrym-oai ·
2026-04-27 14:18:55 -07:00 -
[codex-backend] Prefer state git metadata in filtered thread lists (#19874)
### Summary - `thread/list` filtered filesystem results already overlay state DB metadata, but the existing merge only filled missing git fields. - Prefer non-null SQLite git metadata over stale non-null rollout values so persisted branch/SHA/origin updates are reflected in filtered thread lists. - Update the focused merge test to cover stale filesystem git metadata being replaced by state-backed values. ### Testing now getting expected icons <img width="426" height="913" alt="Screenshot 2026-04-27 at 1 45 45 PM" src="https://github.com/user-attachments/assets/027fb7e7-f54d-4353-8423-cb76f3c8f5ac" />
joeytrasatti-openai ·
2026-04-27 21:02:40 +00:00 -
Streamline thread start handler (#19492)
## Why The thread start handler mixed request validation, thread construction, dynamic-tool validation, and JSON-RPC error emission in one nested flow. Returning request errors from the helper path makes the successful setup path easier to follow. ## What Changed - Reworked `thread/start` handling in `codex-rs/app-server/src/codex_message_processor.rs` so helper methods return `Result` and the handler emits one result. - Moved dynamic-tool validation failures into returned JSON-RPC errors instead of local `send_error` branches. - Preserved the existing thread creation and task-spawning behavior. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::dynamic_tools -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::turn_start -- --test-threads=1`
pakrym-oai ·
2026-04-27 13:56:20 -07:00 -
permissions: remove cwd special path (#19841)
## Why The experimental `PermissionProfile` API had both `:cwd` and `:project_roots` special filesystem paths, which made the permission root ambiguous. This PR removes the unstable `current_working_directory` special path before the permissions API is stabilized, so callers use `:project_roots` for symbolic project-root access. ## What changed - Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol and app-server protocol models, plus regenerated app-server JSON/TypeScript schemas. - Replaces internal `:cwd` permission entries with `:project_roots` entries. - Keeps the existing cwd-update behavior for legacy-shaped workspace-write profiles, while removing the deleted `CurrentWorkingDirectory` case from that compatibility path. - Keeps `PermissionProfile::workspace_write()` as the reusable symbolic workspace-write helper, with docs noting that `:project_roots` entries resolve at enforcement time. - Updates app-server docs/examples and approval UI labeling to stop advertising `:cwd` as a permission token. ## Compatibility Persisted rollout items may contain the old `{"kind":"current_working_directory"}` tag from earlier experimental `permissionProfile` snapshots. This PR keeps that tag as a deserialize-only alias for `ProjectRoots { subpath: None }`, while continuing to serialize only the new `project_roots` tag. ## Follow-up This PR intentionally does not introduce an explicit project-root set on `SessionConfiguration` or runtime sandbox resolution. Today, the resolver still uses the active cwd as the single implicit project root. A follow-up should model project roots separately from tool cwd so `:project_roots` entries can resolve against the configured project roots, and resolve to no entries when there are no project roots. ## Verification - `cargo test -p codex-protocol permissions:: --lib` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-sandboxing -p codex-exec-server --lib` - `cargo test -p codex-core session_configuration_apply_ --lib` - `cargo test -p codex-app-server command_exec_permission_profile_project_roots_use_command_cwd --test all` - `cargo test -p codex-tui thread_read_session_state_does_not_reuse_primary_permission_profile --lib` - `cargo test -p codex-tui preset_matching_accepts_workspace_write_with_extra_roots --lib` - `cargo test -p codex-config --lib`Michael Bolin ·
2026-04-27 13:41:27 -07:00 -
Preserve TUI markdown list spacing after code blocks (#19706)
## Why Fixes #19702. The TUI markdown renderer could visually attach the next list marker to a fenced code block inside the previous list item, even when the source markdown included a blank line before the next item. That made block-heavy loose lists harder to read, while the desired behavior is still to keep simple lists compact. ## What changed - Track whether the current rendered list item contains a code block. - Preserve one blank separator before the following list marker only when the previous item contained a code block. - Add regression coverage for both paths: code-block list items keep the separator, and simple loose list items stay compact. ## Verification - `cargo test -p codex-tui markdown_render` I also manually verified that the bug exists before and is fixed after. ## Before <img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM" src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728" /> ## After <img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM" src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0" />
Eric Traut ·
2026-04-27 13:40:46 -07:00 -
Delay approval prompts while typing (#19513)
## Why Fixes #7744. Approval modals can currently appear while the user is typing ahead in the TUI composer, which lets plain letters like `y` or `a` get consumed as approval shortcuts instead of staying in the draft input. ## What changed - Track recent composer typing activity in `bottom_pane/mod.rs`. - Delay new approval overlays for 1 second while the composer is active, keeping delayed requests queued until the user is idle. - Preserve the existing active-overlay behavior so approvals that arrive while an approval modal is already open are still queued into that overlay. - Prune delayed approvals when app-server resolution says the request has already been handled. ## Verification Added unit coverage for immediate approvals, delayed approvals, idle deadline reset, typed shortcut letters staying in the composer, shortcut handling after the delay, and resolved delayed-request pruning. Focused `codex-tui` test groups pass locally. The full `cargo test -p codex-tui` run currently aborts in `app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`; that same test also fails when run alone with the same stack overflow. Manual reviewer check: 1. Start the TUI from the repo root: ```bash RUST_LOG=trace just codex \ -c log_dir=<temp-log-dir> \ --ask-for-approval untrusted \ --sandbox workspace-write ``` 2. Submit this prompt: ```text create a file text.txt on my desktop ``` 3. While the agent is preparing the approval request, immediately type text such as `ya this should stay in the composer`. 4. Confirm the typed-ahead `y`/`a` remains in the composer instead of approving the request. 5. Stop typing for about 1 second; the approval modal should then appear. 6. Once the modal is visible, press `y` and confirm the approval shortcut works normally.
Eric Traut ·
2026-04-27 13:20:55 -07:00 -
Fix filtered thread-list resume regression in TUI (#19591)
## Why `codex resume` regressed after [#18502](https://github.com/openai/codex/pull/18502) changed the default `thread/list` scan-and-repair path for metadata-filtered listings. The TUI resume picker uses `thread/list` with source/provider/cwd filters and `useStateDbOnly: false`, which is the intended correctness-preserving mode: it should still consult the filesystem so healthy, missing, or stale SQLite state can be repaired. The regression was that #18502 made that filtered, filesystem-backed path call `reconcile_rollout` for every filesystem hit, and then call it again for each SQLite hit. When `reconcile_rollout` does not already have extracted rollout items, it falls back to loading the full JSONL rollout. That changed the resume picker’s first page from a cheap rollout-head scan plus SQLite read-repair into full-file reads for large sessions, so a few long threads could dominate TUI startup/resume latency. This change addresses the regression by keeping `useStateDbOnly: false` on the correctness-preserving path while avoiding unnecessary full JSONL reads for rows the filesystem scan has already validated. Source/provider/cwd filters can be decided from rollout-head metadata, so non-search resume listings only need the lightweight read-repair path for filesystem hits. Full reconciliation is still used for DB-only filtered rows because those can be stale false positives, and for search listings because search can depend on title metadata that may require scanning the full rollout. This fixes #19483. ## What changed - For non-search filtered listings, repair filesystem hits with the lightweight `read_repair_rollout_path` path instead of full `reconcile_rollout`. - Track thread IDs proven by the filesystem scan and only fully reconcile SQLite-filtered hits that the filesystem scan did not return, preserving stale-DB false-positive cleanup without full-reading every healthy rollout. - Leave search listings on full reconciliation, since search depends on full title metadata rather than only source/provider/cwd metadata from the rollout head. ## Verification - `cargo test -p codex-rollout list_threads` - `cargo test -p codex-app-server thread_list`
Eric Traut ·
2026-04-27 13:02:39 -07:00 -
Cap original-detail image token estimates (#19865)
Clamp original-detail image patch estimates to the current 10k patch budget so large images cannot inflate local context accounting without bound. Add regression coverage for an over-budget image. Fixes openai/codex#19806.
Curtis 'Fjord' Hawthorne ·
2026-04-27 12:39:24 -07:00 -
rhan-oai ·
2026-04-27 19:29:19 +00:00 -
fix: filter dynamic deferred tools from model_visible_specs (#19771)
fixes #19486 ### Problem Right now dynamic deferred tools are filtered at normal-turn prompt building time, rather than upstream while building the `ToolRouter` itself. This causes issues because dynamic deferred tools are then wrongly included in the router's `model_visible_specs`, which is what the compaction request-building flow relies on. ### Fix Move the dynamic deferred tool filtering to `ToolRouter` creation time to solve this problem for every request that relies on `ToolRouter` for `model_visible_specs`, which solves the issue generically. ### Tests Added unit + integration tests to ensure dynamic deferred tools are omitted from `model_visible_specs` and compaction request respectively. Tested against live `/compact` endpoint; raw deferred dynamic tools without `tool_search` returned `400` (current bug), while the filtered payload (this fix) returns `200`.
sayan-oai ·
2026-04-27 19:09:02 +00:00 -
Streamline account and command handlers (#19491)
## Why Account login/logout and command exec handlers were doing local error sends in the middle of each handler. That made these request flows branch heavily even though most of the logic is validate, perform the operation, and return the response. ## What Changed - Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and add-credit handlers in `codex-rs/app-server/src/codex_message_processor.rs` to compute `Result` values and send them once at the request boundary. - Applied the same shape to command exec start/write/resize/terminate handlers. - Kept side-effect notifications in the same places after successful request handling. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::account -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::command_exec -- --test-threads=1`
pakrym-oai ·
2026-04-27 12:03:49 -07:00 -
ci: migrate Bazel setup away from archived setup-bazelisk (#19851)
## Why All Bazel CI jobs are currently blocked in the `setup-bazelisk` step while trying to download Bazelisk. [`bazelbuild/setup-bazelisk`](https://github.com/bazelbuild/setup-bazelisk) is archived, and its README now recommends migrating to [`bazel-contrib/setup-bazel`](https://github.com/bazel-contrib/setup-bazel), so leaving our workflows on the archived action leaves CI exposed to exactly this sort of outage. Because `v8-canary` now consumes the shared local `setup-bazel-ci` action, that workflow also needs to trigger when the action changes. Without that follow-up, Bazel bootstrap regressions specific to the V8 canary path could be skipped by the workflow path filters. ## What Changed - Switched `.github/actions/setup-bazel-ci/action.yml` from `bazelbuild/setup-bazelisk` to `bazel-contrib/setup-bazel`, pinned to `0.19.0`. - Left `bazelisk-version` unset so GitHub-hosted runners can use their preinstalled Bazelisk instead of downloading `1.x` at job start. - Updated `.github/workflows/rusty-v8-release.yml` and `.github/workflows/v8-canary.yml` to use the shared `setup-bazel-ci` action instead of referencing `setup-bazelisk` directly. - Added `.github/actions/setup-bazel-ci/**` to the `pull_request` and `push` path filters in `.github/workflows/v8-canary.yml` so changes to the shared Bazel setup action still run the canary workflow. - Kept the existing repository-cache and Windows-specific Bazel setup logic intact. This keeps Bazel version selection anchored by `.bazelversion` while removing the failing dependency on the archived setup action. ## Verification - Searched `.github/` to confirm there are no remaining `setup-bazelisk` references. - Parsed the updated workflow and action YAML locally with Ruby's `YAML.load_file`.
Michael Bolin ·
2026-04-27 11:37:30 -07:00 -
ci: pin npm staging smoke test to a recent rust-release run (#19854)
## Why The `build-test` workflow stages a representative `codex` npm tarball by asking `scripts/stage_npm_packages.py` to look up a past `rust-release` run for a hardcoded release version. That started failing in CI because the representative version in `.github/workflows/ci.yml` was stale: - the workflow was still using `0.115.0` - `stage_npm_packages.py` resolves native artifacts by looking for a `rust-release` run on the `rust-v<version>` branch - that lookup no longer found a matching run for `rust-v0.115.0`, so the smoke test failed before it could stage the package This PR makes that smoke test depend on a known-good recent release run instead of an older branch lookup that is no longer reliable. ## What Changed - Updated the representative release version in `.github/workflows/ci.yml` from `0.115.0` to `0.125.0`. - Added an explicit `WORKFLOW_URL` pointing at a recent successful `rust-release` run: `https://github.com/openai/codex/actions/runs/24901475298`. - Passed that URL to `scripts/stage_npm_packages.py` via `--workflow-url` so the job can reuse the expected native artifacts directly instead of relying on `gh run list --branch rust-v<version>` to discover them. That keeps the npm staging smoke test representative while making it less sensitive to older release branch history disappearing from the GitHub Actions lookup path. ## Verification - Inspected the failing CI log from `build-test` and confirmed the failure came from `scripts/stage_npm_packages.py` being unable to resolve `rust-v0.115.0`. - Confirmed that `https://github.com/openai/codex/actions/runs/24901475298` is a successful `rust-release` run for `rust-v0.125.0`.
Michael Bolin ·
2026-04-27 11:32:48 -07:00 -
refactor: make auth loading async (#19762)
## Summary Auth loading used to expose synchronous construction helpers in several places even though some auth sources now need async work. This PR makes the auth-loading surface async and updates the callers to await it. This is intentionally only plumbing. It does not change how AgentIdentity tokens are decoded, how task runtime ids are allocated, or how JWT signatures are verified. ## Stack 1. **This PR:** [refactor: make auth loading async](https://github.com/openai/codex/pull/19762) 2. [refactor: load AgentIdentity runtime eagerly](https://github.com/openai/codex/pull/19763) 3. [feat: verify AgentIdentity JWTs with JWKS](https://github.com/openai/codex/pull/19764) ## Important call sites | Area | Change | | --- | --- | | `codex-login` auth loading | `CodexAuth` and `AuthManager` construction paths now await auth loading. | | app-server startup | Auth manager construction is awaited during initialization. | | CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now await the same behavior. | | cloud requirements storage loader | The loader becomes async so it can share the same auth construction path. | | auth tests | Tests that load auth now run in async contexts. | ## Testing Tests: targeted Rust auth test compilation, formatter, scoped Clippy fix, and Bazel lock check.
efrazer-oai ·
2026-04-27 11:00:27 -07:00 -
Streamline plugin, apps, and skills handlers (#19490)
## Why The plugin, app, and skills handlers had a lot of repeated `send_error`/`return` branches that made the success path hard to scan. This slice keeps behavior the same while moving fallible steps into local response-producing helpers, so the request boundary can send one result. ## What Changed - Converted plugin list/install/uninstall handlers in `codex-rs/app-server/src/codex_message_processor/plugins.rs` to return `Result<*Response, JSONRPCErrorError>` from helper methods and call `send_result` once. - Added local error-mapping helpers for plugin install/uninstall and marketplace failures. - Applied the same mechanical shape to app list, skills list/config, and marketplace add/remove/upgrade handlers in `codex-rs/app-server/src/codex_message_processor.rs`. ## Verification - `cargo check -p codex-app-server` - `cargo test -p codex-app-server --test all v2::app_list -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::plugin_ -- --test-threads=1` - `cargo test -p codex-app-server --test all v2::skills_list -- --test-threads=1`
pakrym-oai ·
2026-04-27 10:18:25 -07:00 -
Render delegated patch approval details (#19709)
## Why Fixes #19632. When a delegated agent requests approval for an in-progress file change, the parent TUI handles that request from an inactive thread. The app server already sent the `FileChange` item with the proposed diff, but the inactive-thread approval path was not recovering and rendering it the same way as the active-thread path. The result was an inconsistent approval prompt: main-thread edits show a normal patch preview history item before the approval modal, while delegated edits did not show that preview in the transcript flow. ## What Changed - Recover buffered or historical `FileChange` item changes when building inactive-thread file-change approval requests. - Reuse the app-server file-change conversion helper for both live transcript rendering and inactive-thread approvals. - Render recovered delegated patches as a normal patch preview history cell before the approval modal. - Keep apply-patch approval modals focused on the decision prompt and optional metadata; they do not render a synthetic command line or embed the diff body. ## Manual Repro And Verification I manually reproduced the issue using a file under `~/Desktop` so the write would require approval. Before the fix: 1. Ask the main thread: `Use apply_patch, not shell redirection or Python, to create ~/Desktop/bug1.txt with three short lines.` 2. Observe the expected TUI shape: the transcript shows a normal patch preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval modal, and the modal contains only the approval prompt/options without a synthetic command line. 3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch, not shell redirection or Python, to create ~/Desktop/bug1.txt with four short lines.` 4. Observe the delegated approval is inconsistent: the parent view does not render the proposed patch as the normal transcript preview before the modal, so the diff context is missing from the stream or appears inside the modal instead of in the history flow. After the fix: 1. Repeat the delegated worker prompt with `apply_patch`. 2. Confirm the parent view renders the same normal patch preview history cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately before the approval modal. 3. Confirm the approval modal remains focused on the decision prompt. For delegated approvals it may show the worker thread label, but it should not show a `$ apply_patch` command line or embed the diff body in the modal.
Eric Traut ·
2026-04-27 10:07:15 -07:00 -
Persist shell mode commands in prompt history (#19618)
## Why `!` shell commands are currently surfaced as "Bash mode", which is misleading for users running shells such as PowerShell or zsh. Those commands also bypass the persistent prompt history path, so they cannot be recalled after starting a new session. Fixes #19613. ## What changed - Rename the TUI footer label and related test wording from "Bash mode" to "Shell mode". - Persist accepted `!` shell commands to prompt history with the leading `!`, so recall restores the composer into shell mode across sessions. - Add coverage for immediate and queued shell-command submissions emitting the prompt-history update. ## Verification - `cargo test -p codex-tui bang_shell` - `cargo test -p codex-tui shell_command_uses_shell_accent_style` - `cargo test -p codex-tui footer_mode_snapshots` - `cargo insta pending-snapshots --manifest-path tui/Cargo.toml` Manually verified fix after confirming presence of bug prior to fix.
Eric Traut ·
2026-04-27 09:54:25 -07:00 -
Hide rewind preview when no user message exists (#19510)
## Why Fixes #19508. In a fresh TUI session, pressing `Esc` twice entered the rewind transcript overlay even though there was no user message to rewind to. That produced an empty header-only transcript view and exposed a rewind flow that could not select a valid target. ## What changed The backtrack flow now checks whether a user-message rewind target exists before opening the transcript preview. If no target exists, Codex stays in the main TUI and shows `No previous message to edit.` instead of opening an empty overlay. The same guard applies when starting rewind preview from the transcript overlay, and the first `Esc` no longer advertises the “edit previous message” hint when there is no previous message available. Snapshot coverage was added for the unavailable rewind info message, along with a small target-detection test.
Eric Traut ·
2026-04-27 09:51:12 -07:00 -
chore: split memories part 1 (#19818)
Extract memories into 2 different crates
jif-oai ·
2026-04-27 16:01:05 +02:00 -
nit: one more fix (#19813)
Fix this: https://github.com/openai/codex/pull/19812#discussion_r3147529230
jif-oai ·
2026-04-27 15:32:31 +02:00 -
Avoid rewriting Phase 2 selection on clean workspace (#19812)
## Why Phase 2 can now claim the global consolidation lock on startup even when the git-backed memory workspace is already clean. The clean-workspace path still finalized through the normal Phase 2 success path, which clears and re-marks `selected_for_phase2` rows. That made no-op startups perform avoidable writes to `stage1_outputs`, creating unnecessary DB I/O and contention when no memory files changed. ## What Changed - Added a preserving-selection Phase 2 finalizer in `codex-state` that only marks the global job row as succeeded. - Kept the existing `mark_global_phase2_job_succeeded` behavior for real consolidation runs, where the selected Phase 2 snapshot must be rewritten. - Switched the `succeeded_no_workspace_changes` branch in `core/src/memories/phase2.rs` to use the preserving-selection finalizer. - Added a regression test that installs a SQLite trigger on `stage1_outputs` and verifies the clean finalizer performs zero updates there. ## Testing - `cargo test -p codex-state` - `cargo test -p codex-core memories::tests::phase2`
jif-oai ·
2026-04-27 15:14:16 +02:00 -
Allow Phase 2 memory claims after retry exhaustion (#19809)
## Why The Phase 2 memories job row is only the global lock for the git-backed memory workspace. Manual memory edits do not enqueue new Stage 1 work, so a Phase 2 row with `retry_remaining = 0` could be skipped before the worker ever claimed the lock and generated `phase2_workspace_diff.md`. That left workspace-only changes unconsolidated after repeated failures, even when retry backoff had elapsed and the filesystem had real diffable work. ## What Changed - Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after the retry budget is exhausted, while still respecting active `retry_at` backoff and fresh running leases. - Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and update the outcome docs to match. - Clamp Phase 2 retry bookkeeping at zero when failed attempts are recorded. ## Verification - Added `phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to cover the exhausted-budget lock claim path. - Ran `cargo test -p codex-state`.
jif-oai ·
2026-04-27 14:58:11 +02:00 -
feat: use git-backed workspace diffs for memory consolidation (#18982)
## Why This PR make the `morpheus` agent (memory phase 2) use a git diff to start it's consolidation. The workflow is the following: 1. The agent acquire a lock 2. If `.codex/memories` does not exist or is not a git root, initialize everything (and make a first empty commit) 3. Update `raw_memories.md` and `rollout_summaries/` as before. Basically we select max N phase 1 memories based on a given policy 4. We use git (`gix`) to get a diff between the current state of `.codex/memories` and the last commit. 5. Dump the diff in `phase2_workspace_diff.md` 6. Spawn `morpheus` and point it to `phase2_workspace_diff.md` 7. Wait for `morpheus` to be done 8. Re-create a new `.git` and make one single commit on it. We do this because we don't want to preserve history through `.git` and this is cheap anyway 9. We release the lock On top of this, we keep the retry policies etc etc The goals of this new workflow are: * Better support of any memory extensions such as `chronicle` * Allow the user to manually edit memories and this will be considered by the phase 2 agent As a follow-up we will need to add support for user's edition while `morpheus` is running ## What Changed - Added memory workspace helpers that prepare the git baseline, compute the diff, write `phase2_workspace_diff.md`, and reset the baseline after successful consolidation. - Updated Phase 2 to sync current inputs into `raw_memories.md` and `rollout_summaries/`, prune old extension resources, skip clean workspaces, and run the consolidation subagent only when the workspace has changes. - Tightened Phase 2 job ownership around long-running consolidation with heartbeats and an ownership check before resetting the baseline. - Simplified the prompt and state APIs so DB watermarks are bookkeeping, while workspace dirtiness decides whether consolidation work exists. - Updated the memory pipeline README and tests for workspace diffs, extension-resource cleanup, pollution-driven forgetting, selection ranking, and baseline persistence. ## Verification - Added/updated coverage in `core/src/memories/tests.rs`, `core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`, and `core/tests/suite/memories.rs`. --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-04-27 14:32:44 +02:00 -
multi_agent_v2: move thread cap into feature config (#19792)
## Why `features.multi_agent_v2.max_concurrent_threads_per_session` is meant to be the MultiAgentV2-specific session thread cap: it counts the root thread and all open subagent threads. The previous implementation kept this surface tied to `agents.max_threads`, which made it a global subagent-only cap and allowed the legacy setting to coexist with MultiAgentV2. ## What Changed - Added `max_concurrent_threads_per_session` to `[features.multi_agent_v2]` with default `4`. - Removed the `[agents] max_concurrent_threads_per_session` alias to `agents.max_threads`. - When MultiAgentV2 is enabled, reject `agents.max_threads` and derive the existing internal subagent slot limit as `max_concurrent_threads_per_session - 1`. - Regenerated `core/config.schema.json` and added coverage for the new config semantics. ## Result ``` ➜ codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3 ╭────────────────────────────────────────────────────╮ │ >_ OpenAI Codex (v0.0.0) │ │ │ │ model: gpt-5.5 xhigh fast /model to change │ │ directory: ~/code/codex │ ╰────────────────────────────────────────────────────╯ Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager. › Can you try to spawn 4 agents • I’ll try to start four lightweight agents at once and report exactly what the runtime accepts. • Spawned Russell [no-apps] (gpt-5.5 xhigh) └ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Spawned Descartes [no-apps] (gpt-5.5 xhigh) └ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Agent spawn failed └ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work. • Agent spawn failed └ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work. • The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed. ``` --------- Co-authored-by: Codex <noreply@openai.com>
jif-oai ·
2026-04-27 13:31:56 +02:00 -
Add Codex issue digest skill (#19779)
Problem: Maintainers need a shared way to run Codex GitHub issue digests without copying large prompts or relying on manual GitHub page summaries. Solution: Add a reusable codex-issue-digest skill with a deterministic GitHub collector, owner/all-label windows, reaction-aware activity metrics, scaled attention markers, and focused tests.
Eric Traut ·
2026-04-26 23:16:43 -07:00 -
permissions: derive legacy exec policies at boundaries (#19737)
## Why After config and requirements store canonical profiles, exec requests should not cache a derived `SandboxPolicy`. The cached legacy value can drift from the richer profile state, and most execution paths already have the filesystem and network runtime policies they need. ## What Changed - Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest` and `codex_core::sandboxing::ExecRequest`. - Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper for the Windows and legacy call sites that still need a `SandboxPolicy` projection. - Updates Windows filesystem override setup and unified exec policy serialization to derive that compatibility policy at the boundary. - Updates Unix escalation reruns and direct shell requests to reconstruct exec requests from `PermissionProfile` plus runtime filesystem/network policy, without carrying a cached legacy policy. - Adjusts sandboxing manager tests to assert the effective profile rather than the removed legacy field. ## Verification - `cargo check -p codex-config -p codex-core -p codex-sandboxing -p codex-app-server -p codex-cli -p codex-tui` - `cargo test -p codex-sandboxing manager` - `cargo test -p codex-core exec_server_params_use_env_policy_overlay_contract` - `cargo test -p codex-core unix_escalation` - `cargo test -p codex-core exec::tests` - `cargo test -p codex-core sandboxing::tests`
Michael Bolin ·
2026-04-26 22:11:49 -07:00 -
Michael Bolin ·
2026-04-26 21:49:30 -07:00 -
Michael Bolin ·
2026-04-26 20:59:58 -07:00 -
Add /auto-review-denials retry approval flow (#19058)
## Why Auto-review can deny an action that the user later decides they want to retry. Today there is no TUI surface for selecting a recent denial and sending explicit approval context back into the session, so users have to restate intent manually and the retry can be reviewed without the original denied action context. This adds a narrow TUI-driven path for approving a recent denied action while still keeping the retry inside the normal auto-review flow. ## What Changed - Added `/auto-review-denials` to open a picker of recent denied auto-review actions. - Added a small in-memory TUI store for the 10 most recent denied auto-review events. - Selecting a denial sends the structured denied event back through the existing core/app-server op path. - Core now injects a developer message containing the approved action JSON rather than the full assessment event. - Auto-review transcript collection now preserves this specific approval developer message so follow-up review sessions can see the user approval context. - Added TUI snapshot/unit coverage for the picker and approval dispatch path. - Added core coverage for retaining the approval developer message in the auto-review transcript. ## Verification - `cargo test -p codex-core collect_guardian_transcript_entries_keeps_manual_approval_developer_message` - `cargo test -p codex-tui auto_review_denials` - `cargo test -p codex-tui approving_recent_denial_emits_structured_core_op_once` ## Notes This intentionally keeps retries going through auto-review. The approval signal is context for the exact previously denied action, not a blanket bypass for similar future actions.
Won Park ·
2026-04-27 03:43:53 +00:00 -
permissions: centralize legacy sandbox projection (#19734)
## Why The remaining migration work still needs `SandboxPolicy` at a few compatibility boundaries, but those projections should come from one canonical path. Keeping ad hoc legacy projections scattered through app-server, CLI, and config code makes it easy for behavior to drift as `PermissionProfile` gains fidelity that the legacy enum cannot represent. ## What Changed - Adds `Permissions::legacy_sandbox_policy(cwd)` and `Config::legacy_sandbox_policy()` as the compatibility projection from the canonical `PermissionProfile`. - Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs are checked after they are converted into profile semantics. - Updates app-server command handling, Windows sandbox setup, session configuration, and sandbox summaries to use the centralized projection helper. - Leaves `SandboxPolicy` in place only for boundary inputs/outputs that still speak the legacy abstraction. ## Verification - `cargo check -p codex-config -p codex-core -p codex-sandboxing -p codex-app-server -p codex-cli -p codex-tui` - `cargo test -p codex-tui permissions_selection_history_snapshot_full_access_to_default -- --nocapture` - `cargo test -p codex-tui permissions_selection_sends_approvals_reviewer_in_override_turn_context -- --nocapture` - `bazel test //codex-rs/tui:tui-unit-tests-bin --test_arg=permissions_selection_history_snapshot_full_access_to_default --test_output=errors` - `bazel test //codex-rs/tui:tui-unit-tests-bin --test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context --test_output=errors` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734). * #19737 * #19736 * #19735 * __->__ #19734
Michael Bolin ·
2026-04-26 20:31:23 -07:00 -
inline hostname resolution for remote sandbox config (#19739)
# Why Requirements support host-specific `remote_sandbox_config.hostname_patterns`, but config loading previously resolved and passed the system hostname through every config-loading path even when no requirements layer used `remote_sandbox_config`. On machines where hostname lookup is slow, startup and app-server config reads paid for a feature that was not active. We only need the hostname when a requirements layer actually declares `remote_sandbox_config`, so this moves hostname resolution to the single requirements merge point and keeps all other config callers unaware of hostname matching. # What - Removed the eager `host_name` plumbing from `load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`, app-server `ConfigManager`, network proxy loading, and related call sites. - Resolve the hostname inside `merge_requirements_with_remote_sandbox_config` only when the incoming requirements contain `remote_sandbox_config`.
Abhinav ·
2026-04-27 03:18:57 +00:00 -
Michael Bolin ·
2026-04-26 19:42:39 -07:00 -
Andrey Mishchenko ·
2026-04-26 17:56:05 -07:00 -
permissions: remove core legacy policy round trips (#19394)
## Why Several execution paths still converted profile-backed permissions into `SandboxPolicy` and then rebuilt runtime permissions from that legacy shape. Those round trips are unnecessary after the preceding PRs and can lose split filesystem semantics. Core approval and escalation should carry the resolved profile directly. ## What Changed - Removes `sandbox_policy` from `ResolvedPermissionProfile`; the resolved permission object now carries the canonical `PermissionProfile` directly. - Updates exec-policy fallback, shell/unified-exec interception, escalation reruns, and related tests to pass profiles instead of legacy policies. - Removes legacy additional-permission merge helpers that built an effective `SandboxPolicy` before rebuilding runtime permissions. - Keeps legacy projections only at compatibility boundaries that still require `SandboxPolicy`, not in core permission computation. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394). * #19737 * #19736 * #19735 * #19734 * #19395 * __->__ #19394
Michael Bolin ·
2026-04-26 17:43:32 -07:00 -
Delete unused ResponseItem::Message.end_turn (#19605)
This field is unused. Delete it.
Andrey Mishchenko ·
2026-04-26 17:18:09 -07:00 -
Split MCP connection modules (#19725)
## Why The MCP connection manager module had grown to mix orchestration, RMCP client startup, elicitation handling, Codex Apps cache and naming behavior, tool qualification and filtering, and runtime data. The previous stacked PRs split these responsibilities incrementally; this PR collapses that work into one self-contained refactor on latest main. ## What changed - Move McpConnectionManager into connection_manager.rs. - Move RMCP client lifecycle, startup, and uncached tool listing into rmcp_client.rs. - Move elicitation request tracking and policy handling into elicitation.rs. - Move Codex Apps cache, key, filtering, and naming helpers into codex_apps.rs. - Rename the tool-name helper module to tools.rs and move ToolInfo, tool filtering, schema masking, and qualification there. - Move runtime and sandbox shared types into runtime.rs. - Preserve latest main PermissionProfile-based MCP elicitation auto-approval behavior. ## Verification - just fmt - cargo check -p codex-mcp - cargo check -p codex-mcp --tests - cargo check -p codex-core --------- Co-authored-by: Codex <noreply@openai.com>
Ahmed Ibrahim ·
2026-04-26 23:23:34 +00:00 -
test: increase core-all-test shard count to 16 (#19727)
## Summary Increase `core-all-test`'s Bazel shard count from `8` to `16`. ## Why [#19609](https://github.com/openai/codex/pull/19609) restored `bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s shard count because the bigger timeout risk was not just a cold Windows build. The more common problem was a long `rust_test()` shard failing and getting retried multiple times. Recent `main` runs show that `//codex-rs/core:core-all-test` still has the same shape of problem on Windows: - [Run 24943931330](https://github.com/openai/codex/actions/runs/24943931330) reported `//codex-rs/core:core-all-test` as flaky after first-attempt failures in shard `5/8` and shard `8/8`. - Those retries were driven by `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` and `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`. - The failed shard attempts in that run took `272.61s` and `259.27s` before retrying, which is exactly the sort of wall-clock cost that burns through the 30-minute budget. - [Run 24966332583](https://github.com/openai/codex/actions/runs/24966332583) also retried `//codex-rs/tui:tui-unit-tests` after `app::tests::update_memory_settings_updates_current_thread_memory_mode` failed once on Windows. - [Run 24965527138](https://github.com/openai/codex/actions/runs/24965527138) and its linked [BuildBuddy invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce) show the other half of the problem: when Windows cache reuse is weak, the `bazel test //...` step can already consume `24m11s` on its own, leaving very little headroom for flaky retries. Increasing `core-all-test` to `16` shards does not fix the flaky tests, but it does reduce the wall-clock cost when a single shard has to be retried. That matches the mitigation we already applied to `app-server-all-test` in `#19609`. ## What Changed - Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards instead of `8`. - Leave `core-unit-tests` unchanged. ## Follow-up Work This change is meant to buy back CI headroom while we fix the flaky tests themselves in subsequent commits. The recent Windows retries that look worth addressing directly include: - `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` - `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request` - `app::tests::update_memory_settings_updates_current_thread_memory_mode` ## Verification - Compared `core-all-test`'s current sharding against the `app-server-all-test` precedent in [#19609](https://github.com/openai/codex/pull/19609). - Inspected recent `main` Bazel workflow logs and the linked BuildBuddy invocation to confirm that Windows retries on long shards are still consuming a meaningful fraction of the 30-minute timeout budget. - Did not run local tests for this change because it only adjusts Bazel sharding metadata.
Michael Bolin ·
2026-04-26 23:10:26 +00:00 -
Fix codex-core config test type paths (#19726)
Summary: - Update config tests to reference config requirement types from codex_config after the loader split. Tests: - just fmt - cargo build -p codex-core --tests - cargo clippy -p codex-core --tests -- -D warnings
pakrym-oai ·
2026-04-26 15:58:17 -07:00 -
permissions: migrate approval and sandbox consumers to profiles (#19393)
## Why Runtime decisions should not infer permissions from the lossy legacy sandbox projection once `PermissionProfile` is available. In particular, `Disabled` and `External` need to remain distinct, and managed profiles with split filesystem or deny-read rules should not be collapsed before approval, network, safety, or analytics code makes decisions. ## What Changed - Changes managed network proxy setup and network approval logic to use `PermissionProfile` when deciding whether a managed sandbox is active. - Migrates patch safety, Guardian/user-shell approval paths, Landlock helper setup, analytics sandbox classification, and selected turn/session code to profile-backed permissions. - Validates command-level profile overrides against the constrained `PermissionProfile` rather than a strict `SandboxPolicy` round trip. - Preserves configured deny-read restrictions when command profiles are narrowed. - Adds coverage for profile-backed trust, network proxy/approval behavior, patch safety, analytics classification, and command-profile narrowing. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393). * #19395 * #19394 * __->__ #19393
Michael Bolin ·
2026-04-26 15:30:40 -07:00 -
[codex] Move config loading into codex-config (#19487)
## Why Config loading had become split across crates: `codex-config` owned the config types and merge logic, while `codex-core` still owned the loader that assembled the layer stack. This change consolidates that responsibility in `codex-config`, so the crate that defines config behavior also owns how configs are discovered and loaded. To make that move possible without reintroducing the old dependency cycle, the shell-environment policy types and helpers that `codex-exec-server` needs now live in `codex-protocol` instead of flowing through `codex-config`. This also makes the migrated loader tests more deterministic on machines that already have managed or system Codex config installed by letting tests override the system config and requirements paths instead of reading the host's `/etc/codex`. ## What Changed - moved the config loader implementation from `codex-core` into `codex-config::loader` and deleted the old `core::config_loader` module instead of leaving a compatibility shim - moved shell-environment policy types and helpers into `codex-protocol`, then updated `codex-exec-server` and other downstream crates to import them from their new home - updated downstream callers to use loader/config APIs from `codex-config` - added test-only loader overrides for system config and requirements paths so loader-focused tests do not depend on host-managed config state - cleaned up now-unused dependency entries and platform-specific cfgs that were surfaced by post-push CI ## Testing - `cargo test -p codex-config` - `cargo test -p codex-core config_loader_tests::` - `cargo test -p codex-protocol -p codex-exec-server -p codex-cloud-requirements -p codex-rmcp-client --lib` - `cargo test --lib -p codex-app-server-client -p codex-exec` - `cargo test --no-run --lib -p codex-app-server` - `cargo test -p codex-linux-sandbox --lib` - `cargo shear` - `just bazel-lock-check` ## Notes - I did not chase unrelated full-suite failures outside the migrated loader surface. - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive failures on this machine, and Windows CI still shows unrelated long-running/timeouting test noise outside the loader migration itself.
pakrym-oai ·
2026-04-26 15:10:53 -07:00