mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
3cf6f08da562ee1bf6866fbc2db45a3c3af620ff
7054 Commits
-
session: keep startup prewarm aligned with resolved multi-agent runtime (#25841)
## Why Follow-up to #25722. Startup prewarm builds a preview `TurnContext` before the first real turn so it can precompute the initial prompt and tool surface. After the per-thread runtime work landed, that preview path still recomputed multi-agent mode from `model_info` and feature defaults instead of reusing the runtime the session had already resolved from persisted metadata or inheritance. That could leave the prewarmed session primed for a different multi-agent mode than the first real turn, which is especially risky because collaboration tool exposure depends on `turn_context.multi_agent_version`. ## What changed - In the `TurnMultiAgentRuntime::Preview` path, prefer `Session::multi_agent_version()` when it is already known. - Only fall back to `model_info.multi_agent_version` and feature defaults when the session has not resolved a runtime yet. - Keep preview mode read-only: this still avoids storing a runtime during startup prewarm. ## Testing - Not run (small runtime-selection follow-up)
jif-oai ·
2026-06-02 14:35:26 +02:00 -
Resolve per-thread multi-agent runtime (#25722)
Stack split from #25708. Original PR intentionally left open. This third PR resolves the effective per-thread multi-agent runtime from persisted metadata, inherited runtime, and current model selection.
jif-oai ·
2026-06-02 14:31:00 +02:00 -
Persist multi-agent runtime metadata (#25721)
Stack split from #25708. Original PR intentionally left open. This second PR persists multi-agent runtime metadata through thread creation, rollout recording, and thread storage.
jif-oai ·
2026-06-02 13:05:20 +02:00 -
Add multi-agent runtime metadata types (#25720)
Stack split from #25708. Original PR intentionally left open. This first PR adds the multi-agent runtime metadata types and catalog plumbing used by the rest of the stack.
jif-oai ·
2026-06-02 12:10:14 +02:00 -
feat: reuse compressed rollout search snippets (#25814)
## Summary - teach rollout search to return precomputed snippets for compressed rollouts - reuse those snippets in local thread search instead of reopening matching compressed files - keep the no-`rg` fallback single-pass and add regression coverage for the compressed path ## Why `thread/search` currently decodes matching compressed rollouts twice: once to discover the matching path and again to extract the snippet shown in results. That defeats a meaningful part of the compressed-read optimization work. ## Impact Compressed rollout hits now pay one decode pass on the search path while plain `.jsonl` hits keep the existing ripgrep-driven flow. ## Validation - `just test -p codex-rollout` - `just test -p codex-thread-store` - `just fix -p codex-rollout` - `just fix -p codex-thread-store` - `just fmt`
jif-oai ·
2026-06-02 11:32:36 +02:00 -
[codex] Validate plugin skill base names (#25782)
## Summary - Validate skill base name length before plugin namespacing. - Bound the composed `plugin:skill` qualified name to 128 characters. - Keep plugin skill runtime names in the existing `plugin:skill` form. - Add regression tests for the max qualified-name boundary and rejection path. ## Root Cause Plugin skills are represented as `plugin_name:skill_name`, but the loader previously applied the 64-character skill name limit after adding the plugin namespace. Moving that check to the base name fixes valid plugin skills with longer namespaces, and the separate 128-character qualified-name limit keeps model-visible skill names bounded. ## Validation - `just fmt` - `just test -p codex-core-skills plugin_skill_name_length_limit` - `git diff --check`
xl-openai ·
2026-06-02 06:33:02 +00:00 -
[codex] Move plugin discoverable logic into core-plugins (#25783)
## Summary - Move plugin discoverable recommendation filtering from `codex-core` into `codex-core-plugins` behind `ToolSuggestPluginDiscoveryInput`. - Keep `codex-core` as a thin adapter from `Config` to the core-plugins API and back to `DiscoverablePluginInfo`. - Keep the existing discoverable allowlist private to the core-plugins implementation. ## Validation - `just fmt` - `just test -p codex-core list_tool_suggest_discoverable_plugins` - `git diff --check` - Read-only subagent review: no findings
xl-openai ·
2026-06-01 23:25:37 -07:00 -
[codex] Cache remote plugin catalog for suggestions (#25457)
## Summary - cache the global remote plugin catalog when remote plugin listing runs and warm it during startup - use the cached remote catalog in plugin install recommendations with canonical `plugin@openai-curated-remote` ids - reuse the session `PluginsManager` for plugin recommendations so remote cache state is visible on the recommend path - skip core installed-state verification for remote plugin install suggestions while leaving local plugin and connector verification unchanged ## Testing - `just fmt` - `git diff --check` - `cargo test -p codex-core list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins` - `cargo test -p codex-core remote_plugin_install_suggestions_skip_core_installed_verification` - `cargo test -p codex-app-server plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled` Earlier focused checks during the same branch: codex-tools TUI filter test, request_plugin_install tests, and codex-app-server build.
xl-openai ·
2026-06-01 22:10:52 -07:00 -
[codex] Add plugin list JSON output (#25330)
## Summary - add `--json` output to `codex plugin list` with `installed` and `available` arrays - add `--available` for JSON output only; using it without `--json` is rejected - keep the existing non-JSON table output unchanged - add CLI coverage for JSON installed/available output and the `--available`/`--json` requirement ## Validation - `just test -p codex-cli plugin_list` - `just fix -p codex-cli` - `git diff --check` Note: `just fmt` ran Rust formatting first, then failed in the Python ruff step because `openai-codex-cli-bin==0.132.0` has no wheel for this Linux platform.
xl-openai ·
2026-06-01 21:27:06 -07:00 -
feat: show enterprise monthly credit limits in status (#24812)
## Summary Enterprise users can have an effective monthly credit limit, but Codex `/status` currently drops that metadata from the account-usage response. This change adds the optional `spend_control.individual_limit` projection to the existing rate-limit snapshot flow. The backend client reads the monthly limit, app-server exposes it as `individualLimit`, and the TUI renders a `Monthly credit limit` row through the existing progress-bar renderer. When the backend does not return an effective monthly limit, existing rate-limit behavior is unchanged. ## Existing backend state The account-usage backend already returns the effective monthly limit and current usage together: ```json { "spend_control": { "reached": false, "individual_limit": { "limit": "25000", "used": "8000", "remaining": "17000", "used_percent": 32, "remaining_percent": 68, "reset_after_seconds": 86400, "reset_at": 1778137680 } } } ``` Before this change, Codex projected rolling `primary` and `secondary` windows plus `credits`. It ignored `spend_control.individual_limit`, so app-server clients and `/status` could not render the monthly cap. The updated flow is: ```text account usage backend -> backend-client reads spend_control.individual_limit -> existing rate-limit snapshot carries optional individual_limit -> app-server exposes optional individualLimit -> TUI renders Monthly credit limit ``` ## App-server contract `account/rateLimits/read` and sparse `account/rateLimits/updated` notifications now include an additive nullable `rateLimits.individualLimit` field: ```json { "individualLimit": { "limit": "25000", "used": "8000", "remainingPercent": 68, "resetsAt": 1778137680 } } ``` In an `account/rateLimits/read` response, `null` means no monthly limit is available. `account/rateLimits/updated` remains a sparse rolling notification: clients merge available values into their most recent `account/rateLimits/read` snapshot or refetch. Nullable account metadata in a rolling notification does not clear a previously observed value. ## Design decisions - Extend the existing rate-limit snapshot instead of introducing a separate request or wire-level update protocol. - Keep the Codex projection narrow: `/status` needs the effective limit, current usage, remaining percentage, and reset timestamp. - Render the monthly row through the existing progress-bar renderer, with one optional detail line for `8,000 of 25,000 credits used`. - Keep the backend response optional so existing accounts and older usage states preserve their current behavior. - Preserve cached monthly metadata when sparse rolling notifications omit it. Live account-usage reads remain authoritative and can clear a removed limit. ## Visual evidence ```text Monthly credit limit: [██████████████░░░░░░] 68% left (resets 07:08 on 7 May) 8,000 of 25,000 credits used ``` Snapshot: `codex-rs/tui/src/status/snapshots/codex_tui__status__tests__status_snapshot_includes_enterprise_monthly_credit_limit.snap` ## Testing Tests: generated app-server schema verification, protocol tests, backend-client tests, app-server integration coverage, TUI snapshot coverage, formatting, and workspace lint cleanup.efrazer-oai ·
2026-06-01 21:25:42 -07:00 -
Move code review rules into AGENTS (#25738)
## Why Codex Review now supports repository-specific review rules in AGENTS.md. Adding the review prompts there makes the guidance available as repository review rules next to the code it governs while keeping the existing local review skills intact. ## What changed - Added a `## Code Review Rules` section to `AGENTS.md` with the existing review prompts for model context, breaking changes, test authoring, and change size. - Preserved the existing `.codex/skills/code-review*` skill files. ## Verification - `git diff --check origin/main...HEAD`
pakrym-oai ·
2026-06-02 01:41:04 +00:00 -
[codex] Add comprehensive root formatting check (#25683)
## Why The root formatting entrypoints could drift: `just fmt` did not format the Justfile itself, and the CI-facing check recipe only checked Python scripts instead of matching everything formatted by `just fmt`. ## What changed - Add a shared cross-platform Python formatter driver used by both `just fmt` and `just fmt-check`. - Run Justfile, Rust, Python SDK, and internal-script formatter groups concurrently while buffering each formatter group's output until it finishes. - Log formatter starts immediately, then print each formatter group's labeled output when it completes. - Keep the SDK lint-fix and Ruff formatting passes ordered, with source comments explaining their distinct roles and the check-mode equivalents. - Run Ruff through shared `uv run --no-sync --with ruff` overlays so formatting works on clean glibc Linux checkouts without installing the platform-specific SDK runtime wheel. - Show `fmt-check` help text in `just -l` and simplify CI to call the shared driver through `just fmt-check`. - Pin the general CI workflow to `just@1.51.0` so its formatter agrees with the checked-in Justfile. - Add regression coverage for the thin Just recipes and the driver's formatter graph. ## Validation - `just fmt` - `just fmt-check` - `python3 -m pytest sdk/python/tests/test_artifact_workflow_and_binaries.py -k 'root_fmt or root_format' -q` - `pnpm run format` - `git diff --check` - `just -l | rg -n '^ fmt|fmt-check'` - `uvx --from uv==0.7.22 uv run --frozen --project sdk/python --no-sync --with ruff ruff check --diff sdk/python`
Adam Perry @ OpenAI ·
2026-06-02 01:20:25 +00:00 -
feat(remote-control): add pairing start (#25675)
## Why Remote control enrollment authorizes a desktop server, but app-server v2 did not expose the follow-up pairing operation needed to mint a short-lived controller pairing artifact from that enrolled server. Clients need a narrow RPC that starts pairing without exposing the backend `serverId` or conflating pairing with websocket connection state. Issue: N/A; internal remote-control pairing API change. ## What Changed Added experimental app-server v2 `remoteControl/pairing/start` with `manualCode` input and `pairingCode`, nullable `manualPairingCode`, `environmentId`, and Unix-seconds `expiresAt` output. The method serializes under its own `global("remote-control-pairing")` scope and is documented in `app-server/README.md`. Extended the remote-control transport with private `/server/pair` request/response types and normalized `pair_url` handling. Pairing uses the current enrolled server bearer, refreshes that bearer when needed, keeps backend `server_id` private, validates returned `server_id` and `environment_id` against the current enrollment, and preserves backend status/header/body context for failures and malformed responses. Wired the request through `RemoteControlRequestProcessor` and `MessageProcessor`, mapping unavailable/disabled pairing to `invalid_request` and backend failures to internal errors. ## Verification - `just test -p codex-app-server-transport` - `just test -p codex-app-server remote_control_pairing_start_returns_pairing_artifacts`Anton Panasenko ·
2026-06-02 01:05:50 +00:00 -
Handle invalid plugin skills manifest field (#25717)
## Summary - Treat invalid `plugin.json` `skills` shapes as a field-level warning instead of rejecting the whole manifest - Keep valid string path behavior unchanged and continue falling back to the default `skills/` root - Add regression coverage for array-shaped `skills` ## Tests - `just fmt` - `cargo test -p codex-core-plugins`
xli-oai ·
2026-06-01 17:19:34 -07:00 -
Move cloud requirements crate to cloud config (#24621)
## Summary - Moves the existing `codex-cloud-requirements` crate to `codex-cloud-config`. - Updates workspace dependencies and imports to the new crate name. - Intentionally keeps runtime behavior unchanged: this still fetches the legacy cloud requirements endpoint. ## Details This PR exists to make the lineage obvious before the bundle migration. GitHub should show the old `codex-rs/cloud-requirements/src/lib.rs` implementation as moved to `codex-rs/cloud-config/src/lib.rs`, rather than as unrelated new code. The follow-up PR adapts this moved crate to the new config bundle API and switches runtime consumers over.
joeflorencio-openai ·
2026-06-01 16:43:52 -07:00 -
app-server: remove experimental persist_extended_history bool flag (#25712)
## Summary Remove the dead experimental `persistExtendedHistory` app-server flag and collapse rollout persistence to the single policy app-server already used. ## What Changed - Removed `persistExtendedHistory` from v2 thread start/resume/fork params and deleted its deprecation notice path. - Removed the persistence-mode enums and plumbing through core, rollout, and thread-store. - Made rollout filtering mode-free, keeping the existing limited persisted-history behavior. ## Test Plan - `just write-app-server-schema` - `cargo nextest run --no-fail-fast -p codex-app-server-protocol schema_fixtures` - `cargo nextest run --no-fail-fast -p codex-app-server thread_shell_command_history_responses_exclude_persisted_command_executions` - `cargo nextest run --no-fail-fast -p codex-rollout -p codex-thread-store` - final `rg` for removed flag/type names
Owen Lin ·
2026-06-01 23:33:42 +00:00 -
Wire managed MITM CA trust into child env (#22668)
## Stack 1. Parent PR: #18240 uses named MITM permissions config. 2. This PR wires managed MITM CA trust into spawned child processes. ## Why When Codex terminates HTTPS for limited mode or MITM hooks, child HTTPS clients need to trust Codex's managed MITM CA. Exporting proxy URLs alone is not enough, but blindly replacing user CA settings would be wrong: it can break custom enterprise/test roots, leak unreadable CA files into generated bundles, or make the child env disagree with its sandbox policy. ## Summary 1. Build immutable managed CA bundles under `$CODEX_HOME/proxy` that include native roots, the managed MITM CA, and only inherited or command-scoped CA bundles the child is allowed to read. 2. Export curated CA env vars alongside managed proxy env vars while preserving user CA override semantics, including nested Codex `SSL_CERT_FILE` precedence. 3. Thread generated CA bundle paths into child sandbox readable roots, including debug sandbox execution, so the exported env vars work inside sandboxed commands. 4. Remove only Codex-generated MITM CA bundle env when a child intentionally drops managed proxying for escalation or no-proxy retry. 5. Document the managed CA bundle behavior and cover env injection, per-child bundle generation, sandbox readable roots, and no-proxy cleanup in tests. ## Validation 1. Ran `just test -p codex-network-proxy`. 2. Ran `just test -p codex-protocol`. 3. Ran `just fix -p codex-network-proxy -p codex-protocol`. 4. Tried focused `codex-core` validation, but the crate currently fails to compile in `core/tests/suite/guardian_review.rs` because an existing `Op::UserInput` initializer is missing `additional_context`. --------- Co-authored-by: Eva Wong <evawong@openai.com>
Winston Howes ·
2026-06-01 23:23:59 +00:00 -
Reject directory rollout paths for pathless side chats (#25661)
## Why Fixes openai/codex#20944. Desktop side chats are intentionally ephemeral and pathless. They can still accept live turns while loaded, but after a reload there is no persisted rollout to resume. In the reported failure mode, Desktop could send `$CODEX_HOME` as the resume/fork path for one of these pathless side chats. `thread/resume` and `thread/fork` prefer an explicit `path` over `threadId`, and rollout path lookup only checked that a candidate existed. That let `$CODEX_HOME` pass as a rollout path, so the later rollout reader tried to open a directory and surfaced the low-level `Is a directory` error. ## What Changed - Reject explicit rollout paths that resolve to a directory or other non-file before attempting to read rollout history. - Make `codex_rollout::existing_rollout_path` return only plain or compressed rollout candidates that are actual files. - Add an app-server regression test that creates an ephemeral fork, runs a turn while the side thread is loaded, simulates reload, then verifies both `thread/resume` and `thread/fork` reject `$CODEX_HOME` with `path is a directory` instead of the OS-level directory-read error. - Rebase over the `TestAppServer` rename and update the remaining stale test harness call sites to use `TestAppServer` with `app_server` local variables. Relevant code: - `thread-store/src/local/read_thread.rs` validates explicit rollout paths before rollout reading: https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/thread-store/src/local/read_thread.rs#L146-L165 - `rollout/src/compression.rs` now requires file metadata for plain and compressed rollout candidates: https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/rollout/src/compression.rs#L940-L950 - The repro test covers the pathless ephemeral side-chat reload case: https://github.com/openai/codex/blob/25b47c8f425d351aaba4baa955a8092064a1707b/codex-rs/app-server/tests/suite/v2/thread_fork.rs#L774-L886 ## Verification - `just test -p codex-app-server pathless_ephemeral_thread_rejects_codex_home_path_after_reload`
Michael Bolin ·
2026-06-01 16:02:06 -07:00 -
[codex] Publish release symbol artifacts (#25649)
## Why Production Codex binaries are stripped for distribution, which leaves crashes and samples from released builds without the symbols needed for useful stack traces. Publish symbols as separate release assets so production artifacts stay small while released builds remain symbolicateable. ## What changed - Add `.github/scripts/archive-release-symbols-and-strip-binaries.sh` to package platform-native symbols into `codex-symbols-<artifact>.tar.gz` assets while stripping the corresponding Unix binaries before signing. - Build release binaries with full debug information before producing distribution artifacts. - Publish macOS `.dSYM` bundles, Linux `.debug` files with `.gnu_debuglink`, and Windows `.pdb` files. - Strip Linux `bwrap` before computing its packaged-resource digest, but intentionally omit `bwrap` from symbol archives. - Preserve symbols artifacts in the unsigned macOS promotion flow. ## Verification - Ran `shellcheck` and `bash -n` on `.github/scripts/archive-release-symbols-and-strip-binaries.sh`. - Parsed the modified workflow YAML files and ran `git diff --check`. - Built a macOS release smoke binary and verified that the archived `.dSYM` contains DWARF application source information and has the same UUID as the stripped production binary. - Built Linux smoke binaries and verified that the symbol archive contains `codex.debug`, excludes `bwrap.debug`, leaves the expected `.gnu_debuglink` in `codex`, and does not mutate the separately stripped `bwrap` digest. - Staged a Windows smoke archive and verified that it contains the expected `.pdb` file.
Jeremy Rose ·
2026-06-01 15:49:54 -07:00 -
fix(tui): clarify footer shortcut overlay hints (#25625)
## Why The TUI shortcut overlay used static labels for `Tab` and `Ctrl+C`, even though both keys change behavior while a task is running. That made the visible help misleading: idle `Tab` submits rather than queues, and active-turn `Ctrl+C` interrupts rather than exits. Closes #25531. Closes #25564. ## What Changed - Pass task-running state into the shortcut overlay renderer. - Render `Tab` as `submit message` while idle and `queue message` while work is running. - Render `Ctrl+C` as `exit` while idle and `interrupt` while work is running. - Add snapshot coverage for the active-work shortcut overlay and update idle overlay snapshots. ## How to Test 1. Start Codex and open the shortcut overlay with `?` while no task is running. 2. Confirm the overlay shows `tab to submit message` and `ctrl + c to exit`. 3. Start a task, then open or keep the shortcut overlay visible while work is running. 4. Confirm the overlay shows `tab to queue message` and `ctrl + c to interrupt`. 5. Type a follow-up prompt during active work and press `Tab`; confirm it queues rather than submitting immediately. Targeted tests: - `just test -p codex-tui footer_snapshots` - `just test -p codex-tui footer_mode_snapshots` ## Validation Notes `just test -p codex-tui` currently has two unrelated guardian feature-flag test failures on this base: - `app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history` - `app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default` `just argument-comment-lint codex-rs/tui/src/bottom_pane/footer.rs` could not run locally because the prebuilt wrapper requires `dotslash`; the touched Rust diff was manually inspected for opaque positional literals.
Felipe Coury ·
2026-06-01 19:41:22 -03:00 -
Move tool search metadata onto ToolExecutor (#25684)
Deferred tools need to be searchable even when they are not implemented inside `codex-core`. Extension-provided tools can be registered for later discovery, but the search metadata path was still owned by core-specific runtime hooks, which meant the shared `ToolExecutor` abstraction could not describe how a deferred extension tool should appear in `tool_search`. ## Changes - Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and re-export them from the shared tools crate. - Add a default `ToolExecutor::search_info` implementation that derives loadable tool-search metadata from function and namespace specs. - Forward search metadata through extension adapters and exposure overrides while keeping custom search text/source metadata for dynamic, MCP, and multi-agent tools. - Remove the old core-local `tool_search_entry` module now that search metadata lives with the shared executor APIs. ## Testing - Added `deferred_extension_tools_are_discoverable_with_tool_search` coverage in `core/src/tools/spec_plan_tests.rs`.
jif-oai ·
2026-06-02 00:24:41 +02:00 -
Fix stale TestAppServer rename in plugin_list test (#25705)
## Why #25701 renamed the app-server test harness to `TestAppServer`, but it raced with #25681, which added a new `plugin_list` test call site still using the old `McpProcess` name. Once both changes met on `main`, app-server test builds failed before running the suite because `McpProcess` no longer exists in that scope. This PR fixes that CI break by updating the remaining stale call site to the renamed helper. ## What Changed - Replaced the `McpProcess::new(...)` use in `codex-rs/app-server/tests/suite/v2/plugin_list.rs` with `TestAppServer::new(...)`. - Renamed the local variable from `mcp` to `app_server` at the same call site to match the helper rename. Relevant code: https://github.com/openai/codex/blob/aadd9c999b4e0789f7afb2b9b8cc43000bb47e86/codex-rs/app-server/tests/suite/v2/plugin_list.rs#L234-L246 ## Verification Not run locally; this is a compile fix for the app-server test harness rename.
Michael Bolin ·
2026-06-01 15:14:03 -07:00 -
[codex] enable parallel standalone web search calls (#25702)
## Summary - opt the extension-backed standalone `web.run` tool into parallel tool execution - update the existing extension registration test to assert that the tool advertises parallel-call support ## Why The standalone web-search API endpoint now supports parallel requests. The extension executor still inherited the shared serial default, causing multiple `web.run` calls to acquire the exclusive runtime lock. ## Impact Models that emit multiple standalone web-search calls can now execute them concurrently when model-level parallel tool calls are enabled. ## Validation - `just fmt` - `just test -p codex-web-search-extension` - `git diff --check origin/main...HEAD`
sayan-oai ·
2026-06-01 15:04:11 -07:00 -
fix: rename McpServer to TestAppServer (#25701)
This PR brought to you via VS Code rather than Codex... - opened `codex-rs/app-server/tests/common/mcp_process.rs` - put the cursor on `McpServer` - hit `F2` and renamed the symbol to `TestAppServer` - went to the file tree - hit enter and renamed `mcp_process.rs` to `test_app_server.rs` - ran **Save All Files** from the Command Palette - ran `just fmt` The End (Admittedly, most of the local variables for `TestAppServer` are still named `mcp`, though.)
Michael Bolin ·
2026-06-01 21:49:38 +00:00 -
fix: Deduplicate installed local and remote curated plugins (#25681)
## Summary - Deduplicate installed `openai-curated` and `openai-curated-remote` plugin conflicts by feature flag. - Prefer remote when remote plugins are enabled; otherwise prefer local, while preserving one-sided installs. ## Testing - `just fmt` - `git diff --check` - Targeted `just test` was blocked locally because `cargo-nextest` is not installed.
xl-openai ·
2026-06-01 14:27:18 -07:00 -
Add Python version compatibility guidance (#25690)
## Why Python contributions in this repository should target the declared Python 3 runtime instead of carrying Python 2 compatibility patterns forward. When compatibility across Python 3 point releases matters, contributors need a consistent source of truth for the minimum supported version. ## What changed - Added Python development guidance to `AGENTS.md` stating that the repository uses Python 3+ and should not use the `__future__` module. - Documented that contributors should check the nearest `pyproject.toml` `requires-python` field when evaluating Python 3 point-release compatibility. ## Testing Not run (guidance-only change).
Adam Perry @ OpenAI ·
2026-06-01 14:05:54 -07:00 -
[codex] Generalize deferred nested tool guidance (#25689)
## Summary - describe omitted code-mode tools as deferred nested tools instead of MCP/app tools - update the prompt-description assertion to match ## Why Deferred dynamic tools are also callable through `tools` and discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording was too narrow. ## Validation - `just fmt` - `just test -p codex-code-mode` - `git diff --check`
sayan-oai ·
2026-06-01 21:01:30 +00:00 -
Add rollout compression histograms (#25680)
## Summary Stacked on #25679. Add histogram telemetry for rollout compression runtime, per-file compression time, byte sizes, and compression ratio. ## Changes - Emit `codex.rollout_compression.run.duration_ms` tagged by final run status. - Emit `codex.rollout_compression.file.duration_ms` tagged by file outcome. - Emit source and compressed byte histograms for compression candidates/results. - Emit `codex.rollout_compression.file.compression_ratio` for successful compressions, recorded as integer basis points. ## Validation - `just fmt` - `just test -p codex-rollout` - `just fix -p codex-rollout`
jif-oai ·
2026-06-01 22:54:25 +02:00 -
[codex] document out-of-line test module convention (#25682)
## Why New unit test modules should follow one consistent layout so implementation files stay focused and test suites remain easy to locate, without creating cleanup churn in existing inline test modules. ## What changed - Added `AGENTS.md` guidance requiring new test modules to use separate sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]` attribute. - Clarified that existing inline `#[cfg(test)] mod tests { ... }` modules should not be moved solely to follow the new convention. ## Validation - Ran `git diff --check`.Adam Perry @ OpenAI ·
2026-06-01 13:36:16 -07:00 -
Add rollout compression counters (#25679)
## Summary Add counter telemetry for the local rollout compression worker so we can see when it runs, why it skips, and how individual file/materialization paths resolve. ## Changes - Emit `codex.rollout_compression.run` with statuses for start, completion, failure, duplicate-run skip, and missing runtime skip. - Emit `codex.rollout_compression.file` outcomes for scanned, compressed, skipped, and failed compression candidates. - Emit `codex.rollout_compression.temp_cleanup` and `codex.rollout_compression.materialize` counters for cleanup and decompression paths. ## Validation - `just fmt` - `just test -p codex-rollout` - `just fix -p codex-rollout`
jif-oai ·
2026-06-01 22:26:32 +02:00 -
refactor: hide shell override for zsh fork unified exec (#24980)
## Why When unified exec is configured to launch through the zsh fork, local commands should not let the model override the shell binary with the `shell` parameter. The configured zsh fork is the mechanism that makes `execv(2)` interception reliable, so exposing `shell` for local zsh-fork execution would create a confusing API surface and undermine the composition. Remote environments are different: zsh-fork interception is local-only, so remote unified-exec calls must keep direct unified-exec behavior and still expose `shell` when a remote environment can be selected. ## What Changed - Taught the `exec_command` schema builder to omit the `shell` parameter when requested. - Hid `shell` from the unified-exec tool schema only when zsh-fork unified exec applies to all selectable environments. - Kept `shell` visible when any remote environment can be targeted, because those calls run through direct unified exec. - Made unified exec choose the effective shell mode per selected environment: local environments keep zsh-fork mode, remote environments use direct mode. - Left direct unified-exec behavior unchanged, including support for model-specified shells there. ## Verification - Added schema coverage showing `exec_command` can hide `shell`. - Added planner coverage showing zsh-fork unified exec hides `shell` for local-only execution while direct unified exec still exposes it. - Added planner coverage showing `shell` remains visible when a remote environment is available. - Added handler coverage showing remote environments use direct unified-exec shell mode instead of zsh-fork mode. - Ran the focused `codex-core` shell-parameter and zsh-fork tests. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24980). * #24982 * #24981 * __->__ #24980
Michael Bolin ·
2026-06-01 20:22:28 +00:00 -
feat: gate unified exec zsh fork composition (#24979)
## Why `shell_zsh_fork` and unified exec need to remain independently controllable for enterprise rollouts, but we also need a third mode that composes them. That composed mode is intended to preserve unified exec command lifecycle support while letting the zsh fork provide more accurate `execv(2)` interception. Enabling `unified_exec_zsh_fork` by itself is intentionally not sufficient. It is a composition gate, not a dependency-enabling shortcut: - `unified_exec` selects the PTY-backed unified exec tool. - `shell_zsh_fork` opts into the zsh fork backend. - `unified_exec_zsh_fork` only allows those two already-enabled modes to be composed so local zsh unified exec commands can launch through the zsh fork. This separation is deliberate. Enterprises and staged rollouts must be able to enable or disable unified exec and zsh-fork independently. If `unified_exec_zsh_fork` implied either dependency, then enabling one under-development composition flag would silently activate a shell backend that the configured feature set left disabled. This PR introduces only the configuration and planning gate for that composition. Existing `shell_zsh_fork` behavior continues to use the standalone shell tool unless the new composition feature is explicitly enabled alongside both dependencies. ## What Changed - Added the under-development feature flag `unified_exec_zsh_fork`. - Added `UnifiedExecFeatureMode` so the three input feature flags collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool planning. - Updated tool selection so zsh-fork composition requires `unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`. - Kept the existing standalone zsh-fork shell tool behavior when only `shell_zsh_fork` is enabled. - Updated config schema output for the new feature flag. ## Verification - Added feature and tool-config coverage for the new gate. - Added planner coverage proving `shell_zsh_fork` remains standalone until composition is explicitly enabled. - Ran focused tests for `codex-features`, `codex-tools`, and the affected `codex-core` planner case. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979). * #24982 * #24981 * #24980 * __->__ #24979
Michael Bolin ·
2026-06-01 13:01:36 -07:00 -
fix: deflake zsh-fork approval test (#25669)
Fixes this flake: https://github.com/openai/codex/actions/runs/26773809591/job/78919970410?pr=25659 This test is about zsh-fork subcommand approval behavior, not workspace sandboxing, so it now runs with `DangerFullAccess` to avoid macOS sandbox setup failures before the second subcommand approval.
jif-oai ·
2026-06-01 21:55:44 +02:00 -
exec-server: canonicalize bound filesystem paths (#25149)
## Summary - add executor filesystem canonicalization as a bound-path operation - route remote canonicalization through the exec-server filesystem RPC surface - keep path normalization attached to the filesystem that owns the path ## Stack - 2/5 in the skills path authority stack extracted from https://github.com/openai/codex/pull/25098 - follows merged https://github.com/openai/codex/pull/25121 ## Validation - `cd /Users/starr/code/codex-worktrees/pr-25098-restack-review-pr1b/codex-rs && just fmt` - Not run: tests/checks (not requested) - GitHub CI pending on rewritten head
starr-openai ·
2026-06-01 11:53:31 -07:00 -
[codex-rs] auto-review model override (#23767)
## Why Guardian auto-review normally uses the provider-preferred review model when one is available. Some parent models need model-catalog metadata to select a different review model while keeping older `/models` payloads compatible when that metadata is absent. ## What changed - Added optional `ModelInfo::auto_review_model_override` metadata to the public model payload as a review-model slug. - Updated Guardian review model selection to prefer the catalog override when present, while preserving the existing provider preferred-model path and parent-model fallback when it is omitted. - Added focused Guardian coverage for override and no-override model selection. - Added an `auto_review` core integration suite test that loads override metadata from a remote model catalog path and asserts the strict auto-review `/responses` request uses the catalog-selected review model. - Updated existing `ModelInfo` fixtures and local catalog constructors for the new optional field. ## Validation - `cargo test -p codex-protocol model_info_defaults_availability_nux_to_none_when_omitted` - `cargo test -p codex-core guardian_review_uses_` - `cargo test -p codex-core remote_model_override_uses_catalog_model_for_strict_auto_review --test all` - `just fix -p codex-protocol` - `just fix -p codex-core` - `just fmt` - `git diff --check`
Won Park ·
2026-06-01 11:51:15 -07:00 -
Check root Python script formatting in CI (#25165)
## Why Python files under `scripts/` were not covered by the repository formatting recipe or the CI formatting job, so formatting drift could merge unnoticed. ## What - Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so root-script formatting uses a locked Ruff version. - Extend `just fmt` to format root Python scripts and add `fmt-scripts-check` for CI. - Run `just fmt-scripts-check` from `.github/workflows/ci.yml`, installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining the `uv` `0.11.3` pin. - Apply Ruff formatting to the root Python scripts, including `scripts/just-shell.py`, and extend `sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the root formatting recipe. - Update `AGENTS.md` so agents run `just fmt` after code changes anywhere in the repository. ## Validation - Extended the existing Python SDK workflow test to assert that `just fmt` includes root Python scripts.
Adam Perry @ OpenAI ·
2026-06-01 18:50:23 +00:00 -
Throttle repeated rollout compression runs (#25659)
## Why [#25089](https://github.com/openai/codex/pull/25089) introduced the background worker that compresses cold archived rollouts, and [#25654](https://github.com/openai/codex/pull/25654) made that pass faster once it starts. But the worker still deleted `rollout-compression.lock` on successful exit, so the existing six-hour staleness window only helped with overlapping or crashed workers. Each new local thread-store initialization could immediately rescan archived rollouts even if a full pass had just finished. This change keeps the existing marker around long enough to throttle redundant reruns. The worker is still best-effort, but it no longer does repeated startup scans when nothing new is eligible for compression. ## What Changed - Replace the drop-scoped `CompressionLock` with a `CompressionRunMarker` that claims the existing `.tmp/rollout-compression.lock` path and leaves it in place after success. - Reuse the existing six-hour staleness window to block both overlapping starts and immediate reruns, while still letting a stale marker be reclaimed. - Update the worker docs and debug logging to describe the new "already running or recently ran" behavior. - Extend the rollout compression tests to assert that a successful run leaves the marker behind and that a fresh marker suppresses a new run. ## Validation - `just test -p codex-rollout`
jif-oai ·
2026-06-01 20:46:54 +02:00 -
[codex] Consolidate shared prompts in codex-prompts (#25151)
## Why `codex_core` is consistently a bottleneck for incremental builds during iteration. The simplest fix is to make the crate smaller. ## Summary `codex-core` owns several reusable prompt renderers and static prompt assets, which makes the crate harder to split apart. Rename `codex-review-prompts` to `codex-prompts` and move shared review, goal, permissions, compaction, realtime, hierarchical AGENTS.md, and `apply_patch` prompts into it. Move prompt-only tests and update consumers and `CODEOWNERS`. ## Validation - `just test -p codex-prompts -p codex-apply-patch` - `just test -p codex-core prompt_caching` - Bazel builds for the affected crates
Adam Perry @ OpenAI ·
2026-06-01 18:45:07 +00:00 -
[codex] Make justfile recipes Windows-aware (#24983)
## Summary Make the root `justfile` usable from Windows without maintaining a separate Windows copy of most recipes. The repo recipes previously assumed POSIX shell behavior for things like variadic argument forwarding (`"$@"`) and stderr redirection (`2>/dev/null`). That made common workflows such as `just fmt`, `just test`, and `just log` unreliable from Windows. This PR introduces a small cross-platform shell adapter so recipes can stay mostly unified while still expanding the few shell-specific constructs correctly on macOS/Linux and Windows. ## What Changed - Add `scripts/just-shell.py` as the configured `just` shell adapter. - On Unix it invokes `sh -cu`. - On Windows it invokes `pwsh -CommandWithArgs` so arguments containing spaces are preserved. - Add portable recipe placeholders: - `{args}` expands to `"$@"` on Unix and the equivalent PowerShell forwarded-args expression on Windows. - `{stderr-null}` expands to the platform-specific stderr suppression used by `fmt`. - Convert most variadic one-line recipes to the unified `{args}` form, including `codex`, `exec`, `file-search`, `app-server-test-client`, `fix`, `clippy`, `bench`, `mcp-server-run`, `write-app-server-schema`, and `argument-comment-lint-from-source`. - Keep genuinely shell-specific recipes split or Unix-only for now, including recipes backed by `.sh` scripts or recipes whose bodies are more than simple command forwarding. - Add a Windows `just install` path that installs PowerShell via `winget` when `pwsh` is not available, then runs the same basic Rust setup steps. - Update the SDK test that validates the root `fmt` recipe so it recognizes the new portable stderr placeholder. ## Validation - `just --summary` - `just --dry-run fmt` - `just --dry-run bench-smoke` - `just --dry-run codex foo "bar binky" baz` - `just --dry-run write-hooks-schema` - `just --dry-run bazel-lock-update` - `just --dry-run argument-comment-lint-from-source -- "foo bar"` - `git diff --check -- justfile scripts/just-shell.py sdk/python/tests/test_artifact_workflow_and_binaries.py` - Verified Windows argv preservation through `scripts/just-shell.py` with arguments containing spaces. - `uv run --frozen --project sdk/python --extra dev pytest sdk/python/tests/test_artifact_workflow_and_binaries.py::test_root_fmt_recipe_formats_rust_and_python_sdk`iceweasel-oai ·
2026-06-01 11:26:36 -07:00 -
Preserve plugin app manifest order (#25491)
## Summary - Preserve app declaration order when loading plugin .app.json files. - Keep plugin connector summaries in plugin app order after connector metadata is merged and filtered. - Add regression coverage for .app.json order and connector summary order. ## Validation - just fmt - just test -p codex-chatgpt connectors_for_plugin_apps_returns_only_requested_plugin_apps - just test -p codex-core-plugins effective_apps_preserves_app_config_order - just fix -p codex-core-plugins (passes with existing clippy large_enum_variant warning in core-plugins/src/manifest.rs) - just fix -p codex-chatgpt - just bazel-lock-update - just bazel-lock-check
charlesgong-openai ·
2026-06-01 11:04:21 -07:00 -
[codex] Rename multi-agent v2 assign_task to followup_task (#25636)
## Summary Renames the MultiAgentV2 turn-triggering tool from `assign_task` to `followup_task` so the exposed tool name better describes sending an additional task to an existing agent. This updates the tool spec, handler/module names, registry wiring, default multi-agent v2 usage hints, and tests. Rollout trace classification keeps accepting legacy `assign_task` events so older traces still reduce correctly, while docs show the new tool name. ## Test plan - `just test -p codex-core followup_task` - `just test -p codex-core -E 'test(multi_agent_feature_selects_one_agent_tool_family) | test(multi_agent_v2_can_use_configured_tool_namespace) | test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'` - `just test -p codex-rollout-trace` - `just fix -p codex-core` - `just fix -p codex-rollout-trace` Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase because the local environment could not resolve `hatchling>=1.27.0` from the configured internal registry. A full `just test -p codex-core` also hit unrelated environment-sensitive integration failures involving missing spawned test binaries/sandbox behavior; the changed multi-agent spec/handler tests passed in the filtered runs above.
jif-oai ·
2026-06-01 19:57:11 +02:00 -
exec-server: add environment path refs (#25121)
## Summary - add public `codex_exec_server::EnvironmentPathRef` - bind an absolute path to its owning executor filesystem - keep path operations in the next review slice ## Stack - 1/5 in the skills path authority stack extracted from https://github.com/openai/codex/pull/25098 ## Validation - `cd /Users/starr/code/codex-worktrees/pr-25098-restack4/codex-rs && just fmt` - GitHub CI pending on rewritten head
starr-openai ·
2026-06-01 10:55:52 -07:00 -
Parallelize cold rollout compression (#25654)
## Why [#25089](https://github.com/openai/codex/pull/25089) added the background worker for compressing cold archived rollouts, but the worker still processed files effectively one at a time: each compression job was sent to `spawn_blocking` and then awaited before the next file started. On machines with a backlog of archived rollouts, that makes catch-up slower than it needs to be even though the actual compression work already runs off the async runtime. ## What Changed - Queue rollout compression work in a `JoinSet` while directory traversal continues. - Cap the worker at two in-flight compression jobs so it can overlap compression without turning the background task into unbounded blocking work. - Drain pending jobs before returning, including the `read_dir.next_entry()` error path, so every launched job still contributes to the final `compressed`, `skipped`, and `failed` stats. - Treat task join failures the same way as compression failures in the worker's warning and failure accounting.
jif-oai ·
2026-06-01 19:54:52 +02:00 -
jif-oai ·
2026-06-01 19:48:29 +02:00 -
[codex] Use git CLI for release Cargo fetches (#25644)
## Summary - Configure the rust-release build job with `CARGO_NET_GIT_FETCH_WITH_CLI=true` - Document the macOS SecureTransport/libgit2 failure mode that hit the `libwebrtc`/`libyuv` git submodule fetch ## Root cause The release run at https://github.com/openai/codex/actions/runs/26717498860/job/78745156683 repeatedly failed before compilation because Cargo's libgit2 fetch path could not clone the nested `yuv-sys/libyuv` submodule from `chromium.googlesource.com`, ending with `SecureTransport error: connection closed via error`. ## Validation - `git diff --check` This is a workflow-only change, so I did not run Rust package tests.
Shijie Rao ·
2026-06-01 10:34:12 -07:00 -
Vivian Fang ·
2026-06-01 10:13:56 -07:00 -
Disable SQLite intrinsics for Windows x64 releases (#25490)
## Why Codex 0.135.0 started shipping bundled SQLite 3.51.x via SQLx 0.9.0 to avoid the older WAL corruption bug fixed by #24728. On Windows x64, #25367 reports an immediate `STATUS_ILLEGAL_INSTRUCTION` crash on a Haswell CPU when starting normal Codex paths. Rather than downgrading SQLite, this keeps the newer bundled SQLite source and removes SQLite compiler-intrinsic code paths from the Windows x64 release build. ## What changed For `x86_64-pc-windows-msvc` release builds, export `LIBSQLITE3_FLAGS=SQLITE_DISABLE_INTRINSIC` before `cargo build` in: - `.github/workflows/rust-release.yml` - `.github/workflows/rust-release-windows.yml` Other targets keep their current SQLite build flags. ## Verification - `git diff --check`
Eric Traut ·
2026-06-01 09:49:55 -07:00 -
Compress cold local rollouts (#25089)
## Rollout compression stack This stack splits #24941 into reviewable steps for local rollout compression. The design is intentionally staged: 1. Teach readers, listing, search, and lookup to understand compressed rollouts. 2. Make append and resume paths materialize compressed rollouts back to plain JSONL before writing. 3. Add a disabled-by-default worker that can compress cold archived rollouts behind `local_thread_store_compression`. The key invariant is that writers append to plain `.jsonl`. A `.jsonl.zst` file is a cold/read representation; if a write is needed, the compressed file is materialized back to plain JSONL first. Readers prefer plain `.jsonl` when both forms exist and can fall back to the compressed sibling during transitions. The worker is deliberately the last PR and remains behind an under-development feature flag. It currently scans only `archived_sessions`, not active `sessions`, because active sessions have the highest resume/append race risk. That means this stack does not yet compress most unarchived local history. ## Known race / follow-up The remaining unresolved design question is writer/compressor coordination. Even for archived rollouts, a resume or metadata update can append while the worker is replacing the plain file with `.jsonl.zst`; the current double-stat checks narrow but do not fully eliminate the window where a writer has opened the plain file before unlink. Do not treat the worker PR as production-ready until we either: - prevent append/resume paths from racing archived compression, or - introduce a shared representation/append lock or equivalent coordination. The first two PRs are useful independently: they make compressed rollouts readable and make append paths safely recover back to plain JSONL. The third PR isolates the worker behavior so that coordination issue is reviewable separately. ## Validation Focused local validation for the stack includes: - `just test -p codex-rollout` - `just test -p codex-thread-store` where thread-store paths were touched - `just test -p codex-features` for the feature flag slice - `just bazel-lock-check` after dependency graph changes - scoped `just fix -p ...` passes for changed crates CI is still the source of truth for the full platform matrix. ## This PR in the stack This is PR 3/3, based on #25088. It adds the under-development feature flag and starts the best-effort background worker when enabled. The worker currently compresses only cold archived rollouts, skips active sessions, verifies compressed output, preserves mtime and permissions, keeps a store-level lock heartbeat, and cleans stale temp files. Stack order: 1. #25087: read compressed local rollouts. 2. #25088: materialize compressed rollouts before append. 3. This PR: add the disabled local compression worker.
jif-oai ·
2026-06-01 18:35:58 +02:00 -
Preserve renamed thread titles during reconciliation (#25624)
## Summary - preserve existing explicit SQLite thread titles during rollout reconciliation/backfill when the incoming rollout title is only first-message-derived - keep stale inferred-title repair behavior while avoiding session-index scans during startup backfill - add a regression test for renamed titles surviving reconcile ## Testing - just fmt - just test -p codex-rollout - just test -p codex-state
jif-oai ·
2026-06-01 18:33:05 +02:00 -
Add reasoning-only status surface item (#25504)
Closes #24886. ## Why Users can configure the TUI status line and terminal title with `model-with-reasoning`, but issue #24886 asks for a compact reasoning-only item. That lets a setup show just `default`, `low`, `medium`, `high`, or `xhigh` without repeating the model name. ## What changed - Added a `reasoning` item for `/statusline` and `/title` setup flows. - Rendered the item from the effective reasoning effort, including collaboration-mode overrides. - Registered `reasoning` with `codex doctor` so Codex-generated terminal-title config is not reported as invalid. - Updated TUI setup snapshots so the picker previews include the new item.
Eric Traut ·
2026-06-01 09:30:20 -07:00