60 Commits

  • test: branch on target OS instead of runner flavor (#29712)
    ## Why
    
    Core tests should branch on the executor's operating system, not on
    runner details such as Docker or Wine. This keeps platform behavior
    stable as new test backends are added and reserves Wine-specific skips
    for actual runner debt.
    
    ## What
    
    - Add `TestTargetOs` and target/host-aware skip helpers while keeping
    `TestEnvironment` internal.
    - Replace topology enum access with remote predicates and a narrow
    Docker accessor.
    - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
    document the skip taxonomy.
    
    ## Validation
    
    - `just test -p core_test_support`
    - `just test -p codex-core
    remote_test_env_can_connect_and_use_filesystem`
    - `bazel test //codex-rs/core:core-all-wine-exec-test
    --test_output=errors` reached test execution; unrelated existing
    view-image, path, and timing failures remain.
    - `just test -p codex-core` and `just test` reached broad test
    execution; this checkout has unrelated helper, sandbox, and timing
    failures.
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • core: Consolidate Responses API Codex metadata (#27122)
    ## What
    Introduce a `CodexResponsesMetadata` struct that defines all the core
    metadata we send to Responses API. Example fields are `thread_id`,
    `turn_id`, `window_id`, etc.
    
    Going forward, `client_metadata["x-codex-turn-metadata"]` will be the
    canonical way Codex sends metadata to Responses API across both HTTP and
    websocket transports.
    
    For now, we continue to emit the existing top-level HTTP headers and
    top-level `client_metadata` fields from the same
    `CodexResponsesMetadata` struct for compatibility reasons.
    
    Also, app-server clients who specify additional
    `responsesapi_client_metadata` via `turn/start` and `turn/steer` will
    have those fields merged into
    `client_metadata["x-codex-turn-metadata"]`, but cannot override the
    reserved fields that core uses (i.e. the fields in
    `CodexResponsesMetadata`).
    
    ## Why
    
    Responses API request instrumentation is the source of truth for
    downstream Codex analytics that join requests by Codex IDs such as
    session, thread, turn, and context window. Before this change, those
    values were assembled through several request-specific paths: HTTP
    request bodies, websocket handshake headers, websocket `response.create`
    payloads, compaction requests, and the rich `x-codex-turn-metadata`
    envelope all had their own wiring.
    
    That made metadata propagation easy to drift across API-key/direct
    Responses API requests, ChatGPT-auth/proxied requests, websocket
    requests, and compaction requests. It also made additions like
    `window_id` error-prone because a field could be added to one transport
    projection but missed in another.
    
    ## What changed
    
    - Added `CodexResponsesMetadata` as the core-owned snapshot for Codex
    metadata sent to ResponsesAPI.
    - Render `client_metadata["x-codex-turn-metadata"]`, flat
    `client_metadata` projections, and direct compatibility headers from
    that same snapshot.
    - Include the known Codex-owned fields in the turn metadata blob,
    including installation/session/thread/turn/window IDs, request kind,
    lineage, sandbox/workspace metadata, timing, and compaction details.
    - Treat app-server `responsesapi_client_metadata` as enrichment for the
    Codex turn metadata blob while preventing those extras from overriding
    Codex-owned fields.
    - Use the same metadata path for normal turns, websocket prewarm, local
    compaction, remote v1 compaction, and remote v2 compaction.
    - Keep websocket connection-only preconnect metadata separate so
    handshakes carry compatibility identity headers without inventing a fake
    turn metadata blob.
    
    ## Verification
    
    - `cargo check -p codex-core`
    - `just fix -p codex-core`
  • [codex] Preserve logical paths during AGENTS.md discovery (#26465)
    ## Intent
    
    Follow up on #26205 by avoiding unnecessary filesystem canonicalization
    during `AGENTS.md` discovery. The configured working directory is
    already absolute, and canonicalization incorrectly switches symlinked
    workspaces from their logical parent hierarchy to the target's
    hierarchy.
    
    ## User-facing behavior
    
    For a symlinked working directory such as:
    
    ```text
    test-root/
    |-- logical-repo/
    |   |-- AGENTS.md              ("logical parent doc")
    |   `-- workspace ------------> physical-repo/workspace/
    `-- physical-repo/
        |-- AGENTS.md              ("physical parent doc")
        `-- workspace/
            `-- AGENTS.md          ("workspace doc")
    ```
    
    Before this change, Codex canonicalized `logical-repo/workspace` to
    `physical-repo/workspace` before discovery. It therefore loaded
    `physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`,
    ignoring the instructions from the repository through which the user
    entered the workspace.
    
    After this change, ancestor discovery walks the configured logical path,
    so Codex loads `logical-repo/AGENTS.md`. Opening
    `logical-repo/workspace/AGENTS.md` still follows the symlink through the
    host filesystem, so the workspace document is also loaded.
    `physical-repo/AGENTS.md` is not loaded.
    
    ## Implementation
    
    Use the logical absolute working directory when discovering project
    instructions and reporting instruction sources. Filesystem reads still
    follow the working-directory symlink, so an `AGENTS.md` in the target
    workspace continues to load while ancestor discovery uses the symlink's
    parents.
    
    ## Validation
    
    Added integration coverage proving that discovery loads the logical
    parent's instructions and the target workspace's instructions, but not
    the target parent's instructions.
  • Switch runtime to cloud config bundle (#24622)
    ## Summary
    
    - Adapts the moved `codex-cloud-config` crate from the legacy cloud
    requirements endpoint to the new config bundle endpoint.
    - Switches runtime consumers from `CloudRequirementsLoader` to
    `CloudConfigBundleLoader` so one shared bundle supplies cloud-delivered
    config and requirements.
    - Removes the legacy cloud requirements domain loader path.
    
    ## Details
    
    This intentionally keeps `codex-cloud-config` monolithic for review
    lineage: the previous PR establishes the crate move, and this PR shows
    the behavior change against that moved implementation. A follow-up PR
    splits the module back into focused files.
    
    The new bundle path preserves the important cloud requirements loader
    semantics where intended: account-scoped signed cache, 30 minute TTL, 5
    minute refresh cadence, retry/backoff, auth recovery, and fail-closed
    startup loading. The cached payload changes from a single requirements
    TOML string to the backend-delivered bundle, and validation rejects
    malformed config or requirements fragments before cache write/use.
  • [codex] Wait for MCP readiness in core integration tests (#24964)
    Ensures MCP-backed `codex-core` integration tests exercise initialized
    servers instead of racing server startup.
    
    I've been idly investigating a few flakes and the failure modes are much
    more confusing when a tool call fails because of a failed server start
    than when the failed server start causes the test to fail directly.
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • Remove SSE fixture loaders (#22684)
    ## Why
    
    The Responses API test support already has structured SSE event
    builders. Keeping separate JSON fixture loaders made small mock streams
    harder to read and left an on-disk fixture for a single event.
    
    ## What changed
    
    - Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str`
    from `core_test_support`.
    - Deleted the one `tests/fixtures/incomplete_sse.json` Responses API
    fixture.
    - Replaced the remaining call sites with `responses::sse(...)` and
    existing event helpers.
    
    ## Validation
    
    - `cargo test -p codex-core --test all
    stream_no_completed::retries_on_early_close`
    - `cargo test -p codex-core --test all
    history_dedupes_streamed_and_final_messages_across_turns`
    - `cargo test -p codex-core --test all review::`
  • hook trust metadata and enforcement (#20321)
    # Why
    
    We want shared hook trust that both the app and the TUI can build on,
    but the metadata is only useful if runtime behavior agrees with it. This
    PR adds a single backend trust model for hooks so unmanaged hooks cannot
    run until the current definition has been reviewed, while managed hooks
    remain runnable and non-configurable.
    
    # What
    
    - persist `trusted_hash` alongside hook state in `config.toml`
    - expose `currentHash` and derived `trustStatus` through `hooks/list`
    - derive trust from normalized hook definitions so equivalent hooks from
    `config.toml` and `hooks.json` share the same trust identity
    - gate unmanaged hooks on trust before they enter the runnable handler
    set
    
    # Reviewer Notes
    
    - key file to review is `codex-rs/hooks/src/engine/discovery.rs`
    - the only **core** change is schema related
  • core: fix apply_patch request permissions test (#21060)
    ## Why
    
    The Bazel test coverage change exposed
    `approved_folder_write_request_permissions_unblocks_later_apply_patch`,
    and `rust-ci-full.yml` showed the same test failing on `main` on macOS.
    There were two separate classes of problems here.
    
    ### Clean CI failure
    
    The test emits an `apply_patch` tool call, but its config did not enable
    the `apply_patch` tool, so the mocked response completed without an
    `apply-patch-call` output. After enabling the tool, the same path also
    needs the aggregate `codex-core` test binary to dispatch
    `--codex-run-as-fs-helper`; sandboxed `apply_patch` uses that helper
    under macOS Seatbelt.
    
    The test now also canonicalizes the temporary patch target before
    building the patch payload so the path matches normalized grants on
    macOS, where `/var` paths often normalize to `/private/var`.
    
    ### Local/enterprise config isolation
    
    The core test harness now builds its default test config with managed
    config disabled, so host-managed enterprise config cannot alter these
    tests. The request-permissions turns in this test also explicitly use
    the user reviewer path, keeping the assertions focused on
    `request_permissions` behavior rather than reviewer defaults from the
    host.
    
    ## What Changed
    
    - Enable `apply_patch` in
    `approved_folder_write_request_permissions_unblocks_later_apply_patch`.
    - Teach the core integration test binary to dispatch
    `CODEX_FS_HELPER_ARG1`, matching the existing apply-patch and
    linux-sandbox dispatch paths.
    - Canonicalize the tempdir-backed patch target before creating the
    patch.
    - Ignore managed config in default core test configs and explicitly pin
    this test to `ApprovalsReviewer::User`.
    
    ## Verification
    
    Run outside the Codex app sandbox because these macOS tests
    intentionally spawn Seatbelt:
    
    - `cargo test -p codex-core
    approved_folder_write_request_permissions_unblocks_later_apply_patch`
    - `cargo test -p codex-core
    approved_folder_write_request_permissions_unblocks_later_exec_without_sandbox_args`
  • permissions: add built-in default profiles (#19900)
    ## Why
    
    The migration away from `SandboxPolicy` needs new configs to start from
    permissions profiles instead of deriving profiles from legacy sandbox
    modes. Existing users can have empty `config.toml` files, and we should
    not rewrite user-owned config files that may live in shared
    repositories.
    
    This PR introduces built-in profile names so an empty config can resolve
    to a canonical `PermissionProfile`, while explicit named `[permissions]`
    profiles still behave predictably.
    
    ## What changed
    
    - Adds built-in `default_permissions` profile names:
      - `:read-only` maps to `PermissionProfile::read_only()`.
    - `:workspace` maps to the workspace-write profile, including
    project-root metadata carveouts.
    - `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
    the distinction between no sandbox and a broad managed sandbox.
    - Reserves the `:` prefix for built-in profiles so user-defined
    `[permissions]` profiles cannot collide with future built-ins.
    - Allows `default_permissions` to reference a built-in profile without
    requiring a `[permissions]` table.
    - Makes an otherwise empty config choose a built-in profile by
    trust/platform context: trusted or untrusted project roots use
    `:workspace` when the platform supports that sandbox, while roots
    without a trust decision use `:read-only`.
    - Keeps legacy `sandbox_mode` configs on the legacy path, and still
    rejects user-defined `[permissions]` profiles that omit
    `default_permissions` so we do not silently guess among custom profiles.
    - Preserves compatibility behavior for implicit defaults: bare
    `network.enabled = true` allows runtime network without starting the
    managed proxy, explicit profile proxy policy still starts the proxy, and
    implicit workspace/add-dir roots keep legacy metadata carveouts.
    
    ## Verification
    
    - `cargo test -p codex-core builtin --lib`
    - `cargo test -p codex-core profile_network_proxy_config`
    - `cargo test -p codex-core
    implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
    - `cargo test -p codex-core
    permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
    - `cargo test -p codex-core
    permissions_profiles_proxy_policy_starts_managed_network_proxy`
    
    ## Documentation
    
    Public Codex config docs should mention these built-in names when the
    `[permissions]` config format is ready to document as stable.
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
    * #20041
    * #20040
    * #20037
    * #20035
    * #20034
    * #20033
    * #20032
    * #20030
    * #20028
    * #20027
    * #20026
    * #20024
    * #20021
    * #20018
    * #20016
    * #20015
    * #20013
    * #20011
    * #20010
    * #20008
    * __->__ #19900
  • [codex] Apply patches through executor filesystem (#17048)
    ## Summary
    - run apply_patch through the executor filesystem when a remote
    environment is present instead of shelling out to the local process
    - thread the executor FileSystem into apply_patch interception and keep
    existing local behavior for non-remote turns
    - make the apply_patch integration harness use the executor filesystem
    for setup/assertions
    - add remote-aware skips for turn-diff coverage that still reads the
    test-runner filesystem
    
    ## Why
    Remote apply_patch needed to mutate the remote workspace instead of the
    local checkout. The tests also needed to seed and assert workspace state
    through the same filesystem abstraction so local and remote runs
    exercise the same behavior.
    
    ## Validation
    - `just fmt`
    - `git diff --check`
    - `cargo check -p core_test_support --tests`
    - `cargo test -p codex-core --test all
    suite::shell_serialization::apply_patch_custom_tool_call -- --nocapture`
    - `cargo test -p codex-core --test all
    suite::apply_patch_cli::apply_patch_cli_updates_file_appends_trailing_newline
    -- --nocapture`
    - remote `cargo test -p codex-core --test all apply_patch_cli --
    --nocapture` (229 passed)
  • [codex] Migrate apply_patch to executor filesystem (#17027)
    - Migrate apply-patch verification and application internals to use the
    async `ExecutorFileSystem` abstraction from `exec-server`.
    - Convert apply-patch `cwd` handling to `AbsolutePathBuf` through the
    verifier/parser/handler boundary.
    
    Doesn't change how the tool itself works.
  • ci: sync Bazel clippy lints and fix uncovered violations (#16351)
    ## Why
    
    Follow-up to #16345, the Bazel clippy rollout in #15955, and the cleanup
    pass in #16353.
    
    `cargo clippy` was enforcing the workspace deny-list from
    `codex-rs/Cargo.toml` because the member crates opt into `[lints]
    workspace = true`, but Bazel clippy was only using `rules_rust` plus
    `clippy.toml`. That left the Bazel lane vulnerable to drift:
    `clippy.toml` can tune lint behavior, but it cannot set
    allow/warn/deny/forbid levels.
    
    This PR now closes both sides of the follow-up. It keeps `.bazelrc` in
    sync with `[workspace.lints.clippy]`, and it fixes the real clippy
    violations that the newly-synced Windows Bazel lane surfaced once that
    deny-list started matching Cargo.
    
    ## What Changed
    
    - added `.github/scripts/verify_bazel_clippy_lints.py`, a Python check
    that parses `codex-rs/Cargo.toml` with `tomllib`, reads the Bazel
    `build:clippy` `clippy_flag` entries from `.bazelrc`, and reports
    missing, extra, or mismatched lint levels
    - ran that verifier from the lightweight `ci.yml` workflow so the sync
    check does not depend on a Rust toolchain being installed first
    - expanded the `.bazelrc` comment to explain the Cargo `workspace =
    true` linkage and why Bazel needs the deny-list duplicated explicitly
    - fixed the Windows-only `codex-windows-sandbox` violations that Bazel
    clippy reported after the sync, using the same style as #16353: inline
    `format!` args, method references instead of trivial closures, removed
    redundant clones, and replaced SID conversion `unwrap` and `expect`
    calls with proper errors
    - cleaned up the remaining cross-platform violations the Bazel lane
    exposed in `codex-backend-client` and `core_test_support`
    
    ## Testing
    
    Key new test introduced by this PR:
    
    `python3 .github/scripts/verify_bazel_clippy_lints.py`
  • fix: fix old system bubblewrap compatibility without falling back to vendored bwrap (#15693)
    Fixes #15283.
    
    ## Summary
    Older system bubblewrap builds reject `--argv0`, which makes our Linux
    sandbox fail before the helper can re-exec. This PR keeps using system
    `/usr/bin/bwrap` whenever it exists and only falls back to vendored
    bwrap when the system binary is missing. That matters on stricter
    AppArmor hosts, where the distro bwrap package also provides the policy
    setup needed for user namespaces.
    
    For old system bwrap, we avoid `--argv0` instead of switching binaries:
    - pass the sandbox helper a full-path `argv0`,
    - keep the existing `current_exe() + --argv0` path when the selected
    launcher supports it,
    - otherwise omit `--argv0` and re-exec through the helper's own
    `argv[0]` path, whose basename still dispatches as
    `codex-linux-sandbox`.
    
    Also updates the launcher/warning tests and docs so they match the new
    behavior: present-but-old system bwrap uses the compatibility path, and
    only absent system bwrap falls back to vendored.
    
    ### Validation
    
    1. Install Ubuntu 20.04 in a VM
    2. Compile codex and run without bubblewrap installed - see a warning
    about falling back to the vendored bwrap
    3. Install bwrap and verify version is 0.4.0 without `argv0` support
    4. run codex and use apply_patch tool without errors
    
    <img width="802" height="631" alt="Screenshot 2026-03-25 at 11 48 36 PM"
    src="https://github.com/user-attachments/assets/77248a29-aa38-4d7c-9833-496ec6a458b8"
    />
    <img width="807" height="634" alt="Screenshot 2026-03-25 at 11 47 32 PM"
    src="https://github.com/user-attachments/assets/5af8b850-a466-489b-95a6-455b76b5050f"
    />
    <img width="812" height="635" alt="Screenshot 2026-03-25 at 11 45 45 PM"
    src="https://github.com/user-attachments/assets/438074f0-8435-4274-a667-332efdd5cb57"
    />
    <img width="801" height="623" alt="Screenshot 2026-03-25 at 11 43 56 PM"
    src="https://github.com/user-attachments/assets/0dc8d3f5-e8cf-4218-b4b4-a4f7d9bf02e3"
    />
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • Use AbsolutePathBuf for cwd state (#15710)
    Migrate `cwd` and related session/config state to `AbsolutePathBuf` so
    downstream consumers consistently see absolute working directories.
    
    Add test-only `.abs()` helpers for `Path`, `PathBuf`, and `TempDir`, and
    update branch-local tests to use them instead of
    `AbsolutePathBuf::try_from(...)`.
    
    For the remaining TUI/app-server snapshot coverage that renders absolute
    cwd values, keep the snapshots unchanged and skip the Windows-only cases
    where the platform-specific absolute path layout differs.
  • Add remote env CI matrix and integration test (#14869)
    `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
    "remotely" (inside a docker container) turning any integration test into
    remote test.
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat(core, tracing): create turn spans over websockets (#14632)
    ## Description
    
    Dependent on:
    - [responsesapi] https://github.com/openai/openai/pull/760991 
    - [codex-backend] https://github.com/openai/openai/pull/760985
    
    `codex app-server -> codex-backend -> responsesapi` now reuses a
    persistent websocket connection across many turns. This PR updates
    tracing when using websockets so that each `response.create` websocket
    request propagates the current tracing context, so we can get a holistic
    end-to-end trace for each turn.
    
    Tracing is propagated via special keys (`ws_request_header_traceparent`,
    `ws_request_header_tracestate`) set in the `client_metadata` param in
    Responses API.
    
    Currently tracing on websockets is a bit broken because we only set
    tracing context on ws connection time, so it's detached from a
    `turn/start` request.
  • Apply argument comment lint across codex-rs (#14652)
    ## Why
    
    Once the repo-local lint exists, `codex-rs` needs to follow the
    checked-in convention and CI needs to keep it from drifting. This commit
    applies the fallback `/*param*/` style consistently across existing
    positional literal call sites without changing those APIs.
    
    The longer-term preference is still to avoid APIs that require comments
    by choosing clearer parameter types and call shapes. This PR is
    intentionally the mechanical follow-through for the places where the
    existing signatures stay in place.
    
    After rebasing onto newer `main`, the rollout also had to cover newly
    introduced `tui_app_server` call sites. That made it clear the first cut
    of the CI job was too expensive for the common path: it was spending
    almost as much time installing `cargo-dylint` and re-testing the lint
    crate as a representative test job spends running product tests. The CI
    update keeps the full workspace enforcement but trims that extra
    overhead from ordinary `codex-rs` PRs.
    
    ## What changed
    
    - keep a dedicated `argument_comment_lint` job in `rust-ci`
    - mechanically annotate remaining opaque positional literals across
    `codex-rs` with exact `/*param*/` comments, including the rebased
    `tui_app_server` call sites that now fall under the lint
    - keep the checked-in style aligned with the lint policy by using
    `/*param*/` and leaving string and char literals uncommented
    - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
    registry/git metadata in the lint job
    - split changed-path detection so the lint crate's own `cargo test` step
    runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
    - continue to run the repo wrapper over the `codex-rs` workspace, so
    product-code enforcement is unchanged
    
    Most of the code changes in this commit are intentionally mechanical
    comment rewrites or insertions driven by the lint itself.
    
    ## Verification
    
    - `./tools/argument-comment-lint/run.sh --workspace`
    - `cargo test -p codex-tui-app-server -p codex-tui`
    - parsed `.github/workflows/rust-ci.yml` locally with PyYAML
    
    ---
    
    * -> #14652
    * #14651
  • fix(ci) fix guardian ci (#13911)
    ## Summary
    #13910 was merged with some unused imports, let's fix this
    
    ## Testing
    - [x] Let's make sure CI is green
    
    ---------
    
    Co-authored-by: Charles Cunningham <ccunningham@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • core: adopt host_executable() rules in zsh-fork (#13046)
    ## Why
    
    [#12964](https://github.com/openai/codex/pull/12964) added
    `host_executable()` support to `codex-execpolicy`, but the zsh-fork
    interception path in `unix_escalation.rs` was still evaluating commands
    with the default exact-token matcher.
    
    That meant an intercepted absolute executable such as `/usr/bin/git
    status` could still miss basename rules like `prefix_rule(pattern =
    ["git", "status"])`, even when the policy also defined a matching
    `host_executable(name = "git", ...)` entry.
    
    This PR adopts the new matching behavior in the zsh-fork runtime only.
    That keeps the rollout intentionally narrow: zsh-fork already requires
    explicit user opt-in, so it is a safer first caller to exercise the new
    `host_executable()` scheme before expanding it to other execpolicy call
    sites.
    
    It also brings zsh-fork back in line with the current `prefix_rule()`
    execution model. Until prefix rules can carry their own permission
    profiles, a matched `prefix_rule()` is expected to rerun the intercepted
    command unsandboxed on `allow`, or after the user accepts `prompt`,
    instead of merely continuing inside the inherited shell sandbox.
    
    ## What Changed
    
    - added `evaluate_intercepted_exec_policy()` in
    `core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
    execpolicy evaluation for intercepted commands
    - switched intercepted direct execs in the zsh-fork path to
    `check_multiple_with_options(...)` with `MatchOptions {
    resolve_host_executables: true }`
    - added `commands_for_intercepted_exec_policy()` so zsh-fork policy
    evaluation works from intercepted `(program, argv)` data instead of
    reconstructing a synthetic command before matching
    - left shell-wrapper parsing intentionally disabled by default behind
    `ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
    path-sensitive matching relies on later direct exec interception rather
    than shell-script parsing
    - made matched `prefix_rule()` decisions rerun intercepted commands with
    `EscalationExecution::Unsandboxed`, while unmatched-command fallback
    keeps the existing sandbox-preserving behavior
    - extracted the zsh-fork test harness into
    `core/tests/common/zsh_fork.rs` so both the skill-focused and
    approval-focused integration suites can exercise the same runtime setup
    - limited this change to the intercepted zsh-fork path rather than
    changing every execpolicy caller at once
    - added runtime coverage in
    `core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
    disallowed `host_executable()` mappings and the wrapper-parsing modes
    - added integration coverage in `core/tests/suite/approvals.rs` to
    verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
    under zsh-fork outside a restrictive `WorkspaceWrite` sandbox
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
    * #13065
    * __->__ #13046
  • test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
    ## Why
    
    The zsh integration tests were still brittle in two ways:
    
    - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
    they often did not exercise the patched zsh fork that `shell-tool-mcp`
    ships
    - once the tests consistently used the vendored zsh fork, they exposed
    real Linux-specific zsh-fork issues in CI
    
    In particular, the Linux failures were not just test noise:
    
    - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
    `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
    could receive malformed arguments
    - the
    `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
    test uses the zsh exec bridge (which talks to the parent over a Unix
    socket), but Linux restricted sandbox seccomp denies `connect(2)`,
    causing timeouts on `ubuntu-24.04` x86/arm
    
    This PR makes the zsh tests consistently run against the intended
    vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
    signal is meaningful.
    
    ## What Changed
    
    - Added a single shared test-only DotSlash file for the patched zsh fork
    at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
    `bash` test resource).
    - Updated both app-server and exec-server zsh tests to use that shared
    DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
    dependency).
    - Updated the app-server zsh-fork test helper to resolve the shared
    DotSlash zsh and avoid silently falling back to host zsh.
    - Kept the app-server zsh-fork tests configured via `config.toml`, using
    a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
    to `-c`) for the subcommand-decline test.
    - Hardened the app-server subcommand-decline zsh-fork test for CI
    variability:
      - tolerate an extra `/responses` POST with a no-op mock response
    - tolerate non-target approval ordering while remaining strict on the
    two `/usr/bin/true` approvals and decline behavior
    - use `DangerFullAccess` on Linux for this one test because it validates
    zsh approval flow, not Linux sandbox socket restrictions
    - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
    `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
    arg0 dispatch continues to work.
    - Moved `maybe_run_zsh_exec_wrapper_mode()` under
    `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
    handling coexists correctly with arg0-dispatched helper modes.
    - Consolidated duplicated `dotslash -- fetch` resolution logic into
    shared test support (`core/tests/common/lib.rs`).
    - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
    use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
    differences by:
      - resolving an absolute `git` path
      - running `git init --quiet .`
    - asserting success / `.git` creation instead of relying on banner text
    
    ## Verification
    
    - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
    - `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
    - `bazel test //codex-rs/exec-server:exec-server-all-test
    --test_output=streamed --test_arg=--nocapture
    --test_arg=accept_elicitation_for_prompt_rule_with_zsh`
    - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
    x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
    aarch64-unknown-linux-gnu` passed in [run
    22291424358](https://github.com/openai/codex/actions/runs/22291424358)
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • bazel: fix snapshot parity for tests/*.rs rust_test targets (#11893)
    ## Summary
    - make `rust_test` targets generated from `tests/*.rs` use Cargo-style
    crate names (file stem) so snapshot names match Cargo (`all__...`
    instead of Bazel-derived names)
    - split lib vs `tests/*.rs` test env wiring in `codex_rust_crate` to
    keep existing lib snapshot behavior while applying Bazel
    runfiles-compatible workspace root for `tests/*.rs`
    - compute the `tests/*.rs` snapshot workspace root from package depth so
    `insta` resolves committed snapshots under Bazel `--noenable_runfiles`
    
    ## Validation
    - `bazelisk test //codex-rs/core:core-all-test
    --test_arg=suite::compact:: --cache_test_results=no`
    - `bazelisk test //codex-rs/core:core-all-test
    --test_arg=suite::compact_remote:: --cache_test_results=no`
  • feat: persist and restore codex app's tools after search (#11780)
    ### What changed
    1. Removed per-turn MCP selection reset in `core/src/tasks/mod.rs`.
    2. Added `SessionState::set_mcp_tool_selection(Vec<String>)` in
    `core/src/state/session.rs` for authoritative restore behavior (deduped,
    order-preserving, empty clears).
    3. Added rollout parsing in `core/src/codex.rs` to recover
    `active_selected_tools` from prior `search_tool_bm25` outputs:
       - tracks matching `call_id`s
       - parses function output text JSON
       - extracts `active_selected_tools`
       - latest valid payload wins
       - malformed/non-matching payloads are ignored
    4. Applied restore logic to resumed and forked startup paths in
    `core/src/codex.rs`.
    5. Updated instruction text to session/thread scope in
    `core/templates/search_tool/tool_description.md`.
    6. Expanded tests in `core/tests/suite/search_tool.rs`, plus unit
    coverage in:
       - `core/src/codex.rs`
       - `core/src/state/session.rs`
    
    ### Behavior after change
    1. Search activates matched tools.
    2. Additional searches union into active selection.
    3. Selection survives new turns in the same thread.
    4. Resume/fork restores selection from rollout history.
    5. Separate threads do not inherit selection unless forked.
  • core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487)
    This PR keeps compaction context-layout test coverage separate from
    runtime compaction behavior changes, so runtime logic review can stay
    focused.
    
    ## Included
    - Adds reusable context snapshot helpers in
    `core/tests/common/context_snapshot.rs` for rendering model-visible
    request/history shapes.
    - Standardizes helper naming for readability:
      - `format_request_input_snapshot`
      - `format_response_items_snapshot`
      - `format_labeled_requests_snapshot`
      - `format_labeled_items_snapshot`
    - Expands snapshot coverage for both local and remote compaction flows:
      - pre-turn auto-compaction
      - pre-turn failure/context-window-exceeded paths
      - mid-turn continuation compaction
      - manual `/compact` with and without prior user turns
    - Captures both sides where relevant:
      - compaction request shape
      - post-compaction history layout shape
    - Adds/uses shared request-inspection helpers so assertions target
    structured request content instead of ad-hoc JSON string parsing.
    - Aligns snapshots/assertions to current behavior and leaves explicit
    `TODO(ccunningham)` notes where behavior is known and intentionally
    deferred.
    
    ## Not Included
    - No runtime compaction logic changes.
    - No model-visible context/state behavior changes.
  • Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
    ## Why
    
    `codex-core` was being built in multiple feature-resolved permutations
    because test-only behavior was modeled as crate features. For a large
    crate, those permutations increase compile cost and reduce cache reuse.
    
    ## Net Change
    
    - Removed the `test-support` crate feature and related feature wiring so
    `codex-core` no longer needs separate feature shapes for test consumers.
    - Standardized cross-crate test-only access behind
    `codex_core::test_support`.
    - External test code now imports helpers from
    `codex_core::test_support`.
    - Underlying implementation hooks are kept internal (`pub(crate)`)
    instead of broadly public.
    
    ## Outcome
    
    - Fewer `codex-core` build permutations.
    - Better incremental cache reuse across test targets.
    - No intended production behavior change.
  • Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
    ## Why
    
    `codex-core` enabled `deterministic_process_ids` through a self
    dev-dependency.
    That forced a second feature-resolved build of the same crate, which
    increased
    compile time and test latency.
    
    ## What Changed
    
    - Removed the `deterministic_process_ids` feature from
    `codex-rs/core/Cargo.toml`.
    - Removed the self dev-dependency on `codex-core` that enabled that
    feature.
    - Removed the Bazel `deterministic_process_ids` crate feature for
    `codex-core`.
    - Added a test-only `AtomicBool` override in unified exec process-id
    allocation.
    - Added a test-support setter for that override and re-exported it from
    `codex-core`.
    - Enabled deterministic process IDs in integration tests via
    `core_test_support` ctor.
    
    ## Behavior
    
    - Production behavior remains random process IDs.
    - Unit tests remain deterministic via `cfg(test)`.
    - Integration tests remain deterministic via explicit test-support
    initialization.
    
    ## Validation
    
    - `just fmt`
    - `cargo test -p codex-core unified_exec::`
    - `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
    - `cargo tree -p codex-core -e features` (verified the removed feature
    path)
  • Update tests to stop using sse_completed fixture (#10638)
    Summary:
    - replace the `sse_completed` fixture and related JSON template with
    direct `responses::ev_completed` payload builders
    - cascade the new SSE helpers through all affected core tests for
    consistency and clarity
    - remove legacy fixtures that were no longer needed once the helpers are
    in place
    
    Testing:
    - Not run (not requested)
  • feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
    We started working with MCP in Codex before
    https://crates.io/crates/rmcp was mature, so we had our own crate for
    MCP types that was generated from the MCP schema:
    
    
    https://github.com/openai/codex/blob/8b95d3e082376f4cb23e92641705a22afb28a9da/codex-rs/mcp-types/README.md
    
    Now that `rmcp` is more mature, it makes more sense to use their MCP
    types in Rust, as they handle details (like the `_meta` field) that our
    custom version ignored. Though one advantage that our custom types had
    is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
    whereas the types in `rmcp` do not. As such, part of the work of this PR
    is leveraging the adapters between `rmcp` types and the serializable
    types that are API for us (app server and MCP) introduced in #10356.
    
    Note this PR results in a number of changes to
    `codex-rs/app-server-protocol/schema`, which merit special attention
    during review. We must ensure that these changes are still
    backwards-compatible, which is possible because we have:
    
    ```diff
    - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
    + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
    ```
    
    so `ContentBlock` has been replaced with the more general `JsonValue`.
    Note that `ContentBlock` was defined as:
    
    ```typescript
    export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
    ```
    
    so the deletion of those individual variants should not be a cause of
    great concern.
    
    Similarly, we have the following change in
    `codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
    
    ```
    - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
    + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
    ```
    
    so:
    
    - `annotations?: ToolAnnotations` ➡️ `JsonValue`
    - `inputSchema: ToolInputSchema` ➡️ `JsonValue`
    - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
    
    and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
    * #10357
    * __->__ #10349
    * #10356
  • fix: leverage codex_utils_cargo_bin() in codex-rs/core/tests/suite (#8887)
    This eliminates our dependency on the `escargot` crate and better
    prepares us for Bazel builds: https://github.com/openai/codex/pull/8875.
  • chore: unify conversation with thread name (#8830)
    Done and verified by Codex + refactor feature of RustRover
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
    https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
    this PR updates all call non-test call sites to use it instead of
    `Config::load_from_base_config_with_overrides()`.
    
    This is important because `load_from_base_config_with_overrides()` uses
    an empty `ConfigRequirements`, which is a reasonable default for testing
    so the tests are not influenced by the settings on the host. This method
    is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
    
    Because `ConfigBuilder::build()` is `async`, many of the test methods
    had to be migrated to be `async`, as well. On the bright side, this made
    it possible to eliminate a bunch of `block_on_future()` stuff.
  • Fix unified_exec on windows (#7620)
    Fix unified_exec on windows
    
    Requires removal of PSUEDOCONSOLE_INHERIT_CURSOR flag so child processed
    don't attempt to wait for cursor position response (and timeout).
    
    
    https://github.com/wezterm/wezterm/compare/main...pakrym:wezterm:PSUEDOCONSOLE_INHERIT_CURSOR?expand=1
    
    ---------
    
    Co-authored-by: pakrym-oai <pakrym@openai.com>
  • feat(core) Add login to shell_command tool (#6846)
    ## Summary
    Adds the `login` parameter to the `shell_command` tool - optional,
    defaults to true.
    
    ## Testing
    - [x] Tested locally
  • [App-server] v2 for account/updated and account/logout (#6175)
    V2 for `account/updated` and `account/logout` for app server. correspond
    to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
    will make other v2 endpoints call `account/updated` instead of
    `authStatusChange` too.
  • Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
    Adds a new ItemStarted event and delivers UserMessage as the first item
    type (more to come).
    
    
    Renames `InputItem` to `UserInput` considering we're using the `Item`
    suffix for actual items.
  • test: reduce time dependency on test harness (#5053)
    Tightened the CLI integration tests to stop relying on wall-clock
    sleeps—new fs watcher helper waits for session files instead of timing
    out, and SSE mocks/fixtures make the flows deterministic.
  • Make output assertions more explicit (#4784)
    Match using precise regexes.
  • chore: refactor tool handling (#4510)
    # Tool System Refactor
    
    - Centralizes tool definitions and execution in `core/src/tools/*`:
    specs (`spec.rs`), handlers (`handlers/*`), router (`router.rs`),
    registry/dispatch (`registry.rs`), and shared context (`context.rs`).
    One registry now builds the model-visible tool list and binds handlers.
    - Router converts model responses to tool calls; Registry dispatches
    with consistent telemetry via `codex-rs/otel` and unified error
    handling. Function, Local Shell, MCP, and experimental `unified_exec`
    all flow through this path; legacy shell aliases still work.
    - Rationale: reduce per‑tool boilerplate, keep spec/handler in sync, and
    make adding tools predictable and testable.
    
    Example: `read_file`
    - Spec: `core/src/tools/spec.rs` (see `create_read_file_tool`,
    registered by `build_specs`).
    - Handler: `core/src/tools/handlers/read_file.rs` (absolute `file_path`,
    1‑indexed `offset`, `limit`, `L#: ` prefixes, safe truncation).
    - E2E test: `core/tests/suite/read_file.rs` validates the tool returns
    the requested lines.
    
    ## Next steps:
    - Decompose `handle_container_exec_with_params` 
    - Add parallel tool calls
  • Add codex exec testing helpers (#4254)
    Add a shortcut to create working directories and run codex exec with
    fake server.