32 Commits

  • [codex] Trace exec-server JSON-RPC requests (#27466)
    ## Why
    
    Exec-server JSON-RPC calls can cross local and remote transports, but
    trace context stopped at the RPC boundary. That made client and server
    work difficult to correlate when diagnosing latency or failures.
    
    ## What changed
    
    - Propagate the current W3C trace context on outbound JSON-RPC requests.
    - Parent inbound request spans from received trace context.
    - Record the received JSON-RPC method on server spans and keep each span
    open through response enqueue.
    - Add only the OTEL dependencies required by the exec-server crate.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests **(this PR)**
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (153 passed)
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server`
  • protocol: separate app and exec RPC ownership (#29714)
    ## Why
    
    The app-server and exec-server expose separate JSON-RPC APIs, but
    exec-server currently sources its serialized protocol and envelope types
    through app-server-oriented code. Giving each API an explicit owner
    makes the crate boundary legible without introducing shared generic
    envelopes.
    
    ## What changed
    
    - Added `codex-exec-server-protocol` to own exec DTOs, process IDs, and
    JSON-RPC envelopes.
    - Updated exec-server clients, transports, handlers, and tests to use
    the new crate.
    - Exposed app-server's existing JSON-RPC types through a public `rpc`
    module while retaining root re-exports.
    - Preserved existing wire shapes, including exec `PathUri` behavior.
    
    ## Stack
    
    This is PR 1 of 6. Next: [PR
    #29721](https://github.com/openai/codex/pull/29721), which moves auth
    mode below the app wire boundary.
    
    ## Validation
    
    - Exec-server protocol and server coverage passed in the focused
    protocol test runs.
    - App-server protocol schema fixtures passed.
  • Run fs helper through Windows sandbox wrapper (#28359)
    ## Why
    
    This is the final PR in the Windows fs-helper sandbox stack and contains
    the actual bug fix.
    
    The exec-server filesystem helper is a direct-spawn path: it asks
    `SandboxManager` for a `SandboxExecRequest`, then launches the returned
    argv itself. That works on macOS and Linux because the transformed argv
    is already a self-contained sandbox wrapper. On Windows, the transformed
    request carried `WindowsRestrictedToken` metadata, but the direct-spawn
    fs-helper runner still launched the helper argv directly.
    
    That means Windows filesystem built-ins backed by the fs-helper could
    run with the parent Codex process permissions instead of the configured
    Windows sandbox. This PR makes the direct-spawn transform produce a
    self-contained Windows wrapper argv before fs-helper launches it.
    
    ## What Changed
    
    - Added `SandboxManager::transform_for_direct_spawn()` for callers that
    launch the returned argv themselves.
    - Wrapped Windows restricted-token direct-spawn requests with `codex.exe
    --run-as-windows-sandbox` and then marked the outer request as
    unsandboxed, matching the macOS/Linux wrapper argv shape.
    - Updated `exec-server/src/fs_sandbox.rs` to use the direct-spawn
    transform for fs-helper launches.
    - Materialized the inner `codex.exe --codex-run-as-fs-helper` executable
    into `.sandbox-bin` so the sandboxed user can run it.
    - Carried runtime workspace roots through `FileSystemSandboxContext` as
    `PathUri` values so `:workspace_roots` policies resolve correctly
    without sending native client paths over exec-server JSON.
    - Preserved wrapper setup identity environment needed by Windows sandbox
    setup without changing the serialized inner helper environment.
    
    ## Verification
    
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just test -p codex-sandboxing transform_for_direct_spawn_windows`
    - `just test -p codex-exec-server fs_sandbox::tests`
    - `just fix -p codex-windows-sandbox -p codex-sandboxing -p
    codex-exec-server -p codex-core -p codex-file-system`
    
    Local note: `just fmt` completed Rust formatting, but this workstation
    still fails the non-Rust formatter phases because uv cannot open its
    cache and the local buildifier/dotslash path is missing.
  • Resume exec-server sessions after disconnect (#28512)
    Supersedes #28288 (closed).
    
    ## Why
    
    A short WebSocket interruption currently ends every client-side process
    handle, even though exec-server keeps the server session and its
    processes alive for a short time.
    
    This is especially visible for executor-backed stdio MCP servers: a
    temporary connection loss becomes a permanent `Transport closed` error.
    The server already has the information needed to resume the session, but
    the client opens a fresh session instead of using it.
    
    This change reconnects below the process and MCP layers. Existing
    process handles stay valid, missed output is recovered, and the same
    server-side processes continue running.
    
    ## State machine
    
    One logical `ExecServerClient` stays alive while its underlying RPC
    connection changes generations.
    
    ```text
                             transport closes
           +------------------------------------------------+
           |                                                v
    +-------------+                                  +-------------+
    |  Connected  |                                  | Recovering  |
    +-------------+                                  +-------------+
           ^                                                |
           | session resumed, processes caught up           | retryable error
           +------------------------------------------------+ loops until deadline
                                                            |
                                                            | deadline or permanent error
                                                            v
                                                      +-------------+
                                                      |   Failed    |
                                                      +-------------+
    ```
    
    ### `Connected`
    
    - New RPC calls use the current connection.
    - Process notifications are published in sequence order.
    - A disconnect only starts recovery if it came from the current
    connection generation. Late events from older generations cannot replace
    the active connection.
    
    ### `Recovering`
    
    - New calls wait instead of choosing a half-connected RPC client.
    - Existing process handles, wake subscriptions, and event subscriptions
    stay open.
    - Streaming HTTP response bodies fail immediately because their byte
    streams cannot be resumed safely.
    - Recovery first waits for process starts that were already in flight. A
    start whose result became ambiguous is cleaned up after reconnection
    instead of being silently adopted.
    - The client reconnects with the learned `session_id`. The server may
    briefly report that the old connection is still attached, so that error
    is retried until the detach finishes.
    - The notification consumer starts before the resume handshake
    completes. This prevents a busy process from filling the notification
    queue and blocking the initialize response.
    - Before installing the new connection, the client catches up every
    recoverable process with `process/read`.
    
    ### `Failed`
    
    - Recovery stops after 25 seconds or after a permanent error.
    - Waiting calls are released with one stable disconnect error.
    - Existing process sessions receive a terminal failure instead of
    waiting forever.
    
    ## Recovering process events
    
    Output, exit, and close events share one sequence. During normal
    operation, the client buffers early events until every lower sequence
    has been published.
    
    After reconnection, the client reads each process starting after its
    last published sequence:
    
    1. Retained output chunks are inserted by sequence number.
    2. Exit and close state are reconstructed in their sequence positions.
    3. Events already received as live notifications are ignored as
    duplicates.
    4. Newly contiguous events are published in order.
    5. If the server no longer retains enough output to fill a sequence gap,
    only that process is terminated and failed. The recovered connection
    remains usable for other processes.
    
    The server reports its full next event sequence for unbounded reads,
    including exit and close events. Closed processes remain readable for
    the same 30-second window used to retain detached sessions.
    
    ## Other details
    
    - Detached server sessions are retained for 30 seconds, leaving margin
    around the client's 25-second recovery deadline.
    - Session attach and detach update the active notification sender under
    the same attachment lock, so an old connection cannot clear a newly
    attached sender.
    - A dedicated error code distinguishes the temporary "session is still
    attached" race from permanent initialization errors.
    - Process starts are identity-checked on both client and server. Cleanup
    from an older start cannot remove a newer process that reused the same
    ID.
    - Mutating requests that were already in flight when the transport
    closed are not replayed, because the client cannot know whether the
    server applied them. Requests started after recovery is known wait for
    the replacement connection.
    - We assume the server/client version stays in sync (on the before/after
    this PR)
    
    ## User impact
    
    Long-running commands and stdio MCP servers can survive a temporary
    exec-server WebSocket interruption without changing process IDs or
    losing output produced during the outage.
  • [exec-server] serve websocket listener via HTTP upgrade (#21963)
    ## Why
    
    `codex exec-server` should keep the existing public `ws://IP:PORT` URL
    shape while serving that websocket connection through an HTTP upgrade
    path internally. That keeps the client-facing configuration simple and
    allows the listener to work through intermediate HTTP-aware
    infrastructure.
    
    ## What changed
    
    - keep the emitted and configured exec-server URL as `ws://IP:PORT`
    - serve that websocket endpoint through Axum HTTP upgrade handling on
    `/`
    - expose `GET /readyz` from the same listener for readiness checks
    - route upgraded Axum websocket streams through the shared JSON-RPC
    connection machinery
    - initialize the rustls crypto provider before websocket client
    connections
    - preserve inbound binary websocket JSON-RPC parsing for compatibility
    with the prior transport behavior
    
    ## Verification
    
    - `cargo test -p codex-exec-server --test health --test process --test
    websocket --test initialize --test exec_process`
  • fix(exec-server): retain output until streams close (#18946)
    ## Why
    
    A Mac Bazel run hit a flake in
    `server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
    where the read path observed process exit but lost the expected buffered
    stdout (`first\nsecond\n`). See the [GitHub Actions
    job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505)
    and [BuildBuddy
    invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59).
    
    The underlying race is that process exit is not the same thing as
    stdout/stderr closure. If a child or grandchild inherits the pipe write
    end, or a process duplicates it with `dup2`, the watched process can
    exit while the stream is still open and more output can still arrive.
    The exec-server was starting exited-process retention cleanup from the
    exit event, so the process entry could be removed before the output
    streams had actually closed.
    
    While stress-testing the exec-server unit suite,
    `server::handler::tests::long_poll_read_fails_after_session_resume`
    exposed a separate test race: it started a short-lived command that
    could exit and wake the pending long-poll read before the session-resume
    assertion observed the resumed-session error. That test is intended to
    cover resume eviction, not process-exit delivery, so this change keeps
    the process alive and quiet while the second connection resumes the
    session.
    
    ## What changed
    
    - Keep exec-server process entries retained until stdout/stderr streams
    close, then start the post-exit retention timer from the closed event.
    - Wake long-poll readers when the closed event is emitted.
    - Add focused `local_process` unit coverage that proves late output is
    still retained after the short test retention interval has elapsed, and
    that closed process entries are eventually evicted.
    - Add a local and remote regression test where a parent exits while a
    child keeps inherited stdout open. The child waits on an explicit
    release file, so the test deterministically observes exit first,
    releases the child, then requires a nonzero-wait read from the exit
    sequence to receive the late output.
    - In `codex-rs/exec-server/src/server/handler/tests.rs`, make
    `long_poll_read_fails_after_session_resume` run a long-lived silent
    command instead of a short command that prints and exits. This isolates
    the test to session-resume behavior and prevents a normal process exit
    from satisfying the pending long-poll read first.
    
    ## Testing
    
    - `cargo test -p codex-exec-server
    exec_process_retains_output_after_exit_until_streams_close`
    - `cargo test -p codex-exec-server local_process::tests`
    - `cargo test -p codex-exec-server`
    - `just fix -p codex-exec-server`
    - `bazel test //codex-rs/exec-server:exec-server-unit-tests
    //codex-rs/exec-server:exec-server-exec_process-test
    //codex-rs/exec-server:exec-server-file_system-test
    //codex-rs/exec-server:exec-server-http_client-test
    //codex-rs/exec-server:exec-server-initialize-test
    //codex-rs/exec-server:exec-server-process-test
    //codex-rs/exec-server:exec-server-websocket-test`
    - `bazel test --runs_per_test=25
    //codex-rs/exec-server:exec-server-unit-tests`
    
    ## Documentation
    
    No docs update needed; this is an internal exec-server correctness fix.
  • exec-server: preserve fs helper runtime env (#18380)
    ## Summary
    - preserve a small fs-helper runtime env allowlist (`PATH`, temp vars)
    instead of launching the sandboxed helper with an empty env
    - add unit coverage for the allowlist and transformed sandbox request
    env
    - add a Linux smoke test that starts the test exec-server with a fake
    `bwrap` on `PATH`, runs a sandboxed fs write through the remote fs
    helper path, and asserts that bwrap path was exercised
    
    ## Validation
    - `cd /tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
    PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
    && bazel test --bes_backend= --bes_results_url=
    //codex-rs/exec-server:exec-server-file_system-test
    --test_filter=sandboxed_file_system_helper_finds_bwrap_on_preserved_path`
    - `cd /tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
    PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
    && bazel test --bes_backend= --bes_results_url=
    //codex-rs/exec-server:exec-server-unit-tests
    --test_filter="helper_env|sandbox_exec_request_carries_helper_env"`
    - earlier on this branch before the smoke-test harness adjustment: `cd
    /tmp/codex-worktrees/fs-helper-env-defaults/codex-rs && export
    PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH
    && bazel test --bes_backend= --bes_results_url=
    //codex-rs/exec-server:all`
    
    Co-authored-by: Codex <noreply@openai.com>
  • Stabilize exec-server filesystem tests in CI (#17671)
    ## Summary\n- add an exec-server package-local test helper binary that
    can run exec-server and fs-helper flows\n- route exec-server filesystem
    tests through that helper instead of cross-crate codex helper
    binaries\n- stop relying on Bazel-only extra binary wiring for these
    tests\n\n## Testing\n- not run (per repo guidance for codex changes)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Stabilize exec-server process tests (#17605)
    Problem: After #17294 switched exec-server tests to launch the top-level
    `codex exec-server` command, parallel remote exec-process cases can
    flake while waiting for the child server's listen URL or transport
    shutdown.
    
    Solution: Serialize remote exec-server-backed process tests and harden
    the harness so spawned servers are killed on drop and shutdown waits for
    the child process to exit.
  • Run exec-server fs operations through sandbox helper (#17294)
    ## Summary
    - run exec-server filesystem RPCs requiring sandboxing through a
    `codex-fs` arg0 helper over stdin/stdout
    - keep direct local filesystem execution for `DangerFullAccess` and
    external sandbox policies
    - remove the standalone exec-server binary path in favor of top-level
    arg0 dispatch/runtime paths
    - add sandbox escape regression coverage for local and remote filesystem
    paths
    
    ## Validation
    - `just fmt`
    - `git diff --check`
    - remote devbox: `cd codex-rs && bazel test --bes_backend=
    --bes_results_url= //codex-rs/exec-server:all` (6/6 passed)
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • feat: move exec-server ownership (#16344)
    This introduces session-scoped ownership for exec-server so ws
    disconnects no longer immediately kill running remote exec processes,
    and it prepares the protocol for reconnect-based resume.
    - add session_id / resume_session_id to the exec-server initialize
    handshake
      - move process ownership under a shared session registry
    - detach sessions on websocket disconnect and expire them after a TTL
    instead of killing processes immediately (we will resume based on this)
    - allow a new connection to resume an existing session and take over
    notifications/ownership
    - I use UUID to make them not predictable as we don't have auth for now
    - make detached-session expiry authoritative at resume time so teardown
    wins at the TTL boundary
    - reject long-poll process/read calls that get resumed out from under an
    older attachment
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Refactor ExecServer filesystem split between local and remote (#15232)
    For each feature we have:
    1. Trait exposed on environment
    2. **Local Implementation** of the trait
    3. Remote implementation that uses the client to proxy via network
    4. Handler implementation that handles PRC requests and calls into
    **Local Implementation**
  • Remove stdio transport from exec server (#15119)
    Summary
    - delete the deprecated stdio transport plumbing from the exec server
    stack
    - add a basic `exec_server()` harness plus test utilities to start a
    server, send requests, and await events
    - refresh exec-server dependencies, configs, and documentation to
    reflect the new flow
    
    Testing
    - Not run (not requested)
    
    ---------
    
    Co-authored-by: starr-openai <starr@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • refactor: delete exec-server and move execve wrapper into shell-escalation (#12632)
    ## Why
    
    We already plan to remove the shell-tool MCP path, and doing that
    cleanup first makes the follow-on `shell-escalation` work much simpler.
    
    This change removes the last remaining reason to keep
    `codex-rs/exec-server` around by moving the `codex-execve-wrapper`
    binary and shared shell test fixtures to the crates/tests that now own
    that functionality.
    
    ## What Changed
    
    ### Delete `codex-rs/exec-server`
    
    - Remove the `exec-server` crate, including the MCP server binary,
    MCP-specific modules, and its test support/test suite
    - Remove `exec-server` from the `codex-rs` workspace and update
    `Cargo.lock`
    
    ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
    
    - Move the wrapper implementation into `shell-escalation`
    (`src/unix/execve_wrapper.rs`)
    - Add the `codex-execve-wrapper` binary entrypoint under
    `shell-escalation/src/bin/`
    - Update `shell-escalation` exports/module layout so the wrapper
    entrypoint is hosted there
    - Move the wrapper README content from `exec-server` to
    `shell-escalation/README.md`
    
    ### Move shared shell test fixtures to `app-server`
    
    - Move the DotSlash `bash`/`zsh` test fixtures from
    `exec-server/tests/suite/` to `app-server/tests/suite/`
    - Update `app-server` zsh-fork tests to reference the new fixture paths
    
    ### Keep `shell-tool-mcp` as a shell-assets package
    
    - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
    artifact contains only patched Bash/Zsh payloads (no Rust binaries)
    - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
    and docs to reflect the shell-assets-only package shape
    - `shell-tool-mcp-ci.yml` does not need changes because it is already
    JS-only
    
    ## Verification
    
    - `cargo shear`
    - `cargo clippy -p codex-shell-escalation --tests`
    - `just clippy`
  • chore: remove codex-core public protocol/shell re-exports (#12432)
    ## Why
    
    `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
    from `codex-protocol` and `codex-shell-command`. That made it easy for
    workspace crates to import those APIs through `codex-core`, which in
    turn hides dependency edges and makes it harder to reduce compile-time
    coupling over time.
    
    This change removes those public re-exports so call sites must import
    from the source crates directly. Even when a crate still depends on
    `codex-core` today, this makes dependency boundaries explicit and
    unblocks future work to drop `codex-core` dependencies where possible.
    
    ## What Changed
    
    - Removed public re-exports from `codex-rs/core/src/lib.rs` for:
    - `codex_protocol::protocol` and related protocol/model types (including
    `InitialHistory`)
      - `codex_protocol::config_types` (`protocol_config_types`)
    - `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
    parse_command, powershell}`
    - Migrated workspace Rust call sites to import directly from:
      - `codex_protocol::protocol`
      - `codex_protocol::config_types`
      - `codex_protocol::models`
      - `codex_shell_command`
    - Added explicit `Cargo.toml` dependencies (`codex-protocol` /
    `codex-shell-command`) in crates that now import those crates directly.
    - Kept `codex-core` internal modules compiling by using `pub(crate)`
    aliases in `core/src/lib.rs` (internal-only, not part of the public
    API).
    - Updated the two utility crates that can already drop a `codex-core`
    dependency edge entirely:
      - `codex-utils-approval-presets`
      - `codex-utils-cli`
    
    ## Verification
    
    - `cargo test -p codex-utils-approval-presets`
    - `cargo test -p codex-utils-cli`
    - `cargo check --workspace --all-targets`
    - `just clippy`
  • feat(shell-tool-mcp): add patched zsh build pipeline (#11668)
    ## Summary
    - add `shell-tool-mcp/patches/zsh-exec-wrapper.patch` against upstream
    zsh `77045ef899e53b9598bebc5a41db93a548a40ca6`
    - add `zsh-linux` and `zsh-darwin` jobs to
    `.github/workflows/shell-tool-mcp.yml`
    - stage zsh binaries under `artifacts/vendor/<target>/zsh/<variant>/zsh`
    - include zsh artifact jobs in `package.needs`
    - mark staged zsh binaries executable during packaging
    
    ## Notes
    - zsh source is cloned from `https://git.code.sf.net/p/zsh/code`
    - workflow pins zsh commit `77045ef899e53b9598bebc5a41db93a548a40ca6`
    - zsh build runs `./Util/preconfig` before `./configure`
    
    ## Validation
    - parsed workflow YAML locally (`yaml-ok`)
    - validated zsh patch applies cleanly with `git apply --check` on a
    fresh zsh clone
  • feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
    `SandboxPolicy::ReadOnly` previously implied broad read access and could
    not express a narrower read surface.
    This change introduces an explicit read-access model so we can support
    user-configurable read restrictions in follow-up work, while preserving
    current behavior today.
    
    It also ensures unsupported backends fail closed for restricted-read
    policies instead of silently granting broader access than intended.
    
    ## What
    
    - Added `ReadOnlyAccess` in protocol with:
      - `Restricted { include_platform_defaults, readable_roots }`
      - `FullAccess`
    - Updated `SandboxPolicy` to carry read-access configuration:
      - `ReadOnly { access: ReadOnlyAccess }`
      - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
    - Preserved existing behavior by defaulting current construction paths
    to `ReadOnlyAccess::FullAccess`.
    - Threaded the new fields through sandbox policy consumers and call
    sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
    related tests.
    - Updated Seatbelt policy generation to honor restricted read roots by
    emitting scoped read rules when full read access is not granted.
    - Added fail-closed behavior on Linux and Windows backends when
    restricted read access is requested but not yet implemented there
    (`UnsupportedOperation`).
    - Regenerated app-server protocol schema and TypeScript artifacts,
    including `ReadOnlyAccess`.
    
    ## Compatibility / rollout
    
    - Runtime behavior remains unchanged by default (`FullAccess`).
    - API/schema changes are in place so future config wiring can enable
    restricted read access without another policy-shape migration.
  • Upgrade rmcp to 0.14 (#10718)
    - [x] Upgrade rmcp to 0.14
  • feat(linux-sandbox): add bwrap support (#9938)
    ## Summary
    This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
    curent Linux sandbox path relies on in-process restrictions (including
    Landlock). Bubblewrap gives us a more uniform filesystem isolation
    model, especially explicit writable roots with the option to make some
    directories read-only and granular network controls.
    
    This is behind a feature flag so we can validate behavior safely before
    making it the default.
    
    - Added temporary rollout flag:
      - `features.use_linux_sandbox_bwrap`
    - Preserved existing default path when the flag is off.
    - In Bubblewrap mode:
    - Added internal retry without /proc when /proc mount is not permitted
    by the host/container.
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • fix: make the find_resource! macro responsible for the absolutize() call (#8884)
    https://github.com/openai/codex/pull/8879 introduced the
    `find_resource!` macro, but now that I am about to use it in more
    places, I realize that it should take care of this normalization case
    for callers.
    
    Note the `use $crate::path_absolutize::Absolutize;` line is there so
    that users of `find_resource!` do not have to explicitly include
    `path-absolutize` to their own `Cargo.toml`.
  • feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)
    To support Bazelification in https://github.com/openai/codex/pull/8875,
    this PR introduces a new `find_resource!` macro that we use in place of
    our existing logic in tests that looks for resources relative to the
    compile-time `CARGO_MANIFEST_DIR` env var.
    
    To make this work, we plan to add the following to all `rust_library()`
    and `rust_test()` Bazel rules in the project:
    
    ```
    rustc_env = {
        "BAZEL_PACKAGE": native.package_name(),
    },
    ```
    
    Our new `find_resource!` macro reads this value via
    `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
    using `find_resource!`_ is injected into the code expanded from the
    macro. (If `find_resource()` were a function, then
    `option_env!("BAZEL_PACKAGE")` would always be
    `codex-rs/utils/cargo-bin`, which is not what we want.)
    
    Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
    environment variable is set at runtime, indicating that the test is
    being run by Bazel. In this case, we have to concatenate the runtime
    `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
    path to the resource.
    
    In testing this change, I discovered one funky edge case in
    `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
    (but not canonicalize!) the result from `find_resource!` because the
    path contains a `common/..` component that does not exist on disk when
    the test is run under Bazel, so it must be semantically normalized using
    the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
    before it is passed to `dotslash fetch`.
    
    Because this new behavior may be non-obvious, this PR also updates
    `AGENTS.md` to make humans/Codex aware that this API is preferred.
  • fix: update resource path resolution logic so it works with Bazel (#8861)
    The Bazelification work in-flight over at
    https://github.com/openai/codex/pull/8832 needs this fix so that Bazel
    can find the path to the DotSlash file for `bash`.
    
    With this change, the following almost works:
    
    ```
    bazel test --test_output=errors //codex-rs/exec-server:exec-server-all-test
    ```
    
    That is, now the `list_tools` test passes, but
    `accept_elicitation_for_prompt_rule` still fails because it runs
    Seatbelt itself, so it needs to be run outside Bazel's local sandboxing.
  • feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)
    This PR introduces a `codex-utils-cargo-bin` utility crate that
    wraps/replaces our use of `assert_cmd::Command` and
    `escargot::CargoBuild`.
    
    As you can infer from the introduction of `buck_project_root()` in this
    PR, I am attempting to make it possible to build Codex under
    [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
    achieve faster incremental local builds (largely due to Buck2's
    [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
    build strategy, as well as benefits from its local build daemon) as well
    as faster CI builds if we invest in remote execution and caching.
    
    See
    https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
    for more details about the performance advantages of Buck2.
    
    Buck2 enforces stronger requirements in terms of build and test
    isolation. It discourages assumptions about absolute paths (which is key
    to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
    variables that Cargo provides are absolute paths (which
    `assert_cmd::Command` reads), this is a problem for Buck2, which is why
    we need this `codex-utils-cargo-bin` utility.
    
    My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
    passed to a `rust_test()` build rule as relative paths.
    `codex-utils-cargo-bin` will resolve these values to absolute paths,
    when necessary.
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
    * #8498
    * __->__ #8496
  • fix: change codex/sandbox-state/update from a notification to a request (#8142)
    Historically, `accept_elicitation_for_prompt_rule()` was flaky because
    we were using a notification to update the sandbox followed by a `shell`
    tool request that we expected to be subject to the new sandbox config,
    but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
    each incoming message to a new Tokio task, messages are not guaranteed
    to be processed in order, so sometimes the `shell` tool call would run
    before the notification was processed.
    
    Prior to this PR, we relied on a generous `sleep()` between the
    notification and the request to reduce the change of the test flaking
    out.
    
    This PR implements a proper fix, which is to use a _request_ instead of
    a notification for the sandbox update so that we can wait for the
    response to the sandbox request before sending the request to the
    `shell` tool call. Previously, `rmcp` did not support custom requests,
    but I fixed that in
    https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
    into the `0.12.0` release (see #8288).
    
    This PR updates `shell-tool-mcp` to expect
    `"codex/sandbox-state/update"` as a _request_ instead of a notification
    and sends the appropriate ack. Note this behavior is tied to our custom
    `codex/sandbox-state` capability, which Codex honors as an MCP client,
    which is why `core/src/mcp_connection_manager.rs` had to be updated as
    part of this PR, as well.
    
    This PR also updates the docs at `shell-tool-mcp/README.md`.
  • chore: upgrade rmcp crate from 0.10.0 to 0.12.0 (#8288)
    Version `0.12.0` includes
    https://github.com/modelcontextprotocol/rust-sdk/pull/590, which I will
    use in https://github.com/openai/codex/pull/8142.
    
    Changes:
    
    - `rmcp::model::CustomClientNotification` was renamed to
    `rmcp::model::CustomNotification`
    - a bunch of types have a `meta` field now, but it is `Option`, so I
    added `meta: None` to a bunch of things
  • fix: policy/*.codexpolicy -> rules/*.rules (#7888)
    We decided that `*.rules` is a more fitting (and concise) file extension
    than `*.codexpolicy`, so we are changing the file extension for the
    "execpolicy" effort. We are also changing the subfolder of `$CODEX_HOME`
    from `policy` to `rules` to match.
    
    This PR updates the in-repo docs and we will update the public docs once
    the next CLI release goes out.
    
    Locally, I created `~/.codex/rules/default.rules` with the following
    contents:
    
    ```
    prefix_rule(pattern=["gh", "pr", "view"])
    ```
    
    And then I asked Codex to run:
    
    ```
    gh pr view 7888 --json title,body,comments
    ```
    
    and it was able to!
  • fix: ensure accept_elicitation_for_prompt_rule() test passes locally (#7832)
    When I originally introduced `accept_elicitation_for_prompt_rule()` in
    https://github.com/openai/codex/pull/7617, it worked for me locally
    because I had run `codex-rs/exec-server/tests/suite/bash` once myself,
    which had the side-effect of installing the corresponding DotSlash
    artifact.
    
    In CI, I added explicit logic to do this as part of
    `.github/workflows/rust-ci.yml`, which meant the test also passed in CI,
    but this logic should have been done as part of the test so that it
    would work locally for devs who had not installed the DotSlash artifact
    for `codex-rs/exec-server/tests/suite/bash` before. This PR updates the
    test to do this (and deletes the setup logic from `rust-ci.yml`),
    creating a new `DOTSLASH_CACHE` in a temp directory so that this is
    handled independently for each test.
    
    While here, also added a check to ensure that the `codex` binary has
    been built prior to running the test, as we have to ensure it is
    symlinked as `codex-linux-sandbox` on Linux in order for the integration
    test to work on that platform.
  • fix: add integration tests for codex-exec-mcp-server with execpolicy (#7617)
    This PR introduces integration tests that run
    [codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp)
    as a user would. Note that this requires running our fork of Bash, so we
    introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so
    that we can run the integration tests on multiple platforms without
    having to check the binaries into the repository. (As noted in the
    DotSlash file, it is slightly more heavyweight than necessary, which may
    be worth addressing as disk space in CI is limited:
    https://github.com/openai/codex/pull/7678.)
    
    To start, this PR adds two tests:
    
    - `list_tools()` makes the `list_tools` request to the MCP server and
    verifies we get the expected response
    - `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with
    `decision="prompt"` and verifies the elicitation flow works as expected
    
    Though the `accept_elicitation_for_prompt_rule()` test **only works on
    Linux**, as this PR reveals that there are currently issues when running
    the Bash fork in a read-only sandbox on Linux. This will have to be
    fixed in a follow-up PR.
    
    Incidentally, getting this test run to correctly on macOS also requires
    a recent fix we made to `brew` that hasn't hit a mainline release yet,
    so getting CI green in this PR required
    https://github.com/openai/codex/pull/7680.