6 Commits

  • [codex] Keep Bazel startup options stable across commands (#26256)
    ## Why
    
    `just bazel-clippy` ran target discovery with
    `--noexperimental_remote_repo_contents_cache`, then ran the build with
    the workspace default `--experimental_remote_repo_contents_cache`. Bazel
    therefore killed and restarted its server on each transition, slowing
    repeated commands and discarding the in-memory analysis cache. An audit
    found the same class of startup-option variation in several CI command
    sequences.
    
    ## What changed
    
    - Keep local lint target-discovery queries on the workspace-default
    Bazel server, while making CI target discovery explicitly use the CI
    startup options.
    - Normalize GitHub Actions launches through the BuildBuddy wrapper to
    share `BAZEL_OUTPUT_USER_ROOT` and
    `--noexperimental_remote_repo_contents_cache`.
    - Route the CI lockfile check and Windows test-shard query through the
    same startup configuration.
    - Document the startup-option invariant and add wrapper regression
    coverage.
    
    ## Validation
    
    - Confirmed consecutive local clippy target-discovery runs retained the
    same Bazel server PID.
  • [codex] Fix Windows BuildBuddy Bazel wrapper execution (#25915)
    ## Why
    
    #25156 moved Bazel CI launches into a shared Python wrapper. On Windows,
    launching Bazel with `os.execvp` can split the spaced
    `--test_env=PATH=...` argument and fail to propagate the eventual Bazel
    exit status, allowing jobs to pass without running tests. This reapplies
    the wrapper after #25909 with a Windows-safe launch path.
    
    ## What changed
    
    Use a waited `subprocess.run` launch on Windows while preserving
    `os.execvp` on Unix. Add a process-level regression test for spaced
    arguments and child exit status, and run it on Windows Bazel shard 1.
    
    ## Experiment
    
    To confirm Bazel was actually invoking tests, patch `87b61d0be6`
    temporarily added an intentionally failing `codex-core` unit test. Bazel
    failed on that sentinel on all three major platforms:
    
    - [Linux Bazel
    test](https://github.com/openai/codex/actions/runs/26841132773/job/79151062486)
    - [macOS Bazel
    test](https://github.com/openai/codex/actions/runs/26841132773/job/79151062362)
    - [Windows Bazel test shard
    1/4](https://github.com/openai/codex/actions/runs/26841132773/job/79151062155)
    
    The sentinel was removed after collecting this evidence. Windows Bazel
    [clippy](https://github.com/openai/codex/actions/runs/26841132773/job/79151062914)
    and [release
    verification](https://github.com/openai/codex/actions/runs/26841132773/job/79151062739)
    also passed.
    
    ## Validation
    
    After removing the sentinel, `just test -p codex-core` no longer
    reported it. The local run retained two unrelated environment-specific
    failures.
  • [codex] Revert shared BuildBuddy Bazel wrapper (#25909)
    ## Why
    
    PR #25905 intentionally adds a failing `codex-core` unit test, but its
    [Bazel test on Windows
    check](https://github.com/openai/codex/actions/runs/26837526950/job/79135369259)
    passed. That shows the Bazel configuration introduced by #25156 is not
    behaving as expected, so revert it while the configuration can be
    investigated separately.
    
    ## What changed
    
    Revert #25156 in full, restoring the previous Bazel remote
    configuration, CI scripts, workflows, `rusty_v8` handling, and
    documentation. This removes the shared BuildBuddy wrapper and its tests.
    
    ## Validation
    
    Not run locally; this exact revert was prioritized for a fast rollback.
  • Route Bazel CI through shared BuildBuddy remote config wrapper (#25156)
    ## Why
    
    Bazel remote configuration was selected in several CI scripts and
    workflow steps. That made the BuildBuddy tenant policy easy to duplicate
    and harder to audit, especially for fork pull requests that must not use
    the OpenAI tenant.
    
    This builds on
    [sluongng/buildbuddy-ci-host-routing](https://github.com/openai/codex/compare/main...sluongng:codex:sluongng/buildbuddy-ci-host-routing)
    and consolidates the policy in one place.
    
    ## What to do if this breaks you
    
    See `codex-rs/docs/bazel.md` for details. TLDR:
    
    1. make a BuildBuddy API key and put it in `~/.bazelrc`
    2. if you're an OpenAI employee, add `common
    --config=buildbuddy-openai-rbe` to `user.bazelrc` in the repo root
    
    Run `just bazel-test` to ensure it works.
    
    Note that `just bazel-remote-test` no longer exists, you need to select
    a remote configuration as documented to use RBE.
    
    ## What changed
    
    - Add `.github/scripts/run_bazel_with_buildbuddy.py` as the shared Bazel
    wrapper and Python library. It selects the OpenAI host only for trusted
    upstream GitHub Actions runs, routes keyed fork runs to the generic
    host, and falls back to local Bazel execution when no key is available.
    - Move endpoint selection into explicit `.bazelrc` configurations and
    update Bazel CI, query helpers, and `rusty_v8` staging to use the shared
    policy. Loading-phase target-discovery queries remain local.
    - Add wrapper and `rusty_v8` unit coverage, plus `just test-scripts` for
    the `.github/scripts` Python tests.
    - Document local Bazel usage, `user.bazelrc` setup, BuildBuddy
    configurations, and CI behavior in `codex-rs/docs/bazel.md`.
    
    ## Validation
    
    - `just test-scripts`
    - `bash -n .github/scripts/run-bazel-ci.sh
    .github/scripts/run-bazel-query-ci.sh
    .github/scripts/run-argument-comment-lint-bazel.sh
    scripts/list-bazel-clippy-targets.sh`
    - `python3 -m py_compile .github/scripts/run_bazel_with_buildbuddy.py
    .github/scripts/test_run_bazel_with_buildbuddy.py
    .github/scripts/test_rusty_v8_bazel.py
    .github/scripts/rusty_v8_bazel.py`
    - `ruff check .github/scripts/run_bazel_with_buildbuddy.py
    .github/scripts/test_run_bazel_with_buildbuddy.py
    .github/scripts/test_rusty_v8_bazel.py
    .github/scripts/rusty_v8_bazel.py`
  • bazel: enforce MODULE.bazel.lock sync with Cargo.lock (#11790)
    ## Why this change
    
    When Cargo dependencies change, it is easy to end up with an unexpected
    local diff in
    `MODULE.bazel.lock` after running Bazel. That creates noisy working
    copies and pushes lockfile fixes
    later in the cycle. This change addresses that pain point directly.
    
    ## What this change enforces
    
    The expected invariant is: after dependency updates, `MODULE.bazel.lock`
    is already in sync with
    Cargo resolution. In practice, running `bazel mod deps` should not
    mutate the lockfile in a clean
    state. If it does, the dependency update is incomplete.
    
    ## How this is enforced
    
    This change adds a single lockfile check script that snapshots
    `MODULE.bazel.lock`, runs
    `bazel mod deps`, and fails if the file changes. The same check is wired
    into local workflow
    commands (`just bazel-lock-update` and `just bazel-lock-check`) and into
    Bazel CI (Linux x86_64 job)
    so drift is caught early and consistently. The developer documentation
    is updated in
    `codex-rs/docs/bazel.md` and `AGENTS.md` to make the expected flow
    explicit.
    
    `MODULE.bazel.lock` is also refreshed in this PR to match the current
    Cargo dependency resolution.
    
    ## Expected developer workflow
    
    After changing `Cargo.toml` or `Cargo.lock`, run `just
    bazel-lock-update`, then run
    `just bazel-lock-check`, and include any resulting `MODULE.bazel.lock`
    update in the same change.
    
    ## Testing
    
    Ran `just bazel-lock-check` locally.
  • feat: add support for building with Bazel (#8875)
    This PR configures Codex CLI so it can be built with
    [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
    includes configuration so that remote builds can be done using
    [BuildBuddy](https://www.buildbuddy.io).
    
    If you are familiar with Bazel, things should work as you expect, e.g.,
    run `bazel test //... --keep-going` to run all the tests in the repo,
    but we have also added some new aliases in the `justfile` for
    convenience:
    
    - `just bazel-test` to run tests locally
    - `just bazel-remote-test` to run tests remotely (currently, the remote
    build is for x86_64 Linux regardless of your host platform). Note we are
    currently seeing the following test failures in the remote build, so we
    still need to figure out what is happening here:
    
    ```
    failures:
        suite::compact::manual_compact_twice_preserves_latest_user_messages
        suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
        suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
    ```
    
    - `just build-for-release` to build release binaries for all
    platforms/architectures remotely
    
    To setup remote execution:
    - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
    employees should also request org access at
    https://openai.buildbuddy.io/join/ with their `@openai.com` email
    address.)
    - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
    `~/.bazelrc` (add the line `build
    --remote_header=x-buildbuddy-api-key=YOUR_KEY`)
    - Use `--config=remote` in your `bazel` invocations (or add `common
    --config=remote` to your `~/.bazelrc`, or use the `just` commands)
    
    ## CI
    
    In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
    uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
    (we are working on supporting Windows, but that is not ready yet). Note
    that the failures we are seeing in `just bazel-remote-test` do not occur
    on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
    is green right now.
    
    The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
    that macOS CI jobs build _remotely_ on Linux hosts (using the
    `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
    root `BUILD.bazel`) using cross-compilation to build the macOS
    artifacts. Then these artifacts are downloaded locally to GitHub's macOS
    runner so the tests can be executed natively. This is the relevant
    config that enables this:
    
    ```
    common:macos --config=remote
    common:macos --strategy=remote
    common:macos --strategy=TestRunner=darwin-sandbox,local
    ```
    
    Because of the remote caching benefits we get from BuildBuddy, these new
    CI jobs can be extremely fast! For example, consider these two jobs that
    ran all the tests on Linux x86_64:
    
    - Bazel 1m37s
    https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
    - Cargo 9m20s
    https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
    
    For now, we will continue to run both the Bazel and Cargo jobs for PRs,
    but once we add support for Windows and running Clippy, we should be
    able to cutover to using Bazel exclusively for PRs, which should still
    speed things up considerably. We will probably continue to run the Cargo
    jobs post-merge for commits that land on `main` as a sanity check.
    
    Release builds will also continue to be done by Cargo for now.
    
    Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
    Earlier attempt to add support for Buck2, now abandoned:
    https://github.com/openai/codex/pull/8504
    
    ---------
    
    Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
    Co-authored-by: Michael Bolin <mbolin@openai.com>