5 Commits

  • Fan out rust-ci-full nextest by platform (#23358)
    ## Why
    
    `rust-ci-full` was paying the full Cargo nextest build-and-run cost once
    per platform, with Windows ARM64 as the long pole. This change moves the
    heavy work into one reusable per-platform flow: build a nextest archive
    once, then replay it across four shards so the platform lane spends less
    time running tests serially. For Windows ARM64, the archive is
    cross-compiled on Windows x64 and replayed on native Windows ARM64
    shards so the slow ARM64 machine is used for execution rather than
    compilation.
    
    ## What changed
    
    - split the `rust-ci-full` nextest matrix into five explicit
    per-platform reusable-workflow calls
    - add `.github/workflows/rust-ci-full-nextest-platform.yml` to build one
    archive, upload timings/helpers, replay four nextest shards, upload
    per-shard JUnit, and roll the shard status back up per platform
    - add Windows CI helpers for Dev Drive setup and MSVC ARM64 linker
    environment export so the Windows ARM64 archive can be produced on
    Windows x64
    - keep the existing Cargo git CLI fetch hardening inside the reusable
    workflow, since caller workflow-level `env` does not flow through
    `workflow_call`
    - document the archive-backed shard shape in
    `.github/workflows/README.md`
    - raise the default nextest slow timeout to 30s so the sharded full-CI
    path does not treat every >15s test as stuck
    
    ## Verification
    
    - validated the archive/shard flow with live GitHub Actions runs on this
    PR branch
    - Windows ARM64 cross-compile latency on completed runs:
    - https://github.com/openai/codex/actions/runs/26118759651: `34m30s`
    lane e2e, `17m16s` archive build, `9m55s` shard phase
    - https://github.com/openai/codex/actions/runs/26120777976: `30m36s`
    lane e2e, `17m21s` archive build, `6m50s` shard phase
    - comparable pre-cross-compile sharded Windows ARM64 runs were `55m01s`,
    `50m21s`, and `46m42s`, so the completed cross-compile runs improved the
    lane by roughly `12m` to `24m` versus the prior range
    - latest corrected cross-compile run:
    https://github.com/openai/codex/actions/runs/26120777976
      - Windows ARM64 archive built successfully on Windows x64
    - native Windows ARM64 shards started immediately after the archive
    upload
    - 3/4 Windows ARM64 shards passed; the failing shard hit the same
    existing `code_mode` test failure seen outside this lane
    - downloaded failed-shard JUnit XML from the validation runs and
    confirmed the remaining red is from known test failures, not
    archive/shard wiring
    - no local Codex tests run per repo guidance
    
    ## Notes
    
    - this PR does not change developers.openai.com documentation
  • ci: align Bazel repo cache and Windows clippy target handling (#16740)
    ## Why
    
    Bazel CI had two independent Windows issues:
    
    - The workflow saved/restored `~/.cache/bazel-repo-cache`, but
    `.bazelrc` configured `common:ci-windows
    --repository_cache=D:/a/.cache/bazel-repo-cache`, so `actions/cache` and
    Bazel could point at different directories.
    - The Windows `Bazel clippy` job passed the full explicit target list
    from `//codex-rs/...`, but some of those explicit targets are
    intentionally incompatible with `//:local_windows`.
    `run-argument-comment-lint-bazel.sh` already handles that with
    `--skip_incompatible_explicit_targets`; the clippy workflow path did
    not.
    
    I also tried switching the workflow cache path to
    `D:\a\.cache\bazel-repo-cache`, but the Windows clippy job repeatedly
    failed with `Failed to restore: Cache service responded with 400`, so
    the final change standardizes on `$HOME/.cache/bazel-repo-cache` and
    makes cache restore non-fatal.
    
    ## What Changed
    
    - Expose one repository-cache path from
    `.github/actions/setup-bazel-ci/action.yml` and export that path as
    `BAZEL_REPOSITORY_CACHE` so `run-bazel-ci.sh` passes it to Bazel after
    `--config=ci-*`.
    - Move `actions/cache/restore` out of the composite action into
    `.github/workflows/bazel.yml`, and make restore failures non-fatal
    there.
    - Save exactly the exported cache path in `.github/workflows/bazel.yml`.
    - Remove `common:ci-windows
    --repository_cache=D:/a/.cache/bazel-repo-cache` from `.bazelrc` so the
    Windows CI config no longer disagrees with the workflow cache path.
    - Pass `--skip_incompatible_explicit_targets` in the Windows `Bazel
    clippy` job so incompatible explicit targets do not fail analysis while
    the lint aspect still traverses compatible Rust dependencies.
    
    ## Verification
    
    - Parsed `.github/actions/setup-bazel-ci/action.yml` and
    `.github/workflows/bazel.yml` with Ruby's YAML loader.
    - Resubmitted PR `#16740`; CI is rerunning on the amended commit.
  • bazel: lint rust_test targets in clippy workflow (#16450)
    ## Why
    
    `cargo clippy --tests` was catching warnings in inline `#[cfg(test)]`
    code that the Bazel PR Clippy lane missed. The existing Bazel invocation
    linted `//codex-rs/...`, but that did not apply Clippy to the generated
    manual `rust_test` binaries, so warnings in targets such as
    `//codex-rs/state:state-unit-tests-bin` only surfaced as plain compile
    warnings instead of failing the lint job.
    
    ## What Changed
    
    - added `scripts/list-bazel-clippy-targets.sh` to expand the Bazel
    Clippy target set with the generated manual `rust_test` rules while
    still excluding `//codex-rs/v8-poc:all`
    - updated `.github/workflows/bazel.yml` to use that expanded target list
    in the Bazel Clippy PR job
    - updated `just bazel-clippy` to use the same target expansion locally
    - updated `.github/workflows/README.md` to document that the Bazel PR
    lint lane now covers inline `#[cfg(test)]` code
    
    ## Verification
    
    - `./scripts/list-bazel-clippy-targets.sh` includes
    `//codex-rs/state:state-unit-tests-bin`
    - `bazel build --config=clippy -- //codex-rs/state:state-unit-tests-bin`
    now fails with the same unused import in `state/src/runtime/logs.rs`
    that `cargo clippy --tests` reports
  • ci: split fast PR Rust CI from full post-merge Cargo CI (#16072)
    ## Summary
    
    Split the old all-in-one `rust-ci.yml` into:
    
    - a PR-time Cargo workflow in `rust-ci.yml`
    - a full post-merge Cargo workflow in `rust-ci-full.yml`
    
    This keeps the PR path focused on fast Cargo-native hygiene plus the
    Bazel `build` / `test` / `clippy` coverage in `bazel.yml`, while moving
    the heavyweight Cargo-native matrix to `main`.
    
    ## Why
    
    `bazel.yml` is now the main Rust verification workflow for pull
    requests. It already covers the Bazel build, test, and clippy signal we
    care about pre-merge, and it also runs on pushes to `main` to re-verify
    the merged tree and help keep the BuildBuddy caches warm.
    
    What was still missing was a clean split for the Cargo-native checks
    that Bazel does not replace yet. The old `rust-ci.yml` mixed together:
    
    - fast hygiene checks such as `cargo fmt --check` and `cargo shear`
    - `argument-comment-lint`
    - the full Cargo clippy / nextest / release-build matrix
    
    That made every PR pay for the full Cargo matrix even though most of
    that coverage is better treated as post-merge verification. The goal of
    this change is to leave PRs with the checks we still want before merge,
    while moving the heavier Cargo-native matrix off the review path.
    
    ## What Changed
    
    - Renamed the old heavyweight workflow to `rust-ci-full.yml` and limited
    it to `push` on `main` plus `workflow_dispatch`.
    - Added a new PR-only `rust-ci.yml` that runs:
      - changed-path detection
      - `cargo fmt --check`
      - `cargo shear`
      - `argument-comment-lint` on Linux, macOS, and Windows
    - `tools/argument-comment-lint` package tests when the lint itself or
    its workflow wiring changes
    - Kept the PR workflow's gatherer as the single required Cargo-native
    status so branch protection can stay simple.
    - Added `.github/workflows/README.md` to document the intended split
    between `bazel.yml`, `rust-ci.yml`, and `rust-ci-full.yml`.
    - Preserved the recent Windows `argument-comment-lint` behavior from
    `e02fd6e1d3` in `rust-ci-full.yml`, and mirrored cross-platform lint
    coverage into the PR workflow.
    
    A few details are deliberate:
    
    - The PR workflow still keeps the Linux lint lane on the
    default-targets-only invocation for now, while macOS and Windows use the
    broader released-linter path.
    - This PR does not change `bazel.yml`; it changes the Cargo-native
    workflow around the existing Bazel PR path.
    
    ## Testing
    
    - Rebasing this change onto `main` after `e02fd6e1d3`
    - `ruby -e 'require "yaml"; %w[.github/workflows/rust-ci.yml
    .github/workflows/rust-ci-full.yml .github/workflows/bazel.yml].each {
    |f| YAML.load_file(f) }'`