2 Commits

  • ci: cross-compile Windows Bazel tests (#20585)
    ## Status
    
    This is the Bazel PR-CI cross-compilation follow-up to #20485. It is
    intentionally split from the Cargo/cargo-xwin release-build PoC so
    #20485 can stay as the historical release-build exploration. The
    unrelated async-utils test cleanup has been moved to #20686, so this PR
    is focused on the Windows Bazel CI path.
    
    The intended tradeoff is now explicit in `.github/workflows/bazel.yml`:
    pull requests get the fast Windows cross-compiled Bazel test leg, while
    post-merge pushes to `main` run both that fast cross leg and a fully
    native Windows Bazel test leg. The native main-only job keeps full
    V8/code-mode coverage and gets a 40-minute timeout because it is less
    latency-sensitive than PR CI. All other Bazel jobs remain at 30 minutes.
    
    ## Why
    
    Windows Bazel PR CI currently does the expensive part of the build on
    Windows. A native Windows Bazel test job on `main` completed in about
    28m12s, leaving very little headroom under the 30-minute job timeout and
    making Windows the slowest PR signal.
    
    #20485 showed that Windows cross-compilation can be materially faster
    for Cargo release builds, but PR CI needs Bazel because Bazel owns our
    test sharding, flaky-test retries, and integration-test layout. This PR
    applies the same high-level shape we already use for macOS Bazel CI:
    compile with remote Linux execution, then run platform-specific tests on
    the platform runner.
    
    The compromise is deliberately signal-aware: code-mode/V8 changes are
    rare enough that PR CI can accept losing the direct V8/code-mode
    smoke-test signal temporarily, while `main` still runs the native
    Windows job post-merge to catch that class of regression. A follow-up PR
    should investigate making the cross-built Windows gnullvm V8 archive
    pass the direct V8/code-mode tests so this tradeoff can eventually go
    away.
    
    ## What Changed
    
    - Adds a `ci-windows-cross` Bazel config that targets
    `x86_64-pc-windows-gnullvm`, uses Linux RBE for build actions, and keeps
    `TestRunner` actions local on the Windows runner.
    - Adds explicit Windows platform definitions for
    `windows_x86_64_gnullvm`, `windows_x86_64_msvc`, and a bridge toolchain
    that lets gnullvm test targets execute under the Windows MSVC host
    platform.
    - Updates the Windows Bazel PR test leg to opt into the cross-compile
    path via `--windows-cross-compile` and `--remote-download-toplevel`.
    - Adds a `test-windows-native-main` job that runs only for `push` events
    on `refs/heads/main`, uses the native Windows Bazel path, includes
    V8/code-mode smoke tests, and has `timeout-minutes: 40`.
    - Keeps fork/community PRs without `BUILDBUDDY_API_KEY` on the previous
    local Windows MSVC-host fallback, including
    `--host_platform=//:local_windows_msvc` and `--jobs=8`.
    - Preserves the existing integration-test shape on non-gnullvm
    platforms, while generating Windows-cross wrapper targets only for
    `windows_gnullvm`.
    - Resolves `CARGO_BIN_EXE_*` values from runfiles at test runtime,
    avoiding hard-coded Cargo paths and duplicate test runfiles.
    - Extends the V8 Bazel patches enough for the
    `x86_64-pc-windows-gnullvm` target and Linux remote execution path.
    - Makes the Windows sandbox test cwd derive from `INSTA_WORKSPACE_ROOT`
    at runtime when Bazel provides it, because cross-compiled binaries may
    contain Linux compile-time paths.
    - Keeps the direct V8/code-mode unit smoke tests out of the Windows
    cross PR path for now while native Windows CI continues to cover them
    post-merge.
    
    ## Command Shape
    
    The fast Windows PR test leg invokes the normal Bazel CI wrapper like
    this:
    
    ```shell
    ./.github/scripts/run-bazel-ci.sh \
      --print-failed-action-summary \
      --print-failed-test-logs \
      --windows-cross-compile \
      --remote-download-toplevel \
      -- \
      test \
      --test_tag_filters=-argument-comment-lint \
      --test_verbose_timeout_warnings \
      --build_metadata=COMMIT_SHA=${GITHUB_SHA} \
      -- \
      //... \
      -//third_party/v8:all \
      -//codex-rs/code-mode:code-mode-unit-tests \
      -//codex-rs/v8-poc:v8-poc-unit-tests
    ```
    
    With the BuildBuddy secret available on Windows, the wrapper selects
    `--config=ci-windows-cross` and appends the important Windows-cross
    overrides after rc expansion:
    
    ```shell
    --host_platform=//:rbe
    --shell_executable=/bin/bash
    --action_env=PATH=/usr/bin:/bin
    --host_action_env=PATH=/usr/bin:/bin
    --test_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}
    ```
    
    The native post-merge Windows job intentionally omits
    `--windows-cross-compile` and does not exclude the V8/code-mode unit
    targets:
    
    ```shell
    ./.github/scripts/run-bazel-ci.sh \
      --print-failed-action-summary \
      --print-failed-test-logs \
      -- \
      test \
      --test_tag_filters=-argument-comment-lint \
      --test_verbose_timeout_warnings \
      --build_metadata=COMMIT_SHA=${GITHUB_SHA} \
      --build_metadata=TAG_windows_native_main=true \
      -- \
      //... \
      -//third_party/v8:all
    ```
    
    ## Research Notes
    
    The existing macOS Bazel CI config already uses the model we want here:
    build actions run remotely with `--strategy=remote`, but `TestRunner`
    actions execute on the macOS runner. This PR mirrors that pattern for
    Windows with `--strategy=TestRunner=local`.
    
    The important Bazel detail is that `rules_rs` is already targeting
    `x86_64-pc-windows-gnullvm` for Windows Bazel PR tests. This PR changes
    where the build actions execute; it does not switch the Bazel PR test
    target to Cargo, `cargo-nextest`, or the MSVC release target.
    
    Cargo release builds differ from this Bazel path for V8: the normal
    Windows Cargo release target is MSVC, and `rusty_v8` publishes prebuilt
    Windows MSVC `.lib.gz` archives. The Bazel PR path targets
    `windows-gnullvm`; `rusty_v8` does not publish a prebuilt Windows
    GNU/gnullvm archive, so this PR builds that archive in-tree. That
    Linux-RBE-built gnullvm archive currently crashes in direct V8/code-mode
    smoke tests, which is why the workflow keeps native Windows coverage on
    `main`.
    
    The less obvious Bazel detail is test wrapper selection. Bazel chooses
    the Windows test wrapper (`tw.exe`) from the test action execution
    platform, not merely from the Rust target triple. The outer
    `workspace_root_test` therefore declares the default test toolchain and
    uses the bridge toolchain above so the test action executes on Windows
    while its inner Rust binary is built for gnullvm.
    
    The V8 investigation exposed a Windows-client gotcha: even when an
    action execution platform is Linux RBE, Bazel can still derive the
    genrule shell path from the Windows client. That produced remote
    commands trying to run `C:\Program Files\Git\usr\bin\bash.exe` on Linux
    workers. The wrapper now passes `--shell_executable=/bin/bash` with
    `--host_platform=//:rbe` for the Windows cross path.
    
    The same Windows-client/Linux-RBE boundary also affected
    `third_party/v8:binding_cc`: a multiline genrule command can carry CRLF
    line endings into Linux remote bash, which failed as `$'\r'`. That
    genrule now keeps the `sed` command on one physical shell line while
    using an explicit Starlark join so the shell arguments stay readable.
    
    ## Verification
    
    Local checks included:
    
    ```shell
    bash -n .github/scripts/run-bazel-ci.sh
    bash -n workspace_root_test_launcher.sh.tpl
    ruby -e "require %q{yaml}; YAML.load_file(%q{.github/workflows/bazel.yml}); puts %q{ok}"
    RUNNER_OS=Linux ./scripts/list-bazel-clippy-targets.sh
    RUNNER_OS=Windows ./scripts/list-bazel-clippy-targets.sh
    RUNNER_OS=Linux ./tools/argument-comment-lint/list-bazel-targets.sh
    RUNNER_OS=Windows ./tools/argument-comment-lint/list-bazel-targets.sh
    ```
    
    The Linux clippy and argument-comment target lists contain zero
    `*-windows-cross-bin` labels, while the Windows lists still include 47
    Windows-cross internal test binaries.
    
    CI evidence:
    
    - Baseline native Windows Bazel test on `main`: success in about 28m12s,
    https://github.com/openai/codex/actions/runs/25206257208/job/73907325959
    - Green Windows-cross Bazel run on the split PR before adding the
    main-only native leg: Windows test 9m16s, Windows release verify 5m10s,
    Windows clippy 4m43s,
    https://github.com/openai/codex/actions/runs/25231890068
    - The latest SHA adds the explicit PR-vs-main tradeoff in `bazel.yml`;
    CI is rerunning on that focused diff.
    
    ## Follow-Up
    
    A subsequent PR should investigate making a cross-built Windows binary
    work with V8/code-mode enabled. Likely options are either making the
    Linux-RBE-built `windows-gnullvm` V8 archive correct at runtime, or
    evaluating whether a Bazel MSVC target/toolchain can reuse the same
    prebuilt MSVC `rusty_v8` archive shape that Cargo release builds already
    use.
  • ci: derive cache-stable Windows Bazel PATH (#19161)
    ## Why
    
    The BuildBuddy runs for PR #19086 and the later `main` build had the
    same source tree, but their Windows Bazel action and test cache keys did
    not line up. Comparing the downloaded execution logs showed the full
    GitHub-hosted Windows runner `PATH` had changed from
    `apache-maven-3.9.14` to `apache-maven-3.9.15`.
    
    This repo is not using Maven; the Maven entry was just ambient
    hosted-runner state. The problem was that Windows Bazel CI was still
    forwarding the whole runner `PATH` into Bazel via `--action_env=PATH`,
    `--host_action_env=PATH`, and `--test_env=PATH`, which made otherwise
    reusable cache entries sensitive to unrelated runner image churn.
    
    After discussion with the Bazel and BuildBuddy folks, the better shape
    for this change was to stop asking Bazel to inherit the ambient Windows
    `PATH` and instead compute one explicit cache-stable `PATH` in the
    Windows setup action that already prepares the CI toolchain environment.
    
    ## What
    
    - remove Windows `PATH` passthrough from `.bazelrc`
    - export `CODEX_BAZEL_WINDOWS_PATH` from
    `.github/actions/setup-bazel-ci/action.yml`
    - move the PATH derivation logic into
    `.github/scripts/compute-bazel-windows-path.ps1` so the allow-list is
    easier to review and document
    - keep only the Windows tool locations these Bazel jobs actually need:
    MSVC and SDK paths, Git, PowerShell, Node, DotSlash, and the standard
    Windows system directories
    - update `.github/scripts/run-bazel-ci.sh` to require that explicit
    value and forward it to Bazel action, host action, and test environments
    - log the derived `CODEX_BAZEL_WINDOWS_PATH` in the setup step to
    simplify cache-key debugging
    
    ## Verification
    
    - `bash -n .github/scripts/run-bazel-ci.sh`
    - `ruby -e 'require "yaml"; YAML.load_file(ARGV[0])'
    .github/actions/setup-bazel-ci/action.yml`
    - PowerShell parse check for
    `.github/scripts/compute-bazel-windows-path.ps1`
    - simulated a representative Windows `PATH` in PowerShell; the
    allow-list retained MSVC, Git, PowerShell, Node, Windows, and DotSlash
    entries while dropping Maven