docs: document remote executor integration testing (#29790)

## Why

Agents need a clear default for writing remote-compatible integration
tests and reproducible commands for each supported runner.

## What

Expand the `remote-tests` skill with fixture guidance, skip selection,
and Docker and Wine commands. Add always-visible `AGENTS.md` guidance
that points new core and app-server tests toward automatic environment
fixtures.

Stacked on #29789.
This commit is contained in:
Adam Perry @ OpenAI
2026-06-23 22:55:36 -07:00
committed by GitHub
Unverified
parent c2b3e3b4f5
commit 31e428a1ef
2 changed files with 105 additions and 12 deletions
+89 -10
View File
@@ -1,25 +1,104 @@
---
name: remote-tests
description: How to run tests using remote executor.
description: Testing against remote executors in integration tests.
---
Some Codex integration tests select `local`, `docker`, or `wine-exec` through
`CODEX_TEST_ENVIRONMENT`. The legacy `CODEX_TEST_REMOTE_ENV=<container>` still
selects Docker; otherwise execution is local.
Remote executor tests exercise the app-server/exec-server split to ensure that agent features work
in both local and remote execution environments.
Docker container is built and initialized via ./scripts/test-remote-env.sh
Remote executor tests currently require an x86_64 Linux host machine. There are two flavors:
On x86-64 Linux, run Wine exec with
`bazel test //codex-rs/core:core-all-wine-exec-test --test_output=errors`.
1. Docker (Linux exec-server)
2. Wine (Windows exec-server)
Local execution targets the host OS, Docker targets Linux, and Wine exec targets
Windows. Choose the skip macro by what the test depends on:
## Test Fixtures
Individual test cases must opt-in to being run against a remote executor.
### codex_core
Use `TestCodexBuilder::build_with_auto_env()` to opt-in to remote execution in core integration
tests unless the test needs more precise control over its executor.
### app-server
Start the server with `TestAppServer::new_with_auto_env()` unless the test defines its own
`$CODEX_HOME/environments.toml` or will define custom environments at runtime.
Start threads with `TestAppServer::send_thread_start_request_with_auto_env()` if you've created the
server with the `auto_env` approach. Omit `ThreadStartParams.environments` (leave it as `None`) when
doing so.
## Test Skips
If a test doesn't pass in a particular remote executor configuration you can skip it in just that
configuration. Include a string reason for future readers when the selected skip macro supports
one.
Choose the skip macro by what causes the test to fail:
- `skip_if_target_windows!`: Windows target behavior.
- `skip_if_wine_exec!`: Wine-exec runner constraints.
- `skip_if_host_windows!`: Windows host constraints.
- `skip_if_remote!`: Local-only test behavior.
- `skip_if_no_remote_env!`: Remote-only test behavior.
- `skip_if_wine_exec!`: Wine-specific runner debt.
Prefer defining tests that run in all host/target configurations by default. See the `$path-types`
skill for the most common changes required to make tests compatible.
## Docker
Docker container is built and initialized via ./scripts/test-remote-env.sh. Sourcing this script
in bash also provides the `codex_remote_env_cleanup` function to use after testing.
To run core integration tests against a Docker remote executor:
```bash
bash -c '
set -euo pipefail
unset CODEX_TEST_REMOTE_EXEC_SERVER_URL
source scripts/test-remote-env.sh
trap codex_remote_env_cleanup EXIT
cd codex-rs
just test -p codex-core --test all
'
```
To run app-server integration tests against a Docker remote executor:
```bash
bash -c '
set -euo pipefail
unset CODEX_TEST_REMOTE_EXEC_SERVER_URL
source scripts/test-remote-env.sh
trap codex_remote_env_cleanup EXIT
cd codex-rs
just test -p codex-app-server --test all
'
```
## Wine
These tests build an exec-server for Windows and run it under Wine, with the app-server staying on
the Linux host. The cross-platform build dependency means they only run in Bazel.
For core integration tests:
```sh
bazel test //codex-rs/core:core-all-wine-exec-test
```
For app-server integration tests:
```sh
bazel test //codex-rs/app-server:app-server-all-wine-exec-test
```
## Devboxes
You can use a devbox to run these tests if you are running on a macOS machine.
You can list devboxes via `applied_devbox ls`, pick the one with `codex` in the name.
Connect to devbox via `ssh <devbox_name>`.
+16 -2
View File
@@ -220,10 +220,13 @@ Use `just bench-smoke` to dry-run the benchmark for a single iteration to ensure
- Under Bazel, binaries and resources may live under runfiles; use `codex_utils_cargo_bin::cargo_bin` to resolve absolute paths that remain stable after `chdir`.
- When locating fixture files or test resources under Bazel, avoid `env!("CARGO_MANIFEST_DIR")`. Prefer `codex_utils_cargo_bin::find_resource!` so paths resolve correctly under both Cargo and Bazel runfiles.
### Integration tests (core)
### Integration tests
#### codex_core integration testing
- Prefer the utilities in `core_test_support::responses` when writing end-to-end Codex tests.
- Use `TestCodexBuilder::build_with_auto_env()` by default to ensure that new tests work with
foreign app/exec OSes. See $remote-tests for details.
- All `mount_sse*` helpers return a `ResponseMock`; hold onto it so you can assert against outbound `/responses` POST bodies.
- Use `ResponseMock::single_request()` when a test should only issue one POST, or `ResponseMock::requests()` to inspect every captured `ResponsesRequest`.
- `ResponsesRequest` exposes helpers (`body_json`, `input`, `function_call_output`, `custom_tool_call_output`, `call_output`, `header`, `path`, `query_param`) so assertions can target structured payloads instead of manual JSON digging.
@@ -247,6 +250,14 @@ Use `just bench-smoke` to dry-run the benchmark for a single iteration to ensure
// assert using request.function_call_output(call_id) or request.json_body() or other helpers.
```
#### app-server integration testing
- Tests should exercise app-server's public JSON-RPC API.
- Use similar server mocking as for core integration tests.
- Use `TestAppServer::new_with_auto_env()` and `TestAppServer::send_thread_start_request_with_auto_env()`
by default to ensure that new tests work with foreign app/exec OSes. See `$remote-tests` for
details.
## App-server API Development Best Practices
These guidelines apply to app-server protocol work in `codex-rs`, especially:
@@ -307,3 +318,6 @@ closest `pyproject.toml`'s `requires-python` field to see what minimum runtime v
## Platform Support
Tests and features must support Linux, macOS and Windows unless feature is explicitly OS-specific.
Codex supports running connected app-server and exec-server on different operating systems. See the
`$remote-tests` skill for details about integration testing these configurations.