mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
docs: document remote executor integration testing (#29790)
## Why Agents need a clear default for writing remote-compatible integration tests and reproducible commands for each supported runner. ## What Expand the `remote-tests` skill with fixture guidance, skip selection, and Docker and Wine commands. Add always-visible `AGENTS.md` guidance that points new core and app-server tests toward automatic environment fixtures. Stacked on #29789.
This commit is contained in:
committed by
GitHub
Unverified
parent
c2b3e3b4f5
commit
31e428a1ef
@@ -1,25 +1,104 @@
|
||||
---
|
||||
name: remote-tests
|
||||
description: How to run tests using remote executor.
|
||||
description: Testing against remote executors in integration tests.
|
||||
---
|
||||
|
||||
Some Codex integration tests select `local`, `docker`, or `wine-exec` through
|
||||
`CODEX_TEST_ENVIRONMENT`. The legacy `CODEX_TEST_REMOTE_ENV=<container>` still
|
||||
selects Docker; otherwise execution is local.
|
||||
Remote executor tests exercise the app-server/exec-server split to ensure that agent features work
|
||||
in both local and remote execution environments.
|
||||
|
||||
Docker container is built and initialized via ./scripts/test-remote-env.sh
|
||||
Remote executor tests currently require an x86_64 Linux host machine. There are two flavors:
|
||||
|
||||
On x86-64 Linux, run Wine exec with
|
||||
`bazel test //codex-rs/core:core-all-wine-exec-test --test_output=errors`.
|
||||
1. Docker (Linux exec-server)
|
||||
2. Wine (Windows exec-server)
|
||||
|
||||
Local execution targets the host OS, Docker targets Linux, and Wine exec targets
|
||||
Windows. Choose the skip macro by what the test depends on:
|
||||
## Test Fixtures
|
||||
|
||||
Individual test cases must opt-in to being run against a remote executor.
|
||||
|
||||
### codex_core
|
||||
|
||||
Use `TestCodexBuilder::build_with_auto_env()` to opt-in to remote execution in core integration
|
||||
tests unless the test needs more precise control over its executor.
|
||||
|
||||
### app-server
|
||||
|
||||
Start the server with `TestAppServer::new_with_auto_env()` unless the test defines its own
|
||||
`$CODEX_HOME/environments.toml` or will define custom environments at runtime.
|
||||
|
||||
Start threads with `TestAppServer::send_thread_start_request_with_auto_env()` if you've created the
|
||||
server with the `auto_env` approach. Omit `ThreadStartParams.environments` (leave it as `None`) when
|
||||
doing so.
|
||||
|
||||
## Test Skips
|
||||
|
||||
If a test doesn't pass in a particular remote executor configuration you can skip it in just that
|
||||
configuration. Include a string reason for future readers when the selected skip macro supports
|
||||
one.
|
||||
|
||||
Choose the skip macro by what causes the test to fail:
|
||||
|
||||
- `skip_if_target_windows!`: Windows target behavior.
|
||||
- `skip_if_wine_exec!`: Wine-exec runner constraints.
|
||||
- `skip_if_host_windows!`: Windows host constraints.
|
||||
- `skip_if_remote!`: Local-only test behavior.
|
||||
- `skip_if_no_remote_env!`: Remote-only test behavior.
|
||||
- `skip_if_wine_exec!`: Wine-specific runner debt.
|
||||
|
||||
Prefer defining tests that run in all host/target configurations by default. See the `$path-types`
|
||||
skill for the most common changes required to make tests compatible.
|
||||
|
||||
## Docker
|
||||
|
||||
Docker container is built and initialized via ./scripts/test-remote-env.sh. Sourcing this script
|
||||
in bash also provides the `codex_remote_env_cleanup` function to use after testing.
|
||||
|
||||
To run core integration tests against a Docker remote executor:
|
||||
|
||||
```bash
|
||||
bash -c '
|
||||
set -euo pipefail
|
||||
unset CODEX_TEST_REMOTE_EXEC_SERVER_URL
|
||||
source scripts/test-remote-env.sh
|
||||
trap codex_remote_env_cleanup EXIT
|
||||
|
||||
cd codex-rs
|
||||
just test -p codex-core --test all
|
||||
'
|
||||
```
|
||||
|
||||
To run app-server integration tests against a Docker remote executor:
|
||||
|
||||
```bash
|
||||
bash -c '
|
||||
set -euo pipefail
|
||||
unset CODEX_TEST_REMOTE_EXEC_SERVER_URL
|
||||
source scripts/test-remote-env.sh
|
||||
trap codex_remote_env_cleanup EXIT
|
||||
|
||||
cd codex-rs
|
||||
just test -p codex-app-server --test all
|
||||
'
|
||||
```
|
||||
|
||||
## Wine
|
||||
|
||||
These tests build an exec-server for Windows and run it under Wine, with the app-server staying on
|
||||
the Linux host. The cross-platform build dependency means they only run in Bazel.
|
||||
|
||||
For core integration tests:
|
||||
|
||||
```sh
|
||||
bazel test //codex-rs/core:core-all-wine-exec-test
|
||||
```
|
||||
|
||||
For app-server integration tests:
|
||||
|
||||
```sh
|
||||
bazel test //codex-rs/app-server:app-server-all-wine-exec-test
|
||||
```
|
||||
|
||||
## Devboxes
|
||||
|
||||
You can use a devbox to run these tests if you are running on a macOS machine.
|
||||
|
||||
You can list devboxes via `applied_devbox ls`, pick the one with `codex` in the name.
|
||||
Connect to devbox via `ssh <devbox_name>`.
|
||||
|
||||
@@ -220,10 +220,13 @@ Use `just bench-smoke` to dry-run the benchmark for a single iteration to ensure
|
||||
- Under Bazel, binaries and resources may live under runfiles; use `codex_utils_cargo_bin::cargo_bin` to resolve absolute paths that remain stable after `chdir`.
|
||||
- When locating fixture files or test resources under Bazel, avoid `env!("CARGO_MANIFEST_DIR")`. Prefer `codex_utils_cargo_bin::find_resource!` so paths resolve correctly under both Cargo and Bazel runfiles.
|
||||
|
||||
### Integration tests (core)
|
||||
### Integration tests
|
||||
|
||||
#### codex_core integration testing
|
||||
|
||||
- Prefer the utilities in `core_test_support::responses` when writing end-to-end Codex tests.
|
||||
|
||||
- Use `TestCodexBuilder::build_with_auto_env()` by default to ensure that new tests work with
|
||||
foreign app/exec OSes. See $remote-tests for details.
|
||||
- All `mount_sse*` helpers return a `ResponseMock`; hold onto it so you can assert against outbound `/responses` POST bodies.
|
||||
- Use `ResponseMock::single_request()` when a test should only issue one POST, or `ResponseMock::requests()` to inspect every captured `ResponsesRequest`.
|
||||
- `ResponsesRequest` exposes helpers (`body_json`, `input`, `function_call_output`, `custom_tool_call_output`, `call_output`, `header`, `path`, `query_param`) so assertions can target structured payloads instead of manual JSON digging.
|
||||
@@ -247,6 +250,14 @@ Use `just bench-smoke` to dry-run the benchmark for a single iteration to ensure
|
||||
// assert using request.function_call_output(call_id) or request.json_body() or other helpers.
|
||||
```
|
||||
|
||||
#### app-server integration testing
|
||||
|
||||
- Tests should exercise app-server's public JSON-RPC API.
|
||||
- Use similar server mocking as for core integration tests.
|
||||
- Use `TestAppServer::new_with_auto_env()` and `TestAppServer::send_thread_start_request_with_auto_env()`
|
||||
by default to ensure that new tests work with foreign app/exec OSes. See `$remote-tests` for
|
||||
details.
|
||||
|
||||
## App-server API Development Best Practices
|
||||
|
||||
These guidelines apply to app-server protocol work in `codex-rs`, especially:
|
||||
@@ -307,3 +318,6 @@ closest `pyproject.toml`'s `requires-python` field to see what minimum runtime v
|
||||
## Platform Support
|
||||
|
||||
Tests and features must support Linux, macOS and Windows unless feature is explicitly OS-specific.
|
||||
|
||||
Codex supports running connected app-server and exec-server on different operating systems. See the
|
||||
`$remote-tests` skill for details about integration testing these configurations.
|
||||
|
||||
Reference in New Issue
Block a user