codex

test: branch on target OS instead of runner flavor (#29712 )

## Why

Core tests should branch on the executor's operating system, not on
runner details such as Docker or Wine. This keeps platform behavior
stable as new test backends are added and reserves Wine-specific skips
for actual runner debt.

## What

- Add `TestTargetOs` and target/host-aware skip helpers while keeping
`TestEnvironment` internal.
- Replace topology enum access with remote predicates and a narrow
Docker accessor.
- Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
document the skip taxonomy.

## Validation

- `just test -p core_test_support`
- `just test -p codex-core
remote_test_env_can_connect_and_use_filesystem`
- `bazel test //codex-rs/core:core-all-wine-exec-test
--test_output=errors` reached test execution; unrelated existing
view-image, path, and timing failures remain.
- `just test -p codex-core` and `just test` reached broad test
execution; this checkout has unrelated helper, sandbox, and timing
failures.

Adam Perry @ OpenAI · 2026-06-23 14:27:13 -07:00

9a79536e6b

[codex] Use expect in integration tests (#28441 )

The workspace denies `clippy::expect_used` in production. Although
`clippy.toml` allows `expect` in tests, Bazel Clippy compiles
integration-test helper code in a way that does not receive that
exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
and equivalent `match`/`let else` forms.

This allows `clippy::expect_used` once at each integration-test crate
root (including aggregated suites and test-support libraries), then
replaces manual panic-based Result and Option unwraps with
`expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
crate roots. Intentional assertion and unexpected-variant panics remain
unchanged, and the production `expect_used = "deny"` lint remains in
place.

The cleanup is mechanical and net-negative in line count.

pakrym-oai · 2026-06-15 21:53:47 -07:00

e752f7b4ae

chore(features) rm Feature::ApplyPatchFreeform (#22711 )

## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.

## Testing
- [x] unit tests pass

Dylan Hurd · 2026-05-14 16:15:56 -07:00

51b0e94105

Update models.json (#18586 )

- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.

Ahmed Ibrahim · 2026-04-20 10:27:01 -07:00

316cf0e90b

fix: preserve platform-specific core shell env vars (#16707 )

## Why

We were seeing failures in the following tests as part of trying to get
all the tests running under Bazel on Windows in CI
(https://github.com/openai/codex/pull/16528):

```
suite::shell_command::unicode_output::with_login
suite::shell_command::unicode_output::without_login
```

Certainly `PATHEXT` should have been included in the extra `CORE_VARS`
list, so we fix that up here, but also take things a step further for
now by forcibly ensuring it is set on Windows in the return value of
`create_env()`. Once we get the Windows Bazel build working reliably
(i.e., after #16528 is merged), we should come back to this and confirm
we can remove the special case in `create_env()`.

## What

- Split core env inheritance into `COMMON_CORE_VARS` plus
platform-specific allowlists for Windows and Unix in
[`exec_env.rs`](https://github.com/openai/codex/blob/1b55c88fbf585b32cd553cb9d02ec817f2ad6ebc/codex-rs/core/src/exec_env.rs#L45-L81).
- Preserve `PATHEXT`, `USERNAME`, and `USERPROFILE` on Windows, and
`HOME` / locale vars on Unix.
- Backfill a default `PATHEXT` in `create_env()` on Windows if the
parent env does not provide one, so child process launch still works in
stripped-down Bazel environments.
- Extend the Windows exec-env test to assert mixed-case `PathExt`
survives case-insensitive core filtering, and document why the
shell-command Unicode test goes through a child process.

## Verification

- `cargo test -p codex-core exec_env::tests`

Michael Bolin · 2026-04-03 12:07:07 -07:00

faab4d39e1

fix: use cmd.exe in Windows unicode shell test (#16668 )

## Why

This is a follow-up to #16665. The Windows `unicode_output` test should
still exercise a child process so it verifies PowerShell's UTF-8 output
configuration, but `$env:COMSPEC` depends on that environment variable
surviving the curated Bazel test environment.

Using `cmd.exe` keeps the child-process coverage while avoiding both
bare `cmd` + `PATHEXT` lookup and `$env:COMSPEC` env passthrough
assumptions.

## What

- Run `cmd.exe /c echo naïve_café` in the Windows branch of
`unicode_output`.

## Verification

- `cargo test -p codex-core unicode_output`

Michael Bolin · 2026-04-03 00:32:08 -07:00

b15c918836

fix: use COMSPEC in Windows unicode shell test (#16665 )

## Why

Windows Bazel shell tests launch PowerShell with a curated environment,
so `PATHEXT` may be absent. The existing `unicode_output` test invokes
bare `cmd`, which can fail before the test exercises UTF-8 child-process
output.

## What

- Use `$env:COMSPEC /c echo naïve_café` in the Windows branch of
`unicode_output`.
- Preserve the external child-process path instead of switching the test
to a PowerShell builtin.

## Verification

- `cargo test -p codex-core unicode_output`

Michael Bolin · 2026-04-02 23:54:02 -07:00

14f95db57b

chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054 )

## Why

`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.

This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.

## What changed

- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`

That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.

## Validation

- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`

## Follow-up

- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.

Michael Bolin · 2026-03-27 19:00:44 -07:00

61dfe0b86c

chore(core) Remove Feature::PowershellUtf8 (#15128 )

## Summary
This feature has been enabled for powershell for a while now, let's get
rid of the logic

## Testing
- [x] Unit tests

Dylan Hurd · 2026-03-20 22:03:31 +00:00

ea8b07e680

Split features into codex-features crate (#15253 )

- Split the feature system into a new `codex-features` crate.
- Cut `codex-core` and workspace consumers over to the new config and
warning APIs.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>

Ahmed Ibrahim · 2026-03-19 20:12:07 -07:00

2e22885e79

fix(ci) Faster shell_command::unicode_output test (#14114 )

## Summary
Alternative to #14061 - we need to use a child process on windows to
correctly validate Powershell behavior.

## Testing
- [x] These are tests

Dylan Hurd · 2026-03-09 19:09:56 +00:00

0334ddeccb

config: enforce enterprise feature requirements (#13388 )

## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.

Michael Bolin · 2026-03-04 04:40:22 +00:00

bfff0c729f

Adjust shell command timeouts for Windows (#11247 )

Summary
- add platform-aware defaults for shell command timeouts so Windows
tests get longer waits
- keep medium timeout longer on Windows to ensure flakiness is reduced

Testing
- Not run (not requested)

Dylan Hurd · 2026-02-09 20:03:32 -08:00

168c359b71

fix: increase timeout for tests that have been flaking with timeout issues (#8932 )

I have seen this test flake out sometimes when running the macOS build
using Bazel in CI: https://github.com/openai/codex/pull/8875. Perhaps
Bazel runs with greater parallelism, inducing a heavier load, causing an
issue?

Michael Bolin · 2026-01-08 20:31:03 +00:00

224c4867dd

feat(windows) start powershell in utf-8 mode (#7902 )

## Summary
Adds a FeatureFlag to enforce UTF8 encoding in powershell, particularly
Windows Powershell v5. This should help address issues like #7290.

Notably, this PR does not include the ability to parse `apply_patch`
invocations within UTF8 shell commands (calls to the freeform tool
should not be impacted). I am leaving this out of scope for now. We
should address before this feature becomes Stable, but those cases are
not the default behavior at this time so we're okay for experimentation
phase. We should continue cleaning up the `apply_patch::invocation`
logic and then can handle it more cleanly.

## Testing
- [x] Adds additional testing

Dylan Hurd · 2025-12-22 09:36:44 -08:00

33e1d0844a

chore(shell_command) fix freeform timeout output (#7791 )

## Summary
Adding an additional integration test for timeout_ms

## Testing
- [x] these are tests

Dylan Hurd · 2025-12-15 19:26:39 -08:00

b9d1a087ee

feat(core) Add login to shell_command tool (#6846 )

## Summary
Adds the `login` parameter to the `shell_command` tool - optional,
defaults to true.

## Testing
- [x] Tested locally

Dylan Hurd · 2025-12-05 11:03:25 -08:00

a8cbbdbc6e

17 Commits