codex

Remove test-support feature from codex-core and replace it with explicit test toggles (#11405 )

## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.

Michael Bolin · 2026-02-10 22:44:02 -08:00

476c1a7160

Remove WebSocket wire format (#10179 )

I'd like WireApi to go away (when chat is removed) and WebSockets is
still responses API just over a different transport.

pakrym-oai · 2026-01-29 13:50:53 -08:00

fbb3a30953

Use test_codex more (#9961 )

Reduces boilderplate.

pakrym-oai · 2026-01-26 18:52:10 -08:00

998e88b12a

feat(core) update Personality on turn (#9644 )

## Summary
Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
is explicitly unused by the clients so far, to simplify PRs - app-server
and tui implementations will be follow-ups.

## Testing
- [x] added integration tests

Dylan Hurd · 2026-01-22 12:04:23 -08:00

8b3521ee77

Add collaboration_mode override to turns (#9408 )

Ahmed Ibrahim · 2026-01-16 21:51:25 -08:00

146d54cede

Add text element metadata to types (#9235 )

Initial type tweaking PR to make the diff of
https://github.com/openai/codex/pull/9116 smaller

This should not change any behavior, just adds some fields to types

charley-oai · 2026-01-14 16:41:50 -08:00

4a9c2bcc5a

Support response.done and add integration tests (#9129 )

The agent loop using a persistent incremental web socket connection.

pakrym-oai · 2026-01-13 16:12:30 +00:00

2d56519ecd

renaming: task to turn (#8963 )

jif-oai · 2026-01-09 17:31:17 +00:00

1aed01e99f

remove get_responses_requests and get_responses_request_bodies to use in-place matcher (#8858 )

Ahmed Ibrahim · 2026-01-08 13:57:48 -08:00

0d3e673019

chore: unify conversation with thread name (#8830 )

Done and verified by Codex + refactor feature of RustRover

jif-oai · 2026-01-07 17:04:53 +00:00

116059c3a0

feat: agent controller (#8783 )

Added an agent control plane that lets sessions spawn or message other
conversations via `AgentControl`.

`AgentBus` (core/src/agent/bus.rs) keeps track of the last known status
of a conversation.

ConversationManager now holds shared state behind an Arc so AgentControl
keeps only a weak back-reference, the goal is just to avoid explicit
cycle reference.

Follow-ups:
* Build a small tool in the TUI to be able to see every agent and send
manual message to each of them
* Handle approval requests in this TUI
* Add tools to spawn/communicate between agents (see related design)
* Define agent types

jif-oai · 2026-01-06 19:08:02 +00:00

1dd1355df3

feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496 )

This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.

As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.

See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.

Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.

My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496

Michael Bolin · 2025-12-23 19:29:32 -08:00

e61bae12e3

chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276 )

https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.

This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.

Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.

Michael Bolin · 2025-12-18 16:12:52 -08:00

3d4ced3ff5

fix parallel tool calls (#7956 )

Ahmed Ibrahim · 2025-12-16 01:28:27 +00:00

d802b18716

Reimplement skills loading using SkillsManager + skills/list op. (#7914 )

refactor the way we load and manage skills:
1. Move skill discovery/caching into SkillsManager and reuse it across
sessions.
2. Add the skills/list API (Op::ListSkills/SkillsListResponse) to fetch
skills for one or more cwds. Also update app-server for VSCE/App;
3. Trigger skills/list during session startup so UIs preload skills and
handle errors immediately.

xl-openai · 2025-12-14 09:58:17 -08:00

5d77d4db6b

Inject SKILL.md when it's explicitly mentioned. (#7763 )

1. Skills load once in core at session start; the cached outcome is
reused across core and surfaced to TUI via SessionConfigured.
2. TUI detects explicit skill selections, and core injects the matching
SKILL.md content into the turn when a selected skill is present.

xl-openai · 2025-12-10 13:59:17 -08:00

b36ecb6c32

make model optional in config (#7769 )

- Make Config.model optional and centralize default-selection logic in
ModelsManager, including a default_model helper (with
codex-auto-balanced when available) so sessions now carry an explicit
chosen model separate from the base config.
- Resolve `model` once in `core` and `tui` from config. Then store the
state of it on other structs.
- Move refreshing models to be before resolving the default model

Ahmed Ibrahim · 2025-12-10 11:19:00 -08:00

cb9a189857

remove model_family from `config (#7571 )

- Remove `model_family` from `config`
- Make sure to still override config elements related to `model_family`
like supporting reasoning

Ahmed Ibrahim · 2025-12-04 11:57:58 -08:00

9b2055586d

Migrate model family to models manager (#7565 )

This PR moves `ModelsFamily` to `openai_models`. It also propagates
`ModelsManager` to session services and use it to drive model family. We
also make `derive_default_model_family` private because it's a step
towards what we want: one place that gives model configuration.

This is a second step at having one source of truth for models
information and config: `ModelsManager`.

Next steps would be to remove `ModelsFamily` from config. That's massive
because it's being used in 41 occasions mostly pre launching `codex`.
Also, we need to make `find_family_for_model` private. It's also big
because it's being used in 21 occasions ~ all tests.

Ahmed Ibrahim · 2025-12-03 18:49:47 -08:00

cee37a32b2

fix(apply_patch) tests for shell_command (#7307 )

## Summary
Adds test coverage for invocations of apply_patch via shell_command with
heredoc, to validate behavior.

## Testing
- [x] These are tests

Dylan Hurd · 2025-12-01 15:09:22 -08:00

5b25915d7e

feat: remote compaction (#6795 )

Co-authored-by: pakrym-oai <pakrym@openai.com>

jif-oai · 2025-11-18 16:51:16 +00:00

838531d3e4

chore(core) Add shell_serialization coverage (#6810 )

## Summary
Similar to #6545, this PR updates the shell_serialization test suite to
cover the various `shell` tool invocations we have. Note that this does
not cover unified_exec, which has its own suite of tests. This should
provide some test coverage for when we eventually consolidate
serialization logic.

## Testing
- [x] These are tests

Dylan Hurd · 2025-11-17 19:10:56 -08:00

2b7378ac77

Promote shared helpers for suite tests (#6460 )

## Summary
- add `TestCodex::submit_turn_with_policies` and extend the response
helpers with reusable tool-call utilities
- update the grep_files, read_file, list_dir, shell_serialization, and
tools suites to rely on the shared helpers instead of local copies
- make the list_dir helper return `anyhow::Result` so clippy no longer
warns about `expect`

## Testing
- `just fix -p codex-core`
- `cargo test -p codex-core --test all
suite::grep_files::grep_files_tool_collects_matches`
- `cargo test -p codex-core
suite::grep_files::grep_files_tool_collects_matches -- --ignored`
(filter requests ignored tests so nothing runs, but the build stays
clean)


------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69112d53abac83219813cab4d7cb6446)

Ahmed Ibrahim · 2025-11-13 17:12:10 -08:00

2a6e9b20df

chore(core) Consolidate apply_patch tests (#6545 )

## Summary
Consolidates our apply_patch tests into one suite, and ensures each test
case tests the various ways the harness supports apply_patch:
1. Freeform custom tool call
2. JSON function tool
3. Simple shell call
4. Heredoc shell call

There are a few test cases that are specific to a particular variant,
I've left those alone.

## Testing
- [x] This adds a significant number of tests

Dylan Hurd · 2025-11-13 15:52:39 -08:00

2c1b693da4

Migrate prompt caching tests to test_codex (#6605 )

To hopefully fix the flakiness

pakrym-oai · 2025-11-13 09:19:38 -08:00

041d6ad902

Set verbosity to low for 5.1 (#6568 )

And improve test coverage

pakrym-oai · 2025-11-13 01:40:52 +00:00

f97874093e

chore: testing on freeform apply_patch (#5952 )

## Summary
Duplicates the tests in `apply_patch_cli.rs`, but tests the freeform
apply_patch tool as opposed to the function call path. The good news is
that all the tests pass with zero logical tests, with the exception of
the heredoc, which doesn't really make sense in the freeform tool
context anyway.

@jif-oai since you wrote the original tests in #5557, I'd love your
opinion on the right way to DRY these test cases between the two. Happy
to set up a more sophisticated harness, but didn't want to go down the
rabbit hole until we agreed on the right pattern

## Testing
- [x] These are tests

Dylan Hurd · 2025-10-30 10:40:48 -07:00

4a55646a02

feat: deprecation warning (#5825 )

<img width="955" height="311" alt="Screenshot 2025-10-28 at 14 26 25"
src="https://github.com/user-attachments/assets/99729b3d-3bc9-4503-aab3-8dc919220ab4"
/>

jif-oai · 2025-10-29 12:29:28 +00:00

060637b4d4

chore: testing on apply_path (#5557 )

jif-oai · 2025-10-23 17:00:48 +01:00

6745b12427

Add a baseline test for resume initial messages (#5466 )

pakrym-oai · 2025-10-21 11:45:01 -07:00

5cd8803998

fix: apply_patch shell_serialization tests (#4786 )

## Summary
Adds additional shell_serialization tests specifically for apply_patch
and other cases.

## Test Plan
- [x] These are all tests

Dylan · 2025-10-14 13:00:49 -07:00

0a0a10d8b3

[MCP] Add support for MCP Oauth credentials (#4517 )

This PR adds oauth login support to streamable http servers when
`experimental_use_rmcp_client` is enabled.

This PR is large but represents the minimal amount of work required for
this to work. To keep this PR smaller, login can only be done with
`codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
or `codex mcp list` yet. Fingers crossed that this is the last large MCP
PR and that subsequent PRs can be smaller.

Under the hood, credentials are stored using platform credential
managers using the [keyring crate](https://crates.io/crates/keyring).
When the keyring isn't available, it falls back to storing credentials
in `CODEX_HOME/.credentials.json` which is consistent with how other
coding agents handle authentication.

I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
to test the dbus store on linux but did verify that the fallback works.

One quirk is that if you have credentials, during development, every
build will have its own ad-hoc binary so the keyring won't recognize the
reader as being the same as the write so it may ask for the user's
password. I may add an override to disable this or allow
users/enterprises to opt-out of the keyring storage if it causes issues.

<img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
/>
<img width="745" height="486" alt="image"
src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
/>

Gabriel Peal · 2025-10-03 13:43:12 -04:00

1d17ca1fa3

Add notifier tests (#4064 )

Proposal:
1. Use anyhow for tests and avoid unwrap
2. Extract a helper for starting a test instance of codex

pakrym-oai · 2025-09-23 14:25:46 +00:00

5c7d9e27b1

33 Commits