codex

Route image extension reads through turn environments v2 (#27498 )

## Why

Image generation used `std::fs::read` for referenced image paths, which
did not support environment-backed filesystems or their sandbox context.

## What changed

- Expose optional turn environments to extension tool calls.
- Include each environment’s ID, working directory, filesystem, and
sandbox context.
- Read referenced images through the selected environment filesystem.
- Keep sandbox usage at the extension call site so extensions can choose
the appropriate access mode.
- Consolidate image request construction into one async function.
- Add coverage for successful environment reads and read failures.

## Validation

- `cargo check -p codex-image-generation-extension --tests`
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`

`just test -p codex-image-generation-extension` could not complete
because the build exhausted available disk space.

Won Park · 2026-06-11 16:32:52 -07:00

19ce6394af

skills: expose remote skill resource tools (#27388 )

## Why

PR #27387 makes backend plugin skills discoverable and invocable without
an executor, but resources referenced by those skills still sit behind
the generic MCP resource surface. The model needs a skills-owned API
that preserves the provider authority and package boundary instead of
treating remote resources like local files.

This is stacked on #27387.

## What

- Adds one `skills` namespace with bounded `list` and `read` tools for
remote skill providers.
- Revalidates `authority + package` against the live remote catalog on
every read, then routes the opaque resource ID back through that
provider.
- Allows the backend provider to read canonical child `skill://`
resources while rejecting cross-package, non-canonical, query, fragment,
and traversal-shaped URIs.
- Caps each serialized tool result at 8 KB. Lists are paginated; reads
return an opaque continuation cursor.
- Marks the JSON output as external context so memory generation can
apply its normal suppression policy.
- Deliberately does not add `skills.search`; that waits for a bounded
plugin-service search contract.

## Tool contract

Pseudo-Python matching the wire shape:

```python
from typing import Literal, NotRequired, TypedDict


class RemoteSkillAuthority(TypedDict):
    kind: Literal["remote"]
    id: str  # e.g. "codex_apps"


class RemoteSkill(TypedDict):
    authority: RemoteSkillAuthority
    package: str  # opaque provider-owned package ID
    name: str
    description: str
    main_resource: str  # opaque provider-owned SKILL.md ID


class SkillsListParams(TypedDict):
    cursor: NotRequired[str]


class SkillsListResult(TypedDict):
    skills: list[RemoteSkill]
    next_cursor: str | None
    warnings: list[str]
    truncated: bool


class SkillsReadParams(TypedDict):
    authority: RemoteSkillAuthority  # copied from skills.list
    package: str  # copied from skills.list
    resource: str  # provider-owned child resource ID
    cursor: NotRequired[str]  # copy next_cursor to continue


class SkillsReadResult(TypedDict):
    resource: str
    contents: str
    next_cursor: str | None
    truncated: bool


class Skills:
    def list(self, params: SkillsListParams) -> SkillsListResult: ...
    def read(self, params: SkillsReadParams) -> SkillsReadResult: ...
```

There is one namespace for all remote skills, not one tool or MCP server
per skill. No resource ID is converted into a filesystem path.

## Backend dependency

`/ps/mcp` must support direct reads of child resources such as
`skill://plugin_demo/deploy/references/deploy.md`. This PR implements
and tests the Codex side of that contract; production child reads remain
dependent on the corresponding plugin-service support. Search remains
out of scope until that service exposes a bounded search/resource API.

## Validation

- Added an app-server integration test covering `skills.list` followed
by `skills.read` with no executor.
- Ran `just fmt`.
- Ran `just bazel-lock-update` and `just bazel-lock-check`.
- Did not run Rust tests or Clippy locally, per request; CI will run
them.

jif · 2026-06-11 12:38:04 +02:00

7b3eb8e4bc

[codex] Add comp_hash to model metadata (#27532 )

## Summary
- add optional `comp_hash` metadata to `ModelInfo`
- update `ModelInfo` fixtures for the shared schema change
- keep older model responses compatible by defaulting the field to
`None`

## Why
The models endpoint needs an opaque identifier for compaction-compatible
model configurations. This PR only exposes that value in model metadata;
it does not add it to turn context or change runtime behavior.

Follow-up #27520 carries the value through turn context and rollouts,
then uses it to trigger compaction.

## Stack
- based directly on `main`
- replaces #27519, which was accidentally merged into the wrong base
branch
- functionality follow-up: #27520

## Testing
- `just test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `just fix -p codex-core -p codex-protocol -p codex-analytics -p
codex-models-manager`

Ahmed Ibrahim · 2026-06-10 20:42:55 -07:00

e614fad02e

tools: simplify default tool search text (#27526 )

## Why

Default tool search text currently derives identity from both `ToolName`
and `ToolSpec`. For function and namespace specs, this indexes the same
names more than once and also adds a flattened `{namespace}{name}` token
that is not model-visible.

## What changed

- Derive default search text entirely from `ToolSpec` while preserving
names, descriptions, namespace metadata, and recursive schema metadata.
- Keep the default search-text builder private and remove the unused
`ToolName` argument.
- Add coverage for the exact search text generated for a namespaced tool
with nested schema metadata.

## Example

For the `codex_app` namespace and `automation_update` tool (schema terms
omitted):

- Before: `codex_appautomation_update automation update codex_app
codex_app Manage Codex automations. automation_update automation update
...`
- After: `codex_app Manage Codex automations. automation_update
automation update ...`

## Testing

- `just test -p codex-tools`

sayan-oai · 2026-06-11 03:37:25 +00:00

728b8243a9

[plugins] Inject remote_plugin_id into install elicitations (#26409 )

Summary
- Propagate cached remote plugin IDs through Codex plugin discovery.
- Inject `remote_plugin_id` and connector IDs into
`request_plugin_install` elicitation `_meta` from the resolved plugin.
- Keep the remote plugin ID out of the model-facing tool schema,
arguments, and result.

Validation
- `just test -p codex-tools`
- `just test -p codex-core-plugins`
- `just test -p codex-core
list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins`
- `just fix -p codex-tools`
- `just fix -p codex-core-plugins`
- `just fix -p codex-core`
- `git diff --check`
- `just test -p codex-core` was also attempted: 2,581 passed, 55 failed,
and 1 timed out across unrelated sandbox/environment-sensitive
integration tests.

Alex Daley · 2026-06-10 12:01:03 -07:00

020bf49346

[codex] Remove async_trait from ToolExecutor (#27304 )

## Why

We're now [discouraging use of
`async_trait`](https://github.com/openai/codex/pull/20242).

Removing use of `async_trait` from `ToolExecutor` yields a `codex_core`
debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine.

Stacked on #27299, this PR applies the trait change after the handler
bodies have been outlined.

## What

Changed `ToolExecutor::handle` to return an explicit boxed
`ToolExecutorFuture` instead of using `async_trait`.

Updated ToolExecutor implementors to return `Box::pin(...)`, reexported
the future alias through `codex-tools` and `codex-extension-api`, and
removed `codex-tools` direct `async-trait` dependency.

Adam Perry @ OpenAI · 2026-06-10 10:26:53 -07:00

2704ecea9a

chore: preserve one more schema layer during large tool compaction (#27084 )

## Summary

Some customer MCP tools expose large input schemas that exceed Codex's
compact schema budget even after description stripping. Today, the final
compaction pass collapses complex schemas starting at depth 2, which can
erase important shallow call structure such as small `anyOf` branches,
required fields, and help-mode entry points. In one reported case, this
degraded a tool schema into `query: any | any`, leaving the model
without enough structure to discover the required help call.

This change raises the deep-schema collapse boundary from depth 2 to
depth 3. That preserves one additional layer of the tool contract while
still collapsing deeper expensive subtrees to `{}` when a schema remains
over budget.

## What Changed

- Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`.
- Updated the schema compaction traversal test to assert the new
collapse boundary.
- The resulting compacted shape keeps useful shallow structure, for
example:
  - top-level argument names
  - shallow `anyOf` branches
  - required object fields
  - nested property names one level deeper than before

## Validation

- Ran `just test -p codex-tools`: 81 tests passed.
- Ran a golden schema corpus comparison over 214 discovered tool input
schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`.
- Depth 2 and depth 3 had identical percentile token counts across the
corpus.
  - Both ended with `0 / 214` schemas over 1k tokens.
- Both ended with `0 / 214` schemas over the 4,000-byte compact JSON
budget.
- Only one golden schema changed, increasing from 49 to 56 tokens, so
this does not appear to introduce a meaningful corpus-wide regression.

Corpus percentile results:

| Percentile | Depth 2 | Depth 3 |
|---|---:|---:|
| p0 | 9 | 9 |
| p10 | 31 | 31 |
| p25 | 54 | 54 |
| p50 | 81 | 81 |
| p75 | 143 | 143 |
| p90 | 290 | 290 |
| p95 | 431 | 431 |
| p99 | 600 | 600 |
| max | 832 | 832 |

Celia Chen · 2026-06-08 23:07:56 +00:00

6042e5810e

feat: support oneOf and allOf in tool input schemas (#24118 )

## Why

Some connector golden schemas use JSON Schema composition keywords
beyond `anyOf`, specifically top-level or nested `oneOf` and `allOf`.
Codex currently needs to preserve those shapes when parsing MCP tool
input schemas so connector tools do not lose valid schema structure
during normalization.

To prevent an increased Responses API error rate, this PR will be merged
after the Responses API supports top-level `oneOf`/`allOf`.

## What Changed

- Adds `oneOf` and `allOf` support to `JsonSchema`, matching the
existing `anyOf` handling.
- Traverses `oneOf` and `allOf` anywhere schema children are visited,
including sanitization, definition reachability, description stripping,
and deep schema compaction.
- Adds a final large-schema compaction pass that prunes schema objects
containing `anyOf`, `oneOf`, or `allOf` to `{}` if earlier compaction
passes still leave the schema over budget.

## Validation
Golden schema token validation over `2,025` schemas under
`golden_schemas`, all parsed successfully. Token count is `o200k_base`
over compact JSON from `parse_tool_input_schema`.

| Percentile | Before PR | After oneOf/allOf | After pruning |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 63 | 64 | 64 |
| p25 | 86 | 87 | 87 |
| p50 | 125 | 128 | 128 |
| p75 | 203 | 206 | 206 |
| p90 | 327 | 333 | 333 |
| p95 | 460 | 473 | 473 |
| p99 | 763 | 779 | 779 |
| max | 891 | 955 | 955 |

Totals:

| Parser state | Total tokens |
|---|---:|
| Before PR | 345,713 |
| After oneOf/allOf | 352,686 |
| After pruning | 352,686 |

The pruning column matches the oneOf/allOf column for this corpus
because no parsed compact golden schema remains over the `4,000`
compact-byte budget after the earlier compaction passes.

Celia Chen · 2026-06-08 21:41:05 +00:00

b89d91f6ff

[codex] Exclude external tool output from memories (#26821 )

## Summary

- add contains_external_context() to tool output so other tools can be
opted out of influencing memory when disable_on_external_context=true
- Classify standalone web-search output as external context (to match
behavior as hosted web search)
- Verify with integration test

rka-oai · 2026-06-08 16:53:04 +00:00

26d9329833

Encrypt multi-agent v2 message payloads (#26210 )

## Why

Multi-agent v2 currently routes agent instructions through normal tool
arguments and inter-agent context. That means the parent model can emit
plaintext task text, Codex can persist it in history/rollouts, and the
recipient can receive it as ordinary assistant-message JSON.

This changes the v2 path so agent instructions stay encrypted between
model calls: Responses encrypts the `message` argument returned by the
model, Codex forwards only that ciphertext, and Responses decrypts it
internally for the recipient model.

## What changed

- Mark the v2 `message` parameter as encrypted for `spawn_agent`,
`send_message`, and `followup_task`.
- Treat multi-agent v2 tool `message` values as ciphertext
unconditionally.
- Store v2 inter-agent task text in
`InterAgentCommunication.encrypted_content` with empty plaintext
`content`.
- Convert encrypted inter-agent communications into the Responses
`agent_message` input item before sending the child request.
- Preserve `agent_message` items across history, rollout, compaction,
telemetry, and app-server schema paths.
- Leave multi-agent v1 unchanged.

## Message shape

The model still calls the v2 tools with a `message` argument, but that
value is now ciphertext:

```json
{
  "name": "spawn_agent",
  "arguments": {
    "task_name": "worker",
    "message": "<ciphertext>"
  }
}
```

Codex stores the task as encrypted inter-agent communication:

```json
{
  "author": "/root",
  "recipient": "/root/worker",
  "content": "",
  "encrypted_content": "<ciphertext>",
  "trigger_turn": true
}
```

When Codex builds the recipient request, it forwards the ciphertext
using the new Responses input item:

```json
{
  "type": "agent_message",
  "author": "/root",
  "recipient": "/root/worker",
  "content": [
    {
      "type": "encrypted_content",
      "encrypted_content": "<ciphertext>"
    }
  ]
}
```

Responses decrypts that item internally for the recipient model.

## Context impact

- Parent context no longer carries plaintext v2 agent task instructions
from these tool arguments.
- Codex rollout/history stores ciphertext for v2 agent instructions.
- Recipient requests receive an `agent_message` item instead of
assistant commentary JSON for encrypted task delivery.
- Plaintext completion/status notifications are still plaintext because
they are Codex-generated status messages, not encrypted model tool
arguments.

## Validation

- `just test -p codex-tools`
- `just test -p codex-protocol`
- `just test -p codex-rollout`
- `just test -p codex-rollout-trace`
- `just test -p codex-otel`
- `just write-app-server-schema`

jif · 2026-06-05 10:25:57 +02:00

5f4d06ef18

[codex] Add use_responses_lite 'override' logic (#26487 )

## Summary

- add a defaulted `ModelInfo.use_responses_lite` catalog field
- support serializing `reasoning.context` while preserving the existing
effort and summary path
- has not been turned on for any models yet

I've added an override to parallel tools if responses_lite is on. I've
also forced persistent reasoning when using responses_lite. It would be
ideal if we could centralize all the responses_lite plumbing, but I
think this is best for now to keep the plumbing & diffs small.

## Testing

- `cargo test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `RUST_MIN_STACK=8388608 cargo test -p codex-core
responses_lite_sets_all_turns_context_and_disables_parallel_tool_calls`
- `RUST_MIN_STACK=8388608 cargo test -p codex-core
configured_reasoning_summary_is_sent`
- `cargo check -p codex-core --tests`
- `RUST_MIN_STACK=8388608 cargo clippy -p codex-core --tests` (passes
with pre-existing warnings in `codex-code-mode` and
`codex-core-plugins`)

rka-oai · 2026-06-04 18:49:51 -07:00

e0096db6dc

Route standalone image generation through host finalization md (#25176 )

## Why

Standalone image-generation extensions emitted turn items through the
low-level event path, bypassing host-owned finalization such as image
persistence and contributor processing. At the same time, the
generated-image save-path hint must remain visible to the model through
the extension tool's `FunctionCallOutput`, rather than the legacy
built-in developer-message path.

## What changed

- Extended `ExtensionTurnItem` to support image-generation items while
keeping the extension-facing emitter API limited to `emit_started` and
`emit_completed`.
- Routed extension completion through core `finalize_turn_item`, so
standalone image-generation items receive host-owned processing and
persisted `saved_path` values before publication.
- Kept legacy built-in image generation on its existing
developer-message hint path, while standalone image generation returns
its deterministic saved-path hint in `FunctionCallOutput`.
- Shared the image artifact path and output-hint formatting used by core
and the image-generation extension.
- Passed thread identity through extension tool calls so standalone
image generation can construct the same intended artifact path as core.
- Added an app-server integration test covering real standalone image
generation, saved artifact publication, model-visible output hint
wiring, and absence of the legacy developer-message hint.

## Validation

- `just fmt`
- `just test -p codex-image-generation-extension`
- `just test -p codex-web-search-extension`
- `just test -p codex-goal-extension`
- `just test -p codex-memories-extension`
- Targeted `codex-core` tests for image save history, extension
completion finalization, and contributor execution
- `just test -p codex-app-server
standalone_image_generation_returns_saved_path_hint_to_model`
- `just fix -p codex-core`
- `just fix -p codex-image-generation-extension`
- `just bazel-lock-update`
- `just bazel-lock-check`

Won Park · 2026-06-02 12:00:04 -07:00

57f337a8e9

Add multi-agent runtime metadata types (#25720 )

Stack split from #25708. Original PR intentionally left open. This first
PR adds the multi-agent runtime metadata types and catalog plumbing used
by the rest of the stack.

jif-oai · 2026-06-02 12:10:14 +02:00

3f1fb7ed8b

Move tool search metadata onto ToolExecutor (#25684 )

Deferred tools need to be searchable even when they are not implemented
inside `codex-core`. Extension-provided tools can be registered for
later discovery, but the search metadata path was still owned by
core-specific runtime hooks, which meant the shared `ToolExecutor`
abstraction could not describe how a deferred extension tool should
appear in `tool_search`.

## Changes

- Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and
re-export them from the shared tools crate.
- Add a default `ToolExecutor::search_info` implementation that derives
loadable tool-search metadata from function and namespace specs.
- Forward search metadata through extension adapters and exposure
overrides while keeping custom search text/source metadata for dynamic,
MCP, and multi-agent tools.
- Remove the old core-local `tool_search_entry` module now that search
metadata lives with the shared executor APIs.

## Testing

- Added `deferred_extension_tools_are_discoverable_with_tool_search`
coverage in `core/src/tools/spec_plan_tests.rs`.

jif-oai · 2026-06-02 00:24:41 +02:00

8d720feb69

feat: gate unified exec zsh fork composition (#24979 )

## Why

`shell_zsh_fork` and unified exec need to remain independently
controllable for enterprise rollouts, but we also need a third mode that
composes them. That composed mode is intended to preserve unified exec
command lifecycle support while letting the zsh fork provide more
accurate `execv(2)` interception.

Enabling `unified_exec_zsh_fork` by itself is intentionally not
sufficient. It is a composition gate, not a dependency-enabling
shortcut:

- `unified_exec` selects the PTY-backed unified exec tool.
- `shell_zsh_fork` opts into the zsh fork backend.
- `unified_exec_zsh_fork` only allows those two already-enabled modes to
be composed so local zsh unified exec commands can launch through the
zsh fork.

This separation is deliberate. Enterprises and staged rollouts must be
able to enable or disable unified exec and zsh-fork independently. If
`unified_exec_zsh_fork` implied either dependency, then enabling one
under-development composition flag would silently activate a shell
backend that the configured feature set left disabled.

This PR introduces only the configuration and planning gate for that
composition. Existing `shell_zsh_fork` behavior continues to use the
standalone shell tool unless the new composition feature is explicitly
enabled alongside both dependencies.

## What Changed

- Added the under-development feature flag `unified_exec_zsh_fork`.
- Added `UnifiedExecFeatureMode` so the three input feature flags
collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
planning.
- Updated tool selection so zsh-fork composition requires
`unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
- Kept the existing standalone zsh-fork shell tool behavior when only
`shell_zsh_fork` is enabled.
- Updated config schema output for the new feature flag.

## Verification

- Added feature and tool-config coverage for the new gate.
- Added planner coverage proving `shell_zsh_fork` remains standalone
until composition is explicitly enabled.
- Ran focused tests for `codex-features`, `codex-tools`, and the
affected `codex-core` planner case.





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
* #24982
* #24981
* #24980
* __->__ #24979

Michael Bolin · 2026-06-01 13:01:36 -07:00

d6748f741a

[codex-rs] auto-review model override (#23767 )

## Why

Guardian auto-review normally uses the provider-preferred review model
when one is available. Some parent models need model-catalog metadata to
select a different review model while keeping older `/models` payloads
compatible when that metadata is absent.

## What changed

- Added optional `ModelInfo::auto_review_model_override` metadata to the
public model payload as a review-model slug.
- Updated Guardian review model selection to prefer the catalog override
when present, while preserving the existing provider preferred-model
path and parent-model fallback when it is omitted.
- Added focused Guardian coverage for override and no-override model
selection.
- Added an `auto_review` core integration suite test that loads override
metadata from a remote model catalog path and asserts the strict
auto-review `/responses` request uses the catalog-selected review model.
- Updated existing `ModelInfo` fixtures and local catalog constructors
for the new optional field.

## Validation

- `cargo test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `cargo test -p codex-core guardian_review_uses_`
- `cargo test -p codex-core
remote_model_override_uses_catalog_model_for_strict_auto_review --test
all`
- `just fix -p codex-protocol`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`

Won Park · 2026-06-01 11:51:15 -07:00

f1609d9fb6

nit: drop todo (#25655 )

jif-oai · 2026-06-01 19:48:29 +02:00

e6eb462f07

[codex] Require model for standalone web search (#25131 )

## Why

The standalone `/v1/alpha/search` request now requires a `model`, but
the `web.run` extension currently omits it.

Adds `model` to extension `ToolCall` invocation.

Follow-up to #23823.

## What changed

- Make `SearchRequest.model` required.
- Expose the effective per-turn model on extension tool calls and pass
it in standalone web-search requests.
- Assert the model is forwarded in the app-server round-trip test.

## Testing

- `just test -p codex-api -p codex-tools -p codex-web-search-extension
-p codex-memories-extension -p codex-goal-extension`
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
- `just test -p codex-app-server -E
'test(standalone_web_search_round_trips_encrypted_output)'`

sayan-oai · 2026-05-29 12:03:04 -07:00

1f93706e99

Route extension image generation through the native image completion pipeline (#24972 )

## Why

The standalone `image_gen.imagegen` extension should behave like native
image generation for artifact persistence and UI completion, while
returning its save-location guidance as part of the tool result instead
of injecting a developer message.

## What Changed

- Added an image-generation completion hook for extension tools so core
can persist generated images and emit the existing `ImageGeneration`
lifecycle events.
- Reused core image artifact persistence for extension output and
removed extension-local save-path/file-writing logic.
- Split shared image persistence from built-in finalization so native
image generation keeps its existing developer-message instruction
behavior.
- Returned the generated image save-location instruction through the
extension `FunctionCallOutput`, alongside the generated image input for
model follow-up.
- Preserved the existing image-generation event shape for current UI and
replay compatibility.
- Avoided cloning the full generated-image base64 payload when emitting
the in-progress image item.
- Removed dependencies no longer needed after moving persistence out of
the extension crate.

## Fast Follow
- Adjust the existing Extension API and add a general `TurnItem`
finalization path for re-usability of code

## Validation

- Ran `just fmt`.
- Ran `just bazel-lock-update`.
- Ran `just bazel-lock-check`.
- Ran `just test -p codex-tools -p codex-extension-api -p
codex-image-generation-extension`.
- Ran `just test -p codex-core
image_generation_publication_is_finalized_by_core`.
- Ran `just test -p codex-core
handle_output_item_done_records_image_save_history_message`.
- Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p
codex-image-generation-extension`.

Won Park · 2026-05-29 17:33:13 +00:00

10b0399034

[codex] Add model tool mode selector (#25031 )

## Why
Some models need to select their code-execution behavior through model
catalog metadata. Models without that metadata must continue to follow
the existing `CodeMode` and `CodeModeOnly` feature flags, including when
a newer server sends an enum value this client does not recognize.

## What changed
- add optional `ModelInfo.tool_mode` metadata with `direct`,
`code_mode`, and `code_mode_only`
- treat omitted and unknown wire values as `None`
- resolve `None` from the existing feature flags
- carry the resolved `ToolMode` directly on `TurnContext`, outside
`Config`
- use the resolved value for turn creation, model switches, review
turns, tool planning, and code execution

## Coverage
- add protocol coverage for omitted, known, and unknown enum values
- add focused coverage for flag fallback and explicit metadata
overriding feature flags
- add core integration coverage that fetches remote model metadata
through `/v1/models` and verifies the outbound `/responses` tools for
explicit `direct` and `code_mode_only` selectors

## Stack
- followed by #25032

Ahmed Ibrahim · 2026-05-29 09:05:05 -07:00

5577a9e148

extension-api: add TurnItemEmitter to tool calls (#24813 )

## Why
Extension-contributed tools need to emit visible turn items through
Codex's normal event and persistence pipeline.

## What
- Add `TurnItemEmitter` to extension `ToolCall`s and route the core
implementation through `Session::emit_turn_item_*`.
- Hold weak session and turn references so retained tool calls cannot
keep host state alive.
- Provide a no-op emitter for extension test callers.

## Test Plan
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`

---------

Co-authored-by: jif-oai <jif@openai.com>

sayan-oai · 2026-05-28 09:13:43 -07:00

2066874415

Update rmcp to 1.7.0 (#24763 )

WIll make it easier to uprev when the new draft spec is supported.

Also updates reqwest where needed for compatibility but doesn't update
it everywhere since this is already a large diff.

The new version of rmcp handles certain kinds of authentication failures
differently, this patch includes support for identifying the failing scope
in a WWW-Authenticate header.

Adam Perry @ OpenAI · 2026-05-27 14:52:06 -07:00

910578792f

fix: dont compact standalone websearch schema (#24660 )

add new `parse_tool_input_schema_without_compaction` to bypass the
existing compaction/trimming of client-provided tool schemas that are
over 4k bytes.

we want this for standalone web search to keep field guidance/metadata
on certain fields; this keeps us closer to parity with existing hosted
tool schema (which didnt go through this 4k byte filter).

sayan-oai · 2026-05-27 01:05:19 +00:00

9fe55d68e6

Restore legacy image detail values (#24644 )

## Why

Older persisted rollouts can contain `input_image.detail` values of
`auto` or `low` from before `ImageDetail` was narrowed to
`high`/`original`. Current deserialization rejects those values, which
can make resume skip later compacted checkpoints and reconstruct an
oversized raw suffix before the next compaction attempt.

Confirmed Sentry reports fixed by this compatibility path:

- [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
- [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
- [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
- [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)

## Background

[openai/codex#20693](https://github.com/openai/codex/pull/20693) added
image-detail plumbing for app-server `UserInput` so input images could
explicitly request `detail: original`. The Slack discussion behind that
PR was about ScreenSpot / bridge evals where user input images were
resized, while tool output images already had MCP/code-mode ways to
request image detail.

In review, the intended new API surface was narrowed to `high` and
`original`: default to `high`, allow `original` when callers need
unchanged image handling, and avoid encouraging new `auto` or `low`
usage. That policy still makes sense for newly emitted values.

The missing compatibility piece is persisted history. Older rollouts can
already contain `auto` and `low`, and resume reconstructs typed history
by deserializing those rollout records. Rejecting old values at that
boundary causes valid compacted checkpoints to be skipped. This PR
restores `auto` and `low` as real variants so old records deserialize
and round-trip without being rewritten as `high`, while product paths
can continue to default to `high` and avoid emitting `auto` for new
behavior.

## What changed

- Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
protocol values.
- Preserved `auto`/`low` through rollout deserialization, MCP image
metadata, code-mode image output, and schema/type generation.
- Kept local image byte handling conservative: only `original` switches
to original-resolution loading; `auto`/`low`/`high` continue through the
resize-to-fit path while retaining their detail value.
- Added regression coverage for enum round-tripping and code-mode `low`
detail handling.

## Testing

- `just write-app-server-schema`
- `just test -p codex-protocol`
- `just test -p codex-tools`
- `just test -p codex-code-mode`
- `just test -p codex-app-server-protocol`
- `just test -p codex-core
suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
- `just test -p codex-core
suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
- Loaded broken rollouts on local fixed builds, and started/completed
new turns.

I also attempted `just test -p codex-core`; the local broad run did not
finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
out. The failures were broad timeout/deadline failures across unrelated
areas; targeted changed-path core tests above passed.

rhan-oai · 2026-05-26 16:24:33 -07:00

dc4e54d061

standalone websearch extension (#23823 )

## Summary

Add the extension-backed standalone `web.run` tool so Codex can call the
standalone search endpoint through the `codex-api` search client and
return its encrypted output to Responses.

- gate the new tool behind `standalone_web_search`
- install the extension in the app-server thread registry and hide
hosted `web_search` when standalone search is enabled for OpenAI
providers so the two paths stay mutually exclusive
- build search context from persisted history using a small tail
heuristic: previous user message, assistant text between the last two
user turns capped at about 1k tokens, and current user message

## Test Plan

- `cargo test -p codex-web-search-extension`
- `cargo test -p codex-api`
- `cargo test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`

sayan-oai · 2026-05-26 11:12:24 -07:00

a22706dfae

Move MCP tool naming mode into manager (#21576 )

## Why

The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.

This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.

## What Changed

- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.

## Testing

- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`

pakrym-oai · 2026-05-26 08:21:15 -07:00

ff7513cd83

chore: add JSON schema policy fixture coverage (#24152 )

## Why

Before changing the Codex Bridge JSON schema policy, add integration
coverage around real connector-like MCP tool schemas. The existing unit
tests cover individual sanitizer behaviors, but they do not make it easy
to see whether full fixture schemas keep model-visible guidance, prune
only unreachable definitions, drop unsupported JSON Schema fields, and
stay within the Responses API schema budget.

## What Changed

- Added `tools/tests/json_schema_policy_fixtures.rs`, which converts MCP
tool fixtures through `mcp_tool_to_responses_api_tool` and validates the
resulting Responses tool parameters.
- Added connector-style fixtures for Slack, Google Calendar, Google
Drive, Notion, and Microsoft Outlook Email under
`tools/tests/fixtures/json_schema_policy/`.
- Added fixture assertions for preserved guidance, pruned definitions,
expected field drops after `JsonSchema` conversion, marker count
baselines, and dangling local `$ref` prevention.
- Added a real oversized golden Notion `create_page` input schema
fixture to exercise the compaction path that strips descriptions, drops
root `$defs`, rewrites local refs, and fits the compacted schema under
the budget.

Celia Chen · 2026-05-22 16:31:33 -07:00

10ac2781eb

feat: best-effort compact large tool schemas (#23904 )

## Why

The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
connector tools, including `$defs` and nested shapes. That improves
fidelity, but it pushes the largest connector schemas well past the
intended tool-schema budget. This PR adds a best-effort compaction pass
for unusually large tool input schemas so the p99 and max tails stay
small while ordinary schemas are left alone.

## What Changed

- Added best-effort large-schema compaction in
`codex-rs/tools/src/json_schema.rs` after schema sanitization and
definition pruning.
- Compaction runs as a waterfall only while the compact JSON budget
proxy is exceeded:
  1. Strip schema `description` metadata.
  2. Drop root `$defs` / `definitions`.
  3. Collapse deep nested complex schema objects to `{}`.
- Kept top-level argument names and immediate schema shape where
possible.

## Corpus Results

Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
Token count is `o200k_base` over compact JSON from
`parse_tool_input_schema`.

| Percentile | Before `origin/main` `4dbca61e20` | After branch
`dev/cc/ref-def` `f9bf071758` | After this PR |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 59 | 63 | 63 |
| p25 | 81 | 86 | 86 |
| p50 | 114 | 127 | 125 |
| p75 | 174 | 205 | 202 |
| p90 | 295 | 335 | 322 |
| p95 | 391 | 526 | 422 |
| p99 | 794 | 1,303 | 689 |
| max | 2,836 | 3,337 | 887 |

After this PR, `0 / 2,025` schemas are over 1k tokens.

### Compaction Savings

These are cumulative waterfall stages over the same corpus. Later passes
only run for schemas that are still over the compact JSON budget proxy.

| Stage | Total tokens | Step savings | Schemas changed by step |
|---|---:|---:|---:|
| No compaction | 391,862 | - | - |
| Strip schema `description` metadata | 350,961 | 40,901 | 66 |
| Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
| Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |

Celia Chen · 2026-05-22 01:26:17 +00:00

464ab40dfa

Expose conversation history to extension tools (#23963 )

## Why

Extension tools that need conversation context should be able to read it
from the live tool invocation instead of reaching into thread
persistence themselves.

## What changed

- Add a `ConversationHistory` snapshot to extension `ToolCall`s and
populate it from the current raw in-memory response history.
- Expose all history items at this boundary so each extension can filter
and bound the subset it needs before consuming or forwarding it.
- Cover the adapter and registry dispatch paths and update existing
extension tests that construct `ToolCall` literals.

## Test plan

- `cargo test -p codex-tools`
- `cargo test -p codex-extension-api`
- `cargo test -p codex-goal-extension`
- `cargo test -p codex-memories-extension`
- `cargo test -p codex-core passes_turn_fields_to_extension_call`
- `cargo test -p codex-core
extension_tool_executors_are_model_visible_and_dispatchable`

sayan-oai · 2026-05-22 01:11:47 +00:00

7e802b22f1

feat: support local refs and defs in tool input schemas (#23357 )

# Why

Some connector tool input schemas use local JSON Schema references and
definition tables to avoid duplicating large nested shapes. Codex
previously lowered these schemas into the supported subset in a way that
could discard `$ref`-only schema objects and lose the corresponding
definitions, which made non-strict tool registration less faithful than
the original connector schema.

This keeps the existing minimal-lowering policy: Codex still does not
raw-pass through arbitrary JSON Schema, but it now preserves local
reference structure that fits the Responses-compatible subset and prunes
definition entries that cannot be reached by following `$ref`s from the
root schema after sanitization, including refs found transitively inside
other reachable definitions. The pruning matters because Responses
parses definition tables even when entries are unused, so keeping dead
definitions wastes prompt tokens.

# What changed

- Added `$ref`, `$defs`, and legacy `definitions` fields to the tool
`JsonSchema` representation.
- Updated `parse_tool_input_schema` lowering so `$ref`-only schema
objects survive sanitization instead of becoming `{}`.
- Sanitized definition tables recursively and dropped malformed
definition tables so non-strict registration degrades gracefully.
- Added reachability pruning for root definition tables by starting from
refs outside definition tables, then following refs inside reachable
definitions.
- Added JSON Pointer decoding for local definition refs such as
`#/$defs/Foo~1Bar`.

# Verification
ran local golden-schema probes against representative connector schemas
to validate behavior on real generated schemas:

| Golden schema | Before bytes | After bytes | `$defs` before -> after |
`$ref` before -> after | Result |
|---|---:|---:|---:|---:|---|
| `google_calendar/create_space` | 7111 | 4526 | 7 -> 7 | 7 -> 7 | all
definitions preserved because all are reachable |
| `figma/apply_file_variable_changes` | 4609 | 999 | 8 -> 5 | 8 -> 5 |
unused defs pruned after unsupported `oneOf` shapes lower away |
| `snowflake/list_catalog_integrations` | 1380 | 404 | 3 -> 0 | 0 -> 0 |
all defs pruned because none are referenced |
| `dropbox/create_shared_link` | 8894 | 1836 | 14 -> 4 | 9 -> 4 | only
defs reachable from the root schema after sanitization are retained,
including transitively through other retained defs |

Token increase across golden schema due to this change:
<img width="817" height="366" alt="Screenshot 2026-05-19 at 1 47 04 PM"
src="https://github.com/user-attachments/assets/d5c80fe9-da85-41e6-8ac7-a01d1e0b0f71"
/>

Celia Chen · 2026-05-22 00:32:14 +00:00

0cec508148

Make tool executor specs mandatory (#23870 )

## Why

`ToolExecutor` is the runtime contract that keeps a callable tool and
its model-visible spec together. Leaving `spec()` optional lets a
registered runtime silently omit that half of the contract, and it also
overloads a missing spec as an exposure decision for tools that should
stay dispatchable without being shown to the model.

## What

- Make `ToolExecutor::spec()` required and update core, extension, and
test tool executors to return a concrete `ToolSpec`.
- Add `ToolExposure::Hidden` for dispatch-only tools. The legacy
`shell_command` runtime in unified-exec sessions now uses that explicit
exposure instead of hiding itself by omitting a spec.
- Build MCP tool specs when `McpHandler` is constructed so invalid MCP
specs are skipped before the handler is registered.
- Keep tool planning aligned with the new contract for direct, deferred,
hidden, code-mode, dynamic, and namespaced tool paths.

## Testing

- Added tool-plan coverage that invalid MCP tool specs are not
registered.
- Updated shell-family coverage for the hidden legacy `shell_command`
runtime and the affected tool executor test fixtures.

jif-oai · 2026-05-21 15:25:56 +02:00

516f134641

Honor client-resolved service tier defaults (#23537 )

## Why

Model catalog responses can now advertise a nullable
`default_service_tier` for each model. Codex needs to preserve three
distinct states all the way from config/app-server inputs to inference:

- no explicit service tier, so the client may apply the current model
catalog default when FastMode is enabled
- explicit `default`, meaning the user intentionally wants standard
routing
- explicit catalog tier ids such as `priority`, `flex`, or future tiers

Keeping those states distinct prevents the UI from showing one tier
while core sends another, especially after model switches or app-server
`thread/start` / `turn/start` updates.

## What Changed

- Plumbed `default_service_tier` through model catalog protocol types,
app-server model responses, generated schemas, model cache fixtures, and
provider/model-manager conversions.
- Added the request-only `default` service tier sentinel and normalized
legacy config spelling so `fast` in `config.toml` still materializes as
the runtime/request id `priority`.
- Moved catalog default resolution to the TUI/client side, including
recomputing the effective service tier when model/FastMode-dependent
surfaces change.
- Updated app-server thread lifecycle config construction so
`serviceTier: null` preserves explicit standard-routing intent by
mapping to `default` instead of internal `None`.
- Kept core responsible for validating explicit tiers against the
current model and stripping `default` before `/v1/responses`, without
applying catalog defaults itself.

## Validation

- `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
- `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
- `cargo test -p codex-tui service_tier`
- `cargo test -p codex-protocol service_tier_for_request`
- `cargo test -p codex-core get_service_tier`
- `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
service_tier`

Shijie Rao · 2026-05-20 15:57:50 -07:00

370b13afc9

feat: add turn_id and truncation_policy to extension tool calls (#23666 )

## Why

Extension-owned tools currently receive a stripped `ToolCall` with only
`call_id`, `tool_name`, and `payload`.
That makes extension work that needs turn-local execution context
awkward, especially web-search extension work that needs the active
`truncation_policy` at tool invocation time.

Reconstructing that value from config or `ExtensionData` would be
indirect and could drift from the actual turn context, so the cleaner
fix is to pass the needed turn metadata directly on the extension-facing
invocation type.

## What changed

- added `turn_id` and `truncation_policy` to `codex_tools::ToolCall`
- populated those fields when core adapts `ToolInvocation` into an
extension tool call
- added a focused adapter test that verifies extension executors receive
the forwarded turn metadata
- updated the memories extension tests to construct the richer
`ToolCall`
- added the `codex-utils-output-truncation` dependency to `codex-tools`
and refreshed lockfiles

## Testing

- `cargo test -p codex-tools`
- `cargo test -p codex-memories-extension`
- `cargo test -p codex-core passes_turn_fields_to_extension_call`
- `just bazel-lock-update`
- `just bazel-lock-check`

jif-oai · 2026-05-20 20:14:41 +02:00

c5bd131567

add encryptedcontent to functioncalloutput (#23500 )

add new `EncryptedContent` variant to `FunctionCallOutputContentItem`
ahead of standalone websearch.

we need to be able to receive and pass encrypted function call output
from the new web search endpoint back to responsesapi, as we cannot
expose direct search results.

sayan-oai · 2026-05-19 23:47:48 -07:00

34aad43684

Split plugin install discovery into list and request tools (#23372 )

## Summary
- Add `list_available_plugins_to_install` as the inventory step for
plugin and connector install suggestions.
- Slim `request_plugin_install` so it only handles the actual
elicitation, instead of carrying the full discoverable list in its
prompt.
- Emit send-time telemetry when an install elicitation is dispatched,
including requested tool identity in the event payload.
- Emit install-result telemetry through `SessionTelemetry`, including
tool type, user response action, and completion status.
- Update registration and tests to cover the new two-step flow while
keeping the existing `tool_suggest` feature gate unchanged.

## Testing
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core request_plugin_install`
- `cargo test -p codex-core list_available_plugins_to_install`
- `cargo test -p codex-core
install_suggestion_tools_can_be_registered_without_search_tool`
- `cargo test -p codex-otel
manager_records_plugin_install_suggestion_metric`
- `cargo test -p codex-otel
manager_records_plugin_install_elicitation_sent_metric`
- `just fix -p codex-core`
- `just fix -p codex-tools`
- `just fix -p codex-otel`
- `cargo check -p codex-core`

Matthew Zeng · 2026-05-19 14:45:37 -07:00

8335b56c33

Remove ToolsConfig from tool planning (#22835 )

## Why

`codex-tools` is meant to hold reusable tool primitives, but
`ToolsConfig` had become a second copy of core runtime decisions instead
of a small shared contract. It carried provider capabilities, auth/model
gates, permission and environment state, web/search/image feature gates,
multi-agent settings, and goal availability from core into `codex-tools`
([definition](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/tools/src/tool_config.rs#L97),
[stored on each
`TurnContext`](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/core/src/session/turn_context.rs#L87)).
Every session/context variant then had to build and mutate that snapshot
before assembling tools.

This PR removes that master object instead of renaming it. Tool planning
now reads the live `TurnContext`, where `codex-core` already owns those
decisions, while `codex-tools` keeps only reusable primitives and a
generic `ToolSetBuilder`/`ToolSet` accumulator.

## What Changed

- Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the
crate keeps the shared helpers that still belong there, including
request-user-input mode selection, shell backend/type resolution,
`UnifiedExecShellMode`, and `ToolEnvironmentMode`.
- Replaced config-snapshot planning with `ToolRouter::from_turn_context`
and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider
capabilities, auth gates, model support, feature gates, environment
count, goal support, multi-agent options, web search, and image
generation from the authoritative turn state.
- Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the
small core adapter needed to accumulate `CoreToolRuntime` values and
hosted model specs.
- Added the `tool_family::shell` registration module and moved
shell/unified-exec/memory accounting call sites to read the narrow
per-turn fields directly.
- Narrowed `TurnContext` to the remaining explicit per-turn fields
needed by planning: `available_models`, `unified_exec_shell_mode`, and
`goal_tools_supported`.
- Reworked MCP exposure and tool-search setup so deferred/direct MCP
behavior is driven by the current turn rather than a precomputed config
snapshot.
- Replaced the large expected-spec fixture tests with focused
behavior-level coverage for shell tools, environments, goal and
agent-job gates, MCP direct/deferred exposure, tool search,
request-plugin-install, code mode, multi-agent mode, hosted tools, and
extension executor dispatch.

## Verification

- `cargo check -p codex-tools`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spec_plan --lib`
- `cargo test -p codex-core router --lib`

jif-oai · 2026-05-19 11:24:09 +02:00

05e171094d

Remove ToolSearch feature toggle (#23389 )

## Summary
- mark `ToolSearch` as removed and ignore stale config writes for its
legacy key
- make search tool exposure depend only on model capability, not a
feature toggle
- remove app-server enablement support and prune now-obsolete test
coverage/setup

## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core search_tool_requires_model_capability`
- `cargo test -p codex-app-server experimental_feature_enablement_set_`

## Notes
- This keeps the legacy config key as a no-op for compatibility while
removing the ability to toggle the behavior off cleanly.
- No developer-facing docs update outside the touched app-server README
was needed.

sayan-oai · 2026-05-19 01:24:39 +00:00

daa11820b0

fix: default unknown tool schemas to empty schemas (#22380 )

## Why

Some tool providers, especially MCP servers and dynamic tool sources,
can supply schema nodes that omit `type` and have no recognized JSON
Schema shape hints. Previously, `sanitize_json_schema` filled those
unknown nodes in as `string`, which made the schema parseable but
invented a scalar constraint that the provider did not specify. For
description-only fields, that could incorrectly steer tool arguments
away from the provider's actual accepted shape.

The Responses API accepts permissive empty schemas such as `{}` at
nested property positions, so Codex should preserve that permissive
meaning instead of coercing unknown schema nodes into a misleading
scalar type.

## What Changed

- Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs`
to clear unrecognized object schema nodes to `{}`.
- Empty schemas now remain `{}` rather than becoming `type: "string"`.
- Description-only or otherwise metadata-only nested property schemas
now become `{}` while surrounding object/array/string/number inference
still applies when recognized hints are present.
- Updated `codex-tools` and `codex-core` tests to cover top-level empty
schemas, nested empty schemas, metadata-only malformed schemas, dynamic
tools, and MCP tool specs.

## Verification

- `cargo test -p codex-tools`
- `cargo test -p codex-core
test_mcp_tool_property_missing_type_defaults_to_empty_schema`
- Manually verified the real Responses API behavior for both
empty-schema positions:
- Top-level function `parameters: {}` is accepted and echoed back as
`{"type":"object","properties":{}}`; when forced to call the tool,
Responses emitted empty object arguments: `"arguments": "{}"`.
- Nested property schema `{}` is accepted and preserved as `{}`; when
forced to call a tool with `metadata.extra`, Responses emitted
`"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer
behavior\"}}"`.

Celia Chen · 2026-05-18 12:41:10 -07:00

4dbca61e20

Make multi-agent v2 tool namespace configurable (#23147 )

## Summary
- Add `features.multi_agent_v2.tool_namespace` with config/schema
validation for Responses-compatible namespace values.
- Thread the resolved namespace into `ToolsConfig` for normal turns and
review turns.
- Wrap MultiAgentV2 tool specs and registry names in the configured
namespace when namespace tools are supported, while falling back to the
plain tool names when they are not.

## Validation
- `just fmt`
- `just write-config-schema`
- `cargo test -p codex-features multi_agent_v2_feature_config --
--nocapture`
- `cargo test -p codex-core test_build_specs_multi_agent_v2 --
--nocapture`
- `cargo test -p codex-core multi_agent_v2_config -- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_rejects_invalid_tool_namespace -- --nocapture`
- `cargo test -p codex-tools`
- `git diff --check`

jif-oai · 2026-05-17 15:27:43 +02:00

545ede569c

Preserve image detail in app-server inputs (#20693 )

## Summary

- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.

Curtis 'Fjord' Hawthorne · 2026-05-15 15:04:04 -07:00

8543e39885

Simplify tool executor and registry plumbing (#22636 )

## Why

The tool runtime path still had a typed output associated type on
`ToolExecutor`, plus a core-only `RegisteredTool` adapter and
extension-only executor aliases. That made every new shared tool runtime
carry extra adapter plumbing before it could participate in core
dispatch, extension tools, hook payloads, telemetry, and model-visible
spec generation.

This PR moves output erasure to the shared executor boundary so core and
extension tools can use the same execution contract directly.

## What Changed

- Changed `codex_tools::ToolExecutor` to return `Box<dyn ToolOutput>`
instead of an associated `Output` type.
- Removed the extension-specific `ExtensionToolExecutor` /
`ExtensionToolOutput` aliases and exposed `ToolExecutor<ToolCall>` plus
`ToolOutput` through `codex-extension-api`.
- Reworked core tool registration around `CoreToolRuntime` and
`ToolRegistry::from_tools`, removing the extra `RegisteredTool` /
`ToolRegistryBuilder` layer.
- Consolidated model-visible spec planning and registry construction in
`core/src/tools/spec_plan.rs`, including deferred tool search and
code-mode-only filtering.
- Added `ToolOutput` helpers for post-tool-use hook ids and inputs so
MCP, unified exec, extension, and other boxed outputs preserve the same
hook payload behavior.
- Updated core handlers, memories tools, and the related
registry/spec/router tests to use the simplified contract.

## Test Coverage

- Updated coverage for tool spec planning, registry lookup, deferred
tool search registration, extension tool routing, post-tool-use hook
payloads, dispatch tracing, guardian output extraction, and memories
extension tool execution.

jif-oai · 2026-05-15 11:47:54 +02:00

6f1a01fbdd

chore(features) rm Feature::ApplyPatchFreeform (#22711 )

## Summary
Removes the feature since this is effectively on by default in all cases
where we should use it, or can be configured via models.json.

## Testing
- [x] unit tests pass

Dylan Hurd · 2026-05-14 16:15:56 -07:00

51b0e94105

feat: make ToolExecutor an async trait (#22560 )

## Why

`codex_tools::ToolExecutor` keeps a tool spec attached to its runtime
handler, but extension tools still carried a parallel
`ExtensionToolFuture` / `ExtensionToolExecutor` shape. That made
extension-owned tools look different from host tools even though
routing, registration, and execution need the same abstraction.

This PR makes the shared executor contract directly async and lets
extension tools implement it too, so host tools and extension tools can
move through the same registration path.

## What changed

- Changed `ToolExecutor::handle` to an `async fn` using `async-trait`,
and updated built-in tool handlers to implement the async trait
directly.
- Replaced the bespoke `ExtensionToolFuture` contract with a marker
`ExtensionToolExecutor` over `ToolExecutor<ToolCall, Output =
JsonToolOutput>`, re-exporting `ToolExecutor` from
`codex-extension-api`.
- Updated the memories extension tools to implement the shared executor
trait.
- Split tool-router construction into collected executors plus hosted
model specs, keeping hosted tools like web search and image generation
separate from executable handlers.
- Updated spec/router tests and extension-tool stubs for the new
executor shape.

## Verification

- Not run locally.

jif-oai · 2026-05-14 11:23:57 +02:00

6d65686313

Make multi_agent_v2 wait_agent timeouts configurable (#22528 )

## Why

`multi_agent_v2` already allowed configuring the minimum `wait_agent`
timeout, but the default timeout and upper bound were still hard-coded.
That made it hard to tune waits for subagent mailbox activity in
sessions that need either faster wakeups or longer waits, and it meant
the model-visible `wait_agent` schema could not fully reflect the
resolved runtime limits.

## What Changed

- Added `features.multi_agent_v2.max_wait_timeout_ms` and
`features.multi_agent_v2.default_wait_timeout_ms` alongside the existing
`min_wait_timeout_ms` setting.
- Validated all three timeouts in config as `0..=3_600_000`, with
`min_wait_timeout_ms <= default_wait_timeout_ms <= max_wait_timeout_ms`.
- Thread and review session tool config now passes the resolved
min/default/max values into the `wait_agent` tool schema.
- `wait_agent` now uses the configured default when `timeout_ms` is
omitted and rejects explicit values outside the configured min/max range
instead of silently clamping them.
- Updated the generated config schema and config-lock test coverage for
the new fields.

Andrey Mishchenko · 2026-05-13 14:43:06 -07:00

7c57a59f51

feat: expose multi-agent v2 as model-only tools (#22514 )

## Why

`code_mode_only` filters code-mode nested tools out of the top-level
tool list. For multi-agent v2, we need a rollout shape where the
collaboration tools remain callable as normal model tools without also
being embedded into the code-mode `exec` tool declaration.

Related to this:
https://openai-corpws.slack.com/archives/C0AQLHB4U75/p1778660267922549

## What Changed

- Adds `features.multi_agent_v2.non_code_mode_only`, including config
resolution, profile override handling, and generated schema coverage.
- Introduces `ToolExposure::DirectModelOnly` so a tool can be included
in the initial model-visible list while staying out of the nested
code-mode tool surface.
- Applies that exposure to the multi-agent v2 tools when the new flag is
set: `spawn_agent`, `send_message`, `followup_task`, `wait_agent`,
`close_agent`, and `list_agents`.
- Updates code-mode-only filtering so direct-model-only tools remain
visible while ordinary nested code-mode tools are still hidden.

## Verification

- Added config parsing/profile tests for `non_code_mode_only`.
- Added tool spec coverage for the code-mode-only multi-agent v2
exposure behavior.

jif-oai · 2026-05-13 19:49:47 +02:00

fc26af377f

[codex] Remove unused legacy shell tools (#22246 )

## Why

Recent session history showed no active use of the raw `shell`,
`local_shell`, or `container.exec` execution surfaces. Keeping those
handlers/specs wired into core leaves duplicate shell execution paths
alongside the supported `shell_command` and unified exec tools.

## What changed

- Removed the raw `shell` handler/spec and its `ShellToolCallParams`
protocol helper.
- Removed the legacy `local_shell` and `container.exec` handler/spec
plumbing while preserving persisted-history compatibility for old
response items.
- Normalized model/config `default` and `local` shell selections to
`shell_command`.
- Pruned tests that exercised removed raw-shell/local-shell/apply-patch
variants and kept coverage on `shell_command`, unified exec, and
freeform `apply_patch`.

## Verification

- `git diff --check`
- `cargo test -p codex-protocol`
- `cargo test -p codex-tools`
- `cargo test -p codex-core tools::handlers::shell`
- `cargo test -p codex-core tools::spec`
- `cargo test -p codex-core tools::router`
- `cargo test -p codex-core
active_call_preserves_triggering_command_context`
- `cargo test -p codex-core guardian_tests`
- `cargo test -p codex-core --test all shell_serialization`
- `cargo test -p codex-core --test all apply_patch_cli`
- `cargo test -p codex-core --test all shell_command_`
- `cargo test -p codex-core --test all local_shell`
- `cargo test -p codex-core --test all otel::`
- `cargo test -p codex-core --test all hooks::`
- `just fix -p codex-core`
- `just fix -p codex-tools`

pakrym-oai · 2026-05-13 16:43:25 +00:00

83decfa300

Introduce tool exposure for deferred registration (#22489 )

## Why

Deferred tools were tracked with separate side-channel filtering after
tool specs had already been assembled. That made the registry
responsible for executing tools while the router/spec planner separately
decided whether those same tools should be exposed to the model up
front.

This PR makes exposure part of the tool handler contract so direct
versus deferred availability travels with the executable tool
registration.

Next step will be to simplify registration

## What Changed

- Adds `ToolExposure` to `codex-tools` and exposes it through
`ToolExecutor`, defaulting tools to `Direct`.
- Teaches dynamic tools and MCP handlers to mark deferred tools as
`Deferred` at construction time.
- Renames the registry object-safe wrapper from `AnyToolHandler` to
`RegisteredTool` and uses `ToolExposure` when deciding whether to
include a handler's spec in the initial model-visible tool list.
- Refactors tool spec planning to derive direct specs and deferred
search entries from registered handlers, removing the router's
special-case deferred dynamic tool filtering.

## Verification

- Not run.

jif-oai · 2026-05-13 18:16:51 +02:00

fdda59c00b

Refactor extension tools onto shared ToolExecutor (#22369 )

## Why

Extension tools were split across two public runtime contracts:
`codex-tool-api` exposed `ToolBundle` plus its own call/spec/error
types, while core native tools used `codex_tools::ToolExecutor`. That
made contributed tool specs and execution behavior easy to drift apart
and added another crate boundary for what should be one executable-tool
seam.

This PR makes `ToolExecutor` the single runtime contract and keeps
extension-specific pinning in `codex-extension-api`.

## Remaining todo

https://github.com/openai/codex/pull/22369/changes#diff-b935ea8245c3ce568a30cff660175fa6390b66b872ae409e1e2e965738250741R5
Either generic `Invocation` or sub-extract the `ToolCall` and clean
`ToolInvocation`

## What changed

- Removed the `codex-tool-api` workspace crate and its dependencies from
core and `codex-extension-api`.
- Made `codex_tools::ToolExecutor` object-safe with `async_trait` so
extension contributors can return a dyn executor.
- Added the extension-facing aliases under
`ext/extension-api/src/contributors/tools.rs`, including
`ExtensionToolExecutor = dyn ToolExecutor<ToolCall, Output =
ExtensionToolOutput>`.
- Changed `ToolContributor::tools` to return extension executors
directly instead of `ToolBundle`s.
- Updated core’s extension tool handler/registry/router path to adapt
those extension executors into the existing native `ToolInvocation`
runtime path.
- Added focused coverage for extension tools being registered,
model-visible, dispatchable, and not replacing built-in tools.

## Verification

- `cargo test -p codex-tools`
- `cargo test -p codex-extension-api`

jif-oai · 2026-05-13 12:12:06 +02:00

9c5dfa7b1a

feat: extract shared tool executor interface (#22359 )

## Why

Codex still models model-visible tools and executable behavior largely
inside `codex-core`, which makes it harder to evolve the tool system
toward a single reusable abstraction for built-ins, MCP-backed tools,
dynamic tools, and later tools injected from outside core.

This PR takes the next incremental step in that direction by moving the
common execution-facing pieces out of core and separating them from
core-only orchestration. The intent is to let shared tool abstractions
improve in one place, while `codex-core` keeps the parts that are still
inherently host-specific today, such as `ToolInvocation`, dispatch
wiring, and hook integration.

This PR is mostly moving things around. The only interesting piece is
this abstraction:
https://github.com/openai/codex/pull/22359/changes#diff-81af519002548ba51ed102bdaaf77e081d40a1e73a6e5f9b104bbbc96a6f1b3dR13

## What changed

- Added `codex_tools::ToolExecutor<Invocation>` as the shared execution
trait for model-visible tools.
- Moved the reusable execution support types from `codex-core` into
`codex-tools`:
  - `FunctionCallError`
  - `ToolPayload`
  - `ToolOutput`
- Refactored core tool implementations so that execution behavior lives
on `ToolExecutor<ToolInvocation>`, while `ToolHandler` remains the
core-local extension point for hook payloads, telemetry tags, diff
consumers, and other orchestration concerns.
- Kept the registry and dispatch flow behaviorally unchanged while
making the shared/extracted boundary explicit across built-in, MCP,
dynamic, extension-backed, shell, and multi-agent tool handlers.

## Verification

- `cargo test -p codex-tools`
- `just fix -p codex-tools`
- `just fix -p codex-core`
- `cargo test -p codex-core` progressed through the updated tool
surfaces and then hit the existing unrelated multi-agent stack overflow
in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.

jif-oai · 2026-05-13 11:31:27 +02:00

1824685a00

Encapsulate tool search entries in handlers (#22261 )

## Why

This builds on the handler-owned spec refactor by moving deferred
tool-search metadata to the same handlers that already own tool specs.
The registry builder no longer needs a separate prebuilt
`tool_search_entries` path; it can collect searchable entries from
deferred handlers directly.

## What changed

- Added `search_info()` to tool handlers and implemented it for MCP and
dynamic handlers.
- Reused handler `spec()` output when constructing tool-search entries,
adapting it into the deferred `LoadableToolSpec` shape expected by
`tool_search`.
- Simplified `build_tool_registry_builder(...)` so `tool_search`
registration is based on deferred handlers with search info.
- Removed the old standalone search-entry builders and now-unused
`codex-tools` discovery helper exports.

## Verification

- `cargo test -p codex-core tools::handlers::tool_search::tests:: --
--nocapture`
- `cargo test -p codex-core tools::spec_plan::tests::search_tool --
--nocapture`
- `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
- `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
- `cargo test -p codex-tools`
- `just fix -p codex-core`
- `just fix -p codex-tools`

pakrym-oai · 2026-05-12 20:48:02 -07:00

104fc14956

141 Commits