Commit Graph

127 Commits

  • Route standalone image generation through host finalization md (#25176)
    ## Why
    
    Standalone image-generation extensions emitted turn items through the
    low-level event path, bypassing host-owned finalization such as image
    persistence and contributor processing. At the same time, the
    generated-image save-path hint must remain visible to the model through
    the extension tool's `FunctionCallOutput`, rather than the legacy
    built-in developer-message path.
    
    ## What changed
    
    - Extended `ExtensionTurnItem` to support image-generation items while
    keeping the extension-facing emitter API limited to `emit_started` and
    `emit_completed`.
    - Routed extension completion through core `finalize_turn_item`, so
    standalone image-generation items receive host-owned processing and
    persisted `saved_path` values before publication.
    - Kept legacy built-in image generation on its existing
    developer-message hint path, while standalone image generation returns
    its deterministic saved-path hint in `FunctionCallOutput`.
    - Shared the image artifact path and output-hint formatting used by core
    and the image-generation extension.
    - Passed thread identity through extension tool calls so standalone
    image generation can construct the same intended artifact path as core.
    - Added an app-server integration test covering real standalone image
    generation, saved artifact publication, model-visible output hint
    wiring, and absence of the legacy developer-message hint.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-goal-extension`
    - `just test -p codex-memories-extension`
    - Targeted `codex-core` tests for image save history, extension
    completion finalization, and contributor execution
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
    - `just fix -p codex-core`
    - `just fix -p codex-image-generation-extension`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • Add multi-agent runtime metadata types (#25720)
    Stack split from #25708. Original PR intentionally left open. This first
    PR adds the multi-agent runtime metadata types and catalog plumbing used
    by the rest of the stack.
  • Move tool search metadata onto ToolExecutor (#25684)
    Deferred tools need to be searchable even when they are not implemented
    inside `codex-core`. Extension-provided tools can be registered for
    later discovery, but the search metadata path was still owned by
    core-specific runtime hooks, which meant the shared `ToolExecutor`
    abstraction could not describe how a deferred extension tool should
    appear in `tool_search`.
    
    ## Changes
    
    - Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and
    re-export them from the shared tools crate.
    - Add a default `ToolExecutor::search_info` implementation that derives
    loadable tool-search metadata from function and namespace specs.
    - Forward search metadata through extension adapters and exposure
    overrides while keeping custom search text/source metadata for dynamic,
    MCP, and multi-agent tools.
    - Remove the old core-local `tool_search_entry` module now that search
    metadata lives with the shared executor APIs.
    
    ## Testing
    
    - Added `deferred_extension_tools_are_discoverable_with_tool_search`
    coverage in `core/src/tools/spec_plan_tests.rs`.
  • feat: gate unified exec zsh fork composition (#24979)
    ## Why
    
    `shell_zsh_fork` and unified exec need to remain independently
    controllable for enterprise rollouts, but we also need a third mode that
    composes them. That composed mode is intended to preserve unified exec
    command lifecycle support while letting the zsh fork provide more
    accurate `execv(2)` interception.
    
    Enabling `unified_exec_zsh_fork` by itself is intentionally not
    sufficient. It is a composition gate, not a dependency-enabling
    shortcut:
    
    - `unified_exec` selects the PTY-backed unified exec tool.
    - `shell_zsh_fork` opts into the zsh fork backend.
    - `unified_exec_zsh_fork` only allows those two already-enabled modes to
    be composed so local zsh unified exec commands can launch through the
    zsh fork.
    
    This separation is deliberate. Enterprises and staged rollouts must be
    able to enable or disable unified exec and zsh-fork independently. If
    `unified_exec_zsh_fork` implied either dependency, then enabling one
    under-development composition flag would silently activate a shell
    backend that the configured feature set left disabled.
    
    This PR introduces only the configuration and planning gate for that
    composition. Existing `shell_zsh_fork` behavior continues to use the
    standalone shell tool unless the new composition feature is explicitly
    enabled alongside both dependencies.
    
    ## What Changed
    
    - Added the under-development feature flag `unified_exec_zsh_fork`.
    - Added `UnifiedExecFeatureMode` so the three input feature flags
    collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
    planning.
    - Updated tool selection so zsh-fork composition requires
    `unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
    - Kept the existing standalone zsh-fork shell tool behavior when only
    `shell_zsh_fork` is enabled.
    - Updated config schema output for the new feature flag.
    
    ## Verification
    
    - Added feature and tool-config coverage for the new gate.
    - Added planner coverage proving `shell_zsh_fork` remains standalone
    until composition is explicitly enabled.
    - Ran focused tests for `codex-features`, `codex-tools`, and the
    affected `codex-core` planner case.
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
    * #24982
    * #24981
    * #24980
    * __->__ #24979
  • [codex-rs] auto-review model override (#23767)
    ## Why
    
    Guardian auto-review normally uses the provider-preferred review model
    when one is available. Some parent models need model-catalog metadata to
    select a different review model while keeping older `/models` payloads
    compatible when that metadata is absent.
    
    ## What changed
    
    - Added optional `ModelInfo::auto_review_model_override` metadata to the
    public model payload as a review-model slug.
    - Updated Guardian review model selection to prefer the catalog override
    when present, while preserving the existing provider preferred-model
    path and parent-model fallback when it is omitted.
    - Added focused Guardian coverage for override and no-override model
    selection.
    - Added an `auto_review` core integration suite test that loads override
    metadata from a remote model catalog path and asserts the strict
    auto-review `/responses` request uses the catalog-selected review model.
    - Updated existing `ModelInfo` fixtures and local catalog constructors
    for the new optional field.
    
    ## Validation
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `cargo test -p codex-core guardian_review_uses_`
    - `cargo test -p codex-core
    remote_model_override_uses_catalog_model_for_strict_auto_review --test
    all`
    - `just fix -p codex-protocol`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] Require model for standalone web search (#25131)
    ## Why
    
    The standalone `/v1/alpha/search` request now requires a `model`, but
    the `web.run` extension currently omits it.
    
    Adds `model` to extension `ToolCall` invocation.
    
    Follow-up to #23823.
    
    ## What changed
    
    - Make `SearchRequest.model` required.
    - Expose the effective per-turn model on extension tool calls and pass
    it in standalone web-search requests.
    - Assert the model is forwarded in the app-server round-trip test.
    
    ## Testing
    
    - `just test -p codex-api -p codex-tools -p codex-web-search-extension
    -p codex-memories-extension -p codex-goal-extension`
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    - `just test -p codex-app-server -E
    'test(standalone_web_search_round_trips_encrypted_output)'`
  • Route extension image generation through the native image completion pipeline (#24972)
    ## Why
    
    The standalone `image_gen.imagegen` extension should behave like native
    image generation for artifact persistence and UI completion, while
    returning its save-location guidance as part of the tool result instead
    of injecting a developer message.
    
    ## What Changed
    
    - Added an image-generation completion hook for extension tools so core
    can persist generated images and emit the existing `ImageGeneration`
    lifecycle events.
    - Reused core image artifact persistence for extension output and
    removed extension-local save-path/file-writing logic.
    - Split shared image persistence from built-in finalization so native
    image generation keeps its existing developer-message instruction
    behavior.
    - Returned the generated image save-location instruction through the
    extension `FunctionCallOutput`, alongside the generated image input for
    model follow-up.
    - Preserved the existing image-generation event shape for current UI and
    replay compatibility.
    - Avoided cloning the full generated-image base64 payload when emitting
    the in-progress image item.
    - Removed dependencies no longer needed after moving persistence out of
    the extension crate.
    
    ## Fast Follow
    - Adjust the existing Extension API and add a general `TurnItem`
    finalization path for re-usability of code
    
    ## Validation
    
    - Ran `just fmt`.
    - Ran `just bazel-lock-update`.
    - Ran `just bazel-lock-check`.
    - Ran `just test -p codex-tools -p codex-extension-api -p
    codex-image-generation-extension`.
    - Ran `just test -p codex-core
    image_generation_publication_is_finalized_by_core`.
    - Ran `just test -p codex-core
    handle_output_item_done_records_image_save_history_message`.
    - Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p
    codex-image-generation-extension`.
  • [codex] Add model tool mode selector (#25031)
    ## Why
    Some models need to select their code-execution behavior through model
    catalog metadata. Models without that metadata must continue to follow
    the existing `CodeMode` and `CodeModeOnly` feature flags, including when
    a newer server sends an enum value this client does not recognize.
    
    ## What changed
    - add optional `ModelInfo.tool_mode` metadata with `direct`,
    `code_mode`, and `code_mode_only`
    - treat omitted and unknown wire values as `None`
    - resolve `None` from the existing feature flags
    - carry the resolved `ToolMode` directly on `TurnContext`, outside
    `Config`
    - use the resolved value for turn creation, model switches, review
    turns, tool planning, and code execution
    
    ## Coverage
    - add protocol coverage for omitted, known, and unknown enum values
    - add focused coverage for flag fallback and explicit metadata
    overriding feature flags
    - add core integration coverage that fetches remote model metadata
    through `/v1/models` and verifies the outbound `/responses` tools for
    explicit `direct` and `code_mode_only` selectors
    
    ## Stack
    - followed by #25032
  • extension-api: add TurnItemEmitter to tool calls (#24813)
    ## Why
    Extension-contributed tools need to emit visible turn items through
    Codex's normal event and persistence pipeline.
    
    ## What
    - Add `TurnItemEmitter` to extension `ToolCall`s and route the core
    implementation through `Session::emit_turn_item_*`.
    - Hold weak session and turn references so retained tool calls cannot
    keep host state alive.
    - Provide a no-op emitter for extension test callers.
    
    ## Test Plan
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Update rmcp to 1.7.0 (#24763)
    WIll make it easier to uprev when the new draft spec is supported.
    
    Also updates reqwest where needed for compatibility but doesn't update
    it everywhere since this is already a large diff.
    
    The new version of rmcp handles certain kinds of authentication failures
    differently, this patch includes support for identifying the failing scope
    in a WWW-Authenticate header.
  • fix: dont compact standalone websearch schema (#24660)
    add new `parse_tool_input_schema_without_compaction` to bypass the
    existing compaction/trimming of client-provided tool schemas that are
    over 4k bytes.
    
    we want this for standalone web search to keep field guidance/metadata
    on certain fields; this keeps us closer to parity with existing hosted
    tool schema (which didnt go through this 4k byte filter).
  • Restore legacy image detail values (#24644)
    ## Why
    
    Older persisted rollouts can contain `input_image.detail` values of
    `auto` or `low` from before `ImageDetail` was narrowed to
    `high`/`original`. Current deserialization rejects those values, which
    can make resume skip later compacted checkpoints and reconstruct an
    oversized raw suffix before the next compaction attempt.
    
    Confirmed Sentry reports fixed by this compatibility path:
    
    - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
    - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
    - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
    - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
    
    ## Background
    
    [openai/codex#20693](https://github.com/openai/codex/pull/20693) added
    image-detail plumbing for app-server `UserInput` so input images could
    explicitly request `detail: original`. The Slack discussion behind that
    PR was about ScreenSpot / bridge evals where user input images were
    resized, while tool output images already had MCP/code-mode ways to
    request image detail.
    
    In review, the intended new API surface was narrowed to `high` and
    `original`: default to `high`, allow `original` when callers need
    unchanged image handling, and avoid encouraging new `auto` or `low`
    usage. That policy still makes sense for newly emitted values.
    
    The missing compatibility piece is persisted history. Older rollouts can
    already contain `auto` and `low`, and resume reconstructs typed history
    by deserializing those rollout records. Rejecting old values at that
    boundary causes valid compacted checkpoints to be skipped. This PR
    restores `auto` and `low` as real variants so old records deserialize
    and round-trip without being rewritten as `high`, while product paths
    can continue to default to `high` and avoid emitting `auto` for new
    behavior.
    
    ## What changed
    
    - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
    protocol values.
    - Preserved `auto`/`low` through rollout deserialization, MCP image
    metadata, code-mode image output, and schema/type generation.
    - Kept local image byte handling conservative: only `original` switches
    to original-resolution loading; `auto`/`low`/`high` continue through the
    resize-to-fit path while retaining their detail value.
    - Added regression coverage for enum round-tripping and code-mode `low`
    detail handling.
    
    ## Testing
    
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-tools`
    - `just test -p codex-code-mode`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
    - `just test -p codex-core
    suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
    - Loaded broken rollouts on local fixed builds, and started/completed
    new turns.
    
    I also attempted `just test -p codex-core`; the local broad run did not
    finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
    out. The failures were broad timeout/deadline failures across unrelated
    areas; targeted changed-path core tests above passed.
  • standalone websearch extension (#23823)
    ## Summary
    
    Add the extension-backed standalone `web.run` tool so Codex can call the
    standalone search endpoint through the `codex-api` search client and
    return its encrypted output to Responses.
    
    - gate the new tool behind `standalone_web_search`
    - install the extension in the app-server thread registry and hide
    hosted `web_search` when standalone search is enabled for OpenAI
    providers so the two paths stay mutually exclusive
    - build search context from persisted history using a small tail
    heuristic: previous user message, assistant text between the last two
    user turns capped at about 1k tokens, and current user message
    
    ## Test Plan
    
    - `cargo test -p codex-web-search-extension`
    - `cargo test -p codex-api`
    - `cargo test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • feat: best-effort compact large tool schemas (#23904)
    ## Why
    
    The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
    connector tools, including `$defs` and nested shapes. That improves
    fidelity, but it pushes the largest connector schemas well past the
    intended tool-schema budget. This PR adds a best-effort compaction pass
    for unusually large tool input schemas so the p99 and max tails stay
    small while ordinary schemas are left alone.
    
    ## What Changed
    
    - Added best-effort large-schema compaction in
    `codex-rs/tools/src/json_schema.rs` after schema sanitization and
    definition pruning.
    - Compaction runs as a waterfall only while the compact JSON budget
    proxy is exceeded:
      1. Strip schema `description` metadata.
      2. Drop root `$defs` / `definitions`.
      3. Collapse deep nested complex schema objects to `{}`.
    - Kept top-level argument names and immediate schema shape where
    possible.
    
    ## Corpus Results
    
    Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
    Token count is `o200k_base` over compact JSON from
    `parse_tool_input_schema`.
    
    | Percentile | Before `origin/main` `4dbca61e20` | After branch
    `dev/cc/ref-def` `f9bf071758` | After this PR |
    |---|---:|---:|---:|
    | p0 | 9 | 9 | 9 |
    | p10 | 59 | 63 | 63 |
    | p25 | 81 | 86 | 86 |
    | p50 | 114 | 127 | 125 |
    | p75 | 174 | 205 | 202 |
    | p90 | 295 | 335 | 322 |
    | p95 | 391 | 526 | 422 |
    | p99 | 794 | 1,303 | 689 |
    | max | 2,836 | 3,337 | 887 |
    
    After this PR, `0 / 2,025` schemas are over 1k tokens.
    
    ### Compaction Savings
    
    These are cumulative waterfall stages over the same corpus. Later passes
    only run for schemas that are still over the compact JSON budget proxy.
    
    | Stage | Total tokens | Step savings | Schemas changed by step |
    |---|---:|---:|---:|
    | No compaction | 391,862 | - | - |
    | Strip schema `description` metadata | 350,961 | 40,901 | 66 |
    | Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
    | Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |
  • Expose conversation history to extension tools (#23963)
    ## Why
    
    Extension tools that need conversation context should be able to read it
    from the live tool invocation instead of reaching into thread
    persistence themselves.
    
    ## What changed
    
    - Add a `ConversationHistory` snapshot to extension `ToolCall`s and
    populate it from the current raw in-memory response history.
    - Expose all history items at this boundary so each extension can filter
    and bound the subset it needs before consuming or forwarding it.
    - Cover the adapter and registry dispatch paths and update existing
    extension tests that construct `ToolCall` literals.
    
    ## Test plan
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-extension-api`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `cargo test -p codex-core
    extension_tool_executors_are_model_visible_and_dispatchable`
  • feat: support local refs and defs in tool input schemas (#23357)
    # Why
    
    Some connector tool input schemas use local JSON Schema references and
    definition tables to avoid duplicating large nested shapes. Codex
    previously lowered these schemas into the supported subset in a way that
    could discard `$ref`-only schema objects and lose the corresponding
    definitions, which made non-strict tool registration less faithful than
    the original connector schema.
    
    This keeps the existing minimal-lowering policy: Codex still does not
    raw-pass through arbitrary JSON Schema, but it now preserves local
    reference structure that fits the Responses-compatible subset and prunes
    definition entries that cannot be reached by following `$ref`s from the
    root schema after sanitization, including refs found transitively inside
    other reachable definitions. The pruning matters because Responses
    parses definition tables even when entries are unused, so keeping dead
    definitions wastes prompt tokens.
    
    # What changed
    
    - Added `$ref`, `$defs`, and legacy `definitions` fields to the tool
    `JsonSchema` representation.
    - Updated `parse_tool_input_schema` lowering so `$ref`-only schema
    objects survive sanitization instead of becoming `{}`.
    - Sanitized definition tables recursively and dropped malformed
    definition tables so non-strict registration degrades gracefully.
    - Added reachability pruning for root definition tables by starting from
    refs outside definition tables, then following refs inside reachable
    definitions.
    - Added JSON Pointer decoding for local definition refs such as
    `#/$defs/Foo~1Bar`.
    
    # Verification
    ran local golden-schema probes against representative connector schemas
    to validate behavior on real generated schemas:
    
    | Golden schema | Before bytes | After bytes | `$defs` before -> after |
    `$ref` before -> after | Result |
    |---|---:|---:|---:|---:|---|
    | `google_calendar/create_space` | 7111 | 4526 | 7 -> 7 | 7 -> 7 | all
    definitions preserved because all are reachable |
    | `figma/apply_file_variable_changes` | 4609 | 999 | 8 -> 5 | 8 -> 5 |
    unused defs pruned after unsupported `oneOf` shapes lower away |
    | `snowflake/list_catalog_integrations` | 1380 | 404 | 3 -> 0 | 0 -> 0 |
    all defs pruned because none are referenced |
    | `dropbox/create_shared_link` | 8894 | 1836 | 14 -> 4 | 9 -> 4 | only
    defs reachable from the root schema after sanitization are retained,
    including transitively through other retained defs |
    
    Token increase across golden schema due to this change:
    <img width="817" height="366" alt="Screenshot 2026-05-19 at 1 47 04 PM"
    src="https://github.com/user-attachments/assets/d5c80fe9-da85-41e6-8ac7-a01d1e0b0f71"
    />
  • Make tool executor specs mandatory (#23870)
    ## Why
    
    `ToolExecutor` is the runtime contract that keeps a callable tool and
    its model-visible spec together. Leaving `spec()` optional lets a
    registered runtime silently omit that half of the contract, and it also
    overloads a missing spec as an exposure decision for tools that should
    stay dispatchable without being shown to the model.
    
    ## What
    
    - Make `ToolExecutor::spec()` required and update core, extension, and
    test tool executors to return a concrete `ToolSpec`.
    - Add `ToolExposure::Hidden` for dispatch-only tools. The legacy
    `shell_command` runtime in unified-exec sessions now uses that explicit
    exposure instead of hiding itself by omitting a spec.
    - Build MCP tool specs when `McpHandler` is constructed so invalid MCP
    specs are skipped before the handler is registered.
    - Keep tool planning aligned with the new contract for direct, deferred,
    hidden, code-mode, dynamic, and namespaced tool paths.
    
    ## Testing
    
    - Added tool-plan coverage that invalid MCP tool specs are not
    registered.
    - Updated shell-family coverage for the hidden legacy `shell_command`
    runtime and the affected tool executor test fixtures.
  • Honor client-resolved service tier defaults (#23537)
    ## Why
    
    Model catalog responses can now advertise a nullable
    `default_service_tier` for each model. Codex needs to preserve three
    distinct states all the way from config/app-server inputs to inference:
    
    - no explicit service tier, so the client may apply the current model
    catalog default when FastMode is enabled
    - explicit `default`, meaning the user intentionally wants standard
    routing
    - explicit catalog tier ids such as `priority`, `flex`, or future tiers
    
    Keeping those states distinct prevents the UI from showing one tier
    while core sends another, especially after model switches or app-server
    `thread/start` / `turn/start` updates.
    
    ## What Changed
    
    - Plumbed `default_service_tier` through model catalog protocol types,
    app-server model responses, generated schemas, model cache fixtures, and
    provider/model-manager conversions.
    - Added the request-only `default` service tier sentinel and normalized
    legacy config spelling so `fast` in `config.toml` still materializes as
    the runtime/request id `priority`.
    - Moved catalog default resolution to the TUI/client side, including
    recomputing the effective service tier when model/FastMode-dependent
    surfaces change.
    - Updated app-server thread lifecycle config construction so
    `serviceTier: null` preserves explicit standard-routing intent by
    mapping to `default` instead of internal `None`.
    - Kept core responsible for validating explicit tiers against the
    current model and stripping `default` before `/v1/responses`, without
    applying catalog defaults itself.
    
    ## Validation
    
    - `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
    - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
    - `cargo test -p codex-tui service_tier`
    - `cargo test -p codex-protocol service_tier_for_request`
    - `cargo test -p codex-core get_service_tier`
    - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
    service_tier`
  • feat: add turn_id and truncation_policy to extension tool calls (#23666)
    ## Why
    
    Extension-owned tools currently receive a stripped `ToolCall` with only
    `call_id`, `tool_name`, and `payload`.
    That makes extension work that needs turn-local execution context
    awkward, especially web-search extension work that needs the active
    `truncation_policy` at tool invocation time.
    
    Reconstructing that value from config or `ExtensionData` would be
    indirect and could drift from the actual turn context, so the cleaner
    fix is to pass the needed turn metadata directly on the extension-facing
    invocation type.
    
    ## What changed
    
    - added `turn_id` and `truncation_policy` to `codex_tools::ToolCall`
    - populated those fields when core adapts `ToolInvocation` into an
    extension tool call
    - added a focused adapter test that verifies extension executors receive
    the forwarded turn metadata
    - updated the memories extension tests to construct the richer
    `ToolCall`
    - added the `codex-utils-output-truncation` dependency to `codex-tools`
    and refreshed lockfiles
    
    ## Testing
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • add encryptedcontent to functioncalloutput (#23500)
    add new `EncryptedContent` variant to `FunctionCallOutputContentItem`
    ahead of standalone websearch.
    
    we need to be able to receive and pass encrypted function call output
    from the new web search endpoint back to responsesapi, as we cannot
    expose direct search results.
  • Split plugin install discovery into list and request tools (#23372)
    ## Summary
    - Add `list_available_plugins_to_install` as the inventory step for
    plugin and connector install suggestions.
    - Slim `request_plugin_install` so it only handles the actual
    elicitation, instead of carrying the full discoverable list in its
    prompt.
    - Emit send-time telemetry when an install elicitation is dispatched,
    including requested tool identity in the event payload.
    - Emit install-result telemetry through `SessionTelemetry`, including
    tool type, user response action, and completion status.
    - Update registration and tests to cover the new two-step flow while
    keeping the existing `tool_suggest` feature gate unchanged.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core request_plugin_install`
    - `cargo test -p codex-core list_available_plugins_to_install`
    - `cargo test -p codex-core
    install_suggestion_tools_can_be_registered_without_search_tool`
    - `cargo test -p codex-otel
    manager_records_plugin_install_suggestion_metric`
    - `cargo test -p codex-otel
    manager_records_plugin_install_elicitation_sent_metric`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
    - `just fix -p codex-otel`
    - `cargo check -p codex-core`
  • Remove ToolsConfig from tool planning (#22835)
    ## Why
    
    `codex-tools` is meant to hold reusable tool primitives, but
    `ToolsConfig` had become a second copy of core runtime decisions instead
    of a small shared contract. It carried provider capabilities, auth/model
    gates, permission and environment state, web/search/image feature gates,
    multi-agent settings, and goal availability from core into `codex-tools`
    ([definition](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/tools/src/tool_config.rs#L97),
    [stored on each
    `TurnContext`](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/core/src/session/turn_context.rs#L87)).
    Every session/context variant then had to build and mutate that snapshot
    before assembling tools.
    
    This PR removes that master object instead of renaming it. Tool planning
    now reads the live `TurnContext`, where `codex-core` already owns those
    decisions, while `codex-tools` keeps only reusable primitives and a
    generic `ToolSetBuilder`/`ToolSet` accumulator.
    
    ## What Changed
    
    - Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the
    crate keeps the shared helpers that still belong there, including
    request-user-input mode selection, shell backend/type resolution,
    `UnifiedExecShellMode`, and `ToolEnvironmentMode`.
    - Replaced config-snapshot planning with `ToolRouter::from_turn_context`
    and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider
    capabilities, auth gates, model support, feature gates, environment
    count, goal support, multi-agent options, web search, and image
    generation from the authoritative turn state.
    - Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the
    small core adapter needed to accumulate `CoreToolRuntime` values and
    hosted model specs.
    - Added the `tool_family::shell` registration module and moved
    shell/unified-exec/memory accounting call sites to read the narrow
    per-turn fields directly.
    - Narrowed `TurnContext` to the remaining explicit per-turn fields
    needed by planning: `available_models`, `unified_exec_shell_mode`, and
    `goal_tools_supported`.
    - Reworked MCP exposure and tool-search setup so deferred/direct MCP
    behavior is driven by the current turn rather than a precomputed config
    snapshot.
    - Replaced the large expected-spec fixture tests with focused
    behavior-level coverage for shell tools, environments, goal and
    agent-job gates, MCP direct/deferred exposure, tool search,
    request-plugin-install, code mode, multi-agent mode, hosted tools, and
    extension executor dispatch.
    
    ## Verification
    
    - `cargo check -p codex-tools`
    - `cargo check -p codex-core --lib`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core spec_plan --lib`
    - `cargo test -p codex-core router --lib`
  • Remove ToolSearch feature toggle (#23389)
    ## Summary
    - mark `ToolSearch` as removed and ignore stale config writes for its
    legacy key
    - make search tool exposure depend only on model capability, not a
    feature toggle
    - remove app-server enablement support and prune now-obsolete test
    coverage/setup
    
    ## Verification
    - `cargo test -p codex-features`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core search_tool_requires_model_capability`
    - `cargo test -p codex-app-server experimental_feature_enablement_set_`
    
    ## Notes
    - This keeps the legacy config key as a no-op for compatibility while
    removing the ability to toggle the behavior off cleanly.
    - No developer-facing docs update outside the touched app-server README
    was needed.
  • fix: default unknown tool schemas to empty schemas (#22380)
    ## Why
    
    Some tool providers, especially MCP servers and dynamic tool sources,
    can supply schema nodes that omit `type` and have no recognized JSON
    Schema shape hints. Previously, `sanitize_json_schema` filled those
    unknown nodes in as `string`, which made the schema parseable but
    invented a scalar constraint that the provider did not specify. For
    description-only fields, that could incorrectly steer tool arguments
    away from the provider's actual accepted shape.
    
    The Responses API accepts permissive empty schemas such as `{}` at
    nested property positions, so Codex should preserve that permissive
    meaning instead of coercing unknown schema nodes into a misleading
    scalar type.
    
    ## What Changed
    
    - Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs`
    to clear unrecognized object schema nodes to `{}`.
    - Empty schemas now remain `{}` rather than becoming `type: "string"`.
    - Description-only or otherwise metadata-only nested property schemas
    now become `{}` while surrounding object/array/string/number inference
    still applies when recognized hints are present.
    - Updated `codex-tools` and `codex-core` tests to cover top-level empty
    schemas, nested empty schemas, metadata-only malformed schemas, dynamic
    tools, and MCP tool specs.
    
    ## Verification
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core
    test_mcp_tool_property_missing_type_defaults_to_empty_schema`
    - Manually verified the real Responses API behavior for both
    empty-schema positions:
    - Top-level function `parameters: {}` is accepted and echoed back as
    `{"type":"object","properties":{}}`; when forced to call the tool,
    Responses emitted empty object arguments: `"arguments": "{}"`.
    - Nested property schema `{}` is accepted and preserved as `{}`; when
    forced to call a tool with `metadata.extra`, Responses emitted
    `"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer
    behavior\"}}"`.
  • Make multi-agent v2 tool namespace configurable (#23147)
    ## Summary
    - Add `features.multi_agent_v2.tool_namespace` with config/schema
    validation for Responses-compatible namespace values.
    - Thread the resolved namespace into `ToolsConfig` for normal turns and
    review turns.
    - Wrap MultiAgentV2 tool specs and registry names in the configured
    namespace when namespace tools are supported, while falling back to the
    plain tool names when they are not.
    
    ## Validation
    - `just fmt`
    - `just write-config-schema`
    - `cargo test -p codex-features multi_agent_v2_feature_config --
    --nocapture`
    - `cargo test -p codex-core test_build_specs_multi_agent_v2 --
    --nocapture`
    - `cargo test -p codex-core multi_agent_v2_config -- --nocapture`
    - `cargo test -p codex-core
    multi_agent_v2_rejects_invalid_tool_namespace -- --nocapture`
    - `cargo test -p codex-tools`
    - `git diff --check`
  • Preserve image detail in app-server inputs (#20693)
    ## Summary
    
    - Add optional image detail to user image inputs across core, app-server
    v2, thread history/event mapping, and the generated app-server
    schemas/types.
    - Preserve requested detail when serializing Responses image inputs:
    omitted detail stays on the existing `high` default, while explicit
    `original` keeps local images on the original-resolution path.
    - Support `high`/`original` consistently for tool image outputs,
    including MCP `codex/imageDetail`, code-mode image helpers, and
    `view_image`.
  • Simplify tool executor and registry plumbing (#22636)
    ## Why
    
    The tool runtime path still had a typed output associated type on
    `ToolExecutor`, plus a core-only `RegisteredTool` adapter and
    extension-only executor aliases. That made every new shared tool runtime
    carry extra adapter plumbing before it could participate in core
    dispatch, extension tools, hook payloads, telemetry, and model-visible
    spec generation.
    
    This PR moves output erasure to the shared executor boundary so core and
    extension tools can use the same execution contract directly.
    
    ## What Changed
    
    - Changed `codex_tools::ToolExecutor` to return `Box<dyn ToolOutput>`
    instead of an associated `Output` type.
    - Removed the extension-specific `ExtensionToolExecutor` /
    `ExtensionToolOutput` aliases and exposed `ToolExecutor<ToolCall>` plus
    `ToolOutput` through `codex-extension-api`.
    - Reworked core tool registration around `CoreToolRuntime` and
    `ToolRegistry::from_tools`, removing the extra `RegisteredTool` /
    `ToolRegistryBuilder` layer.
    - Consolidated model-visible spec planning and registry construction in
    `core/src/tools/spec_plan.rs`, including deferred tool search and
    code-mode-only filtering.
    - Added `ToolOutput` helpers for post-tool-use hook ids and inputs so
    MCP, unified exec, extension, and other boxed outputs preserve the same
    hook payload behavior.
    - Updated core handlers, memories tools, and the related
    registry/spec/router tests to use the simplified contract.
    
    ## Test Coverage
    
    - Updated coverage for tool spec planning, registry lookup, deferred
    tool search registration, extension tool routing, post-tool-use hook
    payloads, dispatch tracing, guardian output extraction, and memories
    extension tool execution.
  • chore(features) rm Feature::ApplyPatchFreeform (#22711)
    ## Summary
    Removes the feature since this is effectively on by default in all cases
    where we should use it, or can be configured via models.json.
    
    ## Testing
    - [x] unit tests pass
  • feat: make ToolExecutor an async trait (#22560)
    ## Why
    
    `codex_tools::ToolExecutor` keeps a tool spec attached to its runtime
    handler, but extension tools still carried a parallel
    `ExtensionToolFuture` / `ExtensionToolExecutor` shape. That made
    extension-owned tools look different from host tools even though
    routing, registration, and execution need the same abstraction.
    
    This PR makes the shared executor contract directly async and lets
    extension tools implement it too, so host tools and extension tools can
    move through the same registration path.
    
    ## What changed
    
    - Changed `ToolExecutor::handle` to an `async fn` using `async-trait`,
    and updated built-in tool handlers to implement the async trait
    directly.
    - Replaced the bespoke `ExtensionToolFuture` contract with a marker
    `ExtensionToolExecutor` over `ToolExecutor<ToolCall, Output =
    JsonToolOutput>`, re-exporting `ToolExecutor` from
    `codex-extension-api`.
    - Updated the memories extension tools to implement the shared executor
    trait.
    - Split tool-router construction into collected executors plus hosted
    model specs, keeping hosted tools like web search and image generation
    separate from executable handlers.
    - Updated spec/router tests and extension-tool stubs for the new
    executor shape.
    
    ## Verification
    
    - Not run locally.
  • Make multi_agent_v2 wait_agent timeouts configurable (#22528)
    ## Why
    
    `multi_agent_v2` already allowed configuring the minimum `wait_agent`
    timeout, but the default timeout and upper bound were still hard-coded.
    That made it hard to tune waits for subagent mailbox activity in
    sessions that need either faster wakeups or longer waits, and it meant
    the model-visible `wait_agent` schema could not fully reflect the
    resolved runtime limits.
    
    ## What Changed
    
    - Added `features.multi_agent_v2.max_wait_timeout_ms` and
    `features.multi_agent_v2.default_wait_timeout_ms` alongside the existing
    `min_wait_timeout_ms` setting.
    - Validated all three timeouts in config as `0..=3_600_000`, with
    `min_wait_timeout_ms <= default_wait_timeout_ms <= max_wait_timeout_ms`.
    - Thread and review session tool config now passes the resolved
    min/default/max values into the `wait_agent` tool schema.
    - `wait_agent` now uses the configured default when `timeout_ms` is
    omitted and rejects explicit values outside the configured min/max range
    instead of silently clamping them.
    - Updated the generated config schema and config-lock test coverage for
    the new fields.
  • feat: expose multi-agent v2 as model-only tools (#22514)
    ## Why
    
    `code_mode_only` filters code-mode nested tools out of the top-level
    tool list. For multi-agent v2, we need a rollout shape where the
    collaboration tools remain callable as normal model tools without also
    being embedded into the code-mode `exec` tool declaration.
    
    Related to this:
    https://openai-corpws.slack.com/archives/C0AQLHB4U75/p1778660267922549
    
    ## What Changed
    
    - Adds `features.multi_agent_v2.non_code_mode_only`, including config
    resolution, profile override handling, and generated schema coverage.
    - Introduces `ToolExposure::DirectModelOnly` so a tool can be included
    in the initial model-visible list while staying out of the nested
    code-mode tool surface.
    - Applies that exposure to the multi-agent v2 tools when the new flag is
    set: `spawn_agent`, `send_message`, `followup_task`, `wait_agent`,
    `close_agent`, and `list_agents`.
    - Updates code-mode-only filtering so direct-model-only tools remain
    visible while ordinary nested code-mode tools are still hidden.
    
    ## Verification
    
    - Added config parsing/profile tests for `non_code_mode_only`.
    - Added tool spec coverage for the code-mode-only multi-agent v2
    exposure behavior.
  • [codex] Remove unused legacy shell tools (#22246)
    ## Why
    
    Recent session history showed no active use of the raw `shell`,
    `local_shell`, or `container.exec` execution surfaces. Keeping those
    handlers/specs wired into core leaves duplicate shell execution paths
    alongside the supported `shell_command` and unified exec tools.
    
    ## What changed
    
    - Removed the raw `shell` handler/spec and its `ShellToolCallParams`
    protocol helper.
    - Removed the legacy `local_shell` and `container.exec` handler/spec
    plumbing while preserving persisted-history compatibility for old
    response items.
    - Normalized model/config `default` and `local` shell selections to
    `shell_command`.
    - Pruned tests that exercised removed raw-shell/local-shell/apply-patch
    variants and kept coverage on `shell_command`, unified exec, and
    freeform `apply_patch`.
    
    ## Verification
    
    - `git diff --check`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core tools::handlers::shell`
    - `cargo test -p codex-core tools::spec`
    - `cargo test -p codex-core tools::router`
    - `cargo test -p codex-core
    active_call_preserves_triggering_command_context`
    - `cargo test -p codex-core guardian_tests`
    - `cargo test -p codex-core --test all shell_serialization`
    - `cargo test -p codex-core --test all apply_patch_cli`
    - `cargo test -p codex-core --test all shell_command_`
    - `cargo test -p codex-core --test all local_shell`
    - `cargo test -p codex-core --test all otel::`
    - `cargo test -p codex-core --test all hooks::`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
  • Introduce tool exposure for deferred registration (#22489)
    ## Why
    
    Deferred tools were tracked with separate side-channel filtering after
    tool specs had already been assembled. That made the registry
    responsible for executing tools while the router/spec planner separately
    decided whether those same tools should be exposed to the model up
    front.
    
    This PR makes exposure part of the tool handler contract so direct
    versus deferred availability travels with the executable tool
    registration.
    
    Next step will be to simplify registration
    
    ## What Changed
    
    - Adds `ToolExposure` to `codex-tools` and exposes it through
    `ToolExecutor`, defaulting tools to `Direct`.
    - Teaches dynamic tools and MCP handlers to mark deferred tools as
    `Deferred` at construction time.
    - Renames the registry object-safe wrapper from `AnyToolHandler` to
    `RegisteredTool` and uses `ToolExposure` when deciding whether to
    include a handler's spec in the initial model-visible tool list.
    - Refactors tool spec planning to derive direct specs and deferred
    search entries from registered handlers, removing the router's
    special-case deferred dynamic tool filtering.
    
    ## Verification
    
    - Not run.
  • Refactor extension tools onto shared ToolExecutor (#22369)
    ## Why
    
    Extension tools were split across two public runtime contracts:
    `codex-tool-api` exposed `ToolBundle` plus its own call/spec/error
    types, while core native tools used `codex_tools::ToolExecutor`. That
    made contributed tool specs and execution behavior easy to drift apart
    and added another crate boundary for what should be one executable-tool
    seam.
    
    This PR makes `ToolExecutor` the single runtime contract and keeps
    extension-specific pinning in `codex-extension-api`.
    
    ## Remaining todo
    
    https://github.com/openai/codex/pull/22369/changes#diff-b935ea8245c3ce568a30cff660175fa6390b66b872ae409e1e2e965738250741R5
    Either generic `Invocation` or sub-extract the `ToolCall` and clean
    `ToolInvocation`
    
    ## What changed
    
    - Removed the `codex-tool-api` workspace crate and its dependencies from
    core and `codex-extension-api`.
    - Made `codex_tools::ToolExecutor` object-safe with `async_trait` so
    extension contributors can return a dyn executor.
    - Added the extension-facing aliases under
    `ext/extension-api/src/contributors/tools.rs`, including
    `ExtensionToolExecutor = dyn ToolExecutor<ToolCall, Output =
    ExtensionToolOutput>`.
    - Changed `ToolContributor::tools` to return extension executors
    directly instead of `ToolBundle`s.
    - Updated core’s extension tool handler/registry/router path to adapt
    those extension executors into the existing native `ToolInvocation`
    runtime path.
    - Added focused coverage for extension tools being registered,
    model-visible, dispatchable, and not replacing built-in tools.
    
    ## Verification
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-extension-api`
  • feat: extract shared tool executor interface (#22359)
    ## Why
    
    Codex still models model-visible tools and executable behavior largely
    inside `codex-core`, which makes it harder to evolve the tool system
    toward a single reusable abstraction for built-ins, MCP-backed tools,
    dynamic tools, and later tools injected from outside core.
    
    This PR takes the next incremental step in that direction by moving the
    common execution-facing pieces out of core and separating them from
    core-only orchestration. The intent is to let shared tool abstractions
    improve in one place, while `codex-core` keeps the parts that are still
    inherently host-specific today, such as `ToolInvocation`, dispatch
    wiring, and hook integration.
    
    This PR is mostly moving things around. The only interesting piece is
    this abstraction:
    https://github.com/openai/codex/pull/22359/changes#diff-81af519002548ba51ed102bdaaf77e081d40a1e73a6e5f9b104bbbc96a6f1b3dR13
    
    ## What changed
    
    - Added `codex_tools::ToolExecutor<Invocation>` as the shared execution
    trait for model-visible tools.
    - Moved the reusable execution support types from `codex-core` into
    `codex-tools`:
      - `FunctionCallError`
      - `ToolPayload`
      - `ToolOutput`
    - Refactored core tool implementations so that execution behavior lives
    on `ToolExecutor<ToolInvocation>`, while `ToolHandler` remains the
    core-local extension point for hook payloads, telemetry tags, diff
    consumers, and other orchestration concerns.
    - Kept the registry and dispatch flow behaviorally unchanged while
    making the shared/extracted boundary explicit across built-in, MCP,
    dynamic, extension-backed, shell, and multi-agent tool handlers.
    
    ## Verification
    
    - `cargo test -p codex-tools`
    - `just fix -p codex-tools`
    - `just fix -p codex-core`
    - `cargo test -p codex-core` progressed through the updated tool
    surfaces and then hit the existing unrelated multi-agent stack overflow
    in
    `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
  • Encapsulate tool search entries in handlers (#22261)
    ## Why
    
    This builds on the handler-owned spec refactor by moving deferred
    tool-search metadata to the same handlers that already own tool specs.
    The registry builder no longer needs a separate prebuilt
    `tool_search_entries` path; it can collect searchable entries from
    deferred handlers directly.
    
    ## What changed
    
    - Added `search_info()` to tool handlers and implemented it for MCP and
    dynamic handlers.
    - Reused handler `spec()` output when constructing tool-search entries,
    adapting it into the deferred `LoadableToolSpec` shape expected by
    `tool_search`.
    - Simplified `build_tool_registry_builder(...)` so `tool_search`
    registration is based on deferred handlers with search info.
    - Removed the old standalone search-entry builders and now-unused
    `codex-tools` discovery helper exports.
    
    ## Verification
    
    - `cargo test -p codex-core tools::handlers::tool_search::tests:: --
    --nocapture`
    - `cargo test -p codex-core tools::spec_plan::tests::search_tool --
    --nocapture`
    - `cargo test -p codex-core tools::spec::tests:: -- --nocapture`
    - `cargo test -p codex-core tools::spec_plan::tests:: -- --nocapture`
    - `cargo test -p codex-tools`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
  • [codex] Make handlers own parallel tool support (#22254)
    ## Why
    
    `ToolRouter::tool_supports_parallel()` was still consulting configured
    specs when a handler lookup missed, even though parallel schedulability
    is really a property of the executable handler. Keeping that metadata on
    `ConfiguredToolSpec` duplicated state between the model-visible spec
    layer and the runtime handler layer.
    
    This change makes handlers the sole source of truth for parallel tool
    support and removes the extra spec wrapper that only existed to carry
    duplicated metadata.
    
    ## What changed
    
    - removed `ConfiguredToolSpec` and store plain `ToolSpec` values in the
    registry/router builder path
    - changed `ToolRouter::tool_supports_parallel()` to consult only the
    handler registry and fall back to `false`
    - simplified spec collection and test helpers to operate directly on
    `ToolSpec`
    - updated router/spec tests to cover handler-owned parallel behavior and
    the no-handler fallback
    
    ## Validation
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
    - `cargo test -p codex-core
    deferred_responses_api_tool_serializes_with_defer_loading`
    - `cargo test -p codex-core
    tools_without_handlers_do_not_support_parallel`
    - `cargo test -p codex-core
    request_plugin_install_can_be_registered_without_search_tool`
    
    ## Docs
    
    No documentation updates needed.
  • [codex] Delete function-style apply_patch (#21651)
    ## Why
    
    `apply_patch` is now a freeform/custom tool. Keeping the old
    JSON/function-style registration and parsing path left another way for
    models and tests to invoke `apply_patch`, which made the tool surface
    harder to reason about.
    
    ## What changed
    
    - Removed the `ApplyPatchToolType::Function` variant, JSON `apply_patch`
    spec, and handler support for function payloads.
    - Kept `apply_patch_tool_type = freeform` as the supported model
    metadata path, including Bedrock catalog metadata.
    - Migrated `apply_patch` tests and SSE fixtures to custom/freeform tool
    calls.
    
    ## Verification
    
    - `cargo test -p codex-tools -p codex-protocol -p codex-model-provider`
    - `cargo test -p codex-core tools::handlers::apply_patch --lib`
    - `cargo test -p codex-core --test all
    apply_patch_tool_executes_and_emits_patch_events`
    - `cargo test -p codex-core --test all
    apply_patch_reports_parse_diagnostics`
    - `cargo test -p codex-exec test_apply_patch_tool`
    - `just fix -p codex-core`
    - `just fix -p codex-tools -p codex-protocol -p codex-model-provider -p
    codex-exec`
  • [codex] Enable apply_patch freeform by default (#21687)
    ## Summary
    - enable `apply_patch_freeform` by default in the feature registry
    
    ## Why
    - make the freeform `apply_patch` tool available by default when model
    metadata does not explicitly opt into another mode
    
    ## Validation
    - `just fmt`
    - did not run tests
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [codex] Move tool specs into core handlers (#21416)
    ## Why
    
    This is the first mechanical slice of moving tool spec ownership toward
    the handlers. `codex-tools` should keep shared primitives and conversion
    helpers, while builtin tool specs and registration planning live in
    `codex-core` with the handlers that own those tools.
    
    Keeping this PR to relocation and import updates isolates the copy/move
    review from the later logic change that wires specs through registered
    handlers.
    
    ## What changed
    
    - Moved builtin tool spec constructors from `codex-rs/tools/src` into
    `codex-rs/core/src/tools/handlers/*_spec.rs` or nearby core tool
    modules.
    - Moved the registry planning code into
    `codex-rs/core/src/tools/spec_plan.rs` and its associated types/tests
    into core.
    - Kept shared primitives in `codex-tools`, including `ToolSpec`,
    schema/types, discovery/config primitives, dynamic/MCP conversion
    helpers, and code-mode collection helpers.
    - Updated handlers that referenced moved argument types or tool-name
    constants to use the core spec modules.
    - Moved spec tests next to the moved spec modules.
    
    ## Verification
    
    - `cargo check -p codex-tools`
    - `cargo check -p codex-core`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core _spec::tests`
    - `cargo test -p codex-core tools::spec_plan::tests`
    - `just fix -p codex-tools`
    - `just fix -p codex-core`
    
    Note: I also tried the broader `cargo test -p codex-core tools::`; it
    reached the moved spec-plan/spec tests successfully, then aborted with a
    stack overflow in
    `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`,
    which is outside this spec relocation.
  • [codex] Split tool handlers by tool name (#20687)
    ## Why
    
    Tool registration used to bind a tool name to a handler externally,
    which left ownership split between the registry plan and the handler
    implementation. Some built-in handlers also multiplexed multiple in-core
    tools by switching on the invoked tool name internally.
    
    This moves the registry identity onto the handler itself and makes
    built-in multi-tool areas use separate concrete handlers, so each
    registered handler instance owns exactly one tool name and one dispatch
    path.
    
    ## What Changed
    
    - Added `ToolHandler::tool_name()` and changed
    `ToolRegistryBuilder::register_handler` to derive the registry key from
    the handler.
    - Split built-in multiplexed handlers into concrete per-tool handlers
    for unified exec, shell/local shell/container exec, MCP resources, goal
    tools, and agent job tools.
    - Kept name-carrying handler instances only where the runtime target is
    inherently external or dynamic, such as MCP tools, dynamic tools, and
    unavailable placeholders.
    - Updated `ToolHandlerKind` and registry-plan construction so plan
    entries map directly to concrete handler registrations.
    
    ## Verification
    
    - `cargo test -p codex-tools tool_registry_plan`
    - `cargo test -p codex-core --lib tools::registry_tests`
    - `just fix -p codex-tools`
    - `just fix -p codex-core`
  • Route process tools to selected environments (#20647)
    ## Why
    When a turn exposes multiple selected environments, shell-style tools
    need a model-facing way to identify the intended target environment and
    handlers need to resolve that target before parsing cwd-relative
    permission fields or launching processes.
    
    This PR scopes that rollout to process tools. Filesystem-oriented tools
    such as `apply_patch`, `view_image`, and `list_dir` are intentionally
    left for follow-up slices.
    
    ## What Changed
    - Adds an `include_environment_id` option to shell-style tool schema
    builders.
    - Exposes optional `environment_id` on `shell`, `shell_command`, and
    `exec_command` only when `ToolEnvironmentMode::Multiple` is active.
    - Adds a shared handler helper that parses `environment_id` and
    `workdir` from JSON function-call arguments and returns the selected
    `Environment` plus effective absolute cwd.
    - Uses that helper in `shell`, `shell_command`, and `exec_command`
    handling so process execution uses the selected environment filesystem
    and cwd.
    - Changes `ExecCommandRequest` to carry a required resolved `cwd`,
    removing the process-manager fallback to the primary turn cwd for new
    exec commands.
    - Leaves `write_stdin` unchanged because it targets an existing process
    id, not a new environment.
    
    ## Testing
    - Added unit coverage for process-tool schema exposure, selected
    environment resolution, primary fallback, no-environment handling,
    unknown environment ids, and resolving cwd-relative permission paths
    against the selected environment cwd.
    - Added a remote-suite e2e coverage case for `exec_command` routing
    across explicit zero environments, one local environment, and
    local+remote environments.
    - Ran `just fmt` and `git diff --check`.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • tools: remove unused experimental list_dir tool (#21170)
    ## Why
    `list_dir` still carries a full spec/handler/test path, but nothing in
    the current model catalog advertises it via
    `experimental_supported_tools`. That leaves us maintaining an
    environment-backed tool surface that is effectively unused.
    
    ## What changed
    - delete the `list_dir` handler and its tests from `codex-core`
    - remove the `list_dir` spec builder, handler kind, and registry wiring
    from `codex-tools`
    - clean up the remaining internal README and registry tests so they no
    longer mention the removed tool
  • 1- Add model service tiers metadata (#20969)
    ## Why
    
    The model list needs to carry display-ready service tier metadata so
    clients can render tier choices with stable IDs, names, and
    descriptions. A raw speed-tier string list is not enough for richer UI
    copy or future tier labels.
    
    ## What changed
    
    - Added `ModelServiceTier` to shared model metadata with string `id`,
    `name`, and `description` fields.
    - Added `service_tiers` to `ModelInfo` and `ModelPreset`, preserving
    empty defaults for older cached model payloads.
    - Exposed `serviceTiers` on app-server v2 `Model` responses and threaded
    it through TUI app-server model conversion.
    - Marked legacy `additional_speed_tiers` / `additionalSpeedTiers`
    metadata as deprecated in source and generated schema output.
    - Regenerated app-server protocol JSON schema and TypeScript fixtures,
    including `ModelServiceTier.ts`.
    
    ## Verification
    
    - Ran `just write-app-server-schema`.
    - Did not run local tests per repo instruction; relying on PR CI.
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Use MCP server instructions in deferred namespace descriptions (#21053)
    ## Why
    
    MCP servers can provide `instructions` that explain what their tools are
    for. Directly exposed MCP namespaces already use those instructions when
    a connector description is not available, but deferred `tool_search`
    results did not preserve that fallback. The direct path falls back from
    connector metadata to server instructions, while the deferred path only
    carried `connector_description` and otherwise fell back to generic
    namespace text.
    
    That meant a plain MCP server could provide useful model-facing guidance
    and still appear as `Tools in the X namespace.` whenever it was
    discovered lazily through `tool_search`.
    
    ## What changed
    
    - Store one model-facing `namespace_description` on `ToolInfo`, using
    connector descriptions for connector-backed tools and server
    instructions for plain MCP servers.
    - Thread that namespace description through the `tool_search` source
    list, search indexing, and returned namespace metadata.
    - Add an end-to-end regression test for deferred non-app MCP search
    results exposing server instructions as the namespace description.
    
    ## Verification
    
    - `cargo test -p codex-tools
    search_tool_description_lists_each_mcp_source_once --lib`
    - `cargo test -p codex-core --test all
    tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`
  • Prepare selected environment plumbing (#20669)
    ## Why
    This is a prep PR in the multi-environment process-tool stack. It
    separates ownership/config cleanup from the behavior change that teaches
    process tools to route by selected environment, so the follow-up PR can
    focus on model-facing `environment_id` behavior.
    
    ## Stack
    1. https://github.com/openai/codex/pull/20646 - `EnvironmentContext`
    rendering for selected environments
    2. https://github.com/openai/codex/pull/20669 - selected-environment
    ownership and tool config prep (this PR)
    3. https://github.com/openai/codex/pull/20647 - process-tool
    `environment_id` routing
    
    ## What Changed
    - keep the resolved turn environment list wrapped in
    `ResolvedTurnEnvironments` through `TurnContext` instead of unwrapping
    it back to a raw `Vec`
    - add `TurnContext::resolve_path_against` so cwd-relative path
    resolution has one shared helper
    - replace the old tool config boolean with `ToolEnvironmentMode::{None,
    Single, Multiple}`
    
    ## Testing
    - Tests not run locally; this prep refactor is covered by GitHub CI for
    the stack.
    
    Co-authored-by: Codex <noreply@openai.com>
  • [tool_suggest] More prompt polishes. (#20566)
    Tool suggest still misfires when model needs tool_search, updating the
    prompts to further disambiguate it:
    
    - [x] rename it from `tool_suggest` to `request_plugin_install`
    - [x] rephrase "suggestion" to "install" in the tool descriptions.
    - [x] disambiguate "the tool" vs "the plugin/connector". 
    
    Tested with the Codex App and verified it still works.
  • Gate multi-agent v2 tools independently of collab (#20246)
    ## Why
    
    `multi_agents_v2` is meant to be independently gated from the older
    `collab` feature. The tool registry still treated the
    collaboration-style agent tools as `collab`-only, so enabling
    `multi_agents_v2` without `collab` omitted the v2 agent tools. Review
    and guardian sub-sessions also need to keep agent spawning disabled even
    when the outer session has `multi_agents_v2` enabled.
    
    ## What changed
    
    - Include the collab-backed agent tools when either `multi_agents_v2` or
    `collab` is enabled.
    - Explicitly disable `multi_agents_v2` for review and guardian review
    sub-sessions, matching the existing `spawn_csv` and `collab`
    restrictions.
    - Add a registry test that enables `multi_agents_v2`, disables `collab`,
    and verifies the v2 agent tools are present while legacy `send_input`
    and `resume_agent` remain hidden.
    
    ## Testing
    
    - Added
    `test_build_specs_multi_agent_v2_does_not_require_collab_feature`.