154 Commits

  • [codex] Use model metadata for skills usage instructions (#29740)
    ## Summary
    
    - add a false-by-default `include_skills_usage_instructions` model
    metadata field
    - enable the field for the bundled `gpt-5.5` model metadata
    - consume the metadata in both core and extension skill rendering
    - remove hardcoded legacy-model matching and its marker plumbing
  • Preserve namespaces on custom tool calls (#30302)
    ## Summary
    
    - Preserve the optional namespace on custom tool calls during response
    deserialization and app-server replay.
    - Use the namespaced tool identifier for streaming argument handling and
    tool dispatch.
    - Regenerate app-server protocol schemas.
    - Add regression tests covering namespace serialization and routing.
    
    ## Testing
    
    - Ran affected protocol and app-server test suites.
    - Ran the full core test suite; two load-sensitive timing tests passed
    when rerun individually.
    - Ran Clippy and formatting checks.
    - Verified with a local end-to-end app-server replay that the namespace
    is preserved through the complete request/response flow.
  • mcp: keep elicitation requests below app wire types (#29724)
    ## Why
    
    Core and tools need to request MCP elicitation without constructing
    app-server wire payloads. The request should remain a neutral protocol
    concept until app-server serializes it for a client.
    
    ## What changed
    
    - Switched core and tools to
    `codex_protocol::approvals::ElicitationRequest`.
    - Derived turn and server context inside core instead of carrying
    app-server request types through lower layers.
    - Kept the app-server payload unchanged through an explicit boundary
    conversion.
    - Removed the remaining production app-server-protocol dependency from
    tools.
    
    ## Stack
    
    This is PR 5 of 6, stacked on [PR
    #29723](https://github.com/openai/codex/pull/29723). Review only the
    delta from `codex/split-connector-metadata-types`. Next: [PR
    #29725](https://github.com/openai/codex/pull/29725).
    
    ## Validation
    
    - `codex-core` MCP coverage passed: 87 tests.
    - Tools elicitation and app-server round-trip coverage passed.
  • [apps] Thread structured icon assets through app list (#29889)
    ## Summary
    
    - Add `iconAssets` and `iconDarkAssets` to the app-list protocol.
    - Preserve structured icons through directory merging and the connector,
    app-
      server, and TUI boundaries.
    - Keep legacy logo URLs unchanged as compatibility fallbacks.
    - Update generated protocol schemas and TypeScript types.
  • connectors: own app metadata types (#29723)
    ## Why
    
    Connector metadata is consumed by connector discovery, ChatGPT
    integration, core, and TUI code. Treating app-server's wire DTO as the
    shared domain model reverses the intended dependency direction.
    
    ## What changed
    
    - Added connector-owned app branding, review, screenshot, metadata, and
    info types.
    - Added explicit conversions in app-server and TUI while preserving
    app-server's wire payloads.
    - Removed production app-server-protocol dependencies from connectors
    and ChatGPT connector code.
    
    ## Stack
    
    This is PR 4 of 6, stacked on [PR
    #29722](https://github.com/openai/codex/pull/29722). Review only the
    delta from `codex/split-config-layer-types`. Next: [PR
    #29724](https://github.com/openai/codex/pull/29724).
    
    ## Validation
    
    - Connector and tools coverage passed.
    - App-server app-list coverage passed: 13 tests.
  • Let image generation extension hosts control output persistence (#29711)
    ## Why
    
    Some extension hosts need generated images returned without writing them
    to the local filesystem or giving the model a local path.
    
    ## What changed
    
    **tl;dr**: we now conduct all extension operations in the image gen
    extension
    
    - Let hosts provide an optional image save root when installing the
    extension.
    - Save images and return path hints only when a save root is configured.
    - Return image data without saving or adding a path hint when no save
    root is configured.
    - Preserve the extension-provided `saved_path` instead of persisting
    extension images again in core.
    - Leave built-in image generation unchanged.
    
    ## Validation
    
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
    - `just test -p codex-core
    extension_tool_uses_granted_turn_permissions_without_local_persistence`
    - `just test -p codex-core tools::handlers::extension_tools::tests`
    - tested on CODEX CLI on both save_root: CODEX_HOME and None 
    - tested on CODEX APP on both as well
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Add indexed web search mode (#28489)
    ## Summary
    
    - Add `web_search = "indexed"` alongside `disabled`, `cached`, and
    `live`.
    - Use that same resolved mode for both hosted and standalone web search.
    - For hosted search, send `index_gated_web_access: true` with external
    web access enabled only when `indexed` is selected.
    - For standalone search, preserve the existing boolean wire values for
    existing modes (`cached` maps to `false` and `live` to `true`) and send
    `"indexed"` only for `indexed`; `disabled` keeps the tool unavailable.
    - Carry the mode through managed configuration requirements and
    generated schemas.
    
    ## Why
    
    Indexed search provides a middle ground between cached-only search and
    unrestricted live page fetching. Search queries can remain live while
    direct page fetches are limited to URLs admitted by the server.
    
    The existing `web_search` setting remains the single source of truth, so
    hosted and standalone executors cannot drift into different access
    modes. Without an explicit `indexed` selection, the existing
    model-visible tool and request shapes are unchanged.
    
    ```toml
    web_search = "indexed"
    
    [features]
    standalone_web_search = true
    ```
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-api` (`126 passed`)
    - `just test -p codex-web-search-extension` (`7 passed`)
    - `just test -p codex-core
    code_mode_can_call_indexed_standalone_web_search` (`1 passed`)
    - Focused configuration, hosted request, standalone request, and
    managed-requirement coverage is included in the PR; remaining suites run
    in CI.
    
    The full workspace test suite was not run locally.
  • [codex] [4/4] Simplify recommended plugin install schema (#28403)
    ## Summary
    - Simplify recommendation-context `request_plugin_install` arguments to
    `plugin_id` and `suggest_reason`.
    - Derive plugin type and install action from the matched candidate while
    preserving Codex-owned elicitation metadata.
    - Keep the legacy list-backed schema unchanged and accept resumed calls
    that still use `tool_id`.
    
    ## Stack
    - #28399
    - #28400
    - #27704
    - This PR
    
    ## Validation
    - `just test -p codex-tools -p codex-core request_plugin_install` (25
    passed)
    - `just fix -p codex-tools -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • core: cache the tool search handler per session (#27258)
    ## Why
    
    Tool router construction rebuilds the deferred-tool BM25 index during
    session initialization and before each sampling continuation, even when
    the searchable tool metadata is unchanged. Local profiling measured
    `append_tool_search_executor` at roughly 113 ms per continuation, making
    repeated index construction the largest measured router-building cost.
    
    ## What changed
    
    - Add a session-scoped `ToolSearchHandlerCache` so continuations and
    user turns can reuse the existing handler.
    - Key reuse on the complete ordered `Vec<ToolSearchInfo>`, rebuilding
    when searchable text, loadable tool specs, source metadata, or ordering
    changes.
    - Build handlers outside the cache lock and recheck before publishing
    them, avoiding holding the mutex during index construction.
    
    ## Verification
    
    - `cache_reuses_identical_search_infos_and_rebuilds_changed_inputs`
    covers exact cache reuse and invalidation when the ordered search
    metadata changes.
    - Local rollout profiling showed the initial router build populating the
    cache and unchanged later continuations reusing it:
      - uncached: 118 ms median across 14 spans from 3 rollouts
      - cached: 4 ms median across 12 spans from 3 rollouts
  • Represent dynamic tools with explicit namespaces internally (#27365)
    Follow-up to #27356.
    
    ## Stack note
    
    This PR changes Codex's internal dynamic-tool shape while leaving
    `thread/start` unchanged. App-server therefore converts the existing
    per-tool input into explicit functions and namespaces before passing it
    to core.
    
    [#27371](https://github.com/openai/codex/pull/27371) updates
    `thread/start` to use the same explicit shape and removes this temporary
    conversion.
    
    ## Why
    
    Dynamic tools repeat namespace metadata on every function. Core should
    keep one explicit namespace with its member tools so descriptions and
    membership stay consistent across sessions and runtime planning.
    
    ## What changed
    
    - Represent dynamic tools as top-level functions or explicit namespaces
    in protocol and session state.
    - Read old flat rollout metadata and write the canonical hierarchy.
    - Flatten namespace members only when registering callable tools.
    - Keep `thread/start.dynamicTools` flat for now and normalize it at the
    app-server boundary.
    
    New builds can read old rollout metadata. Older builds cannot read newly
    written hierarchical metadata.
    
    ## Test plan
    
    - `just test -p codex-app-server
    thread_start_normalizes_legacy_dynamic_tools_into_model_request`
    - `just test -p codex-protocol
    session_meta_normalizes_legacy_dynamic_tools`
    - `just test -p codex-core
    resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
    - `just test -p codex-core
    tool_search_returns_deferred_dynamic_tool_and_routes_follow_up_call`
    - `just test -p codex-core code_mode_can_call_hidden_dynamic_tools`
    - `just test -p codex-tools`
  • Route image extension reads through turn environments v2 (#27498)
    ## Why
    
    Image generation used `std::fs::read` for referenced image paths, which
    did not support environment-backed filesystems or their sandbox context.
    
    ## What changed
    
    - Expose optional turn environments to extension tool calls.
    - Include each environment’s ID, working directory, filesystem, and
    sandbox context.
    - Read referenced images through the selected environment filesystem.
    - Keep sandbox usage at the extension call site so extensions can choose
    the appropriate access mode.
    - Consolidate image request construction into one async function.
    - Add coverage for successful environment reads and read failures.
    
    ## Validation
    
    - `cargo check -p codex-image-generation-extension --tests`
    - `just fmt`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    
    `just test -p codex-image-generation-extension` could not complete
    because the build exhausted available disk space.
  • skills: expose remote skill resource tools (#27388)
    ## Why
    
    PR #27387 makes backend plugin skills discoverable and invocable without
    an executor, but resources referenced by those skills still sit behind
    the generic MCP resource surface. The model needs a skills-owned API
    that preserves the provider authority and package boundary instead of
    treating remote resources like local files.
    
    This is stacked on #27387.
    
    ## What
    
    - Adds one `skills` namespace with bounded `list` and `read` tools for
    remote skill providers.
    - Revalidates `authority + package` against the live remote catalog on
    every read, then routes the opaque resource ID back through that
    provider.
    - Allows the backend provider to read canonical child `skill://`
    resources while rejecting cross-package, non-canonical, query, fragment,
    and traversal-shaped URIs.
    - Caps each serialized tool result at 8 KB. Lists are paginated; reads
    return an opaque continuation cursor.
    - Marks the JSON output as external context so memory generation can
    apply its normal suppression policy.
    - Deliberately does not add `skills.search`; that waits for a bounded
    plugin-service search contract.
    
    ## Tool contract
    
    Pseudo-Python matching the wire shape:
    
    ```python
    from typing import Literal, NotRequired, TypedDict
    
    
    class RemoteSkillAuthority(TypedDict):
        kind: Literal["remote"]
        id: str  # e.g. "codex_apps"
    
    
    class RemoteSkill(TypedDict):
        authority: RemoteSkillAuthority
        package: str  # opaque provider-owned package ID
        name: str
        description: str
        main_resource: str  # opaque provider-owned SKILL.md ID
    
    
    class SkillsListParams(TypedDict):
        cursor: NotRequired[str]
    
    
    class SkillsListResult(TypedDict):
        skills: list[RemoteSkill]
        next_cursor: str | None
        warnings: list[str]
        truncated: bool
    
    
    class SkillsReadParams(TypedDict):
        authority: RemoteSkillAuthority  # copied from skills.list
        package: str  # copied from skills.list
        resource: str  # provider-owned child resource ID
        cursor: NotRequired[str]  # copy next_cursor to continue
    
    
    class SkillsReadResult(TypedDict):
        resource: str
        contents: str
        next_cursor: str | None
        truncated: bool
    
    
    class Skills:
        def list(self, params: SkillsListParams) -> SkillsListResult: ...
        def read(self, params: SkillsReadParams) -> SkillsReadResult: ...
    ```
    
    There is one namespace for all remote skills, not one tool or MCP server
    per skill. No resource ID is converted into a filesystem path.
    
    ## Backend dependency
    
    `/ps/mcp` must support direct reads of child resources such as
    `skill://plugin_demo/deploy/references/deploy.md`. This PR implements
    and tests the Codex side of that contract; production child reads remain
    dependent on the corresponding plugin-service support. Search remains
    out of scope until that service exposes a bounded search/resource API.
    
    ## Validation
    
    - Added an app-server integration test covering `skills.list` followed
    by `skills.read` with no executor.
    - Ran `just fmt`.
    - Ran `just bazel-lock-update` and `just bazel-lock-check`.
    - Did not run Rust tests or Clippy locally, per request; CI will run
    them.
  • [codex] Add comp_hash to model metadata (#27532)
    ## Summary
    - add optional `comp_hash` metadata to `ModelInfo`
    - update `ModelInfo` fixtures for the shared schema change
    - keep older model responses compatible by defaulting the field to
    `None`
    
    ## Why
    The models endpoint needs an opaque identifier for compaction-compatible
    model configurations. This PR only exposes that value in model metadata;
    it does not add it to turn context or change runtime behavior.
    
    Follow-up #27520 carries the value through turn context and rollouts,
    then uses it to trigger compaction.
    
    ## Stack
    - based directly on `main`
    - replaces #27519, which was accidentally merged into the wrong base
    branch
    - functionality follow-up: #27520
    
    ## Testing
    - `just test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `just fix -p codex-core -p codex-protocol -p codex-analytics -p
    codex-models-manager`
  • tools: simplify default tool search text (#27526)
    ## Why
    
    Default tool search text currently derives identity from both `ToolName`
    and `ToolSpec`. For function and namespace specs, this indexes the same
    names more than once and also adds a flattened `{namespace}{name}` token
    that is not model-visible.
    
    ## What changed
    
    - Derive default search text entirely from `ToolSpec` while preserving
    names, descriptions, namespace metadata, and recursive schema metadata.
    - Keep the default search-text builder private and remove the unused
    `ToolName` argument.
    - Add coverage for the exact search text generated for a namespaced tool
    with nested schema metadata.
    
    ## Example
    
    For the `codex_app` namespace and `automation_update` tool (schema terms
    omitted):
    
    - Before: `codex_appautomation_update automation update codex_app
    codex_app Manage Codex automations. automation_update automation update
    ...`
    - After: `codex_app Manage Codex automations. automation_update
    automation update ...`
    
    ## Testing
    
    - `just test -p codex-tools`
  • [plugins] Inject remote_plugin_id into install elicitations (#26409)
    Summary
    - Propagate cached remote plugin IDs through Codex plugin discovery.
    - Inject `remote_plugin_id` and connector IDs into
    `request_plugin_install` elicitation `_meta` from the resolved plugin.
    - Keep the remote plugin ID out of the model-facing tool schema,
    arguments, and result.
    
    Validation
    - `just test -p codex-tools`
    - `just test -p codex-core-plugins`
    - `just test -p codex-core
    list_tool_suggest_discoverable_plugins_includes_cached_remote_global_plugins`
    - `just fix -p codex-tools`
    - `just fix -p codex-core-plugins`
    - `just fix -p codex-core`
    - `git diff --check`
    - `just test -p codex-core` was also attempted: 2,581 passed, 55 failed,
    and 1 timed out across unrelated sandbox/environment-sensitive
    integration tests.
  • [codex] Remove async_trait from ToolExecutor (#27304)
    ## Why
    
    We're now [discouraging use of
    `async_trait`](https://github.com/openai/codex/pull/20242).
    
    Removing use of `async_trait` from `ToolExecutor` yields a `codex_core`
    debug test build speedup of ~78% (from 227.5s to 50.3s) on my machine.
    
    Stacked on #27299, this PR applies the trait change after the handler
    bodies have been outlined.
    
    ## What
    
    Changed `ToolExecutor::handle` to return an explicit boxed
    `ToolExecutorFuture` instead of using `async_trait`.
    
    Updated ToolExecutor implementors to return `Box::pin(...)`, reexported
    the future alias through `codex-tools` and `codex-extension-api`, and
    removed `codex-tools` direct `async-trait` dependency.
  • chore: preserve one more schema layer during large tool compaction (#27084)
    ## Summary
    
    Some customer MCP tools expose large input schemas that exceed Codex's
    compact schema budget even after description stripping. Today, the final
    compaction pass collapses complex schemas starting at depth 2, which can
    erase important shallow call structure such as small `anyOf` branches,
    required fields, and help-mode entry points. In one reported case, this
    degraded a tool schema into `query: any | any`, leaving the model
    without enough structure to discover the required help call.
    
    This change raises the deep-schema collapse boundary from depth 2 to
    depth 3. That preserves one additional layer of the tool contract while
    still collapsing deeper expensive subtrees to `{}` when a schema remains
    over budget.
    
    ## What Changed
    
    - Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`.
    - Updated the schema compaction traversal test to assert the new
    collapse boundary.
    - The resulting compacted shape keeps useful shallow structure, for
    example:
      - top-level argument names
      - shallow `anyOf` branches
      - required object fields
      - nested property names one level deeper than before
    
    ## Validation
    
    - Ran `just test -p codex-tools`: 81 tests passed.
    - Ran a golden schema corpus comparison over 214 discovered tool input
    schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`.
    - Depth 2 and depth 3 had identical percentile token counts across the
    corpus.
      - Both ended with `0 / 214` schemas over 1k tokens.
    - Both ended with `0 / 214` schemas over the 4,000-byte compact JSON
    budget.
    - Only one golden schema changed, increasing from 49 to 56 tokens, so
    this does not appear to introduce a meaningful corpus-wide regression.
    
    Corpus percentile results:
    
    | Percentile | Depth 2 | Depth 3 |
    |---|---:|---:|
    | p0 | 9 | 9 |
    | p10 | 31 | 31 |
    | p25 | 54 | 54 |
    | p50 | 81 | 81 |
    | p75 | 143 | 143 |
    | p90 | 290 | 290 |
    | p95 | 431 | 431 |
    | p99 | 600 | 600 |
    | max | 832 | 832 |
  • feat: support oneOf and allOf in tool input schemas (#24118)
    ## Why
    
    Some connector golden schemas use JSON Schema composition keywords
    beyond `anyOf`, specifically top-level or nested `oneOf` and `allOf`.
    Codex currently needs to preserve those shapes when parsing MCP tool
    input schemas so connector tools do not lose valid schema structure
    during normalization.
    
    To prevent an increased Responses API error rate, this PR will be merged
    after the Responses API supports top-level `oneOf`/`allOf`.
    
    ## What Changed
    
    - Adds `oneOf` and `allOf` support to `JsonSchema`, matching the
    existing `anyOf` handling.
    - Traverses `oneOf` and `allOf` anywhere schema children are visited,
    including sanitization, definition reachability, description stripping,
    and deep schema compaction.
    - Adds a final large-schema compaction pass that prunes schema objects
    containing `anyOf`, `oneOf`, or `allOf` to `{}` if earlier compaction
    passes still leave the schema over budget.
    
    ## Validation
    Golden schema token validation over `2,025` schemas under
    `golden_schemas`, all parsed successfully. Token count is `o200k_base`
    over compact JSON from `parse_tool_input_schema`.
    
    | Percentile | Before PR | After oneOf/allOf | After pruning |
    |---|---:|---:|---:|
    | p0 | 9 | 9 | 9 |
    | p10 | 63 | 64 | 64 |
    | p25 | 86 | 87 | 87 |
    | p50 | 125 | 128 | 128 |
    | p75 | 203 | 206 | 206 |
    | p90 | 327 | 333 | 333 |
    | p95 | 460 | 473 | 473 |
    | p99 | 763 | 779 | 779 |
    | max | 891 | 955 | 955 |
    
    Totals:
    
    | Parser state | Total tokens |
    |---|---:|
    | Before PR | 345,713 |
    | After oneOf/allOf | 352,686 |
    | After pruning | 352,686 |
    
    The pruning column matches the oneOf/allOf column for this corpus
    because no parsed compact golden schema remains over the `4,000`
    compact-byte budget after the earlier compaction passes.
  • [codex] Exclude external tool output from memories (#26821)
    ## Summary
    
    - add contains_external_context() to tool output so other tools can be
    opted out of influencing memory when disable_on_external_context=true
    - Classify standalone web-search output as external context (to match
    behavior as hosted web search)
    - Verify with integration test
  • Encrypt multi-agent v2 message payloads (#26210)
    ## Why
    
    Multi-agent v2 currently routes agent instructions through normal tool
    arguments and inter-agent context. That means the parent model can emit
    plaintext task text, Codex can persist it in history/rollouts, and the
    recipient can receive it as ordinary assistant-message JSON.
    
    This changes the v2 path so agent instructions stay encrypted between
    model calls: Responses encrypts the `message` argument returned by the
    model, Codex forwards only that ciphertext, and Responses decrypts it
    internally for the recipient model.
    
    ## What changed
    
    - Mark the v2 `message` parameter as encrypted for `spawn_agent`,
    `send_message`, and `followup_task`.
    - Treat multi-agent v2 tool `message` values as ciphertext
    unconditionally.
    - Store v2 inter-agent task text in
    `InterAgentCommunication.encrypted_content` with empty plaintext
    `content`.
    - Convert encrypted inter-agent communications into the Responses
    `agent_message` input item before sending the child request.
    - Preserve `agent_message` items across history, rollout, compaction,
    telemetry, and app-server schema paths.
    - Leave multi-agent v1 unchanged.
    
    ## Message shape
    
    The model still calls the v2 tools with a `message` argument, but that
    value is now ciphertext:
    
    ```json
    {
      "name": "spawn_agent",
      "arguments": {
        "task_name": "worker",
        "message": "<ciphertext>"
      }
    }
    ```
    
    Codex stores the task as encrypted inter-agent communication:
    
    ```json
    {
      "author": "/root",
      "recipient": "/root/worker",
      "content": "",
      "encrypted_content": "<ciphertext>",
      "trigger_turn": true
    }
    ```
    
    When Codex builds the recipient request, it forwards the ciphertext
    using the new Responses input item:
    
    ```json
    {
      "type": "agent_message",
      "author": "/root",
      "recipient": "/root/worker",
      "content": [
        {
          "type": "encrypted_content",
          "encrypted_content": "<ciphertext>"
        }
      ]
    }
    ```
    
    Responses decrypts that item internally for the recipient model.
    
    ## Context impact
    
    - Parent context no longer carries plaintext v2 agent task instructions
    from these tool arguments.
    - Codex rollout/history stores ciphertext for v2 agent instructions.
    - Recipient requests receive an `agent_message` item instead of
    assistant commentary JSON for encrypted task delivery.
    - Plaintext completion/status notifications are still plaintext because
    they are Codex-generated status messages, not encrypted model tool
    arguments.
    
    ## Validation
    
    - `just test -p codex-tools`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-rollout-trace`
    - `just test -p codex-otel`
    - `just write-app-server-schema`
  • [codex] Add use_responses_lite 'override' logic (#26487)
    ## Summary
    
    - add a defaulted `ModelInfo.use_responses_lite` catalog field
    - support serializing `reasoning.context` while preserving the existing
    effort and summary path
    - has not been turned on for any models yet
    
    I've added an override to parallel tools if responses_lite is on. I've
    also forced persistent reasoning when using responses_lite. It would be
    ideal if we could centralize all the responses_lite plumbing, but I
    think this is best for now to keep the plumbing & diffs small.
    
    ## Testing
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    responses_lite_sets_all_turns_context_and_disables_parallel_tool_calls`
    - `RUST_MIN_STACK=8388608 cargo test -p codex-core
    configured_reasoning_summary_is_sent`
    - `cargo check -p codex-core --tests`
    - `RUST_MIN_STACK=8388608 cargo clippy -p codex-core --tests` (passes
    with pre-existing warnings in `codex-code-mode` and
    `codex-core-plugins`)
  • Route standalone image generation through host finalization md (#25176)
    ## Why
    
    Standalone image-generation extensions emitted turn items through the
    low-level event path, bypassing host-owned finalization such as image
    persistence and contributor processing. At the same time, the
    generated-image save-path hint must remain visible to the model through
    the extension tool's `FunctionCallOutput`, rather than the legacy
    built-in developer-message path.
    
    ## What changed
    
    - Extended `ExtensionTurnItem` to support image-generation items while
    keeping the extension-facing emitter API limited to `emit_started` and
    `emit_completed`.
    - Routed extension completion through core `finalize_turn_item`, so
    standalone image-generation items receive host-owned processing and
    persisted `saved_path` values before publication.
    - Kept legacy built-in image generation on its existing
    developer-message hint path, while standalone image generation returns
    its deterministic saved-path hint in `FunctionCallOutput`.
    - Shared the image artifact path and output-hint formatting used by core
    and the image-generation extension.
    - Passed thread identity through extension tool calls so standalone
    image generation can construct the same intended artifact path as core.
    - Added an app-server integration test covering real standalone image
    generation, saved artifact publication, model-visible output hint
    wiring, and absence of the legacy developer-message hint.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-image-generation-extension`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-goal-extension`
    - `just test -p codex-memories-extension`
    - Targeted `codex-core` tests for image save history, extension
    completion finalization, and contributor execution
    - `just test -p codex-app-server
    standalone_image_generation_returns_saved_path_hint_to_model`
    - `just fix -p codex-core`
    - `just fix -p codex-image-generation-extension`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • Add multi-agent runtime metadata types (#25720)
    Stack split from #25708. Original PR intentionally left open. This first
    PR adds the multi-agent runtime metadata types and catalog plumbing used
    by the rest of the stack.
  • Move tool search metadata onto ToolExecutor (#25684)
    Deferred tools need to be searchable even when they are not implemented
    inside `codex-core`. Extension-provided tools can be registered for
    later discovery, but the search metadata path was still owned by
    core-specific runtime hooks, which meant the shared `ToolExecutor`
    abstraction could not describe how a deferred extension tool should
    appear in `tool_search`.
    
    ## Changes
    
    - Move `ToolSearchEntry` and `ToolSearchInfo` into `codex-tools` and
    re-export them from the shared tools crate.
    - Add a default `ToolExecutor::search_info` implementation that derives
    loadable tool-search metadata from function and namespace specs.
    - Forward search metadata through extension adapters and exposure
    overrides while keeping custom search text/source metadata for dynamic,
    MCP, and multi-agent tools.
    - Remove the old core-local `tool_search_entry` module now that search
    metadata lives with the shared executor APIs.
    
    ## Testing
    
    - Added `deferred_extension_tools_are_discoverable_with_tool_search`
    coverage in `core/src/tools/spec_plan_tests.rs`.
  • feat: gate unified exec zsh fork composition (#24979)
    ## Why
    
    `shell_zsh_fork` and unified exec need to remain independently
    controllable for enterprise rollouts, but we also need a third mode that
    composes them. That composed mode is intended to preserve unified exec
    command lifecycle support while letting the zsh fork provide more
    accurate `execv(2)` interception.
    
    Enabling `unified_exec_zsh_fork` by itself is intentionally not
    sufficient. It is a composition gate, not a dependency-enabling
    shortcut:
    
    - `unified_exec` selects the PTY-backed unified exec tool.
    - `shell_zsh_fork` opts into the zsh fork backend.
    - `unified_exec_zsh_fork` only allows those two already-enabled modes to
    be composed so local zsh unified exec commands can launch through the
    zsh fork.
    
    This separation is deliberate. Enterprises and staged rollouts must be
    able to enable or disable unified exec and zsh-fork independently. If
    `unified_exec_zsh_fork` implied either dependency, then enabling one
    under-development composition flag would silently activate a shell
    backend that the configured feature set left disabled.
    
    This PR introduces only the configuration and planning gate for that
    composition. Existing `shell_zsh_fork` behavior continues to use the
    standalone shell tool unless the new composition feature is explicitly
    enabled alongside both dependencies.
    
    ## What Changed
    
    - Added the under-development feature flag `unified_exec_zsh_fork`.
    - Added `UnifiedExecFeatureMode` so the three input feature flags
    collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
    planning.
    - Updated tool selection so zsh-fork composition requires
    `unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
    - Kept the existing standalone zsh-fork shell tool behavior when only
    `shell_zsh_fork` is enabled.
    - Updated config schema output for the new feature flag.
    
    ## Verification
    
    - Added feature and tool-config coverage for the new gate.
    - Added planner coverage proving `shell_zsh_fork` remains standalone
    until composition is explicitly enabled.
    - Ran focused tests for `codex-features`, `codex-tools`, and the
    affected `codex-core` planner case.
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
    * #24982
    * #24981
    * #24980
    * __->__ #24979
  • [codex-rs] auto-review model override (#23767)
    ## Why
    
    Guardian auto-review normally uses the provider-preferred review model
    when one is available. Some parent models need model-catalog metadata to
    select a different review model while keeping older `/models` payloads
    compatible when that metadata is absent.
    
    ## What changed
    
    - Added optional `ModelInfo::auto_review_model_override` metadata to the
    public model payload as a review-model slug.
    - Updated Guardian review model selection to prefer the catalog override
    when present, while preserving the existing provider preferred-model
    path and parent-model fallback when it is omitted.
    - Added focused Guardian coverage for override and no-override model
    selection.
    - Added an `auto_review` core integration suite test that loads override
    metadata from a remote model catalog path and asserts the strict
    auto-review `/responses` request uses the catalog-selected review model.
    - Updated existing `ModelInfo` fixtures and local catalog constructors
    for the new optional field.
    
    ## Validation
    
    - `cargo test -p codex-protocol
    model_info_defaults_availability_nux_to_none_when_omitted`
    - `cargo test -p codex-core guardian_review_uses_`
    - `cargo test -p codex-core
    remote_model_override_uses_catalog_model_for_strict_auto_review --test
    all`
    - `just fix -p codex-protocol`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • [codex] Require model for standalone web search (#25131)
    ## Why
    
    The standalone `/v1/alpha/search` request now requires a `model`, but
    the `web.run` extension currently omits it.
    
    Adds `model` to extension `ToolCall` invocation.
    
    Follow-up to #23823.
    
    ## What changed
    
    - Make `SearchRequest.model` required.
    - Expose the effective per-turn model on extension tool calls and pass
    it in standalone web-search requests.
    - Assert the model is forwarded in the app-server round-trip test.
    
    ## Testing
    
    - `just test -p codex-api -p codex-tools -p codex-web-search-extension
    -p codex-memories-extension -p codex-goal-extension`
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    - `just test -p codex-app-server -E
    'test(standalone_web_search_round_trips_encrypted_output)'`
  • Route extension image generation through the native image completion pipeline (#24972)
    ## Why
    
    The standalone `image_gen.imagegen` extension should behave like native
    image generation for artifact persistence and UI completion, while
    returning its save-location guidance as part of the tool result instead
    of injecting a developer message.
    
    ## What Changed
    
    - Added an image-generation completion hook for extension tools so core
    can persist generated images and emit the existing `ImageGeneration`
    lifecycle events.
    - Reused core image artifact persistence for extension output and
    removed extension-local save-path/file-writing logic.
    - Split shared image persistence from built-in finalization so native
    image generation keeps its existing developer-message instruction
    behavior.
    - Returned the generated image save-location instruction through the
    extension `FunctionCallOutput`, alongside the generated image input for
    model follow-up.
    - Preserved the existing image-generation event shape for current UI and
    replay compatibility.
    - Avoided cloning the full generated-image base64 payload when emitting
    the in-progress image item.
    - Removed dependencies no longer needed after moving persistence out of
    the extension crate.
    
    ## Fast Follow
    - Adjust the existing Extension API and add a general `TurnItem`
    finalization path for re-usability of code
    
    ## Validation
    
    - Ran `just fmt`.
    - Ran `just bazel-lock-update`.
    - Ran `just bazel-lock-check`.
    - Ran `just test -p codex-tools -p codex-extension-api -p
    codex-image-generation-extension`.
    - Ran `just test -p codex-core
    image_generation_publication_is_finalized_by_core`.
    - Ran `just test -p codex-core
    handle_output_item_done_records_image_save_history_message`.
    - Ran `just fix -p codex-tools -p codex-extension-api -p codex-core -p
    codex-image-generation-extension`.
  • [codex] Add model tool mode selector (#25031)
    ## Why
    Some models need to select their code-execution behavior through model
    catalog metadata. Models without that metadata must continue to follow
    the existing `CodeMode` and `CodeModeOnly` feature flags, including when
    a newer server sends an enum value this client does not recognize.
    
    ## What changed
    - add optional `ModelInfo.tool_mode` metadata with `direct`,
    `code_mode`, and `code_mode_only`
    - treat omitted and unknown wire values as `None`
    - resolve `None` from the existing feature flags
    - carry the resolved `ToolMode` directly on `TurnContext`, outside
    `Config`
    - use the resolved value for turn creation, model switches, review
    turns, tool planning, and code execution
    
    ## Coverage
    - add protocol coverage for omitted, known, and unknown enum values
    - add focused coverage for flag fallback and explicit metadata
    overriding feature flags
    - add core integration coverage that fetches remote model metadata
    through `/v1/models` and verifies the outbound `/responses` tools for
    explicit `direct` and `code_mode_only` selectors
    
    ## Stack
    - followed by #25032
  • extension-api: add TurnItemEmitter to tool calls (#24813)
    ## Why
    Extension-contributed tools need to emit visible turn items through
    Codex's normal event and persistence pipeline.
    
    ## What
    - Add `TurnItemEmitter` to extension `ToolCall`s and route the core
    implementation through `Session::emit_turn_item_*`.
    - Hold weak session and turn references so retained tool calls cannot
    keep host state alive.
    - Provide a no-op emitter for extension test callers.
    
    ## Test Plan
    - `just test -p codex-core -E
    'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`
    
    ---------
    
    Co-authored-by: jif-oai <jif@openai.com>
  • Update rmcp to 1.7.0 (#24763)
    WIll make it easier to uprev when the new draft spec is supported.
    
    Also updates reqwest where needed for compatibility but doesn't update
    it everywhere since this is already a large diff.
    
    The new version of rmcp handles certain kinds of authentication failures
    differently, this patch includes support for identifying the failing scope
    in a WWW-Authenticate header.
  • fix: dont compact standalone websearch schema (#24660)
    add new `parse_tool_input_schema_without_compaction` to bypass the
    existing compaction/trimming of client-provided tool schemas that are
    over 4k bytes.
    
    we want this for standalone web search to keep field guidance/metadata
    on certain fields; this keeps us closer to parity with existing hosted
    tool schema (which didnt go through this 4k byte filter).
  • Restore legacy image detail values (#24644)
    ## Why
    
    Older persisted rollouts can contain `input_image.detail` values of
    `auto` or `low` from before `ImageDetail` was narrowed to
    `high`/`original`. Current deserialization rejects those values, which
    can make resume skip later compacted checkpoints and reconstruct an
    oversized raw suffix before the next compaction attempt.
    
    Confirmed Sentry reports fixed by this compatibility path:
    
    - [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
    - [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
    - [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
    - [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
    
    ## Background
    
    [openai/codex#20693](https://github.com/openai/codex/pull/20693) added
    image-detail plumbing for app-server `UserInput` so input images could
    explicitly request `detail: original`. The Slack discussion behind that
    PR was about ScreenSpot / bridge evals where user input images were
    resized, while tool output images already had MCP/code-mode ways to
    request image detail.
    
    In review, the intended new API surface was narrowed to `high` and
    `original`: default to `high`, allow `original` when callers need
    unchanged image handling, and avoid encouraging new `auto` or `low`
    usage. That policy still makes sense for newly emitted values.
    
    The missing compatibility piece is persisted history. Older rollouts can
    already contain `auto` and `low`, and resume reconstructs typed history
    by deserializing those rollout records. Rejecting old values at that
    boundary causes valid compacted checkpoints to be skipped. This PR
    restores `auto` and `low` as real variants so old records deserialize
    and round-trip without being rewritten as `high`, while product paths
    can continue to default to `high` and avoid emitting `auto` for new
    behavior.
    
    ## What changed
    
    - Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
    protocol values.
    - Preserved `auto`/`low` through rollout deserialization, MCP image
    metadata, code-mode image output, and schema/type generation.
    - Kept local image byte handling conservative: only `original` switches
    to original-resolution loading; `auto`/`low`/`high` continue through the
    resize-to-fit path while retaining their detail value.
    - Added regression coverage for enum round-tripping and code-mode `low`
    detail handling.
    
    ## Testing
    
    - `just write-app-server-schema`
    - `just test -p codex-protocol`
    - `just test -p codex-tools`
    - `just test -p codex-code-mode`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-core
    suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
    - `just test -p codex-core
    suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
    - Loaded broken rollouts on local fixed builds, and started/completed
    new turns.
    
    I also attempted `just test -p codex-core`; the local broad run did not
    finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
    out. The failures were broad timeout/deadline failures across unrelated
    areas; targeted changed-path core tests above passed.
  • standalone websearch extension (#23823)
    ## Summary
    
    Add the extension-backed standalone `web.run` tool so Codex can call the
    standalone search endpoint through the `codex-api` search client and
    return its encrypted output to Responses.
    
    - gate the new tool behind `standalone_web_search`
    - install the extension in the app-server thread registry and hide
    hosted `web_search` when standalone search is enabled for OpenAI
    providers so the two paths stay mutually exclusive
    - build search context from persisted history using a small tail
    heuristic: previous user message, assistant text between the last two
    user turns capped at about 1k tokens, and current user message
    
    ## Test Plan
    
    - `cargo test -p codex-web-search-extension`
    - `cargo test -p codex-api`
    - `cargo test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates`
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • chore: add JSON schema policy fixture coverage (#24152)
    ## Why
    
    Before changing the Codex Bridge JSON schema policy, add integration
    coverage around real connector-like MCP tool schemas. The existing unit
    tests cover individual sanitizer behaviors, but they do not make it easy
    to see whether full fixture schemas keep model-visible guidance, prune
    only unreachable definitions, drop unsupported JSON Schema fields, and
    stay within the Responses API schema budget.
    
    ## What Changed
    
    - Added `tools/tests/json_schema_policy_fixtures.rs`, which converts MCP
    tool fixtures through `mcp_tool_to_responses_api_tool` and validates the
    resulting Responses tool parameters.
    - Added connector-style fixtures for Slack, Google Calendar, Google
    Drive, Notion, and Microsoft Outlook Email under
    `tools/tests/fixtures/json_schema_policy/`.
    - Added fixture assertions for preserved guidance, pruned definitions,
    expected field drops after `JsonSchema` conversion, marker count
    baselines, and dangling local `$ref` prevention.
    - Added a real oversized golden Notion `create_page` input schema
    fixture to exercise the compaction path that strips descriptions, drops
    root `$defs`, rewrites local refs, and fits the compacted schema under
    the budget.
  • feat: best-effort compact large tool schemas (#23904)
    ## Why
    
    The `dev/cc/ref-def` branch preserves richer JSON Schema detail for
    connector tools, including `$defs` and nested shapes. That improves
    fidelity, but it pushes the largest connector schemas well past the
    intended tool-schema budget. This PR adds a best-effort compaction pass
    for unusually large tool input schemas so the p99 and max tails stay
    small while ordinary schemas are left alone.
    
    ## What Changed
    
    - Added best-effort large-schema compaction in
    `codex-rs/tools/src/json_schema.rs` after schema sanitization and
    definition pruning.
    - Compaction runs as a waterfall only while the compact JSON budget
    proxy is exceeded:
      1. Strip schema `description` metadata.
      2. Drop root `$defs` / `definitions`.
      3. Collapse deep nested complex schema objects to `{}`.
    - Kept top-level argument names and immediate schema shape where
    possible.
    
    ## Corpus Results
    
    Scope: 2,025 schemas under `golden_schemas`, all parsed successfully.
    Token count is `o200k_base` over compact JSON from
    `parse_tool_input_schema`.
    
    | Percentile | Before `origin/main` `4dbca61e20` | After branch
    `dev/cc/ref-def` `f9bf071758` | After this PR |
    |---|---:|---:|---:|
    | p0 | 9 | 9 | 9 |
    | p10 | 59 | 63 | 63 |
    | p25 | 81 | 86 | 86 |
    | p50 | 114 | 127 | 125 |
    | p75 | 174 | 205 | 202 |
    | p90 | 295 | 335 | 322 |
    | p95 | 391 | 526 | 422 |
    | p99 | 794 | 1,303 | 689 |
    | max | 2,836 | 3,337 | 887 |
    
    After this PR, `0 / 2,025` schemas are over 1k tokens.
    
    ### Compaction Savings
    
    These are cumulative waterfall stages over the same corpus. Later passes
    only run for schemas that are still over the compact JSON budget proxy.
    
    | Stage | Total tokens | Step savings | Schemas changed by step |
    |---|---:|---:|---:|
    | No compaction | 391,862 | - | - |
    | Strip schema `description` metadata | 350,961 | 40,901 | 66 |
    | Drop root `$defs` / `definitions` | 340,683 | 10,278 | 13 |
    | Collapse deep complex schemas to `{}` | 335,875 | 4,808 | 6 |
  • Expose conversation history to extension tools (#23963)
    ## Why
    
    Extension tools that need conversation context should be able to read it
    from the live tool invocation instead of reaching into thread
    persistence themselves.
    
    ## What changed
    
    - Add a `ConversationHistory` snapshot to extension `ToolCall`s and
    populate it from the current raw in-memory response history.
    - Expose all history items at this boundary so each extension can filter
    and bound the subset it needs before consuming or forwarding it.
    - Cover the adapter and registry dispatch paths and update existing
    extension tests that construct `ToolCall` literals.
    
    ## Test plan
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-extension-api`
    - `cargo test -p codex-goal-extension`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `cargo test -p codex-core
    extension_tool_executors_are_model_visible_and_dispatchable`
  • feat: support local refs and defs in tool input schemas (#23357)
    # Why
    
    Some connector tool input schemas use local JSON Schema references and
    definition tables to avoid duplicating large nested shapes. Codex
    previously lowered these schemas into the supported subset in a way that
    could discard `$ref`-only schema objects and lose the corresponding
    definitions, which made non-strict tool registration less faithful than
    the original connector schema.
    
    This keeps the existing minimal-lowering policy: Codex still does not
    raw-pass through arbitrary JSON Schema, but it now preserves local
    reference structure that fits the Responses-compatible subset and prunes
    definition entries that cannot be reached by following `$ref`s from the
    root schema after sanitization, including refs found transitively inside
    other reachable definitions. The pruning matters because Responses
    parses definition tables even when entries are unused, so keeping dead
    definitions wastes prompt tokens.
    
    # What changed
    
    - Added `$ref`, `$defs`, and legacy `definitions` fields to the tool
    `JsonSchema` representation.
    - Updated `parse_tool_input_schema` lowering so `$ref`-only schema
    objects survive sanitization instead of becoming `{}`.
    - Sanitized definition tables recursively and dropped malformed
    definition tables so non-strict registration degrades gracefully.
    - Added reachability pruning for root definition tables by starting from
    refs outside definition tables, then following refs inside reachable
    definitions.
    - Added JSON Pointer decoding for local definition refs such as
    `#/$defs/Foo~1Bar`.
    
    # Verification
    ran local golden-schema probes against representative connector schemas
    to validate behavior on real generated schemas:
    
    | Golden schema | Before bytes | After bytes | `$defs` before -> after |
    `$ref` before -> after | Result |
    |---|---:|---:|---:|---:|---|
    | `google_calendar/create_space` | 7111 | 4526 | 7 -> 7 | 7 -> 7 | all
    definitions preserved because all are reachable |
    | `figma/apply_file_variable_changes` | 4609 | 999 | 8 -> 5 | 8 -> 5 |
    unused defs pruned after unsupported `oneOf` shapes lower away |
    | `snowflake/list_catalog_integrations` | 1380 | 404 | 3 -> 0 | 0 -> 0 |
    all defs pruned because none are referenced |
    | `dropbox/create_shared_link` | 8894 | 1836 | 14 -> 4 | 9 -> 4 | only
    defs reachable from the root schema after sanitization are retained,
    including transitively through other retained defs |
    
    Token increase across golden schema due to this change:
    <img width="817" height="366" alt="Screenshot 2026-05-19 at 1 47 04 PM"
    src="https://github.com/user-attachments/assets/d5c80fe9-da85-41e6-8ac7-a01d1e0b0f71"
    />
  • Make tool executor specs mandatory (#23870)
    ## Why
    
    `ToolExecutor` is the runtime contract that keeps a callable tool and
    its model-visible spec together. Leaving `spec()` optional lets a
    registered runtime silently omit that half of the contract, and it also
    overloads a missing spec as an exposure decision for tools that should
    stay dispatchable without being shown to the model.
    
    ## What
    
    - Make `ToolExecutor::spec()` required and update core, extension, and
    test tool executors to return a concrete `ToolSpec`.
    - Add `ToolExposure::Hidden` for dispatch-only tools. The legacy
    `shell_command` runtime in unified-exec sessions now uses that explicit
    exposure instead of hiding itself by omitting a spec.
    - Build MCP tool specs when `McpHandler` is constructed so invalid MCP
    specs are skipped before the handler is registered.
    - Keep tool planning aligned with the new contract for direct, deferred,
    hidden, code-mode, dynamic, and namespaced tool paths.
    
    ## Testing
    
    - Added tool-plan coverage that invalid MCP tool specs are not
    registered.
    - Updated shell-family coverage for the hidden legacy `shell_command`
    runtime and the affected tool executor test fixtures.
  • Honor client-resolved service tier defaults (#23537)
    ## Why
    
    Model catalog responses can now advertise a nullable
    `default_service_tier` for each model. Codex needs to preserve three
    distinct states all the way from config/app-server inputs to inference:
    
    - no explicit service tier, so the client may apply the current model
    catalog default when FastMode is enabled
    - explicit `default`, meaning the user intentionally wants standard
    routing
    - explicit catalog tier ids such as `priority`, `flex`, or future tiers
    
    Keeping those states distinct prevents the UI from showing one tier
    while core sends another, especially after model switches or app-server
    `thread/start` / `turn/start` updates.
    
    ## What Changed
    
    - Plumbed `default_service_tier` through model catalog protocol types,
    app-server model responses, generated schemas, model cache fixtures, and
    provider/model-manager conversions.
    - Added the request-only `default` service tier sentinel and normalized
    legacy config spelling so `fast` in `config.toml` still materializes as
    the runtime/request id `priority`.
    - Moved catalog default resolution to the TUI/client side, including
    recomputing the effective service tier when model/FastMode-dependent
    surfaces change.
    - Updated app-server thread lifecycle config construction so
    `serviceTier: null` preserves explicit standard-routing intent by
    mapping to `default` instead of internal `None`.
    - Kept core responsible for validating explicit tiers against the
    current model and stripping `default` before `/v1/responses`, without
    applying catalog defaults itself.
    
    ## Validation
    
    - `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
    - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
    - `cargo test -p codex-tui service_tier`
    - `cargo test -p codex-protocol service_tier_for_request`
    - `cargo test -p codex-core get_service_tier`
    - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
    service_tier`
  • feat: add turn_id and truncation_policy to extension tool calls (#23666)
    ## Why
    
    Extension-owned tools currently receive a stripped `ToolCall` with only
    `call_id`, `tool_name`, and `payload`.
    That makes extension work that needs turn-local execution context
    awkward, especially web-search extension work that needs the active
    `truncation_policy` at tool invocation time.
    
    Reconstructing that value from config or `ExtensionData` would be
    indirect and could drift from the actual turn context, so the cleaner
    fix is to pass the needed turn metadata directly on the extension-facing
    invocation type.
    
    ## What changed
    
    - added `turn_id` and `truncation_policy` to `codex_tools::ToolCall`
    - populated those fields when core adapts `ToolInvocation` into an
    extension tool call
    - added a focused adapter test that verifies extension executors receive
    the forwarded turn metadata
    - updated the memories extension tests to construct the richer
    `ToolCall`
    - added the `codex-utils-output-truncation` dependency to `codex-tools`
    and refreshed lockfiles
    
    ## Testing
    
    - `cargo test -p codex-tools`
    - `cargo test -p codex-memories-extension`
    - `cargo test -p codex-core passes_turn_fields_to_extension_call`
    - `just bazel-lock-update`
    - `just bazel-lock-check`
  • add encryptedcontent to functioncalloutput (#23500)
    add new `EncryptedContent` variant to `FunctionCallOutputContentItem`
    ahead of standalone websearch.
    
    we need to be able to receive and pass encrypted function call output
    from the new web search endpoint back to responsesapi, as we cannot
    expose direct search results.
  • Split plugin install discovery into list and request tools (#23372)
    ## Summary
    - Add `list_available_plugins_to_install` as the inventory step for
    plugin and connector install suggestions.
    - Slim `request_plugin_install` so it only handles the actual
    elicitation, instead of carrying the full discoverable list in its
    prompt.
    - Emit send-time telemetry when an install elicitation is dispatched,
    including requested tool identity in the event payload.
    - Emit install-result telemetry through `SessionTelemetry`, including
    tool type, user response action, and completion status.
    - Update registration and tests to cover the new two-step flow while
    keeping the existing `tool_suggest` feature gate unchanged.
    
    ## Testing
    - `just fmt`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core request_plugin_install`
    - `cargo test -p codex-core list_available_plugins_to_install`
    - `cargo test -p codex-core
    install_suggestion_tools_can_be_registered_without_search_tool`
    - `cargo test -p codex-otel
    manager_records_plugin_install_suggestion_metric`
    - `cargo test -p codex-otel
    manager_records_plugin_install_elicitation_sent_metric`
    - `just fix -p codex-core`
    - `just fix -p codex-tools`
    - `just fix -p codex-otel`
    - `cargo check -p codex-core`
  • Remove ToolsConfig from tool planning (#22835)
    ## Why
    
    `codex-tools` is meant to hold reusable tool primitives, but
    `ToolsConfig` had become a second copy of core runtime decisions instead
    of a small shared contract. It carried provider capabilities, auth/model
    gates, permission and environment state, web/search/image feature gates,
    multi-agent settings, and goal availability from core into `codex-tools`
    ([definition](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/tools/src/tool_config.rs#L97),
    [stored on each
    `TurnContext`](https://github.com/openai/codex/blob/22dd9ad3929253ed24d7ee4f10f238e95ab25f37/codex-rs/core/src/session/turn_context.rs#L87)).
    Every session/context variant then had to build and mutate that snapshot
    before assembling tools.
    
    This PR removes that master object instead of renaming it. Tool planning
    now reads the live `TurnContext`, where `codex-core` already owns those
    decisions, while `codex-tools` keeps only reusable primitives and a
    generic `ToolSetBuilder`/`ToolSet` accumulator.
    
    ## What Changed
    
    - Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the
    crate keeps the shared helpers that still belong there, including
    request-user-input mode selection, shell backend/type resolution,
    `UnifiedExecShellMode`, and `ToolEnvironmentMode`.
    - Replaced config-snapshot planning with `ToolRouter::from_turn_context`
    and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider
    capabilities, auth gates, model support, feature gates, environment
    count, goal support, multi-agent options, web search, and image
    generation from the authoritative turn state.
    - Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the
    small core adapter needed to accumulate `CoreToolRuntime` values and
    hosted model specs.
    - Added the `tool_family::shell` registration module and moved
    shell/unified-exec/memory accounting call sites to read the narrow
    per-turn fields directly.
    - Narrowed `TurnContext` to the remaining explicit per-turn fields
    needed by planning: `available_models`, `unified_exec_shell_mode`, and
    `goal_tools_supported`.
    - Reworked MCP exposure and tool-search setup so deferred/direct MCP
    behavior is driven by the current turn rather than a precomputed config
    snapshot.
    - Replaced the large expected-spec fixture tests with focused
    behavior-level coverage for shell tools, environments, goal and
    agent-job gates, MCP direct/deferred exposure, tool search,
    request-plugin-install, code mode, multi-agent mode, hosted tools, and
    extension executor dispatch.
    
    ## Verification
    
    - `cargo check -p codex-tools`
    - `cargo check -p codex-core --lib`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core spec_plan --lib`
    - `cargo test -p codex-core router --lib`
  • Remove ToolSearch feature toggle (#23389)
    ## Summary
    - mark `ToolSearch` as removed and ignore stale config writes for its
    legacy key
    - make search tool exposure depend only on model capability, not a
    feature toggle
    - remove app-server enablement support and prune now-obsolete test
    coverage/setup
    
    ## Verification
    - `cargo test -p codex-features`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core search_tool_requires_model_capability`
    - `cargo test -p codex-app-server experimental_feature_enablement_set_`
    
    ## Notes
    - This keeps the legacy config key as a no-op for compatibility while
    removing the ability to toggle the behavior off cleanly.
    - No developer-facing docs update outside the touched app-server README
    was needed.