37 Commits

  • [codex] Use tool search for MCP tools by default (#29486)
    ## Why
    
    MCP tools were only placed behind `tool_search` when a feature flag was
    enabled or when there were at least 100 tools. That made the model's
    tool flow depend on both rollout configuration and the number of
    installed tools.
    
    The searched-tool flow is now the intended behavior. Making it
    unconditional when the model and provider support it gives every
    supported setup the same behavior and lets us retire the feature flag
    safely.
    
    ## What changed
    
    - Defer all effective MCP tools when `tool_search` and namespaced tools
    are supported.
    - Keep exposing MCP tools directly when search cannot be used, so older
    or unsupported model/provider combinations still work.
    - Mark `tool_search_always_defer_mcp_tools` as removed and ignore old
    configured values.
    - Keep plugin filtering, app-only filtering, file handling, and MCP
    calls working through the searched-tool flow.
    
    ## Why many tests changed
    
    Many tests used to act as if the model could see MCP tools in its first
    request and call them immediately. That is no longer the real flow: the
    model first receives `tool_search`, searches for a tool, receives the
    matching MCP tool, and then calls it in the next request.
    
    The tests therefore needed an extra search step, and checks for tool
    names, descriptions, and input fields had to move from the first request
    to the search result. These are not separate product changes; they make
    the tests follow what the model will actually see after this change. The
    plugin tests still check which tools are allowed and where they came
    from, the file tests still check upload fields and behavior, and the MCP
    round-trip test still checks a successful call from start to finish.
    
    ## Tests
    
    - `just test -p codex-features`
    - Focused `codex-core` tests for MCP exposure and tool planning
    - `just test -p codex-core explicit_plugin_mentions`
    - `just test -p codex-core stdio_server_round_trip`
    - Focused `codex-core` tests for tool search, app-only tools, and MCP
    file uploads
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • [codex] Dedupe plugin MCPs by app declaration name (#27607)
    ## Context
    
    This is the next step in the plugin auth-routing stack. The earlier PRs
    make `PluginsManager` auth-aware and move the broad App/MCP surface
    decision into that layer. This PR narrows the ChatGPT/SIWC behavior so
    we only hide a plugin MCP server when it conflicts with an App
    declaration of the same name.
    
    In product terms: if a plugin exposes both an App route and MCP route
    for `foo`, ChatGPT/SIWC sessions should use the App route for `foo`. If
    the same plugin also exposes a separate MCP server like `foo2`, that MCP
    server should remain available.
    
    ```json
    // .app.json
    {
      "apps": {
        "foo": {
          "id": "connector_abc"
        }
      }
    }
    ```
    
    ```json
    // .mcp.json
    {
      "mcpServers": {
        "foo": {
          "url": "https://mcp.foo.com/mcp"
        },
        "foo2": {
          "url": "https://mcp.foo2.com/mcp"
        }
      }
    }
    ```
    
    ## Stack
    
    - PR1: #27652 seed plugin manager auth at construction.
    - PR2: #27459 route plugin surfaces by auth mode.
    - PR3: #27607 dedupe plugin MCP servers by App declaration name.
    - PR4: #27602 preserve plugin Apps in connector listings.
    - PR5: #27461 skip install-time plugin MCP OAuth for matching App
    routes.
    
    ## Summary
    
    - Preserve App declaration names in loaded plugin metadata.
    - Keep public effective App outputs as deduped connector IDs for
    existing callers.
    - For ChatGPT/SIWC, suppress only plugin MCP servers whose names match
    declared App names.
    
    ## Validation
    
    ```bash
    cargo fmt --all
    cargo test -p codex-core-plugins plugin_auth_projection
    cargo test -p codex-core-plugins effective_apps
    cargo test -p codex-core-plugins read_plugin_for_config_installed_git_source_reads_from_cache_without_cloning
    cargo test -p codex-core explicit_plugin_mentions_use_apps_for_chatgpt_dual_surface_plugins
    cargo test -p codex-core explicit_plugin_mentions_keep_non_conflicting_mcp_for_chatgpt_auth
    cargo test -p codex-app-server --test all plugin_install_filters_disallowed_apps_needing_auth
    git diff --check
    ```
    
    ---------
    
    Co-authored-by: Xin Lin <xl@openai.com>
  • [codex] Gate plugin MCP servers by auth route (#27459)
    ## Context
    
    Some plugins expose both Apps and MCP servers. This PR moves auth-aware
    surface projection into `core-plugins::PluginsManager`, so callers get a
    consistent effective plugin view. Later PRs narrow the conflict rule and
    update listing/install paths.
    
    The high level goal of this PR is to set up the plumbing to
    conditionally filter App/MCP in the plugin manager layer. We start by
    removing MCP servers when using SIWC/Codex-backend auth, and removing
    Apps when using API-key-style auth.
    
    This PR is now stacked on #27652, which contains only the constructor
    plumbing for seeding `PluginsManager` with the current auth mode.
    
    ## Stack
    
    - PR1: #27652 seed plugin manager auth at construction.
    - PR2: #27459 route plugin surfaces by auth mode.
    - PR3: #27607 dedupe plugin MCP servers by App declaration name.
    - PR4: #27602 preserve plugin Apps in connector listings.
    - PR5: #27461 skip install-time plugin MCP OAuth for matching App
    routes.
    
    ## Summary
    
    - API-key/non-ChatGPT routes hide plugin Apps and keep plugin MCPs.
    - ChatGPT/SIWC with Apps enabled keeps plugin Apps and suppresses MCPs
    for dual-surface plugins.
    - MCP-only plugins stay available for ChatGPT/SIWC sessions.
    - Cached plugin load outcomes are re-projected when auth mode changes.
    
    ## Validation
    
    ```bash
    cargo test -p codex-core-plugins plugin_auth_projection
    cargo test -p codex-core list_tool_suggest_discoverable_plugins
    git diff --check
    ```
  • fix(plugins) rm plugin descriptions (#23254)
    ## Summary
    Removes Plugin descriptions from the dev message, since descriptions of
    skills and MCPs cover the capabilities offered by the plugin.
    
    ## Testing
    - [x] Updates unit tests
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • log plugin MCP server names (#26002)
    ## Summary
    - emit the plugin capability summary's exact MCP server names in
    `codex_plugin_used`
    
    ## Test
    - `just test -p codex-analytics`
    - `just test -p codex-core
    explicit_plugin_mentions_track_plugin_used_analytics`
    - `just fix -p codex-analytics`
  • flake: Keep plugin test homes alive (#25857)
    ## Summary
    
    Keep the full `TestCodex` harness alive in plugin integration tests
    instead of returning only the `CodexThread`.
    
    ## Why
    
    The helper was moving a temporary `codex_home` into `TestCodex`, then
    immediately dropping the harness and returning only the thread. For
    plugin MCP tests, the MCP server cwd is inside that temporary home. If
    the temp directory is removed while MCP startup is still racing, the
    server launch can fail with `No such file or directory`.
    
    Keeping the harness in scope keeps the temp home alive for the test
    duration and removes the lifetime race behind the recent
    `explicit_plugin_mentions_inject_plugin_guidance` flake.
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-core
    explicit_plugin_mentions_inject_plugin_guidance`
  • [codex] Wait for MCP readiness in core integration tests (#24964)
    Ensures MCP-backed `codex-core` integration tests exercise initialized
    servers instead of racing server startup.
    
    I've been idly investigating a few flakes and the failure modes are much
    more confusing when a tool call fails because of a failed server start
    than when the failed server start causes the test to fail directly.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • [1 of 7] Add thread settings to UserInput (#23080)
    **Stack position:** [1 of 7]
    
    ## Summary
    
    The first three PRs in this stack are a cleanup pass before the actual
    thread settings API work.
    
    Today, core has several overlapping "user input" ops: `UserInput`,
    `UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
    much next-turn state they carry, which makes the later queued thread
    settings update harder to reason about and review.
    
    This PR starts that cleanup by adding the shared
    `ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
    it. Existing variants remain in place here, so this layer is mostly a
    behavior-preserving API shape change plus mechanical constructor
    updates.
    
    ## End State After PR3
    
    By the end of PR3, `Op::UserInput` is the only "user input" core op. It
    can carry optional thread settings overrides for callers that need to
    update stored defaults with a turn, while callers without updates use
    empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
    deleted.
    
    ## End State After PR5
    
    By the end of PR5, core will have only two ops for this area:
    
    - `Op::UserInput` for user-input-bearing submissions.
    - `Op::ThreadSettings` for settings-only updates.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080) (this PR)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • Remove core MCP list tools op (#21281)
    ## Why
    
    The core `Op::ListMcpTools` request path is no longer needed. Keeping it
    around left a dead request/response surface alongside the app-server MCP
    inventory APIs that own current server status listing.
    
    ## What Changed
    
    - Removed `Op::ListMcpTools`, `EventMsg::McpListToolsResponse`, and the
    core handler that built the MCP snapshot response.
    - Removed the now-unused `codex-mcp` snapshot wrapper/export and passive
    event handling arms in rollout and MCP-server consumers.
    - Updated tests that used the old op as a synchronization hook to wait
    on existing startup/skills events, and deleted the plugin test that only
    exercised the removed listing op.
    
    ## Validation
    
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-rollout -p codex-rollout-trace -p
    codex-mcp-server`
    - `cargo test -p codex-core --test all
    pending_input::queued_inter_agent_mail`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_mcp_tool_call_includes_sandbox_state_meta`
    - `cargo test -p codex-core --test all
    rmcp_client::stdio_image_responses`
    - `just fix -p codex-core -p codex-protocol -p codex-mcp -p
    codex-rollout -p codex-rollout-trace -p codex-mcp-server`
  • Support Codex Apps auth elicitations (#19193)
    ## Summary
    
    - request URL-mode MCP elicitations when Codex Apps tool calls fail with
    connector auth metadata
    - route Codex Apps auth URL elicitations into the TUI app-link flow
    
    ## Test plan
    
    - `just fmt`
    - `cargo test -p codex-core mcp_tool_call::tests`
    - `cargo test -p codex-mcp`
    - `cargo test -p codex-tui bottom_pane::app_link_view::tests`
    - `just fix -p codex-core`
    - `just fix -p codex-mcp`
    - `just fix -p codex-tui`
    
    Also attempted broader local runs:
    
    - `cargo test -p codex-core` fails in unrelated
    config/request-permission/proxy-sensitive tests under the current Codex
    Desktop environment.
    - `cargo test -p codex-tui` fails in unrelated status
    snapshots/trust-default tests because the ambient environment renders
    workspace-write/network permission defaults.
  • Stabilize plugin MCP fixture tests (#19452)
    ## Why
    
    Recent `main` CI had repeated flakes in the plugin fixture tests:
    
    - `codex-core::all
    suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
    in runs
    [24909500958](https://github.com/openai/codex/actions/runs/24909500958),
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    [24906197645](https://github.com/openai/codex/actions/runs/24906197645),
    and
    [24898949647](https://github.com/openai/codex/actions/runs/24898949647).
    - `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
    in runs
    [24909500958](https://github.com/openai/codex/actions/runs/24909500958),
    [24908076251](https://github.com/openai/codex/actions/runs/24908076251),
    and
    [24898949647](https://github.com/openai/codex/actions/runs/24898949647).
    
    The failures were in the same plugin/MCP fixture family: assertions
    expected sample plugin guidance or tool inventory, but the test could
    observe the session before the sample MCP server had finished startup.
    
    ## Root Cause
    
    `explicit_plugin_mentions_inject_plugin_guidance` submitted the user
    turn immediately after constructing the session. MCP startup is
    asynchronous, so on a slower or busier CI runner the prompt could be
    built before the sample plugin MCP server had reported its tools. That
    made the test depend on scheduler timing rather than the fixture being
    ready.
    
    `plugin_mcp_tools_are_listed` already needed the same readiness
    condition, but its wait logic was local to that test.
    
    ## What Changed
    
    - Added a shared `wait_for_sample_mcp_ready` helper for the plugin
    fixture tests.
    - Wait for `McpStartupComplete` before submitting the explicit plugin
    mention turn.
    - Reuse the same readiness helper in the MCP tool-listing test.
    
    ## Why This Should Be Reliable
    
    The tests now wait for the explicit readiness signal from the sample MCP
    server before asserting guidance or tools derived from that server. This
    removes the startup race while still exercising the real fixture path,
    so the assertions should only run after the plugin inventory is
    deterministic.
    
    ## Verification
    
    - `cargo test -p codex-core --test all plugins::`
    - GitHub CI for this PR is passing.
  • Stabilize plugin MCP tools test (#19191)
    ## Summary
    
    The plugin MCP tool-listing test could hide MCP startup failures by
    polling `ListMcpTools` until its own 30s deadline. If the plugin MCP
    server startup had already failed or timed out, the session-owned MCP
    manager would keep returning an empty tool list, so CI only reported
    `discovered tools: []` instead of the startup state that mattered.
    
    This makes the test synchronize on `McpStartupComplete` for the sample
    plugin MCP server before asserting listed tools, and gives the
    Bazel-launched test server a larger startup window.
    
    ## Notes
    
    Confidence is about 80%. The source path strongly supports the RCA: a
    failed MCP startup is represented as an empty tool list through
    `ListMcpTools`, so the old polling contract could not distinguish "not
    ready yet" from "startup already failed." I could not retrieve the CI
    execution-log artifact to confirm the exact hidden startup error, but
    the observed Ubuntu Bazel failure matches this path: repeated
    `ListMcpTools` responses with no tools until the test-local timeout
    fired.
    
    I think this is the right solution because it keeps plugin behavior
    unchanged and fixes only the test contract. Future startup failures
    should now report the `McpStartupComplete` failure/cancellation instead
    of timing out on an empty tool snapshot.
    
    This test was introduced in https://github.com/openai/codex/pull/12864.
  • Add turn-scoped environment selections (#18416)
    ## Summary
    - add experimental turn/start.environments params for per-turn
    environment id + cwd selections
    - pass selections through core protocol ops and resolve them with
    EnvironmentManager before TurnContext creation
    - treat omitted selections as default behavior, empty selections as no
    environment, and non-empty selections as first environment/cwd as the
    turn primary
    
    ## Testing
    - ran `just fmt`
    - ran `just write-app-server-schema`
    - not run: unit tests for this stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Update models.json (#18586)
    - Replace the active models-manager catalog with the deleted core
    catalog contents.
    - Replace stale hardcoded test model slugs with current bundled model
    slugs.
    - Keep this as a stacked change on top of the cleanup PR.
  • register all mcp tools with namespace (#17404)
    stacked on #17402.
    
    MCP tools returned by `tool_search` (deferred tools) get registered in
    our `ToolRegistry` with a different format than directly available
    tools. this leads to two different ways of accessing MCP tools from our
    tool catalog, only one of which works for each. fix this by registering
    all MCP tools with the namespace format, since this info is already
    available.
    
    also, direct MCP tools are registered to responsesapi without a
    namespace, while deferred MCP tools have a namespace. this means we can
    receive MCP `FunctionCall`s in both formats from namespaces. fix this by
    always registering MCP tools with namespace, regardless of deferral
    status.
    
    make code mode track `ToolName` provenance of tools so it can map the
    literal JS function name string to the correct `ToolName` for
    invocation, rather than supporting both in core.
    
    this lets us unify to a single canonical `ToolName` representation for
    each MCP tool and force everywhere to use that one, without supporting
    fallbacks.
  • Forward app-server turn clientMetadata to Responses (#16009)
    ## Summary
    App-server v2 already receives turn-scoped `clientMetadata`, but the
    Rust app-server was dropping it before the outbound Responses request.
    This change keeps the fix lightweight by threading that metadata through
    the existing turn-metadata path rather than inventing a new transport.
    
    ## What we're trying to do and why
    We want turn-scoped metadata from the app-server protocol layer,
    especially fields like Hermes/GAAS run IDs, to survive all the way to
    the actual Responses API request so it is visible in downstream
    websocket request logging and analytics.
    
    The specific bug was:
    - app-server protocol uses camelCase `clientMetadata`
    - Responses transport already has an existing turn metadata carrier:
    `x-codex-turn-metadata`
    - websocket transport already rewrites that header into
    `request.request_body.client_metadata["x-codex-turn-metadata"]`
    - but the Rust app-server never parsed or stored `clientMetadata`, so
    nothing from the app-server request was making it into that existing
    path
    
    This PR fixes that without adding a new header or a second metadata
    channel.
    
    ## How we did it
    ### Protocol surface
    - Add optional `clientMetadata` to v2 `TurnStartParams` and
    `TurnSteerParams`
    - Regenerate the JSON schema / TypeScript fixtures
    - Update app-server docs to describe the field and its behavior
    
    ### Runtime plumbing
    - Add a dedicated core op for app-server user input carrying turn-scoped
    metadata: `Op::UserInputWithClientMetadata`
    - Wire `turn/start` and `turn/steer` through that op / signature path
    instead of dropping the metadata at the message-processor boundary
    - Store the metadata in `TurnMetadataState`
    
    ### Transport behavior
    - Reuse the existing serialized `x-codex-turn-metadata` payload
    - Merge the new app-server `clientMetadata` into that JSON additively
    - Do **not** replace built-in reserved fields already present in the
    turn metadata payload
    - Keep websocket behavior unchanged at the outer shape level: it still
    sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
    string now contains the merged fields
    - Keep HTTP fallback behavior unchanged except that the existing
    `x-codex-turn-metadata` header now includes the merged fields too
    
    ### Request shape before / after
    Before, a websocket `response.create` looked like:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
      }
    }
    ```
    Even if the app-server caller supplied `clientMetadata`, it was not
    represented there.
    
    After, the same request shape is preserved, but the serialized payload
    now includes the new turn-scoped fields:
    ```json
    {
      "type": "response.create",
      "client_metadata": {
        "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
      }
    }
    ```
    
    ## Validation
    ### Targeted tests added / updated
    - protocol round-trip coverage for `clientMetadata` on `turn/start` and
    `turn/steer`
    - protocol round-trip coverage for `Op::UserInputWithClientMetadata`
    - `TurnMetadataState` merge test proving client metadata is added
    without overwriting reserved built-in fields
    - websocket request-shape test proving outbound `response.create`
    contains merged metadata inside
    `client_metadata["x-codex-turn-metadata"]`
    - app-server integration tests proving:
    - `turn/start` forwards `clientMetadata` into the outbound Responses
    request path
      - websocket warmup + real turn request both behave correctly
      - `turn/steer` updates the follow-up request metadata
    
    ### Commands run
    - `just write-app-server-schema`
    - `cargo test -p codex-app-server-protocol`
    - `cargo test -p codex-protocol`
    - `cargo test -p codex-core
    turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
    --lib`
    - `cargo test -p codex-core --test all
    responses_websocket_preserves_custom_turn_metadata_fields`
    - `cargo test -p codex-app-server --test all client_metadata`
    - `cargo test -p codex-app-server --test all
    turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
    -- --nocapture`
    - `just fmt`
    - `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
    -p codex-app-server`
    - `just fix -p codex-exec -p codex-tui-app-server`
    - `just argument-comment-lint`
    
    ### Full suite note
    `cargo test` in `codex-rs` still fails in:
    -
    `suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
    
    I verified that same failure on a clean detached `HEAD` worktree with an
    isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
  • core: remove cross-crate re-exports from lib.rs (#16512)
    ## Why
    
    `codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
    which made downstream crates depend on `codex-core` as a proxy module
    instead of the actual owner crate.
    
    Removing those forwards makes crate boundaries explicit and lets leaf
    crates drop unnecessary `codex-core` dependencies. In this PR, this
    reduces the dependency on `codex-core` to `codex-login` in the following
    files:
    
    ```
    codex-rs/backend-client/Cargo.toml
    codex-rs/mcp-server/tests/common/Cargo.toml
    ```
    
    ## What
    
    - Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
    `codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
    `codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
    `codex-tools`, and `codex-utils-path`.
    - Delete the `default_client` forwarding shim in `codex-rs/core`.
    - Update in-crate and downstream callsites to import directly from the
    owning `codex-*` crate.
    - Add direct Cargo dependencies where callsites now target the owner
    crate, and remove `codex-core` from `codex-rs/backend-client`.
  • [codex-analytics] thread events (#15690)
    - add event for thread initialization
    - thread/start, thread/fork, thread/resume
    - feature flagged behind `FeatureFlag::GeneralAnalytics`
    - does not yet support threads started by subagents
    
    PR stack:
    - --> [[telemetry] thread events
    #15690](https://github.com/openai/codex/pull/15690)
    - [[telemetry] subagent events
    #15915](https://github.com/openai/codex/pull/15915)
    - [[telemetry] turn events
    #15591](https://github.com/openai/codex/pull/15591)
    - [[telemetry] steer events
    #15697](https://github.com/openai/codex/pull/15697)
    - [[telemetry] queued prompt data
    #15804](https://github.com/openai/codex/pull/15804)
    
    
    Sample extracted logs in Codex-backend
    ```
    INFO     | 2026-03-29 16:39:37 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3bf7-9f5f-7f82-9877-6d48d1052531 product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=new subagent_source=None parent_thread_id=None created_at=1774827577 | 
    INFO     | 2026-03-29 16:45:46 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3b84-5731-79d0-9b3b-9c6efe5f5066 product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=resumed subagent_source=None parent_thread_id=None created_at=1774820022 | 
    INFO     | 2026-03-29 16:45:49 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3bfd-4cd6-7c12-a13e-48cef02e8c4d product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=forked subagent_source=None parent_thread_id=None created_at=1774827949 | 
    INFO     | 2026-03-29 17:20:29 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3c1d-0412-7ed2-ad24-c9c0881a36b0 product_surface=codex product_client_id=CODEX_SERVICE_EXEC client_name=codex_exec client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=new subagent_source=None parent_thread_id=None created_at=1774830027 | 
    ```
    
    Notes
    - `product_client_id` gets canonicalized in codex-backend
    - subagent threads are addressed in a following pr
  • Split features into codex-features crate (#15253)
    - Split the feature system into a new `codex-features` crate.
    - Cut `codex-core` and workspace consumers over to the new config and
    warning APIs.
    
    Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
    Co-authored-by: Codex <noreply@openai.com>
  • [apps] Add tool call meta. (#14647)
    - [x] Add resource_uri and other things to _meta to shortcut resource
    lookup and speed things up.
  • move plugin/skill instructions into dev msg and reorder (#14609)
    Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
    of `user_instructions` and into the developer message, with new `Apps ->
    Skills -> Plugins` order for better clarity.
    
    Also wrap those sections in stable XML-style instruction tags (like
    other sections) and update prompt-layout tests/snapshots. This makes the
    tests less brittle in snapshot output (we can parse the sections), and
    it consolidates the capability instructions in one place.
    
    #### Tests
    Updated snapshots, added tests.
    
    `<AGENTS_MD>` disappearing in snapshots is expected: before this change,
    the wrapped user-instructions message was kept alive by `Skills`
    content. Now that `Skills` and `Plugins` are in the developer message,
    that wrapper only appears when there is real
    project-doc/user-instructions content.
    
    ---------
    
    Co-authored-by: Charley Cunningham <ccunningham@openai.com>
  • Normalize MCP tool names to code-mode safe form (#14605)
    Code mode doesn't allow `-` in names and it's better if function names
    and code-mode names are the same.
  • chore: clarify plugin + app copy in model instructions (#14541)
    - clarify app mentions are in user messages
    - clarify what it means for tools to be provided via `codex_apps` MCP
    - add plugin descriptions (with basic sanitization) to top-level `##
    Plugins` section alongside the corresponding plugin names
    - explain that skills from plugins are prefixed with `plugin_name:` in
    top-level `##Plugins` section
    
    changes to more logically organize `Apps`, `Skills`, and `Plugins`
    instructions will be in a separate PR, as that shuffles dev + user
    instructions in ways that change tests broadly.
    
    ### Tests
    confirmed in local rollout, some new tests.
  • Add plugin usage telemetry (#14531)
    adding metrics including: 
    * plugin used
    * plugin installed/uninstalled
    * plugin enabled/disabled
  • feat: search_tool migrate to bring you own tool of Responses API (#14274)
    ## Why
    
    to support a new bring your own search tool in Responses
    API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
    we migrating our bm25 search tool to use official way to execute search
    on client and communicate additional tools to the model.
    
    ## What
    - replace the legacy `search_tool_bm25` flow with client-executed
    `tool_search`
    - add protocol, SSE, history, and normalization support for
    `tool_search_call` and `tool_search_output`
    - return namespaced Codex Apps search results and wire namespaced
    follow-up tool calls back into MCP dispatch
  • [apps] Fix apps enablement condition. (#14011)
    - [x] Fix apps enablement condition to check both the feature flag and
    that the user is not an API key user.
  • feat: structured plugin parsing (#13711)
    #### What
    
    Add structured `@plugin` parsing and TUI support for plugin mentions.
    
    - Core: switch from plain-text `@display_name` parsing to structured
    `plugin://...` mentions via `UserInput::Mention` and
    `[$...](plugin://...)` links in text, same pattern as apps/skills.
    - TUI: add plugin mention popup, autocomplete, and chips when typing
    `$`. Load plugin capability summaries and feed them into the composer;
    plugin mentions appear alongside skills and apps.
    - Generalize mention parsing to a sigil parameter, still defaults to `$`
    
    <img width="797" height="119" alt="image"
    src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
    />
    
    Builds on #13510. Currently clients have to build their own `id` via
    `plugin@marketplace` and filter plugins to show by `enabled`, but we
    will add `id` and `available` as fields returned from `plugin/list`
    soon.
    
    ####Tests
    
    Added tests, verified locally.
  • add @plugin mentions (#13510)
    ## Note-- added plugin mentions via @, but that conflicts with file
    mentions
    
    depends and builds upon #13433.
    
    - introduces explicit `@plugin` mentions. this injects the plugin's mcp
    servers, app names, and skill name format into turn context as a dev
    message.
    - we do not yet have UI for these mentions, so we currently parse raw
    text (as opposed to skills and apps which have UI chips, autocomplete,
    etc.) this depends on a `plugins/list` app-server endpoint we can feed
    the UI with, which is upcoming
    - also annotate mcp and app tool descriptions with the plugin(s) they
    come from. this gives the model a first class way of understanding what
    tools come from which plugins, which will help implicit invocation.
    
    ### Tests
    Added and updated tests, unit and integration. Also confirmed locally a
    raw `@plugin` injects the dev message, and the model knows about its
    apps, mcps, and skills.
  • feat: track plugins mcps/apps and add plugin info to user_instructions (#13433)
    ### first half of changes, followed by #13510
    
    Track plugin capabilities as derived summaries on `PluginLoadOutcome`
    for enabled plugins with at least one skill/app/mcp.
    
    Also add `Plugins` section to `user_instructions` injected on session
    start. These introduce the plugins concept and list enabled plugins, but
    do NOT currently include paths to enabled plugins or details on what
    apps/mcps the plugins contain (current plan is to inject this on
    @-mention). that can be adjusted in a follow up and based on evals.
    
    ### tests
    Added/updated tests, confirmed locally that new `Plugins` section +
    currently enabled plugins show up in `user_instructions`.
  • config: enforce enterprise feature requirements (#13388)
    ## Why
    
    Enterprises can already constrain approvals, sandboxing, and web search
    through `requirements.toml` and MDM, but feature flags were still only
    configurable as managed defaults. That meant an enterprise could suggest
    feature values, but it could not actually pin them.
    
    This change closes that gap and makes enterprise feature requirements
    behave like the other constrained settings. The effective feature set
    now stays consistent with enterprise requirements during config load,
    when config writes are validated, and when runtime code mutates feature
    flags later in the session.
    
    It also tightens the runtime API for managed features. `ManagedFeatures`
    now follows the same constraint-oriented shape as `Constrained<T>`
    instead of exposing panic-prone mutation helpers, and production code
    can no longer construct it through an unconstrained `From<Features>`
    path.
    
    The PR also hardens the `compact_resume_fork` integration coverage on
    Windows. After the feature-management changes,
    `compact_resume_after_second_compaction_preserves_history` was
    overflowing the libtest/Tokio thread stacks on Windows, so the test now
    uses an explicit larger-stack harness as a pragmatic mitigation. That
    may not be the ideal root-cause fix, and it merits a parallel
    investigation into whether part of the async future chain should be
    boxed to reduce stack pressure instead.
    
    ## What Changed
    
    Enterprises can now pin feature values in `requirements.toml` with the
    requirements-side `features` table:
    
    ```toml
    [features]
    personality = true
    unified_exec = false
    ```
    
    Only canonical feature keys are allowed in the requirements `features`
    table; omitted keys remain unconstrained.
    
    - Added a requirements-side pinned feature map to
    `ConfigRequirementsToml`, threaded it through source-preserving
    requirements merge and normalization in `codex-config`, and made the
    TOML surface use `[features]` (while still accepting legacy
    `[feature_requirements]` for compatibility).
    - Exposed `featureRequirements` from `configRequirements/read`,
    regenerated the JSON/TypeScript schema artifacts, and updated the
    app-server README.
    - Wrapped the effective feature set in `ManagedFeatures`, backed by
    `ConstrainedWithSource<Features>`, and changed its API to mirror
    `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
    and result-returning `enable` / `disable` / `set_enabled` helpers.
    - Removed the legacy-usage and bulk-map passthroughs from
    `ManagedFeatures`; callers that need those behaviors now mutate a plain
    `Features` value and reapply it through `set(...)`, so the constrained
    wrapper remains the enforcement boundary.
    - Removed the production loophole for constructing unconstrained
    `ManagedFeatures`. Non-test code now creates it through the configured
    feature-loading path, and `impl From<Features> for ManagedFeatures` is
    restricted to `#[cfg(test)]`.
    - Rejected legacy feature aliases in enterprise feature requirements,
    and return a load error when a pinned combination cannot survive
    dependency normalization.
    - Validated config writes against enterprise feature requirements before
    persisting changes, including explicit conflicting writes and
    profile-specific feature states that normalize into invalid
    combinations.
    - Updated runtime and TUI feature-toggle paths to use the constrained
    setter API and to persist or apply the effective post-constraint value
    rather than the requested value.
    - Updated the `core_test_support` Bazel target to include the bundled
    core model-catalog fixtures in its runtime data, so helper code that
    resolves `core/models.json` through runfiles works in remote Bazel test
    environments.
    - Renamed the core config test coverage to emphasize that effective
    feature values are normalized at runtime, while conflicting persisted
    config writes are rejected.
    - Ran `compact_resume_after_second_compaction_preserves_history` inside
    an explicit 8 MiB test thread and Tokio runtime worker stack, following
    the existing larger-stack integration-test pattern, to keep the Windows
    `compact_resume_fork` test slice from aborting while a parallel
    investigation continues into whether some of the underlying async
    futures should be boxed.
    
    ## Verification
    
    - `cargo test -p codex-config`
    - `cargo test -p codex-core feature_requirements_ -- --nocapture`
    - `cargo test -p codex-core
    load_requirements_toml_produces_expected_constraints -- --nocapture`
    - `cargo test -p codex-core
    compact_resume_after_second_compaction_preserves_history -- --nocapture`
    - `cargo test -p codex-core compact_resume_fork -- --nocapture`
    - Re-ran the built `codex-core` `tests/all` binary with
    `RUST_MIN_STACK=262144` for
    `compact_resume_after_second_compaction_preserves_history` to confirm
    the explicit-stack harness fixes the deterministic low-stack repro.
    - `cargo test -p codex-core`
    - This still fails locally in unrelated integration areas that expect
    the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
    wiremock mismatches.
    
    ## Docs
    
    `developers.openai.com/codex` should document the requirements-side
    `[features]` table for enterprise and MDM-managed configuration,
    including that it only accepts canonical feature keys and that
    conflicting config writes are rejected.
  • feat: load plugin apps (#13401)
    load plugin-apps from `.app.json`.
    
    make apps runtime-mentionable iff `codex_apps` MCP actually exposes
    tools for that `connector_id`.
    
    if the app isn't available, it's filtered out of runtime connector set,
    so no tools are added and no app-mentions resolve.
    
    right now we don't have a clean cli-side error for an app not being
    installed. can look at this after.
    
    ### Tests
    Added tests, tested locally that using a plugin that bundles an app
    picks up the app.
  • Refactor plugin config and cache path (#13333)
    Update config.toml plugin entries to use
    <plugin_name>@<marketplace_name> as the key.
    Plugin now stays in
    [plugins/cache/marketplace-name/plugin-name/$version/]
    Clean up the plugin code structure.
    Add plugin install functionality (not used yet).
  • feat: load from plugins (#12864)
    Support loading plugins.
    
    Plugins can now be enabled via [plugins.<name>] in config.toml. They are
    loaded as first-class entities through PluginsManager, and their default
    skills/ and .mcp.json contributions are integrated into the existing
    skills and MCP flows.