94 Commits

  • [codex] wire process-owned code mode host into core (#30142)
    ## Summary
    
    - add the `code_mode_host` feature flag and select
    `ProcessOwnedCodeModeSessionProvider` in `CodeModeService` when enabled
    - initialize code-mode sessions lazily so a missing host reports a tool
    error without failing thread startup
    - resolve `codex-code-mode-host` beside the running Codex binary by
    default while preserving `CODEX_CODE_MODE_HOST_PATH` as an override
    - add unit and end-to-end coverage for host resolution and graceful
    missing-host behavior
    
    ## Why
    
    This wires the process-owned session client from #30112 into the core
    service behind an opt-in rollout gate. Packaged Codex installations can
    place the helper in the same `bin` directory as the main executable
    without relying on `PATH`, while development and custom installations
    can continue to override the helper path.
    
    ## Stack
    
    - Depends on #30112
    - Base branch: `cconger/process-owned-session-runtime-4-client`
    
    ## Validation
    
    Build `codex` and `codex-code-mode-host`
    `CODEX_CODE_MODE_HOST_PATH="$PWD/target/debug/codex-code-mode-host"
    ./target/debug/codex --enable code_mode_host`
  • Represent MCP authentication with an enum (#29924)
    ## Why
    
    MCP authentication has distinct OAuth and ChatGPT-session flows.
    Representing that choice as `use_chatgpt_auth` makes one flow implicit
    and allows the configuration model to express the distinction only
    through a boolean.
    
    ChatGPT credential forwarding also needs a first-party trust boundary. A
    configurable `chatgpt_base_url` controls routing, but must not grant an
    MCP server permission to receive session credentials.
    
    This change builds on #29733, where the boolean was introduced.
    
    ## What changed
    
    - Replace `use_chatgpt_auth` with an `auth` field backed by the
    exhaustive `McpServerAuth` enum.
    - Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
    the default.
    - Trust only the origin derived from the existing hardcoded
    `CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
    - Keep configured bearer tokens and authorization headers ahead of the
    selected authentication flow.
    - Update config writers, schema output, fixtures, and integration-test
    setup to use the enum.
    
    ## Verification
    
    Integration coverage exercises the complete streamable HTTP startup path
    in two independent configurations:
    
    - A directly constructed MCP configuration verifies that matching an
    overridden `chatgpt_base_url` does not grant ChatGPT auth.
    - A persisted `config.toml` containing an attacker-controlled
    `chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
    through normal config parsing.
    
    Both tests complete MCP initialization and tool listing and assert that
    the full captured request sequence contains no authorization headers.
    Separate integration coverage verifies that configured authorization
    takes precedence over ChatGPT auth.
  • Allow ChatGPT-hosted MCP servers to use session auth (#29733)
    ## Why
    
    ChatGPT session authentication was inferred from the reserved Codex Apps
    server name. That couples credential routing to Codex Apps-specific
    behavior and prevents other MCP endpoints hosted by ChatGPT from
    explicitly using the current session.
    
    The opt-in also needs a clear security boundary: an arbitrary MCP
    configuration must not be able to redirect ChatGPT credentials to
    another origin.
    
    ## What changed
    
    - Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
    `false`.
    - Honor the setting only when the parsed server URL has the same HTTP(S)
    origin as the configured `chatgpt_base_url`; otherwise remove the
    capability before startup.
    - Resolve bearer tokens and static or environment-backed authorization
    headers before selecting authentication, with configured authorization
    taking precedence over ChatGPT session auth.
    - Enable the setting for the built-in Codex Apps and hosted plugin
    runtime endpoints while keeping Codex Apps caching and tool
    normalization scoped to the reserved server.
    - Persist the setting through MCP config rewrite paths and expose it in
    the generated config schema.
    - Load the current login state for `codex mcp list` so reported auth
    status matches runtime behavior.
    
    ## Verification
    
    Core integration coverage exercises the complete streamable HTTP MCP
    startup path and verifies that:
    
    - a same-origin opted-in server receives the current ChatGPT access
    token;
    - an explicitly configured authorization header takes precedence;
    - a different-origin server completes MCP initialization and tool
    listing without receiving any ChatGPT authorization header.
  • [codex] nest sleep config under current time reminder (#29910)
    ## Summary
    
    - move sleep tool enablement from top-level `[features].sleep_tool` to
    `[features.current_time_reminder].sleep_tool`
    - remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
    from resolved current-time configuration
    - update config schema, config-lock materialization, and existing sleep
    coverage
    
    Stacked on #29907.
  • fix: scope context remaining to body window (#29665)
    ## Why
    
    With `model_auto_compact_token_limit_scope = "body_after_prefix"`, the
    persistent prefix should not count against the active body window.
    `get_context_remaining` and the token-budget reminder should report the
    same usable body-after-prefix window that auto-compaction uses, rather
    than the total token count since the session began.
    
    This is stacked on #29664 so the mechanical move from `turn.rs` is
    isolated from the behavior fix.
    
    ## What
    
    - Extends `ContextWindowTokenStatus` with `context_remaining_tokens`.
    - Updates `get_context_remaining` to use the shared context-window
    accounting.
    - Adds integration coverage for body-after-prefix reminder timing and
    `get_context_remaining` output.
    
    ## Testing
    
    - `just test -p codex-core body_after_prefix_window`
    - `just test -p codex-core auto_compact_body_after_prefix`
    - `just fix -p codex-core`
  • [core] debounce current-time reminders by elapsed time (#29659)
    ## Summary
    - rename `reminder_interval_model_requests` to
    `reminder_interval_seconds`
    - read the configured time provider before every model request and
    inject a reminder only after the configured number of seconds has
    elapsed
    - preserve immediate first delivery and forced delivery after compaction
    changes the context window
    
    ## Tests
    - `just test -p codex-core current_time_reminder`
  • remove flag for image preparation (#29429)
    ## What
    
    - make Fjord's centralized response-item image preparation unconditional
    for new and resumed history
    - have local user images and `view_image` outputs always defer decoding
    and resizing to that path
    - retain `resize_all_images` as an ignored, removed compatibility key
    for released clients
    - delete the flag-off producer paths and obsolete policy-specific tests
    
    ## Why
    
    Centralized preparation is now the intended image path. Keeping the
    runtime feature checks also kept two image-processing implementations
    alive and allowed client config to select the legacy behavior.
    
    This is a clean replacement for #28975, rebuilt from the latest `main`.
    
    ## How
    
    `prepare_response_items` now runs whenever items enter history and
    whenever persisted history is reconstructed. Producers emit deferred
    image data, so malformed images become the existing model-visible
    placeholder instead of failing the session at the producer.
    
    ## Test plan
    
    - `just fmt`
    - `just fix -p codex-core -p codex-features`
    - `just test -p codex-features` — 52 passed
    - focused affected `codex-core` set — 20 passed
    - `just test -p codex-core handle_accepts_explicit_high_detail` — 1
    passed
    - full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated
    environment failures from read-only `~/.codex` SQLite state and
    unavailable integration helper binaries
  • Add indexed web search mode (#28489)
    ## Summary
    
    - Add `web_search = "indexed"` alongside `disabled`, `cached`, and
    `live`.
    - Use that same resolved mode for both hosted and standalone web search.
    - For hosted search, send `index_gated_web_access: true` with external
    web access enabled only when `indexed` is selected.
    - For standalone search, preserve the existing boolean wire values for
    existing modes (`cached` maps to `false` and `live` to `true`) and send
    `"indexed"` only for `indexed`; `disabled` keeps the tool unavailable.
    - Carry the mode through managed configuration requirements and
    generated schemas.
    
    ## Why
    
    Indexed search provides a middle ground between cached-only search and
    unrestricted live page fetching. Search queries can remain live while
    direct page fetches are limited to URLs admitted by the server.
    
    The existing `web_search` setting remains the single source of truth, so
    hosted and standalone executors cannot drift into different access
    modes. Without an explicit `indexed` selection, the existing
    model-visible tool and request shapes are unchanged.
    
    ```toml
    web_search = "indexed"
    
    [features]
    standalone_web_search = true
    ```
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-api` (`126 passed`)
    - `just test -p codex-web-search-extension` (`7 passed`)
    - `just test -p codex-core
    code_mode_can_call_indexed_standalone_web_search` (`1 passed`)
    - Focused configuration, hosted request, standalone request, and
    managed-requirement coverage is included in the PR; remaining suites run
    in CI.
    
    The full workspace test suite was not run locally.
  • [codex] add clock current-time tool (#29011)
    ## Summary
    - expose `clock.curr_time` when current-time reminders are enabled
    - query the session's configured time provider with the calling thread
    id
    - return the existing UTC reminder text for direct model calls
    - return `{ "current_time": "YYYY-MM-DD HH:MM:SS UTC" }` in Code Mode
    
    Clock lookup failures remain fatal, matching pre-inference reminder
    behavior.
    
    ## Testing
    - `just test -p codex-core current_time_tool_returns_the_latest_time`
    - `just test -p codex-core
    code_mode_current_time_returns_structured_result`
    - `just fix -p codex-core`
  • [codex] Test code-mode variable truncation (#28471)
    ## Summary
    
    Code mode has two separate truncation points: the nested tool result
    returned to JavaScript and the code-mode output later recorded for the
    model. These tests now verify those behaviors independently.
    
    - Report whether `result.output` was truncated before printing it.
    - Verify omitted or sufficiently large nested limits produce `Variable
    truncated: False`, while allowing the printed value to be truncated
    downstream.
    - Verify an explicit nested limit produces `Variable truncated: True`
    when the command output exceeds it.
    - Use a token-policy model fixture so downstream truncation is visible
    as `…N tokens truncated…`.
    - Align the explicit nested-truncation expectation with the warning
    header.
    
    This PR changes test coverage only; runtime truncation behavior is
    unchanged.
    
    ## Validation
    
    - `env -u CODEX_SANDBOX_NETWORK_DISABLED RUST_MIN_STACK=8388608 cargo
    test -p codex-core --test all code_mode_exec -- --nocapture` (8 passed)
  • code-mode: extend test coverage to lock in cell lifecycle (#28468)
    This PR establishes the intended behavior as an executable contract
    before a refactor of the cell runtime begins. It also fixes cases where
    a second observer or termination request could replace an existing
    response channel and leave the original caller unresolved.
    
    ### Behavior codified
    - A cell can yield output and subsequently resume to completion.
    - A caller can run a cell until it has no immediately runnable work,
    receive its accumulated output and outstanding tool-call IDs, and then
    resume the same cell when the awaited work is available.
    - Each cell admits one active observer:
       - a second observer receives an explicit busy error
       - the existing observer remains registered and is not displaced
    - A natural result (conclusion of the js module) that has already
    reached the cell controller wins over a later termination request.
    - Otherwise, termination preempts execution and resolves both:
      - the active observer, if present
      - the caller requesting termination
    - Repeated termination requests are rejected while termination is
    already in progress.
    - Terminal responses are sent only after outstanding callback work has
    been handled:
    - natural completion drains notifications and cancels outstanding tool
    calls
    - termination cancels and drains both notification and tool callbacks.
    - Cell removal and cell_closed notification happen after callback
    cleanup
  • [codex] Warn clearly when code mode output is truncated (#28467)
    ## Summary
    
    - make `formatted_truncate_text` prepend `Warning: truncated output
    (original token count: N)` above the existing `Total output lines`
    header
    - update direct formatter, unified-exec, user-shell, and code-mode
    expectations
    - add core unit coverage that runs in Bazel without requiring the
    skipped V8-backed code-mode integration suite
    
    ## Validation
    
    - `cargo test -p codex-utils-output-truncation -- --nocapture` (17
    passed)
    - `cargo test -p codex-core --lib
    truncated_text_output_starts_with_warning -- --nocapture`
    - `cargo test -p codex-core --test all
    clamps_model_requested_max_output_tokens_to_policy -- --nocapture` (2
    passed)
    - `cargo test -p codex-core --test all
    unified_exec_formats_large_output_summary -- --nocapture`
    - `cargo test -p codex-core --test all
    user_shell_command_output_is_truncated_in_history -- --nocapture`
    - Bazel CI exercises the shared formatter and downstream integration
    expectations
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • Represent dynamic tools with explicit namespaces internally (#27365)
    Follow-up to #27356.
    
    ## Stack note
    
    This PR changes Codex's internal dynamic-tool shape while leaving
    `thread/start` unchanged. App-server therefore converts the existing
    per-tool input into explicit functions and namespaces before passing it
    to core.
    
    [#27371](https://github.com/openai/codex/pull/27371) updates
    `thread/start` to use the same explicit shape and removes this temporary
    conversion.
    
    ## Why
    
    Dynamic tools repeat namespace metadata on every function. Core should
    keep one explicit namespace with its member tools so descriptions and
    membership stay consistent across sessions and runtime planning.
    
    ## What changed
    
    - Represent dynamic tools as top-level functions or explicit namespaces
    in protocol and session state.
    - Read old flat rollout metadata and write the canonical hierarchy.
    - Flatten namespace members only when registering callable tools.
    - Keep `thread/start.dynamicTools` flat for now and normalize it at the
    app-server boundary.
    
    New builds can read old rollout metadata. Older builds cannot read newly
    written hierarchical metadata.
    
    ## Test plan
    
    - `just test -p codex-app-server
    thread_start_normalizes_legacy_dynamic_tools_into_model_request`
    - `just test -p codex-protocol
    session_meta_normalizes_legacy_dynamic_tools`
    - `just test -p codex-core
    resume_restores_dynamic_tools_from_rollout_with_sqlite_enabled`
    - `just test -p codex-core
    tool_search_returns_deferred_dynamic_tool_and_routes_follow_up_call`
    - `just test -p codex-core code_mode_can_call_hidden_dynamic_tools`
    - `just test -p codex-tools`
  • [code-mode] Reject remote image URLs from output helpers (#27732)
    ## Summary
    
    - reject HTTP(S) image URLs from the shared code-mode output-image
    normalization path
    - return a concise model-visible tool error so the model can recover on
    its next turn
    - apply the targeted rejection to both `image()` and `generatedImage()`
    - leave other non-empty image URL values to existing downstream handling
    
    The returned error is:
    
    > Tool call failed: remote image URLs are not supported in tool outputs.
    Pass a base64 data URI instead
    
    ## Why
    
    Responses Lite cannot lower a remote image URL emitted from a structured
    tool output. Rejecting HTTP(S) values in the Codex harness preserves the
    tool-call metadata and gives the model a recoverable next turn instead
    of invalidating the sample.
    
    ## Test coverage
    
    The regression is covered primarily by a `test_codex()` agent
    integration test that simulates the Responses API exchange and asserts
    the failed model-visible exec output. A supplemental runtime test covers
    both `http://` and `https://` inputs across both image output helpers.
    
    ## Test plan
    
    - `cd codex-rs && just test -p codex-code-mode`
    - `cd codex-rs && just test -p codex-code-mode-protocol`
    - `cd codex-rs && just test -p codex-core
    code_mode_image_helper_rejects_remote_url`
    - `cd codex-rs && just fmt`
    - `git diff --check origin/main...HEAD`
    
    Related context: https://github.com/openai/openai/pull/1022346
  • Keep request_user_input direct-model only (#27316)
    ## Why
    
    `request_user_input` has direct blocking semantics when invoked by the
    model. When it is exposed as a nested code-mode tool, the call has to
    flow through code-mode waiting and continuation behavior instead, which
    is not the behavior we want for this user-input request surface.
    
    ## What changed
    
    - Mark `request_user_input` with `ToolExposure::DirectModelOnly` when
    registering the core utility tool.
    - Keep `request_user_input` direct-model visible, including in
    code-mode-only planning.
    - Add focused `spec_plan_tests` coverage that verifies
    `request_user_input` remains visible and registered as
    direct-model-only, while it is omitted from the nested code-mode tool
    description.
    
    No active goal suppression or runtime unavailability behavior is
    included in this PR.
    
    ## Validation
    
    - No new build/test run for this housekeeping pass, per maintainer
    request.
    - Earlier targeted run, confirmed from session context: `just test -p
    codex-core request_user_input` passed.
  • Include thread id in token budget context (#27663)
    ## Why
    
    The token budget full-context fragment identifies the current context
    window, but not the thread that owns that window. Including the thread
    id makes the initial context-window metadata self-contained, and
    `get_context_remaining` also needs to be usable from Code Mode without
    forcing callers to parse the model-facing fragment string.
    
    ## What changed
    
    - Include the session thread id in the initial `<token_budget>` context
    fragment.
    - Expose `get_context_remaining` as a Code Mode nested tool while
    keeping `new_context` direct-model-only.
    - Keep direct model-facing `get_context_remaining` output as the
    existing `<token_budget>` text fragment.
    - Return only `tokens_left` from the Code Mode structured result for
    `get_context_remaining`.
    - Update token-budget integration tests and add Code Mode coverage for
    the structured result.
    
    ## Verification
    
    - `just test -p codex-core token_budget`
    - `just test -p codex-core
    code_mode_get_context_remaining_returns_structured_result`
    - `just test -p core_test_support redacted_text_mode_normalizes_uuids`
  • core: resize all history images behind a feature flag (#27247)
    ## Summary
    
    Adds complete client-side image preparation behind the default-off
    `resize_all_images` feature flag.
    
    When enabled, local image producers defer decoding and resizing. Images
    are prepared centrally before insertion into conversation history,
    covering user input, `view_image`, and structured tool-output images.
    
    ## Behavior
    
    - Processes base64 `data:` images in messages and function/custom tool
    outputs.
    - Leaves non-data URLs, including HTTP(S) URLs, unchanged.
    - Applies image-detail budgets:
      - `high` and omitted: 2048px maximum dimension and 2.5K 32px patches.
      - `original`: 6000px maximum dimension and 10K 32px patches.
      - `auto`: uses the same 2048px / 2.5K-patch budget as high.
      - `low`: unsupported and replaced with an actionable placeholder.
    - Preserves original image bytes when no resize or format conversion is
    needed.
    - Enforces the shared 1 GiB encoded and decoded data-URL sanity limits.
    - Replaces only an image that fails preparation, preserving sibling
    content and tool-output metadata.
    - Uses bounded placeholders distinguishing generic processing failures,
    oversized images, and unsupported `low` detail.
    - Prepares resumed and forked history before installing it as live
    history without modifying persisted rollouts.
    
    ## Flag-Off Behavior
    
    When `resize_all_images` is disabled:
    
    - Existing local user-input and `view_image` processing remains
    unchanged.
    - Existing decoding and error behavior remains unchanged.
    - Arbitrary tool-output images are not processed.
    - HTTP(S) image URLs continue to be forwarded unchanged.
    
    
    #### [git stack](https://github.com/magus/git-stack-cli)
    -  `1` https://github.com/openai/codex/pull/27245
    - 👉 `2` https://github.com/openai/codex/pull/27247
    -  `3` https://github.com/openai/codex/pull/27246
    -  `4` https://github.com/openai/codex/pull/27266
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • [codex] Enable standalone web search in code mode (#26719)
    ## What
    
    - Consume plaintext `output` from standalone search while retaining
    optional `encrypted_output` parsing.
    - Expose `web.run` to code mode and return search output to nested
    JavaScript calls.
    - Cover direct and code-mode standalone search paths with integration
    tests.
    
    ## Why
    
    `/v1/alpha/search` now returns plaintext output, which code mode needs
    to consume standalone search results.
    
    ## Test plan
    
    - `just test -p codex-api`
    - `just test -p codex-web-search-extension`
    - `just test -p codex-core code_mode_can_call_standalone_web_search`
    - `just test -p codex-app-server
    standalone_web_search_round_trips_output`
  • Require absolute cwd in thread settings (#26532)
    ## Why
    
    Thread settings cwd overrides are expected to be resolved before they
    enter core. Keeping this boundary as a plain `PathBuf` made it easy for
    core/session code to keep fallback normalization and relative-path
    resolution logic in places that should only receive an already-resolved
    cwd.
    
    This is intentionally the absolute-cwd-only slice: it does not change
    environment selection stickiness or cwd-to-default-environment fallback
    behavior.
    
    ## What changed
    
    - Changes `ThreadSettingsOverrides.cwd`,
    `CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
    use `AbsolutePathBuf`.
    - Removes core-side cwd normalization/resolution from session settings
    updates.
    - Updates affected core/app-server test helpers and callsites to pass
    existing absolute cwd values or use `abs()` helpers.
    
    ## Validation
    
    Opening as draft so CI can start while local validation continues.
  • ci: test windows cross build (#25000)
    We cross build when using bazel for windows. This causes a couple
    hiccups in that v8 does a mksnapshot step that is expecting to snapshot
    on the host arch which wasn't matching when we were doing the
    crossbuild. This was causing segfault failiures when starting up
    codemode from a cross built artifact.
    
    This changes things such that we cross build the library and then run
    and link a snapshot on the host machine/arch which is windows. This
    gives us a functional snapshot and library that can start code-mode on
    windows.
    
    This fixes the build and then fixes two test regressions we had.
  • core: allow excluding tool namespaces from code mode (#26320)
    ## Why
    
    Research and training setups need to control which tool namespaces
    appear inside code mode's nested `tools` surface without disabling those
    tools entirely. This makes it possible to train against a deliberately
    reduced nested-tool setup while preserving the normal direct and
    deferred tool paths.
    
    ## What
    
    - Extend `features.code_mode` to accept structured configuration while
    preserving the existing boolean syntax.
    - Add an exact `excluded_tool_namespaces` list under
    `[features.code_mode]`:
    
      ```toml
      [features.code_mode]
      enabled = true
      excluded_tool_namespaces = ["mcp__codex_apps", "multi_agent_v1"]
      ```
    
    - Filter matching canonical `ToolName` namespaces when constructing code
    mode's nested router and code-mode-specific direct tool descriptions.
    - Keep excluded tools registered, directly exposed in mixed code mode,
    and discoverable through top-level `tool_search` when otherwise
    eligible.
    - Derive deferred nested-tool guidance after namespace filtering so the
    `exec` description does not advertise excluded-only deferred tools.
    - Preserve the boolean/table representation when materializing config
    locks and update the generated config schema.
    
    ## Testing
    
    - `just test -p codex-features`
    - `just test -p codex-config`
    - `just test -p codex-core load_config_resolves_code_mode_config`
    - `just test -p codex-core
    lock_contains_prompts_and_materializes_features`
    - `just test -p codex-core
    excluded_deferred_namespaces_do_not_enable_nested_tool_guidance`
    - `just test -p codex-core
    code_mode_excludes_configured_nested_tool_namespaces`
    - `cargo check -p codex-thread-manager-sample`
  • app-server: remove experimental persist_extended_history bool flag (#25712)
    ## Summary
    
    Remove the dead experimental `persistExtendedHistory` app-server flag
    and collapse rollout persistence to the single policy app-server already
    used.
    
    ## What Changed
    
    - Removed `persistExtendedHistory` from v2 thread start/resume/fork
    params and deleted its deprecation notice path.
    - Removed the persistence-mode enums and plumbing through core, rollout,
    and thread-store.
    - Made rollout filtering mode-free, keeping the existing limited
    persisted-history behavior.
    
    ## Test Plan
    
    - `just write-app-server-schema`
    - `cargo nextest run --no-fail-fast -p codex-app-server-protocol
    schema_fixtures`
    - `cargo nextest run --no-fail-fast -p codex-app-server
    thread_shell_command_history_responses_exclude_persisted_command_executions`
    - `cargo nextest run --no-fail-fast -p codex-rollout -p
    codex-thread-store`
    - final `rg` for removed flag/type names
  • [codex] Wait for MCP readiness in core integration tests (#24964)
    Ensures MCP-backed `codex-core` integration tests exercise initialized
    servers instead of racing server startup.
    
    I've been idly investigating a few flakes and the failure modes are much
    more confusing when a tool call fails because of a failed server start
    than when the failed server start causes the test to fail directly.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • code-mode: merge stored values by key (#24159)
    ## Summary
    
    Change code-mode stored value updates to merge writes by key instead of
    replacing the session's complete stored-value map after each cell
    completes.
    
    Previously, each cell received a snapshot of stored values and returned
    the complete resulting map. When multiple cells ran concurrently, a
    later completion could overwrite values written by another cell because
    it committed an older snapshot.
    
    This change moves stored-value ownership into `CodeModeService`:
    
    - Each runtime starts from the service's current stored values.
    - Runtime completion reports only keys written by that cell.
    - The service merges those writes into the current stored-value map on
    successful completion.
    - Core no longer replaces its stored-value state from a cell result.
    
    As a result, concurrently executing cells can update different stored
    keys without clobbering one another.
    
    The move into CodeModeService is motivated by a desire to have this
    lifetime tied to a new lifetime object on that side in a subsequent PR.
  • Route MCP servers through explicit environments (#23583)
    ## Summary
    - route each configured MCP server through an explicit per-server
    `environment_id` instead of a manager-wide remote toggle
    - default omitted `environment_id` to `local`, resolve named ids through
    `EnvironmentManager`, and fail only the affected MCP server when an
    explicit id is unknown
    - keep local stdio on the existing local launcher path for now, while
    named-environment stdio uses the selected environment backend and
    requires an absolute `cwd`
    - allow local HTTP MCP servers to keep using the ambient HTTP client
    when no local `Environment` is configured; named-environment HTTP MCPs
    use that environment's HTTP client
    
    ## Validation
    - devbox Bazel build: `bazel build --bes_backend= --bes_results_url=
    //codex-rs/cli:codex //codex-rs/rmcp-client:test_stdio_server
    //codex-rs/rmcp-client:test_streamable_http_server`
    - devbox app-server config matrix with real `config.toml` /
    `environments.toml` files covering omitted local, explicit local,
    omitted local under remote default, explicit remote stdio, local HTTP
    without local env, explicit remote HTTP, local stdio without local env,
    unknown explicit env, and remote stdio without `cwd`
  • [codex] Preserve raw code-mode exec output by default (#23564)
    ## Why
    Code mode can use nested unified exec calls as data sources. When those
    calls omit `max_output_tokens`, code mode should receive raw command
    output so the script can parse or summarize it itself. When code mode
    does provide `max_output_tokens`, that explicit nested budget should be
    respected, including values above the default unified exec limit, rather
    than being capped before code mode sees the result.
    
    ## What
    - Preserve direct unified exec truncation behavior, while letting
    code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
    explicitly supplied.
    - Make code-mode tool results use raw output when no explicit limit is
    present, and use the explicit nested limit directly when one is
    specified.
    - Refactor unified exec output formatting so `truncated_output` takes
    the caller-selected token budget.
    - Add e2e integration coverage for explicit nested exec limits, omitted
    nested exec limits, outer exec limit propagation, omitted-limit outputs
    that exceed both the default and a small truncation policy, explicit
    nested limits above those caps, and high explicit limits that still
    compact larger command output.
    - Reuse the code-mode turn setup helper while directly asserting the
    exact exec output item in each test.
    
    ## Testing
    - `just fmt`
    - `git diff --check`
    - Not run locally per repo guidance; CI should validate the e2e
    integration tests.
  • [3 of 7] Remove UserTurn (#23075)
    **Stack position:** [3 of 7]
    
    ## Summary
    
    This PR finishes the input-op consolidation by moving the remaining
    `Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`.
    This touches a lot of files, but it is a low-risk mechanical migration.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075) (this PR)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • Remove ToolSearch feature toggle (#23389)
    ## Summary
    - mark `ToolSearch` as removed and ignore stale config writes for its
    legacy key
    - make search tool exposure depend only on model capability, not a
    feature toggle
    - remove app-server enablement support and prune now-obsolete test
    coverage/setup
    
    ## Verification
    - `cargo test -p codex-features`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core search_tool_requires_model_capability`
    - `cargo test -p codex-app-server experimental_feature_enablement_set_`
    
    ## Notes
    - This keeps the legacy config key as a no-op for compatibility while
    removing the ability to toggle the behavior off cleanly.
    - No developer-facing docs update outside the touched app-server README
    was needed.
  • Preserve image detail in app-server inputs (#20693)
    ## Summary
    
    - Add optional image detail to user image inputs across core, app-server
    v2, thread history/event mapping, and the generated app-server
    schemas/types.
    - Preserve requested detail when serializing Responses image inputs:
    omitted detail stays on the existing `high` default, while explicit
    `original` keeps local images on the original-resolution path.
    - Support `high`/`original` consistently for tool image outputs,
    including MCP `codex/imageDetail`, code-mode image helpers, and
    `view_image`.
  • chore(features) rm Feature::ApplyPatchFreeform (#22711)
    ## Summary
    Removes the feature since this is effectively on by default in all cases
    where we should use it, or can be configured via models.json.
    
    ## Testing
    - [x] unit tests pass
  • Support explicit MCP OAuth client IDs (#22575)
    ## Why
    Some MCP OAuth providers require a pre-registered public client ID and
    cannot rely on dynamic client registration. Codex already supports MCP
    OAuth, but it had no way to supply that client ID from config into the
    PKCE flow.
    
    ## What changed
    - add `oauth.client_id` under `[mcp_servers.<server>]` config, including
    config editing and schema generation
    - thread the configured client ID through CLI, app-server, plugin login,
    and MCP skill dependency OAuth entrypoints
    - configure RMCP authorization with the explicit client when present,
    while preserving the existing dynamic-registration path when it is
    absent
    - add focused coverage for config parsing/serialization and OAuth URL
    generation
    
    ## Verification
    - `cargo test -p codex-config -p codex-rmcp-client -p codex-mcp -p
    codex-core-plugins`
    - `cargo test -p codex-core blocking_replace_mcp_servers_round_trips
    --lib`
    - `cargo test -p codex-core
    replace_mcp_servers_streamable_http_serializes_oauth_resource --lib`
    - `cargo test -p codex-core config_schema_matches_fixture --lib`
    
    ## Notes
    Broader local package runs still hit unrelated pre-existing stack
    overflows in:
    - `codex-app-server::in_process_start_clamps_zero_channel_capacity`
    -
    `codex-core::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
  • Prune unused code-mode globals (#20542)
    Hide Atomics, SharedArrayBuffer, and WebAssembly from the code-mode
    runtime since the harness does not expose worker support or need those
    APIs.
  • Make thread store process-scoped (#19474)
    - Build one app-server process ThreadStore from startup config and share
    it with ThreadManager and CodexMessageProcessor.
    - Remove per-thread/fork store reconstruction so effective thread config
    cannot switch the persistence backend.
    - Add params to ThreadStore create/resume for specifying thread
    metadata, since otherwise the metadata from store creation would be used
    (incorrectly).
  • Add ThreadManager sample crate (#20141)
    Summary:
    - Add codex-thread-manager-sample, a one-shot binary that starts a
    ThreadManager thread, submits a prompt, and prints the final assistant
    output.
    - Pass ThreadStore into ThreadManager::new and expose
    thread_store_from_config for existing callsites.
    - Build the sample Config directly with only --model and prompt inputs.
    
    Verification:
    - just fmt
    - cargo check -p codex-thread-manager-sample -p codex-app-server -p
    codex-mcp-server
    - git diff --check
    
    Tests: Not run per request.
  • core tests: configure profiles directly (#20015)
    ## Summary
    - Replace legacy sandbox config setup in delegate and telemetry tests
    with direct `PermissionProfile` configuration.
    - Move no-sandbox and read-only test turns in `tools.rs`,
    `code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
    legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
    leaving the deny-glob read-only compatibility case for a later targeted
    cleanup.
    - Use `PermissionProfile::read_only()` where tests need managed
    read-only behavior and `PermissionProfile::Disabled` where they
    intentionally need no sandbox.
    - Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
    files after #20013 to 22 files.
    
    ## Testing
    - `cargo check -p codex-core --tests`
    - `just fmt`
  • tui: carry permission profiles on user turns (#18285)
    ## Why
    
    Per-turn permission overrides should use the same canonical profile
    abstraction as session configuration. That lets TUI submissions preserve
    exact configured permissions without round-tripping through legacy
    sandbox fields.
    
    ## What changed
    
    This adds `permission_profile` to user-turn operations, threads it
    through TUI/app-server submission paths, fills the new field in existing
    test fixtures, and adds coverage that composer submission includes the
    configured profile.
    
    ## Verification
    
    - `cargo test -p codex-tui permissions -- --nocapture`
    - `cargo test -p codex-core --test all permissions_messages --
    --nocapture`
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
    * #18288
    * #18287
    * #18286
    * __->__ #18285
  • Add turn-scoped environment selections (#18416)
    ## Summary
    - add experimental turn/start.environments params for per-turn
    environment id + cwd selections
    - pass selections through core protocol ops and resolve them with
    EnvironmentManager before TurnContext creation
    - treat omitted selections as default behavior, empty selections as no
    environment, and non-empty selections as first environment/cwd as the
    turn primary
    
    ## Testing
    - ran `just fmt`
    - ran `just write-app-server-schema`
    - not run: unit tests for this stacked PR
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • [tool search] support namespaced deferred dynamic tools (#18413)
    Deferred dynamic tools need to round-trip a namespace so a tool returned
    by `tool_search` can be called through the same registry key that core
    uses for dispatch.
    
    This change adds namespace support for dynamic tool specs/calls,
    persists it through app-server thread state, and routes dynamic tool
    calls by full `ToolName` while still sending the app the leaf tool name.
    Deferred dynamic tools must provide a namespace; non-deferred dynamic
    tools may remain top-level.
    
    It also introduces `LoadableToolSpec` as the shared
    function-or-namespace Responses shape used by both `tool_search` output
    and dynamic tool registration, so dynamic tools use the same wrapping
    logic in both paths.
    
    Validation:
    - `cargo test -p codex-tools`
    - `cargo test -p codex-core tool_search`
    
    ---------
    
    Co-authored-by: Sayan Sisodiya <sayan@openai.com>
  • Update models.json (#18586)
    - Replace the active models-manager catalog with the deleted core
    catalog contents.
    - Replace stale hardcoded test model slugs with current bundled model
    slugs.
    - Keep this as a stacked change on top of the cleanup PR.
  • Update image outputs to default to high detail (#18386)
    Do not assume the default `detail`.
  • Add server-level approval defaults for custom MCP servers (#17843)
    ## Summary
    - Add `default_tools_approval_mode` support for custom MCP server
    configs, matching the existing `codex_apps` behavior
    - Apply approval precedence as per-tool override, then server default,
    then `auto`
    - Update config serialization, CLI display, schema generation, docs, and
    tests
    
    ## Testing
    - `cargo check -p codex-config`
    - `cargo check -p codex-core`
    - `just write-config-schema`
    - `just fmt`
    - `cargo test -p codex-config`
    - Targeted `codex-core` tests for config parsing, config writes, and MCP
    approval precedence
    - `just fix -p codex-config -p codex-core`
  • [code mode] defer mcp tools from exec description (#17287)
    ## Summary
    - hide deferred MCP/app nested tool descriptions from the `exec` prompt
    in code mode
    - add short guidance that omitted nested tools are still available
    through `ALL_TOOLS`
    - cover the code_mode_only path with an integration test that discovers
    and calls a deferred app tool
    
    ## Motivation
    `code_mode_only` exposes only top-level `exec`/`wait`, but the `exec`
    description could still include a large nested-tool reference. This
    keeps deferred nested tools callable while avoiding that prompt bloat.
    
    ## Tests
    - `just fmt`
    - `just fix -p codex-code-mode`
    - `just fix -p codex-tools`
    - `cargo test -p codex-code-mode
    exec_description_mentions_deferred_nested_tools_when_available`
    - `cargo test -p codex-tools
    create_code_mode_tool_matches_expected_spec`
    - `cargo test -p codex-core
    code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
  • [1/8] Add MCP server environment config (#18085)
    ## Summary
    - Add an MCP server environment setting with local as the default.
    - Thread the default through config serialization, schema generation,
    and existing config fixtures.
    
    ## Stack
    ```text
    o  #18027 [8/8] Fail exec client operations after disconnect
    │
    o  #18025 [7/8] Cover MCP stdio tests with executor placement
    │
    o  #18089 [6/8] Wire remote MCP stdio through executor
    │
    o  #18088 [5/8] Add executor process transport for MCP stdio
    │
    o  #18087 [4/8] Abstract MCP stdio server launching
    │
    o  #18020 [3/8] Add pushed exec process events
    │
    o  #18086 [2/8] Support piped stdin in exec process API
    │
    @  #18085 [1/8] Add MCP server environment config
    │
    o  main
    ```
    
    Co-authored-by: Codex <noreply@openai.com>