253 Commits

  • Preserve namespaces on custom tool calls (#30302)
    ## Summary
    
    - Preserve the optional namespace on custom tool calls during response
    deserialization and app-server replay.
    - Use the namespaced tool identifier for streaming argument handling and
    tool dispatch.
    - Regenerate app-server protocol schemas.
    - Add regression tests covering namespace serialization and routing.
    
    ## Testing
    
    - Ran affected protocol and app-server test suites.
    - Ran the full core test suite; two load-sensitive timing tests passed
    when rerun individually.
    - Ran Clippy and formatting checks.
    - Verified with a local end-to-end app-server replay that the namespace
    is preserved through the complete request/response flow.
  • feat(app-server): add history_mode to thread (#29927)
    ## Description
    
    This PR adds a new `historyMode = "legacy" | "paginated"` to `Thread`.
    This will be stored in `SessionMeta` in the JSONL rollout file and as a
    new column in the SQLite thread_metadata table, and exposed on
    `thread/start` and on the `Thread` object in app-server.
    
    ## What changed
    
    - Added canonical `ThreadHistoryMode` with `legacy` and `paginated`,
    defaulting old and new SessionMeta to `legacy`.
    - Carried `history_mode` through core session config, ThreadStore stored
    metadata, local/in-memory stores, rollout metadata extraction, and the
    existing SQLite `threads` table.
    - Added experimental `historyMode` to app-server v2 `Thread` and
    `thread/start`.
    - Made paginated stored threads metadata-discoverable but unsupported
    for legacy full-history reads, `load_history`, live resume, and create
    paths.
    - Regenerated app-server schema fixtures and added
    protocol/state/thread-store/app-server coverage for persistence and
    fail-closed behavior.
    
    ## Compatibility floor
    Because users may be running various versions of Codex binaries on the
    same machine (TUI, Codex App, etc.), we will need to establish a
    compatibility floor for upcoming paginated threads, which will change
    how thread storage reads and writes work.
    
    The overall plan here:
    ```
    Release N:
    - Add historyMode to SessionMeta / Thread / SQLite metadata.
    - Teach binaries to understand paginated threads.
    - If a binary sees `historyMode="paginated"` but does not support the paginated contract, it refuses to resume/mutate the thread.
    - Default remains `"legacy"`.
    
    Release N+1:
    - First-party clients start opting into paginated threads where appropriate.
    - Internal dogfood / staged rollout.
    - Measure old-client usage and paginated-thread unsupported errors.
    
    Release N+2:
    - Only after Release N+ is overwhelmingly deployed, make paginated the default.
    - Accept that a small tail of N-1-or-older binaries may not understand paginated threads.
    ```
    
    The important behavior change is fail-closed handling for a binary that
    encounters a persisted `paginated` thread before it knows how to fully
    support paginated history. In app-server, if a thread is `paginated`, we
    will:
    
    - allow metadata-only discovery paths like `thread/list` and
    `thread/read(includeTurns=false)`, so clients can still see the thread
    and inspect its `historyMode`
    - reject legacy full-history/live-thread paths like
    `thread/read(includeTurns=true)` and `thread/resume` with an unsupported
    JSON-RPC error
    - avoid silently treating an unknown or future `historyMode` as `legacy`
    
    Under the hood, the ThreadStore layer also rejects legacy operations
    that would need to load or replay the full thread history for a
    paginated thread. That gives us the behavior we want for Release N:
    future paginated threads are visible, but this binary fails closed
    instead of trying to operate on them as if they were legacy threads.
  • Retry failed Codex Apps MCP startup (#29920)
    ## Problem
    
    The built-in Codex Apps MCP client shares a future for the full startup
    operation: connect, complete `initialize`, fetch the initial tools, and
    return a usable client. Sharing deduplicates startup work, but it also
    memoizes terminal errors.
    
    After a transient connection, handshake, or initial `tools/list`
    failure, later tool builds observe the same failed future. The thread
    cannot reconnect after the backend recovers and continues serving its
    startup-time cached tool snapshot, which may be empty or stale.
    
    ## Fix
    
    When Apps MCP startup ends in an error, Codex starts bounded recovery
    without putting startup latency on tool-router construction:
    
    1. The current tool build immediately continues with the cached startup
    snapshot.
    2. After the initial failure is reported, Codex starts one fresh full
    startup attempt in the background.
    3. Concurrent tool builds share that in-flight attempt and also continue
    with cached tools.
    4. On success, the recovered client becomes active, refreshes the Apps
    tools cache, emits a `Ready` startup status, and is reused by later
    operations.
    5. On failure, the cache remains unchanged and later tool builds may
    start another background attempt after exponential cooldown: 1s, 2s, 4s,
    8s, 16s, then 30s maximum.
    
    Each recreated startup performs a fresh MCP `initialize` and uncached
    `tools/list`. The MCP client retains its existing bounded retries for
    retryable `initialize` and `tools/list` failures.
    
    This avoids adding the Apps startup timeout to every request during a
    sustained outage.
    
    ## Scope
    
    This is limited to the built-in Codex Apps MCP client:
    
    - no reconnects for user-configured MCP servers;
    - no cache deletion; and
    - no proactive refresh for a healthy client with stale tools.
    
    ## Tests
    
    Coverage verifies:
    
    - tool builds return cached tools without waiting for a blocked
    reconnect;
    - concurrent tool builds start only one background reconnect;
    - failed reconnects preserve cached tools and respect exponential
    cooldown;
    - a recovered client is retained and reused; and
    - a long-lived thread exposes recovered app tools on a later follow-up.
    
    Validation:
    
    - `just test -p codex-mcp` — 95 passed
    - `just test -p codex-core
    later_follow_up_uses_background_recovered_apps_after_mid_thread_startup_failures
    --no-capture` — passed
    - `just fix -p codex-mcp`
    - `just fmt`
  • core: expose permission profile to shell tools (#29941)
    ## tl;dr
    
    Inject a `CODEX_PERMISSION_PROFILE` environment variable with the name
    of the current permission profile when invoking a shell tool.
    
    ## Why
    
    Shell tool owners may need to launch nested commands under the same
    named permission profile, including through `codex sandbox -P PROFILE
    --include-managed-config`. Until now, child processes could observe
    sandbox and network metadata but could not identify the active named
    permission profile.
    
    The `--include-managed-config` flag is essential when a helper
    reconstructs the sandbox from a profile name: it ensures the nested
    sandbox also loads managed enterprise requirements. Without it, using
    the inherited profile could unintentionally create a sandbox that does
    not enforce the organization's managed restrictions.
    
    The new environment value is intentionally informational and **must not
    be treated as trusted input**. Any process in the ancestry can overwrite
    an environment variable, so a consumer that passes this value to `codex
    sandbox -P` must first validate it against the profiles that helper is
    authorized to use.
    
    ## Example Use Case
    
    Suppose an organization provides a trusted `remote-bash` wrapper that
    lets Codex run a command on an approved build host. The local shell
    command uses the named `:workspace` permission profile:
    
    ```toml
    default_permissions = ":workspace"
    ```
    
    The command exposed to the model is a small zsh wrapper. It deliberately
    delegates with `exec`, preserving the original arguments and process
    environment:
    
    ```zsh
    #!/usr/bin/env zsh
    exec /opt/codex-tools/remote_bash.py "$@"
    ```
    
    The model invokes the public wrapper, not its Python implementation:
    
    ```sh
    /opt/codex-tools/remote-bash \
      --host builder.example.com \
      -- printf '%s' 'hello world'
    ```
    
    Only the inner implementation is authorized to escape the local sandbox:
    
    ```starlark
    prefix_rule(
        pattern=["/opt/codex-tools/remote_bash.py"],
        decision="allow",
    )
    ```
    
    With zsh-fork, execution begins with `remote-bash` inside the
    `:workspace` sandbox. When the wrapper calls `exec`, the exact prefix
    rule matches `remote_bash.py`, so that inner script is restarted
    unsandboxed. The escalated process inherits:
    
    ```text
    CODEX_PERMISSION_PROFILE=:workspace
    ```
    
    Inheritance does not make the value trustworthy. `remote_bash.py`
    independently allowlists both the remote host and the permission profile
    before using either value. In particular, a forged value such as
    `:danger-full-access` is rejected before it can reach `codex sandbox
    -P`:
    
    ```python
    import argparse
    import os
    import shlex
    import sys
    
    ALLOWED_HOSTS = {"builder.example.com"}
    ALLOWED_PROFILES = {":workspace"}
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", required=True)
    separator = sys.argv.index("--")
    args = parser.parse_args(sys.argv[1:separator])
    command = sys.argv[separator + 1:]
    
    if args.host not in ALLOWED_HOSTS:
        parser.error("host is not allowlisted")
    if not command:
        parser.error("the remote command must not be empty")
    
    profile = os.environ.get("CODEX_PERMISSION_PROFILE")
    if not profile:
        raise SystemExit("CODEX_PERMISSION_PROFILE must not be empty")
    if profile not in ALLOWED_PROFILES:
        raise SystemExit("CODEX_PERMISSION_PROFILE is not allowlisted")
    
    remote_command = shlex.join(command)
    sandbox_command = shlex.join([
        "codex", "sandbox", "-P", profile,
        "--include-managed-config", "--",
        "bash", "-lc", remote_command,
    ])
    print(shlex.join(["ssh", args.host, sandbox_command]))
    ```
    
    This builds each command layer as an argument vector and uses
    `shlex.join()` at the boundary, rather than interpolating untrusted
    shell text. After validation and parsing, the nested command has this
    structure:
    
    ```text
    ssh argv:
      ["ssh", "builder.example.com", SANDBOX_COMMAND]
    
    SANDBOX_COMMAND argv:
      ["codex", "sandbox", "-P", ":workspace",
       "--include-managed-config", "--",
       "bash", "-lc", "printf %s 'hello world'"]
    
    bash -lc payload argv:
      ["printf", "%s", "hello world"]
    ```
    
    A production implementation could execute that SSH command. The
    integration fixture prints it and parses the result back into arguments,
    verifying the complete flow:
    
    ```text
    model invokes outer wrapper
      -> zsh-fork starts wrapper under :workspace
      -> wrapper execs allowlisted Python script
      -> prefix rule restarts Python script unsandboxed
      -> Python script inherits CODEX_PERMISSION_PROFILE=:workspace
      -> Python script verifies :workspace is allowlisted
      -> remote command runs codex sandbox -P :workspace
         with --include-managed-config
      -> nested sandbox honors managed enterprise requirements
    ```
    
    This gives the trusted helper access to resources outside the local
    sandbox—such as SSH credentials—while ensuring that it can select only
    an explicitly authorized profile and that work on the remote host
    remains subject to the organization's managed requirements.
    
    ## What changed
    
    - Inject `CODEX_PERMISSION_PROFILE` after shell environment policy
    evaluation so the active profile wins over inherited or configured stale
    values.
    - Apply the variable to both `shell_command` and unified `exec_command`,
    including local, zsh-fork, and remote exec-server paths.
    - Remove stale values when the session has no active named profile.
    - Preserve the current profile value when loading a shell snapshot so a
    parent snapshot cannot restore an older profile.
    
    ## Testing
    
    - Added classic-shell integration coverage proving an exact prefix rule
    can run a `require_escalated` script outside the `:workspace` sandbox
    while preserving `CODEX_PERMISSION_PROFILE=:workspace`.
    - Added zsh-fork integration coverage in which the model invokes an
    outer zsh wrapper, an inner allowlisted `remote_bash.py` runs
    unsandboxed, and its printed SSH command reconstructs the inherited
    `:workspace` sandbox with `--include-managed-config` while preserving
    every argument after `--`.
    - The example helper treats `CODEX_PERMISSION_PROFILE` as untrusted and
    validates it against `ALLOWED_PROFILES` before constructing the nested
    command.
    - Assert that the reconstructed sandbox command includes
    `--include-managed-config` so nested use of the inherited profile cannot
    bypass managed enterprise requirements.
    - Added coverage for overriding and removing stale profile values.
    - Verified `shell_command` receives the selected active profile.
    - Added shell snapshot coverage using `printenv
    CODEX_PERMISSION_PROFILE`.
  • feat: add provider-aware model fallback to thread start (#29942)
    ## Why
    
    Helper threads such as task title generation can request a model ID that
    is valid for the default OpenAI provider but unavailable from the active
    provider. With Amazon Bedrock, `gpt-5.4-mini` is rejected while the
    provider static catalog exposes Bedrock model IDs such as
    `openai.gpt-5.5` and `openai.gpt-5.4`. This causes repeated background
    404s and can surface a misleading turn error even when the main turn
    succeeds.
    
    Clients need an explicit way to ask app-server to resolve an unavailable
    helper model to the active provider default. That fallback must remain
    limited to providers with an authoritative static catalog so custom or
    dynamically discovered model IDs are not rewritten based on an
    incomplete catalog.
    
    Fixes #28741.
    
    ## What changed
    
    - Add the experimental `allowProviderModelFallback` option to
    `thread/start`, defaulting to `false` to preserve existing behavior.
    - Thread the option through thread creation and model selection.
    - When enabled for a static model manager, preserve requested models
    present in the catalog and replace unavailable models with the provider
    default.
    - Continue preserving explicit model IDs for dynamic model managers
    without fetching a catalog solely to validate them.
    - Document the new `thread/start` behavior in the app-server API
    overview.
    
    ## Test
    Temporary test-client harness:
    ```
    ThreadStartParams {
        model: Some("gpt-5.4-mini".to_string()),
        allow_provider_model_fallback: true,
        ..Default::default()
    }
    ```
    Command:
    ```
    CODEX_HOME=/tmp/codex-bedrock-thread-start-home \
    CODEX_E2E_BEDROCK_THREAD_START_ONLY=1 \
    ./target/debug/codex-app-server-test-client \
      --codex-bin ./target/debug/codex \
      -c 'model_provider="amazon-bedrock"' \
      send-message-v2 --experimental-api ignored
    ```
    Relevant output:
    ```
    > "method": "thread/start",
    > "params": {
    >   "model": "gpt-5.4-mini",
    >   "modelProvider": null,
    >   "allowProviderModelFallback": true,
    >   ...
    > }
    
    < "result": {
    <   "model": "openai.gpt-5.5",
    <   "modelProvider": "amazon-bedrock",
    <   ...
    < }
    ```
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [codex] Inject agent graph store into ThreadManager (#29736)
    Pick up the AgentGraphStore migration.
    
    - Inject an explicit optional agent graph store into `ThreadManager` 
    - Move all calls to spawn, close, recursive resume, and
    subtree/archive/delete/feedback traversal through it
    - Keep using  `LocalAgentGraphStore` when SQLite is available
    
    This required some changes to the interface to deal with futures:
    
    - The interface now matches `ThreadStore`'s object-safe pattern by
    returning a boxed `AgentGraphStoreFuture` directly, allowing
    `ThreadManager` to hold `Arc<dyn AgentGraphStore>`
    
    *Slight behavior change!* Unfiltered subtree enumeration now performs a
    single all-status breadth-first traversal, so a closed grandchild
    beneath an open edge is included; the previous Open-then-Closed
    traversals could not cross mixed-status paths and silently omitted it.
  • test: add app-server auto environment helper (#29746)
    ## Why
    
    Start moving towards app-server tests defaulting to running against
    remote & foreign OS executors. To do so we need a point of indirection
    similar to core integration tests' `build_with_auto_env`, but with the
    flexibility of letting tests control environment registration if they
    need to.
    
    ## What
    
    This adds:
    
    - `TestAppServer::new_with_auto_env()` for constructing an app server
    with a default environment defined by the test runner (e.g. bazel)
    - `TestAppServer::auto_env_params()` for tests to easily acquire turn
    env params tailored to the automatic environment
    - `TestAppServer::send_thread_start_request_with_auto_env()` to make it
    easy for tests to start a thread using the automatic environment
    
    The above methods all fail if the test calling them has set up an
    environment where the automatic environment configuration conflicts with
    test-created state.
    
    ## Validation
    
    Adds a couple of basic smoke tests to the app-server test suite.
    Follow-ups will migrate more tests to use it.
  • core tests: rename automatic environment builder (#29728)
    ## Why
    
    Use a clearer name for what happens when this helper sets up a test
    environment.
    
    ## What
    
    - Rename the builder and its harness wrapper to use `auto_env` instead
    of `remote_env` because the helper will set up a local environment if
    configured by the build system.
  • test: branch on target OS instead of runner flavor (#29712)
    ## Why
    
    Core tests should branch on the executor's operating system, not on
    runner details such as Docker or Wine. This keeps platform behavior
    stable as new test backends are added and reserves Wine-specific skips
    for actual runner debt.
    
    ## What
    
    - Add `TestTargetOs` and target/host-aware skip helpers while keeping
    `TestEnvironment` internal.
    - Replace topology enum access with remote predicates and a narrow
    Docker accessor.
    - Migrate OS-semantic Wine skips, preserve runner-specific gaps, and
    document the skip taxonomy.
    
    ## Validation
    
    - `just test -p core_test_support`
    - `just test -p codex-core
    remote_test_env_can_connect_and_use_filesystem`
    - `bazel test //codex-rs/core:core-all-wine-exec-test
    --test_output=errors` reached test execution; unrelated existing
    view-image, path, and timing failures remain.
    - `just test -p codex-core` and `just test` reached broad test
    execution; this checkout has unrelated helper, sandbox, and timing
    failures.
  • path-uri: clarify host-native path conversion (#29501)
    ## Why
    
    Downstream refactors are producing confusing code with this
    functionality having a very generic name. Encoding the specific
    conversion approach in the method name makes it clearer.
    
    ## What
    
    Rename `PathUri::from_path` to `PathUri::from_host_native_path` and
    update its Rust call sites.
  • feat(core): store turn_id on ResponseItem metadata (#28360)
    ## Description
    
    This PR is a followup to https://github.com/openai/codex/pull/28355 and
    starts assigning `internal_chat_message_metadata_passthrough.turn_id` to
    durable Responses API items created during a turn.
    
    The goal is that those items keep the `turn_id` that introduced them
    when Codex resends stateless HTTP context, reconstructs history for
    resume/fork paths, or reuses websocket response state.
    
    ## What changed
    
    - Set `internal_chat_message_metadata_passthrough.turn_id` when missing
    as response items enter durable history, initial/replacement history,
    inter-agent communication history, and local compaction summaries.
    - Preserve existing item turn IDs instead of overwriting them during
    persistence, resume reconstruction, compaction, forked history, and
    websocket incremental reuse.
    - Keep `compaction_trigger` fieldless because it is a request control,
    not a durable response item.
    - Update focused history/request assertions and fixtures for stateless
    requests, websocket incrementals, compaction, thread injection, prompt
    debug, and related CI coverage.
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Expose thread-level multi-agent mode (#28792)
    ## Why
    
    Once multi-agent mode can be selected per turn, clients also need to
    choose the initial selection when creating a thread and observe that
    selection through lifecycle and settings APIs.
    
    The selected value is intentionally distinct from the effective
    model-visible value: no client selection is represented as `null`, even
    though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
    its effective default.
    
    ## What changed
    
    - Add the optional experimental `thread/start.multiAgentMode` parameter
    and pass it through thread creation.
    - Preserve an omitted initial value as an unset selection rather than
    eagerly storing `explicitRequestOnly`.
    - Apply an explicit `thread/start` selection to the first turn through
    the session configuration established at thread creation.
    - Restore the latest persisted effective mode as the selected baseline
    on cold resume when rollout history contains one.
    - Inherit the optional selected mode from a loaded parent when creating
    related runtime threads.
    - Return the current selected `multiAgentMode` from `thread/start`,
    `thread/resume`, `thread/fork`, and thread settings, using `null` when
    no mode is selected.
    - Keep lifecycle reporting independent from model capability and feature
    eligibility; core turn construction remains responsible for calculating
    and persisting the effective mode.
    
    ## Not covered
    
    - Clearing an existing loaded-session selection back to unset through
    `turn/start`; omitted or `null` currently retains the session's
    selection.
    - A TUI control, slash command, or `config.toml` preference.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`
    
    The focused app-server coverage verifies explicit `thread/start`
    initialization, first-turn prompting, nullable reporting for an omitted
    selection, and retention of selections that are not currently
    runtime-eligible.
    
    ## Stack
    
    Stacked on #28685. This PR contains only the thread initialization and
    lifecycle/settings API layer.
  • [plugins] Refresh plugin and tool caches after remote install (#28951)
    Summary
    - Refresh the installed remote-plugin snapshot and Codex Apps tools
    after completing a remote JIT install.
    - Gate `completed: true` on every expected `app_connector_id` appearing
    after the uncached `tools/list` refresh, while continuing to skip local
    bundle verification for server-side installs.
    - Keep the cached recommendations response and filter refreshed
    installed remote IDs locally, so this does not add another
    recommendations fetch.
    - Add regression coverage for tools appearing after the hard refresh and
    remaining absent after the refresh. The resumed model request sees the
    refreshed tool router when installation completes.
    
    Root Cause
    - Remote suggestions from `openai-curated-remote` returned `true` before
    taking the existing connector refresh path, leaving the resumed turn
    with the pre-install Apps tool catalog.
    
    Validation
    - `just test -p codex-core request_plugin_install`
    - `just test -p codex-core-plugins
    recommended_plugin_candidates_filter_installed_and_disabled_plugins`
    - `just test -p codex-core-plugins`
    - `just fix -p codex-core-plugins`
    - `just fix -p codex-core`
    - `just fmt`
    - `just test -p codex-core` was not fully clean locally: 2,729 passed,
    26 failed, and 16 skipped. The failures were dominated by local
    Seatbelt/network/timing issues, including plugin-install timeouts under
    full-suite contention; the focused plugin-install runs pass.
  • core: load AGENTS.md from foreign environments (#28958)
    ## Why
    
    Make it possible to load AGENTS.md from remote exec-servers whose OS is
    different than app-server.
    
    ## What
    
    - keep `AGENTS.md` discovery and provenance as `PathUri`, with
    root-aware parent and ancestor traversal
    - expose lifecycle instruction sources as legacy app-server path strings
    in events while retaining `PathUri` internally
    - preserve and test mixed POSIX and Windows paths in model context and
    TUI status output
    - cover remote Windows loading end to end by seeding the Wine prefix
    through host filesystem APIs
    - fix bug in `PathUri`'s parent() implementation that would erase
    Windows drive letters
  • Emit Trusted MCP App Identity on Tool-Call Items (#27132)
    ## Summary
    
    - Add optional `appContext` to app-server MCP tool-call items with
    trusted `connectorId`, `linkId`, and `mcpAppResourceUri` metadata.
    - Preserve that context across tool-call events, persisted history,
    reconnects, and thread resume.
    - Keep the deprecated top-level `mcpAppResourceUri` temporarily for
    client migration.
    
    The consumer contract is `{ appContext: { connectorId, linkId,
    mcpAppResourceUri }, tool }`.
    
    ## Validation
    
    - Full GitHub Actions suite passes, including CLA, Bazel tests, clippy,
    release builds, and argument-comment lint.
    
    ---------
    
    Co-authored-by: martinauyeung-oai <280153141+martinauyeung-oai@users.noreply.github.com>
  • current time reminders impl for system clock (varlatency 2/n) (#28824)
    Stacked on #28822.
    
    ## Summary
    
    - add a host-injectable current-time provider with a built-in system
    implementation
    - record UTC developer reminders in history immediately before due model
    requests
    - keep cadence state per session and force a refresh after compaction
    
    This does NOT include the app server client <-> server clock logic. This
    PR is only for the reminder message & system clock that will be used in
    prod.
    
    ## Testing
    
    - `just test -p codex-core varlatency_`
    - `just clippy -p codex-core -p codex-app-server -p codex-mcp-server -p
    codex-thread-manager-sample`
    - `just fmt`
  • Support openai/form extended form elicitations (#27500)
    # Summary
    Allow App Server clients to opt into `openai/form` MCP elicitations.
  • app-server: keep the model cache warm (#28699)
    ## Why
    
    The app server is long-lived, but its shared model cache otherwise
    refreshes only when a caller needs it. Once the five-minute cache
    expires, starting a thread or calling `model/list` can wait for
    `/models` on the request path.
    
    Refresh the cache in the background before it expires so foreground
    callers normally use fresh local state.
    
    ## What changed
    
    - Start an app-server worker that refreshes models immediately and then
    every three minutes using the existing models-manager API.
    - Hold only a weak reference to the models manager between refreshes, so
    the worker does not extend its lifetime.
    - Stop scheduling refreshes when the app-server lifecycle handle is shut
    down or dropped. A refresh already in progress is allowed to finish.
    - Adjust affected app-server test fixtures to distinguish the background
    `/models` probe from the connection they are testing.
    
    The existing models-manager cache, refresh strategies, auth handling,
    ETag behavior, and concurrency semantics are unchanged.
    
    ## Testing
    
    -
    `models_refresh_worker::tests::refreshes_immediately_periodically_and_stops_when_dropped`
    -
    `suite::v2::remote_control::listen_off_honors_persisted_remote_control_enable`
    -
    `suite::v2::attestation::attestation_generate_round_trip_adds_header_to_responses_websocket_handshake`
  • Clarify model-generated and legacy app path types (#28577)
    ## Why
    
    `ApiPathString` kind of implies that it can be used anywhere we pull a
    path out of JSON, but it's not really appropriate for tool arguments
    when the model might generate relative paths.
    
    Prefer `String` for model-generated paths and we can handle the
    conversion per feature for now and define a shared abstraction later if
    it makes sense.
    
    # What
    
    Rename `ApiPathString` to `AppLegacyPathString` to clarify its role.
    
    Expand the `path-types` skill to tell the model to leave tool args as
    bare strings.
  • [tests] Keep Apps out of generic core test harness (#28508)
    ## Summary
    
    - disable the stable Apps feature in the generic `test_codex()`
    integration-test harness
    - keep Apps-specific tests explicit: their builders re-enable Apps and
    point it at a local mock server
    
    ## Why
    
    Generic tests that use dummy ChatGPT auth were also enabling the
    host-owned `codex_apps` MCP server. That made unrelated tests contact
    `chatgpt.com` and wait for MCP startup, causing the Bazel timeouts
    observed on #28368.
    
    The generic harness should be hermetic and should not start an external
    service that the test did not request. This is test-only; production
    Apps behavior is unchanged. The broader optional-MCP startup behavior is
    being handled separately in #28407.
    
    ## Testing
    
    - `just test -p codex-core -E
    'test(pre_sampling_compact_runs_when_comp_hash_changes) |
    test(model_switch_to_smaller_model_updates_token_context_window) |
    test(codex_apps_file_params_upload_local_paths_before_mcp_tool_call)'`
    - `just fix -p codex-core`
    - `just fmt`
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • feat(core): add metadata field to ResponseItem (#28355)
    ## Description
    
    This PR adds an optional `metadata` field to `ResponseItem` for
    Responses API calls. Only mechanical plumbing, no actual values
    populated and sent yet. Turns out just adding a new field to
    `ResponseItem` has quite a large blast radius already.
    
    This change is backwards compatible because `metadata` is optional and
    omitted when absent, so existing response items and rollout history
    without it still deserialize and requests that do not set it keep the
    same wire shape. For provider compatibility, we strip out `metadata`
    before non-OpenAI Responses requests so Azure and AWS Bedrock never see
    this field.
    
    My followup PR here will actually make use of it to start storing and
    passing along `turn_id`: https://github.com/openai/codex/pull/28360
    
    ## What changed
    
    - Added `ResponseItemMetadata` with optional `turn_id`, plus optional
    `metadata` on Responses API item variants and inter-agent communication.
    - Preserved item metadata through response-item rewrites such as
    truncation, missing tool-output synthesis, compaction history
    rebuilding, visible-history conversion, rollout/resume, and generated
    app-server schemas/types.
    - Strip item metadata from non-OpenAI Responses requests while
    preserving it for OpenAI-shaped requests.
    - Updated the mechanical fixture/test construction churn required by the
    new optional field.
  • [codex] exec-server honors remote environment cwd and shell (#28122)
    ## Why
    
    Next slice needed to make progress on the `remote_env_windows` test is
    to support passing a Windows cwd for the remote environment and using
    that environment's native shell. This lets the test run a real Windows
    process instead of only recording an early path or shell mismatch.
    
    ## What
    
    - change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to
    `PathUri`
    - convert local cwd values to URIs when constructing selections
    - preserve a remote primary cwd instead of replacing it with the local
    legacy fallback
    - prefer the selected environment's discovered shell for unified exec,
    falling back to the session shell when unavailable
    - convert back to a host-native absolute path at current native-only
    consumer boundaries
    - reject or deny unsupported foreign cwd values at the existing
    request-permissions boundary, with TODOs for its future migration
    - extend the hermetic Wine test to execute Windows PowerShell in
    `C:\windows` and verify successful process completion
    - record the current app-server rejection against the same Wine-backed
    remote Windows fixture when its cwd is supplied as a native Windows path
  • [codex] make PathUri::from_abs_path infallible (#27976)
    ## Why
    
    `PathUri::from_abs_path` can fail for absolute paths that do not have a
    normal `file:` URI representation, forcing filesystem call sites to
    handle a conversion error even though the original path can be preserved
    losslessly.
    
    ## What
    
    Make `from_abs_path` infallible and migrate its callers. Unrepresentable
    paths use `file:///%00/bad/path/<base64>`, encoding Unix bytes or
    Windows UTF-16LE; `to_abs_path` validates and decodes that fallback. The
    leading encoded null reserves a namespace that cannot collide with a
    real Unix or Windows path, and fallback URIs remain opaque to lexical
    path operations.
    
    ## Validation
    
    Added path-URI coverage for Unix null and non-UTF-8 paths, Windows
    device/verbatim and non-Unicode paths, serialization, malformed
    fallbacks, opaque lexical operations, invalid native payloads, and
    literal `/bad/path` collision resistance.
  • [codex] Load AGENTS.md from all bound environments (#27696)
    ## Why
    
    We already have the machinery to support multiple environments on a
    single thread, but we only show the model the contents of `AGENTS.md`
    files in the primary environment.
    
    We should show the model all of the relevant project instructions when
    we know there's more than one environment.
    
    ## Known Gaps
    
    As discussed in the RFC, this implementation:
    
    1. doesn't handle environments being added/removed to/from the thread
    after its creation
    2. it doesn't enforce an aggregate context budget across environments,
    and instead applies the configured project maximum independently to each
    environment
    
    ## Implementation
    
    - Discover project instructions in environment order with an independent
    byte budget per environment and preserve source provenance/order.
    - Keep the legacy fragment byte-for-byte when exactly one environment
    contributes project instructions; use environment-labeled sections when
    two or more environments contribute.
    - Freeze the complete rendered fragment in `LoadedAgentsMd`, insert it
    directly into requests, and recognize both layouts in contextual and
    memory filtering.
    - Add exact rendering, independent-budget, source-order,
    creation-snapshot, and consumer coverage without changing app-server
    schemas.
  • core: Consolidate Responses API Codex metadata (#27122)
    ## What
    Introduce a `CodexResponsesMetadata` struct that defines all the core
    metadata we send to Responses API. Example fields are `thread_id`,
    `turn_id`, `window_id`, etc.
    
    Going forward, `client_metadata["x-codex-turn-metadata"]` will be the
    canonical way Codex sends metadata to Responses API across both HTTP and
    websocket transports.
    
    For now, we continue to emit the existing top-level HTTP headers and
    top-level `client_metadata` fields from the same
    `CodexResponsesMetadata` struct for compatibility reasons.
    
    Also, app-server clients who specify additional
    `responsesapi_client_metadata` via `turn/start` and `turn/steer` will
    have those fields merged into
    `client_metadata["x-codex-turn-metadata"]`, but cannot override the
    reserved fields that core uses (i.e. the fields in
    `CodexResponsesMetadata`).
    
    ## Why
    
    Responses API request instrumentation is the source of truth for
    downstream Codex analytics that join requests by Codex IDs such as
    session, thread, turn, and context window. Before this change, those
    values were assembled through several request-specific paths: HTTP
    request bodies, websocket handshake headers, websocket `response.create`
    payloads, compaction requests, and the rich `x-codex-turn-metadata`
    envelope all had their own wiring.
    
    That made metadata propagation easy to drift across API-key/direct
    Responses API requests, ChatGPT-auth/proxied requests, websocket
    requests, and compaction requests. It also made additions like
    `window_id` error-prone because a field could be added to one transport
    projection but missed in another.
    
    ## What changed
    
    - Added `CodexResponsesMetadata` as the core-owned snapshot for Codex
    metadata sent to ResponsesAPI.
    - Render `client_metadata["x-codex-turn-metadata"]`, flat
    `client_metadata` projections, and direct compatibility headers from
    that same snapshot.
    - Include the known Codex-owned fields in the turn metadata blob,
    including installation/session/thread/turn/window IDs, request kind,
    lineage, sandbox/workspace metadata, timing, and compaction details.
    - Treat app-server `responsesapi_client_metadata` as enrichment for the
    Codex turn metadata blob while preventing those extras from overriding
    Codex-owned fields.
    - Use the same metadata path for normal turns, websocket prewarm, local
    compaction, remote v1 compaction, and remote v2 compaction.
    - Keep websocket connection-only preconnect metadata separate so
    handshakes carry compatibility identity headers without inventing a fake
    turn metadata blob.
    
    ## Verification
    
    - `cargo check -p codex-core`
    - `just fix -p codex-core`
  • [codex] Load user instructions through an injected provider (#27101)
    ## Why
    
    We want to remove implicit use of `$CODEX_HOME` from `codex-core` and
    make embedders responsible for supplying user-level instructions. This
    also ensures user instructions load when no primary environment is
    selected.
    
    ## What changed
    
    Stacked on #27415, which makes `codex exec` surface thread-scoped
    runtime warnings.
    
    - Added `UserInstructionsProvider` to `codex-extension-api`, with
    absolute source attribution and recoverable loading warnings.
    - Added `codex-home` with the filesystem-backed provider for
    `AGENTS.override.md` and `AGENTS.md`, preserving precedence, fallback,
    trimming, lossy UTF-8 handling, and the existing uncapped global
    instruction size.
    - Removed global instruction loading from `Config` and require
    `ThreadManager` callers to inject a provider.
    - Load provider instructions once for each fresh root runtime, including
    runtimes without a primary environment. Running sessions retain their
    snapshot, while child agents inherit the parent snapshot without
    invoking the provider.
    - Keep provider instructions separate while loading project `AGENTS.md`,
    then assemble the model-visible instructions with the existing ordering,
    source attribution, warning, and turn-context behavior.
    - Wired the Codex home provider through the CLI, app server, MCP server,
    core facade, and thread-manager sample.
    
    ## Validation
    
    - `just test -p codex-home -p codex-extension-api`
    - `just test -p codex-core agents_md`
    - `just test -p codex-core guardian`
    - `just test -p codex-app-server
    thread_start_without_selected_environment_includes_only_global_instruction_source`
    - `just test -p codex-exec warning`
    - `just bazel-lock-check`
  • [codex] migrate ExecutorFileSystem paths to PathUri (#27424)
    ## Why
    
    We're moving exec-server to use PathUri for its internal path
    representations.
    
    ## What
    
    Move `ExecutorFileSystem` APIs to use `PathUri` instead of
    `AbsolutePathBuf`. Future changes will convert higher-level parts of
    exec-server.
  • Pair thread environment settings (#26687)
    ## Why
    
    Thread cwd and environment selections are a single logical setting in
    core: updating one without the other can silently desynchronize the
    next-turn execution context. This change makes that relationship
    explicit in the internal thread settings flow while preserving the
    existing app-server public API shape.
    
    ## What changed
    
    - Moved the cwd/environment pair through internal
    `ThreadSettingsOverrides.environment_settings` instead of a top-level
    internal `cwd` field.
    - Kept `thread/settings/update` public params unchanged, with app-server
    translating top-level `cwd` into the paired internal settings shape.
    - Moved `Op::UserInput` environment overrides into thread settings so
    user turns and settings updates use the same core path.
    - Updated core, app-server, MCP, memories, sample, and test callsites to
    construct the paired settings shape.
    
    ## Verification
    
    - `git diff --check`
    - Local test run starting after PR creation.
  • fix: preserve approval sandbox decisions in unified exec (#24981)
    ## Why
    
    This PR fixes approval sandbox semantics in the unified-exec path. The
    zsh-fork runtime exposed the bug because the shell can do meaningful
    work before any intercepted child `execv(2)` exists: redirections,
    builtins, globbing, and pipeline setup all happen in the launch process.
    If the model requested `sandbox_permissions=require_escalated`, or an
    exec-policy `allow` rule explicitly bypassed the sandbox, that approved
    sandbox decision needs to be preserved for the launch path and for
    intercepted execs that use the same approval machinery.
    
    The behavior is not only about zsh fork. The production changes are in
    shared approval/escalation code, so they also affect non-zsh-fork
    intercepted exec paths that go through the same sandbox decision logic.
    The narrow intent is to preserve the approval decision while still
    keeping denied-read profiles and bounded additional-permission requests
    sandboxed.
    
    ## Production Changes
    
    - `codex-rs/core/src/tools/runtimes/unified_exec.rs`: derives a
    `launch_sandbox_permissions` value from the requested sandbox
    permissions and the runtime filesystem policy, then uses that value for
    managed-network/env setup and launch sandbox selection. This keeps full
    approval or policy-bypass decisions visible to the first unified-exec
    attempt, while still preventing a full sandbox override from discarding
    denied-read restrictions. Direct unified exec keeps the same decision
    surface; the important difference is that zsh-fork launch setup no
    longer accidentally loses the approved parent sandbox decision.
    
    - `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`: makes
    intercepted-exec escalation selection explicit for the three sandbox
    permission modes. `UseDefault` only escalates when an exec-policy
    decision allows sandbox bypass, `RequireEscalated` escalates when
    unsandboxed execution is allowed, and `WithAdditionalPermissions`
    escalates through the bounded additional-permissions path instead of
    being treated as a full unsandboxed override. Unsandboxed intercepted
    execs now also rebuild the environment as `RequireEscalated`, which
    strips managed-network proxy variables consistently with other
    unsandboxed execution.
    
    ## Test Coverage
    
    Most of the PR is tests. The new coverage verifies:
    
    - unified exec preserves parent approval and exec-policy sandbox
    decisions for zsh-fork launch selection;
    - bounded `with_additional_permissions` remains sandboxed and
    permission-profile based;
    - denied-read profiles are not weakened by parent approval;
    - explicit prompt rules still prompt for intercepted execs after the
    parent command is approved;
    - unsandboxed intercepted execs strip managed-network env vars.
    
    No documentation update is needed; this is an internal approval/sandbox
    correctness fix.
    
    
    
    
    
    ---
    [//]: # (BEGIN SAPLING FOOTER)
    Stack created with [Sapling](https://sapling-scm.com). Best reviewed
    with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24981).
    * #24982
    * __->__ #24981
  • [codex] Use standalone tools for Responses Lite (#26490)
    ## Summary
    
    Responses Lite does not execute hosted Responses tools, so models using
    it must route web search and image generation through Codex-owned
    executors & standalone Response's API endpoints.
    
    This PR is stacked on #26487.
    
    ## Validation
    
    - `cargo test -p codex-core responses_lite_ --lib`
    - `cargo test -p codex-core
    standalone_executors_remain_hidden_without_flags_or_responses_lite
    --lib`
    - `cargo test -p codex-core
    hosted_tools_follow_provider_auth_model_and_config_gates --lib`
    - `cargo test -p codex-web-search-extension -p
    codex-image-generation-extension`
    - `cargo test -p codex-app-server --test all standalone_`
    - `cargo fmt --all -- --check`
  • Require absolute cwd in thread settings (#26532)
    ## Why
    
    Thread settings cwd overrides are expected to be resolved before they
    enter core. Keeping this boundary as a plain `PathBuf` made it easy for
    core/session code to keep fallback normalization and relative-path
    resolution logic in places that should only receive an already-resolved
    cwd.
    
    This is intentionally the absolute-cwd-only slice: it does not change
    environment selection stickiness or cwd-to-default-environment fallback
    behavior.
    
    ## What changed
    
    - Changes `ThreadSettingsOverrides.cwd`,
    `CodexThreadSettingsOverrides.cwd`, and `SessionSettingsUpdate.cwd` to
    use `AbsolutePathBuf`.
    - Removes core-side cwd normalization/resolution from session settings
    updates.
    - Updates affected core/app-server test helpers and callsites to pass
    existing absolute cwd values or use `abs()` helpers.
    
    ## Validation
    
    Opening as draft so CI can start while local validation continues.
  • [codex] Preserve logical paths during AGENTS.md discovery (#26465)
    ## Intent
    
    Follow up on #26205 by avoiding unnecessary filesystem canonicalization
    during `AGENTS.md` discovery. The configured working directory is
    already absolute, and canonicalization incorrectly switches symlinked
    workspaces from their logical parent hierarchy to the target's
    hierarchy.
    
    ## User-facing behavior
    
    For a symlinked working directory such as:
    
    ```text
    test-root/
    |-- logical-repo/
    |   |-- AGENTS.md              ("logical parent doc")
    |   `-- workspace ------------> physical-repo/workspace/
    `-- physical-repo/
        |-- AGENTS.md              ("physical parent doc")
        `-- workspace/
            `-- AGENTS.md          ("workspace doc")
    ```
    
    Before this change, Codex canonicalized `logical-repo/workspace` to
    `physical-repo/workspace` before discovery. It therefore loaded
    `physical-repo/AGENTS.md` and `physical-repo/workspace/AGENTS.md`,
    ignoring the instructions from the repository through which the user
    entered the workspace.
    
    After this change, ancestor discovery walks the configured logical path,
    so Codex loads `logical-repo/AGENTS.md`. Opening
    `logical-repo/workspace/AGENTS.md` still follows the symlink through the
    host filesystem, so the workspace document is also loaded.
    `physical-repo/AGENTS.md` is not loaded.
    
    ## Implementation
    
    Use the logical absolute working directory when discovering project
    instructions and reporting instruction sources. Filesystem reads still
    follow the working-directory symlink, so an `AGENTS.md` in the target
    workspace continues to load while ancestor discovery uses the symlink's
    parents.
    
    ## Validation
    
    Added integration coverage proving that discovery loads the logical
    parent's instructions and the target workspace's instructions, but not
    the target parent's instructions.
  • Switch runtime to cloud config bundle (#24622)
    ## Summary
    
    - Adapts the moved `codex-cloud-config` crate from the legacy cloud
    requirements endpoint to the new config bundle endpoint.
    - Switches runtime consumers from `CloudRequirementsLoader` to
    `CloudConfigBundleLoader` so one shared bundle supplies cloud-delivered
    config and requirements.
    - Removes the legacy cloud requirements domain loader path.
    
    ## Details
    
    This intentionally keeps `codex-cloud-config` monolithic for review
    lineage: the previous PR establishes the crate move, and this PR shows
    the behavior change against that moved implementation. A follow-up PR
    splits the module back into focused files.
    
    The new bundle path preserves the important cloud requirements loader
    semantics where intended: account-scoped signed cache, 30 minute TTL, 5
    minute refresh cadence, retry/backoff, auth recovery, and fail-closed
    startup loading. The cached payload changes from a single requirements
    TOML string to the backend-delivered bundle, and validation rejects
    malformed config or requirements fragments before cache write/use.
  • [codex] Wait for MCP readiness in core integration tests (#24964)
    Ensures MCP-backed `codex-core` integration tests exercise initialized
    servers instead of racing server startup.
    
    I've been idly investigating a few flakes and the failure modes are much
    more confusing when a tool call fails because of a failed server start
    than when the failed server start causes the test to fail directly.
  • Add experimental turn additional context (#24154)
    ## Summary
    
    Adds experimental `additionalContext` support to `turn/start` and
    `turn/steer` so clients can provide ephemeral external context, such as
    browser or automation state, without turning that plumbing into a
    visible user prompt or triggering user-prompt lifecycle behavior.
    
    ## API Shape
    
    The parameter shape is:
    
    ```ts
    additionalContext?: Record<string, {
      value: string
      kind: "untrusted" | "application"
    }> | null
    ```
    
    Example:
    
    ```json
    {
      "additionalContext": {
        "browser_info": {
          "value": "Active tab is CI failures.",
          "kind": "untrusted"
        },
        "automation_info": {
          "value": "CI rerun is in progress.",
          "kind": "application"
        }
      }
    }
    ```
    
    The keys are opaque and caller-defined.
    
    ## Context Injection
    
    When provided, accepted entries are inserted into model context as
    hidden contextual message items, not as visible thread user-message
    items.
    
    `kind: "untrusted"` entries are inserted with role `user`:
    
    ```text
    <external_${key}>${value}</external_${key}>
    ```
    
    `kind: "application"` entries are inserted with role `developer`:
    
    ```text
    <${key}>${value}</${key}>
    ```
    
    Values are not escaped. Each value is truncated to 1k approximate tokens
    before wrapping.
    
    For `turn/start`, accepted additional context is inserted before normal
    user input. For `turn/steer`, additional context is merged only when the
    steer includes non-empty user input; context-only steers still reject as
    empty input.
    
    ## Dedupe Strategy
    
    `AdditionalContextStore` lives on session state and stores the latest
    complete additional-context map.
    
    Each `turn/start` or non-empty `turn/steer` treats its
    `additionalContext` as the current complete set of values. Entries are
    injected only when the key is new or the exact entry for that key
    changed, including `value` or `kind`. After merging, the store is
    replaced with the provided map, so omitted keys are removed from the
    retained set and can be injected again later if reintroduced.
    
    Omitting `additionalContext`, passing `null`, or passing an empty object
    resets the store to empty and injects nothing.
    
    ## What Changed
    
    - Threads experimental v2 `additionalContext` through app-server into
    core turn start and steer handling.
    - Adds separate contextual fragment types for untrusted user-role
    context and application developer-role context.
    - Uses pending response input items so additional context can be
    combined with normal user input without treating it as prompt text.
    - Adds integration coverage for start/steer flow, role routing,
    dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
    behavior, empty context-only steer rejection, external-fragment marker
    matching, and truncation.
  • Move MCP tool naming mode into manager (#21576)
    ## Why
    
    The `non_prefixed_mcp_tool_names` feature should be applied where MCP
    tools become model-visible, not by remapping names later in core.
    Keeping the decision in `McpConnectionManager` construction makes
    `ToolInfo` the single shaped view that spec building, deferred tool
    search, routing, and unavailable-tool placeholders can consume directly.
    
    This also preserves the existing external behavior while the feature is
    off, and keeps the feature-on behavior for code mode and hooks explicit
    at the manager boundary.
    
    ## What Changed
    
    - Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
    into `McpConnectionManager::new`.
    - Normalize MCP `ToolInfo` names in the manager using either
    legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
    adds `mcp__` without restoring the old trailing namespace suffix.
    - Remove the core-side MCP name remapping path so specs, tool search,
    session resolution, and unavailable-tool placeholder construction use
    the manager-provided `ToolName` values directly.
    - Keep code mode flattening on the `__` namespace separator.
    - Preserve hook compatibility by giving non-prefixed MCP hook names
    legacy `mcp__...` matcher aliases.
    - Add/adjust integration and unit coverage for non-prefixed code-mode
    behavior, hook matching with the feature on and off, and manager-level
    legacy prefixing.
    
    ## Testing
    
    - `cargo test -p codex-mcp --lib`
    - `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tools -- --nocapture`
    - `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
    - `cargo test -p codex-core --test all mcp_tool -- --nocapture`
    - `cargo test -p codex-core --test all search_tool -- --nocapture`
    - `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
    - `cargo test -p codex-core --test all
    code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
    --nocapture`
    - `cargo test -p codex-tools`
    - `cargo test -p codex-features`
  • Honor client-resolved service tier defaults (#23537)
    ## Why
    
    Model catalog responses can now advertise a nullable
    `default_service_tier` for each model. Codex needs to preserve three
    distinct states all the way from config/app-server inputs to inference:
    
    - no explicit service tier, so the client may apply the current model
    catalog default when FastMode is enabled
    - explicit `default`, meaning the user intentionally wants standard
    routing
    - explicit catalog tier ids such as `priority`, `flex`, or future tiers
    
    Keeping those states distinct prevents the UI from showing one tier
    while core sends another, especially after model switches or app-server
    `thread/start` / `turn/start` updates.
    
    ## What Changed
    
    - Plumbed `default_service_tier` through model catalog protocol types,
    app-server model responses, generated schemas, model cache fixtures, and
    provider/model-manager conversions.
    - Added the request-only `default` service tier sentinel and normalized
    legacy config spelling so `fast` in `config.toml` still materializes as
    the runtime/request id `priority`.
    - Moved catalog default resolution to the TUI/client side, including
    recomputing the effective service tier when model/FastMode-dependent
    surfaces change.
    - Updated app-server thread lifecycle config construction so
    `serviceTier: null` preserves explicit standard-routing intent by
    mapping to `default` instead of internal `None`.
    - Kept core responsible for validating explicit tiers against the
    current model and stripping `default` before `/v1/responses`, without
    applying catalog defaults itself.
    
    ## Validation
    
    - `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
    - `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
    - `cargo test -p codex-tui service_tier`
    - `cargo test -p codex-protocol service_tier_for_request`
    - `cargo test -p codex-core get_service_tier`
    - `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
    service_tier`
  • Make local environment optional in EnvironmentManager (#23369)
    ## Summary
    - make `EnvironmentManager` local environment/runtime paths optional
    - simplify constructor surface around snapshot materialization
    - rename local env accessors to `require_local_environment` /
    `try_local_environment`
    
    ## Validation
    - devbox Bazel build for touched crate surfaces
    - `//codex-rs/exec-server:exec-server-unit-tests`
    - `//codex-rs/app-server-client:app-server-client-unit-tests`
    - filtered touched `//codex-rs/core:core-unit-tests` cases
  • [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
    **Stack position:** [5 of 7]
    
    ## Summary
    
    This PR adds `Op::ThreadSettings`, a queued settings-only update
    mechanism for changing stored thread settings without starting a new
    turn. It also removes the legacy `Op::OverrideTurnContext` in the same
    layer, so reviewers can see the replacement and deletion together.
    
    ## Changes
    
    - Add `Op::ThreadSettings` for settings-only queued updates.
    - Emit `ThreadSettingsApplied` with the effective thread settings
    snapshot after core applies an update.
    - Route settings-only updates through the same submission queue as user
    input.
    - Migrate remaining `OverrideTurnContext` tests and callers to the
    queued `Op::ThreadSettings` path.
    - Delete `Op::OverrideTurnContext` from the core protocol and submission
    loop.
    
    This stack addresses #20656 and #22090.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [3 of 7] Remove UserTurn (#23075)
    **Stack position:** [3 of 7]
    
    ## Summary
    
    This PR finishes the input-op consolidation by moving the remaining
    `Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`.
    This touches a lot of files, but it is a low-risk mechanical migration.
    
    ## Stack
    
    1. [1 of 7] [Add thread settings to
    UserInput](https://github.com/openai/codex/pull/23080)
    2. [2 of 7] [Remove
    UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
    3. [3 of 7] [Remove
    UserTurn](https://github.com/openai/codex/pull/23075) (this PR)
    4. [4 of 7] [Placeholder for OverrideTurnContext
    cleanup](https://github.com/openai/codex/pull/23087)
    5. [5 of 7] [Replace OverrideTurnContext with
    ThreadSettings](https://github.com/openai/codex/pull/22508)
    6. [6 of 7] [Add app-server thread settings
    API](https://github.com/openai/codex/pull/22509)
    7. [7 of 7] [Sync TUI thread
    settings](https://github.com/openai/codex/pull/22510)
  • [codex] Remove legacy shell output formatting paths (#22706)
    ## Why
    
    The client and tool pipeline still carried compatibility code for legacy
    structured shell output. Current shell and apply_patch responses are
    already plain text for model consumption, so keeping a
    JSON-serialization path plus shell-item rewrite logic makes the request
    formatter and tests preserve a format we do not need anymore.
    
    ## What Changed
    
    - Removed the client-side shell output rewrite from
    `core/src/client_common.rs`.
    - Removed the structured exec-output formatter and the shell `freeform`
    switch so tool emitters use one model-facing formatter.
    - Collapsed apply_patch/shell serialization tests around the remaining
    plain-text output expectations and removed duplicate one-variant
    parameterized cases.
    - Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility
    input shape, but no longer treats it as a separate output-format mode.
    
    ## Validation
    
    - `cargo test -p codex-core client_common`
    - `cargo test -p codex-core shell_serialization`
    - `cargo test -p codex-core apply_patch_cli`
    - `just fix -p codex-core`
    
    ## Documentation
    
    No external Codex documentation update is needed.
  • Add user_input_requested_during_turn to MCP turn metadata (#22237)
    ## Why
    - Similar change as https://github.com/openai/codex/pull/21219
    - Without change: MCP tool calls receive
    `_meta["x-codex-turn-metadata"]` with various key values.
    - Issue: MCP servers currently do not know if user input was requested
    during the turn (Ex: Model decides to prompt the user for approval
    mid-turn before making a possibly risky tool call). MCP servers may want
    to know this when tracking latency metrics because these instances are
    inflated.
    
    ## What Changed
    - With change: MCP turn metadata now includes
    `user_input_requested_during_turn` when a model-visible
    `request_user_input` call happened earlier in the turn, propagated in
    `_meta["x-codex-turn-metadata"]`.
    - `mark_turn_user_input_requested()` is called when user input is
    requested through either MCP elicitation (`mcp.rs`) or the
    `request_user_input` tool (`mod.rs`).
    - MCP tool call `_meta` is now built immediately before execution
    (`mcp_tool_call.rs`) so user input requested earlier in the same turn,
    including within the same tool call via elicitation, is reflected in the
    metadata.
    - Normal `/responses` turn metadata headers are unchanged.
    
    ## Verification
    - `codex-rs/core/src/session/mcp_tests.rs`
    - `codex-rs/core/src/tools/handlers/request_user_input_tests.rs`
    - `codex-rs/core/src/turn_metadata_tests.rs`
    - `codex-rs/core/tests/suite/search_tool.rs`
  • Remove SSE fixture loaders (#22684)
    ## Why
    
    The Responses API test support already has structured SSE event
    builders. Keeping separate JSON fixture loaders made small mock streams
    harder to read and left an on-disk fixture for a single event.
    
    ## What changed
    
    - Removed `load_sse_fixture` and `load_sse_fixture_with_id_from_str`
    from `core_test_support`.
    - Deleted the one `tests/fixtures/incomplete_sse.json` Responses API
    fixture.
    - Replaced the remaining call sites with `responses::sse(...)` and
    existing event helpers.
    
    ## Validation
    
    - `cargo test -p codex-core --test all
    stream_no_completed::retries_on_early_close`
    - `cargo test -p codex-core --test all
    history_dedupes_streamed_and_final_messages_across_turns`
    - `cargo test -p codex-core --test all review::`
  • chore(features) rm Feature::ApplyPatchFreeform (#22711)
    ## Summary
    Removes the feature since this is effectively on by default in all cases
    where we should use it, or can be configured via models.json.
    
    ## Testing
    - [x] unit tests pass
  • Fix remote environment test fixtures (#22572)
    ## Why
    The Docker remote-env coverage was failing before it reached the
    behavior those tests are meant to exercise. The remote-aware test
    fixture only registered the remote environment, so tests that
    intentionally select both `local` and `remote` could not start a turn.
    After that was fixed, two tests exposed stale fixtures: the approval
    test was auto-approving under workspace-write, and the remote
    `view_image` test was writing invalid PNG bytes.
    
    ## What Changed
    - Added `EnvironmentManager::create_for_tests_with_local(...)` so tests
    can keep the provider default while also selecting `local` explicitly.
    - Updated `build_remote_aware()` to use that test-only manager when a
    remote exec-server URL is present.
    - Changed the remote apply-patch approval helper to use
    `SandboxPolicy::new_read_only_policy()` so the test actually exercises
    approval caching per environment.
    - Replaced the hardcoded remote `view_image` PNG blob with the existing
    `png_bytes(...)` helper so the test uses a valid image fixture.
    
    ## Validation
    Ran these isolated Docker remote-env tests on the devbox with
    `$remote-tests` setup:
    -
    `suite::remote_env::apply_patch_freeform_routes_to_selected_remote_environment`
    -
    `suite::remote_env::apply_patch_approvals_are_remembered_per_environment`
    -
    `suite::remote_env::apply_patch_intercepted_exec_command_routes_to_selected_remote_environment`
    -
    `suite::remote_env::exec_command_routes_to_selected_remote_environment`
    - `suite::view_image::view_image_routes_to_selected_remote_environment`
    
    All five pass.