3916 Commits

  • Allow codex sandbox to consume MCP sandbox state (#29358)
    ## Summary
    
    - let `codex sandbox` accept the JSON value from
    `codex/sandbox-state-meta`
    - require the payload `permissionProfile` instead of falling back to
    ambient permissions
    - reuse the existing macOS, Linux, and Windows launch paths, treating
    external sandbox state conservatively as read-only
    - let opaque forwarders add runtime read roots and disable direct
    network access without decoding the payload
    
    Builds on #29113, which is now on `main`.
    
    ## Tests
    
    - `just test -p codex-cli debug_sandbox::tests`
    - `cargo build -p codex-rmcp-client --bin test_stdio_server`
    - `just test -p codex-core
    stdio_mcp_tool_call_includes_sandbox_state_meta`
    - `just test -p codex-mcp`
    - `just fmt`
  • Centralize Codex Apps client handling (#29528)
    ## Why
    
    Codex Apps-specific behavior is currently distributed across cache
    helpers, startup, tool conversion, and model-visible annotation. Each
    layer independently checks the reserved server name, which obscures the
    boundary between trusted host-owned connector metadata and regular MCP
    server data.
    
    Classifying the server once when `AsyncManagedClient` is created gives
    the client a single source of truth and makes the two processing paths
    explicit.
    
    ## What changed
    
    - Record whether an `AsyncManagedClient` represents the Codex Apps
    server at construction time.
    - Route startup cache loading, cache persistence, and cache telemetry
    through the Codex Apps branch.
    - Split uncached tool conversion between Codex Apps normalization and
    regular MCP metadata sanitization.
    - Split model-visible schema and plugin provenance handling along the
    same boundary.
    - Remove redundant server-name guards from helpers that are now called
    only from the Codex Apps branch.
    
    ## Verification
    
    - Preserve behavioral coverage that verifies Codex Apps connector
    metadata and the complete converted `ToolInfo` shape.
    
    ## Stack
    
    Depends on #29518.
  • [codex] Use input items for Responses Lite tools (#27946)
    When using Responses Lite, we should all use `additional_tools` and a
    developer item instead of the top level tools array & instructions
    field. This keeps things 1-to-1.
    
    Forced namespacing for _all_ tools will land in a following PR after
    some coordination & fixes in Responses API (around collisions & return
    items).
    
    The goal is to eventually expand the scope of this to _all_ requests
    from codex, but that will require larger coordination across providers &
    slower rollout.
  • Remove redundant Codex Apps manager flag (#29518)
    ## Why
    
    Codex Apps server admission is already decided before
    `McpConnectionManager` is constructed. `effective_mcp_servers` and
    `effective_mcp_servers_from_configured` remove the server when the apps
    feature or required authentication is unavailable, so storing the same
    decision on the manager duplicates state that can drift from the
    effective server map.
    
    ## What changed
    
    - Remove `host_owned_codex_apps_enabled` from `McpConnectionManager` and
    its constructor.
    - Identify the host-owned Codex Apps server by its reserved server name
    once it is present in the effective server map.
    - Remove the now-unused flag calculations and constructor arguments from
    production and test callsites.
  • [codex] stylistic changes (#29068)
    ## Summary
    
    - express remote compaction result handling as an exhaustive match
    - preserve the special `TurnAborted` path without emitting a generic
    compaction error
    - rely on the standard `test_codex` provider setup in the compaction
    budget test
    
    Follow-up to review feedback on #28707.
    
    ## Testing
    
    - `just test -p codex-core
    compaction_budget_exhaustion_aborts_without_error_or_retry`
    - `just fmt`
  • [codex] Expose service tier and reasoning effort in OTEL (#29155)
    ## Summary
    
    NVIDIA asked to measure Fast mode usage and reasoning effort from Codex
    CLI OTEL logs. Add the finalized `service_tier` and
    `model_reasoning_effort` to the existing `codex.sse_event`
    `response.completed` record.
    
    This intentionally reuses the existing completion event and leaves
    transport APIs and shared telemetry plumbing unchanged.
    
    ## Testing
    
    - `cargo build -p codex-cli --bin codex`
    - `just test -p codex-core responses_api_emits_api_request_event`
    - End-to-end with the built CLI and a local OTLP/HTTP collector:
    - Fast/high emitted `service_tier=priority` and
    `model_reasoning_effort=high` with token usage.
    - Standard/low omitted `service_tier` and emitted
    `model_reasoning_effort=low` with token usage.
  • Propagate safety buffering treatment metadata (#29473)
    ## Summary
    
    - read the request-scoped safety-buffering treatment from HTTP response
    headers and per-turn WebSocket metadata through one shared header parser
    - combine that treatment with Responses API safety-buffering signals
    - propagate `showBufferingUi` and nullable `fasterModel` through the
    existing `model/safetyBuffering/updated` app-server notification
    - update the app-server documentation and generated JSON and TypeScript
    schemas
    
    The public implementation contains no model mapping or real model
    identifier. Tests and protocol examples use generic `current-model` and
    `faster-model` placeholders only.
    
    ## Dependencies
    
    - server-side treatment evaluation:
    https://github.com/openai/openai/pull/1060247
    - initial Responses API safety-buffering propagation:
    https://github.com/openai/codex/pull/29371
    - Codex App UI: https://github.com/openai/openai/pull/1057789
    
    ## Validation
    
    - Codex API tests: 129 passed
    - focused Codex core safety-buffering integration test passed
    - app-server protocol tests passed after regenerating schema fixtures
    - Clippy fix and repository formatting completed successfully
    
    The broader app-server run compiled all changed crates and completed
    with 1,269 passing tests. Its remaining failures were unrelated
    environment limitations: macOS sandbox application was denied, one
    expected test binary was unavailable, and several existing subprocess
    tests timed out as a result.
  • mcp: accept foreign absolute cwd for remote stdio (#29493)
    ## Why
    
    Remote stdio MCP servers can run in an environment whose path convention
    differs from the Codex host. A Windows cwd such as
    `C:\Users\openai\share` is absolute for the executor but was rejected by
    a POSIX orchestrator.
    
    Built on #29501, now merged, which only clarifies the host-native
    `PathUri` constructor name.
    
    ## What changed
    
    - Deserialize MCP cwd values as `LegacyAppPathString` so config does not
    apply host path rules.
    - Interpret that spelling as host-native for local launches and convert
    it to `PathUri` at executor launch.
    - Skip host filesystem and command resolution checks for remote stdio in
    `codex doctor`.
    - Add host-independent config and executor-boundary coverage using the
    foreign path convention for each test platform.
    
    ## Validation
    
    - `just test -p codex-utils-path-uri -p codex-config -p codex-mcp -p
    codex-rmcp-client` (408 passed)
    - `just test -p codex-cli -p codex-rmcp-client` (372 passed)
    - `cargo check --workspace --tests`
    - `just test` (11,311 passed; 43 unrelated environment/timing failures)
    - `just fix -p codex-cli -p codex-config -p codex-core -p codex-mcp -p
    codex-mcp-extension -p codex-rmcp-client -p codex-tui`
  • chore: improve expired Bedrock credential errors (#28992)
    ## Why
    
    Amazon Bedrock returns a `401 Unauthorized` response containing
    `Signature expired:` when an AWS credential, including a short-lived
    `AWS_BEARER_TOKEN_BEDROCK`, has expired. Codex currently surfaces that
    response as a generic `unexpected status` error, which does not explain
    how to recover.
    
    Environment-provided bearer tokens cannot be refreshed automatically, so
    the error should direct users to refresh their AWS credentials or
    replace or remove the environment token and restart Codex. This
    classification belongs to the Amazon Bedrock provider so similar
    responses from other providers retain their existing behavior.
    
    ## What changed
    
    - Add a synchronous `ModelProvider::map_api_error` hook that defaults to
    the existing provider-neutral API error mapping, and route model
    request, stream, WebSocket, and terminal unauthorized errors through the
    active provider.
    - Override the hook for Amazon Bedrock. After preserving the structured
    status, body, URL, and request metadata, recognize `401` responses
    containing `Signature expired:` and attach actionable credential
    guidance.
    - Keep `codex-protocol` provider-neutral by representing the guidance as
    an optional `user_message`. Error rendering prefers this message while
    continuing to append the URL, request ID, Cloudflare ray, and
    authorization diagnostics.
    - Add model-provider coverage for expired signatures and negative cases,
    core coverage for provider dispatch after unauthorized recovery, and a
    TUI snapshot for the rendered error.
    
    ## Testing
    Tested with a real request with expired bedrock key:
    <img width="962" height="126" alt="Screenshot 2026-06-22 at 3 56 51 PM"
    src="https://github.com/user-attachments/assets/7e21cc7c-798e-4662-8467-7f304a2f2b59"
    />
  • fix: world state response item test (#29504)
    seems to be a merge conflict on main:
    
    > pakrym-oai introduced the stale initializer in commit 3b32d861c5, PR
    #29249.
    > Context: Owen Lin renamed metadata to
    internal_chat_message_metadata_passthrough in PR #28968. PR #29249 then
    landed afterward with the old field name, causing the compile/Clippy
    failure.
  • path-uri: clarify host-native path conversion (#29501)
    ## Why
    
    Downstream refactors are producing confusing code with this
    functionality having a very generic name. Encoding the specific
    conversion approach in the method name makes it clearer.
    
    ## What
    
    Rename `PathUri::from_path` to `PathUri::from_host_native_path` and
    update its Rust call sites.
  • [codex] Use tool search for MCP tools by default (#29486)
    ## Why
    
    MCP tools were only placed behind `tool_search` when a feature flag was
    enabled or when there were at least 100 tools. That made the model's
    tool flow depend on both rollout configuration and the number of
    installed tools.
    
    The searched-tool flow is now the intended behavior. Making it
    unconditional when the model and provider support it gives every
    supported setup the same behavior and lets us retire the feature flag
    safely.
    
    ## What changed
    
    - Defer all effective MCP tools when `tool_search` and namespaced tools
    are supported.
    - Keep exposing MCP tools directly when search cannot be used, so older
    or unsupported model/provider combinations still work.
    - Mark `tool_search_always_defer_mcp_tools` as removed and ignore old
    configured values.
    - Keep plugin filtering, app-only filtering, file handling, and MCP
    calls working through the searched-tool flow.
    
    ## Why many tests changed
    
    Many tests used to act as if the model could see MCP tools in its first
    request and call them immediately. That is no longer the real flow: the
    model first receives `tool_search`, searches for a tool, receives the
    matching MCP tool, and then calls it in the next request.
    
    The tests therefore needed an extra search step, and checks for tool
    names, descriptions, and input fields had to move from the first request
    to the search result. These are not separate product changes; they make
    the tests follow what the model will actually see after this change. The
    plugin tests still check which tools are allowed and where they came
    from, the file tests still check upload fields and behavior, and the MCP
    round-trip test still checks a successful call from start to finish.
    
    ## Tests
    
    - `just test -p codex-features`
    - Focused `codex-core` tests for MCP exposure and tool planning
    - `just test -p codex-core explicit_plugin_mentions`
    - `just test -p codex-core stdio_server_round_trip`
    - Focused `codex-core` tests for tool search, app-only tools, and MCP
    file uploads
  • feat(core): store turn_id on ResponseItem metadata (#28360)
    ## Description
    
    This PR is a followup to https://github.com/openai/codex/pull/28355 and
    starts assigning `internal_chat_message_metadata_passthrough.turn_id` to
    durable Responses API items created during a turn.
    
    The goal is that those items keep the `turn_id` that introduced them
    when Codex resends stateless HTTP context, reconstructs history for
    resume/fork paths, or reuses websocket response state.
    
    ## What changed
    
    - Set `internal_chat_message_metadata_passthrough.turn_id` when missing
    as response items enter durable history, initial/replacement history,
    inter-agent communication history, and local compaction summaries.
    - Preserve existing item turn IDs instead of overwriting them during
    persistence, resume reconstruction, compaction, forked history, and
    websocket incremental reuse.
    - Keep `compaction_trigger` fieldless because it is a request control,
    not a durable response item.
    - Update focused history/request assertions and fixtures for stateless
    requests, websocket incrementals, compaction, thread injection, prompt
    debug, and related CI coverage.
  • [codex] replace remote images with model-visible error text (#29417)
    ## What
    
    This PR will extend the existing centralized image-preparation path to
    replace HTTP(S) image inputs with a model visible error message. It
    won't "ruin" and break existing rollouts, but it will deprecate support
    for the pathway. App server clients should no longer use HTTP image urls
    if they'd like to upgrade.
    
    The HTTP image url pathway is currently resolved in the responsesapi. It
    is slow and not reccomended.
    
    ## Behavior
    
    - HTTP(S) image URL: replace with `input_text`
    - data URL: use the existing decode and resize path
    - other image URL schemes: leave unchanged
    
    This intentionally does not change app-server ingress. That validation
    remains a follow-up.
    
    ## Test plan
    
    - `just test -p codex-core -E
    'test(/image_preparation|prepares_image_failures_before_history_insertion|prepares_resumed_history_before_installing_it|responses_lite_prepares_images/)'`
    — 7 passed
    - `just fix -p codex-core`
    - `just fmt`
  • core: wrap token budget window context (#29494)
    Token-budget initial context carries thread and context-window lineage
    that the model should treat as one structured context-window block.
    Wrapping it in `<context_window>` makes that boundary explicit while
    preserving the existing window id content.
    
    Before this change, the window identifiers were injected as an untagged
    developer text fragment:
    
    ```text
    Thread id <THREAD_ID>.
    First context window id: <FIRST_WINDOW_ID>
    Current context window id: <WINDOW_ID>
    Previous context window id: <PREVIOUS_WINDOW_ID>
    ```
    
    After this change, the same payload is wrapped as a context-window
    block:
    
    ```text
    <context_window>
    Thread id: <THREAD_ID>
    First context window id: <FIRST_WINDOW_ID>
    Current context window id: <WINDOW_ID>
    Previous context window id: <PREVIOUS_WINDOW_ID>
    </context_window>
    ```
    
    This adds shared `CONTEXT_WINDOW_*_TAG` protocol constants, updates
    `TokenBudgetContext` to render with those markers, treats the new
    wrapper as contextual developer content when mapping history, and
    refreshes the token-budget request-shape assertions and snapshot.
    
    Verification:
    - `just test -p codex-core token_budget`
    - `just test -p codex-core
    recognizes_context_window_as_contextual_developer_content`
  • [codex] migrate environment context to model world state (#29249)
    ## Why
    
    Environment context is model-visible state, but it is currently
    assembled from transient turn values and diffed through
    environment-specific paths. That makes initial injection, turn-to-turn
    updates, and changes that happen within a turn use different baselines.
    
    This PR introduces the smallest useful model world-state slice:
    environments only, with one in-memory baseline and one renderer for full
    state and diffs.
    
    ## What changed
    
    - Add a typed `WorldState` container whose sections render fragments
    relative to an optional previous value. Full rendering uses the same
    diff path with no previous state.
    - Replace the parallel `EnvironmentContext` representation with an
    `EnvironmentsState` section keyed by environment ID and rendered in
    deterministic order.
    - Preserve the legacy single-environment output while supporting
    multiple environments, starting environments, unavailable tombstones,
    and changes to persisted turn-context values.
    - Store the latest complete `WorldState` on `ContextManager` and use it
    for both turn-boundary and mid-turn environment diffs.
    - Build initial and post-compaction context from the same world-state
    builder, then retain the rendered state as the next baseline.
    - Seed the in-memory baseline from the latest `TurnContextItem` when
    resuming an existing rollout; the world state itself is not serialized.
    - Keep non-world settings updates on their existing path and merge
    rendered world-state fragments at the session consumer.
    
    ## Known limitation
    
    A legacy `TurnContextItem` only reconstructs the primary environment as
    `local`; it cannot faithfully recover a remote-primary environment ID
    after resume. Live state uses the exact environment IDs once a complete
    baseline is established.
    
    ## Test plan
    
    - `just test -p codex-core world_state`
    - `just test -p codex-core record_context_updates`
    - `just test -p codex-core deferred_executor_`
    - `just test -p codex-core build_initial_context`
    - `just test -p codex-core rollout_reconstruction`
    - `just test -p codex-core
    process_compacted_history_reinjects_full_initial_context`
  • Register full CDP requirements feature (#28769)
    register cdp requirements feature flag
  • fix(config): address permission profile review follow-ups (#29479)
    ## Summary
    
    - rename `Config::permission_profile_allowed` to
    `is_permission_profile_allowed`
    - use `BUILT_IN_PERMISSION_PROFILE_DANGER_FULL_ACCESS` in the TUI and
    its assertion
    - follow up on the late review comments from #26678
    
    The previous `:danger-no-sandbox` value was an invalid built-in profile
    ID. #26678 corrected it to `:danger-full-access`; this PR centralizes
    the value to prevent future drift.
    
    ## Testing
    
    - Not run per request; `cargo fmt` only
    
    Co-authored-by: Codex <noreply@openai.com>
  • permission profiles: expose availability to clients (#26678)
    ## Why
    
    `permissionProfile/list` currently advertises every built-in and
    configured profile even when effective enterprise requirements prevent
    selecting it. That forces each client to reconstruct policy from
    lower-level requirement fields, which is easy to miss and difficult to
    keep consistent.
    
    The catalog should remain complete so clients can explain that an option
    was disabled by an administrator, while also reporting whether each
    profile is selectable.
    
    ## What
    
    - Add an `allowed` field to each permission profile summary.
    - Build a shared catalog from the effective config and current
    requirements, including `allowed_sandbox_modes`, `allowed_permissions`,
    and filesystem restrictions.
    - Use the shared catalog in app-server and the TUI so disallowed
    profiles remain visible but cannot be selected.
    - Use the canonical `:danger-full-access` profile ID in the TUI.
    - Update the app-server schemas, API documentation, behavioral tests,
    and TUI snapshots.
    
    ## Scope
    
    This PR targets `main` directly and is independent of #24852. It
    preserves the current behavior where built-in profiles are constrained
    by sandbox-mode requirements and `allowed_permissions` applies to
    configured profiles.
    
    ## Testing
    
    - `just test -p codex-core
    permission_profile_catalog_marks_profiles_disallowed_by_requirements`
    - `just test -p codex-app-server permission_profile_list`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-tui profile_permissions`
    - `just fix -p codex-core`
    - `just fix -p codex-app-server-protocol`
    - `just fix -p codex-app-server`
    - `just fix -p codex-tui`
    - `just fmt`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
    Co-authored-by: Joey Trasatti <joey.trasatti@openai.com>
  • [codex] configure rollout budget reminder thresholds (#29423)
    ## Summary
    
    Instead of:
    
        reminder_interval_tokens = 65_536
    
    allow users to configure explicit remaining-token reminder thresholds:
    
    reminder_at_remaining_tokens = [65_536, 32_768, 16_384, 8_192, 4_096,
    2_048, 1_024, 512]
    
    ## Validation
    
    - CARGO_INCREMENTAL=0 just test -p codex-core rollout_budget: 9 passed
    - just fix -p codex-core
    - just fmt
  • PAC 2 - Add shared auth system proxy contract (#26707)
    ## Summary
    
    Stacked on #26706.
    
    Adds the shared auth/system-proxy contract that later platform resolver
    PRs plug into. This PR moves Codex-owned auth and startup HTTP clients
    through a common route-aware boundary, but does not yet add Windows or
    macOS system proxy resolution.
    
    The default path remains unchanged when `respect_system_proxy` is absent
    or disabled.
    
    ## Implementation
    
    - Adds `codex-client/src/outbound_proxy.rs` with the shared
    route-selection model:
      - `OutboundProxyConfig`;
      - `ClientRouteClass`;
      - `RouteFailureClass`;
      - `build_reqwest_client_for_route`.
    - Preserves the existing reqwest/default-client behavior when no route
    config is supplied.
    - Uses the fixed MVP routing policy when route config is supplied:
    platform system/PAC/WPAD discovery, then explicit env proxy variables,
    then direct connection.
    - Keeps platform-specific system discovery behind the shared client
    boundary. This PR provides the contract and fallback behavior; later
    resolver PRs plug in Windows and macOS discovery.
    - Adds `login::AuthRouteConfig` so auth call sites depend on a small
    policy type instead of platform resolver details.
    - Maps the resolved `Config.respect_system_proxy` boolean into
    `AuthRouteConfig` for auth-owned clients.
    - Wires the route config through browser login, device-code login,
    access-token login, login status, logout/revoke, token refresh, API-key
    exchange, app-server account login, TUI/app startup, cloud-config
    bootstrap, cloud tasks, plugin auth, and exec startup config loading.
    
    ## End-user behavior
    
    - No behavior changes by default.
    - When `respect_system_proxy = true`, auth-owned clients opt into the
    shared route-aware client path.
    - On platforms without a resolver implementation in this PR, system
    discovery is unavailable and the route-aware path falls back to explicit
    env proxy handling, then direct connection.
    - Custom CA handling remains separate from proxy route selection and
    still runs through the shared client builder.
    - No proxy URLs, PAC contents, or resolved platform details are exposed
    through the public config surface introduced here.
    
    ## Tests
    
    Adds or updates coverage for:
    
    - preserving default auth-client fallback behavior when no route config
    is provided;
    - injected environment-proxy fallback without mutating process
    environment;
    - existing login-server E2E flows using explicit `auth_route_config:
    None` to guard unchanged default behavior;
    - updated auth manager, login, logout, cloud-config, startup, and
    plugin-auth call sites passing route config explicitly.
  • core: remove unused permissions cwd plumbing (#29468)
    ## Why
    
    `compile_scoped_filesystem_pattern()` accepted a `_policy_cwd` parameter
    even though scoped glob compilation no longer uses the policy working
    directory. Keeping that unused argument forced the surrounding
    permissions compilation path to keep forwarding `policy_cwd` through
    call sites that did not need it, making the API look more dependent on
    cwd resolution than it is.
    
    ## What changed
    
    Removed the unused cwd parameter from
    `compile_scoped_filesystem_pattern()` and the callers that only
    forwarded it: `compile_filesystem_permission()`,
    `compile_permission_profile()`, and
    `compile_permission_profile_selection()`. Workspace root resolution
    still keeps `policy_cwd`, because that path still resolves relative
    roots against the active policy cwd.
    
    Relevant code:
    [`codex-rs/core/src/config/permissions.rs`](https://github.com/openai/codex/blob/b8b9816102e064dae4488ec130cf560f63c1ab78/codex-rs/core/src/config/permissions.rs#L346).
    
    ## Verification
    
    - `just test -p codex-core config::permissions`
    - `just test -p codex-core` was also run after building
    `test_stdio_server`; it passed the touched permissions coverage but
    still reported unrelated existing failures in `cli_stream` and shell
    snapshot tests.
  • [codex] Start the guardian child session when parent session is started (#27982)
    ## Why
    
    The first auto-review currently creates its Guardian child session on
    demand, adding avoidable latency before the review can begin. Creating
    the ordinary Guardian child during parent-session initialization lets
    that child use the existing session startup WebSocket prewarm before the
    first escalation. This does not introduce a Guardian-specific prewarm
    mechanism.
    
    ## What changed
    
    - initialize the existing Guardian review-session manager owned by
    `Session` when a thread starts with auto-review enabled and an approval
    policy that routes to Guardian
    - use the standard Guardian child-session construction and the existing
    session startup WebSocket prewarm
    - preserve the existing reuse-key invalidation and lazy creation
    fallback when startup initialization fails or the effective review
    configuration changes
    - add an integration test that verifies normal root-session startup
    emits a Guardian `generate=false` prewarm request
    
    ## Benchmark
    
    I compared release builds against main. Each prompt first ran a
    non-escalated `sleep 3`, then requested an escalated marker command.
    
    | binary | count | avg Guardian duration | median Guardian duration |
    avg Guardian TTFT |
    |---|---:|---:|---:|---:|
    | origin-main | 10 | 4008.7 ms | 3949.5 ms | 3746.5 ms |
    | session-fix | 10 | 2865.0 ms | 2594.0 ms | 2492.7 ms |
    
    Guardian duration fell by 28.5% and Guardian TTFT fell by 33.5%. These
    measurements cover Guardian review latency; they do not measure parent
    thread-start latency.
  • core: rename metadata -> internal_chat_message_metadata_passthrough (#28968)
    ## Description
    This PR cuts Codex over from generic `ResponseItem.metadata` (introduced
    here: https://github.com/openai/codex/pull/28355) to
    `ResponseItem.internal_chat_message_metadata_passthrough`, which is the
    blessed path and has strongly-typed keys.
    
    For now we have to drop this MAv2 usage of `metadata`:
    https://github.com/openai/codex/pull/28561 until we figure out where
    that should live.
  • Report remote sandbox denials semantically (#29424)
    ## Why
    
    #29113 moved remote sandbox setup and enforcement to the exec server.
    That gives the executor ownership of the platform-specific work: a Linux
    executor chooses and runs a Linux sandbox even when the Codex
    orchestrator is running on macOS or Windows.
    
    It also means the orchestrator no longer knows which concrete sandbox
    the executor selected. When that sandbox blocks a remote command, the
    orchestrator currently sees only a failed process and can treat the
    denial as an ordinary command failure. The existing sandbox approval and
    retry path is then skipped.
    
    This PR lets the executor report one portable fact:
    
    > This command probably failed because the executor sandbox blocked it.
    
    The executor keeps its concrete sandbox type private. The protocol sends
    only the semantic result.
    
    ## Example
    
    Suppose a local macOS Codex session asks a Linux devbox to write outside
    the allowed workspace.
    
    Before this PR:
    
    ```text
    Linux sandbox blocks the write
        -> remote process exits with "Permission denied"
        -> local orchestrator sees an ordinary command failure
        -> the normal sandbox approval and retry path can be skipped
    ```
    
    With this PR:
    
    ```text
    Linux sandbox blocks the write
        -> executor reports sandboxDenied: true
        -> unified exec returns UnifiedExecError::SandboxDenied
        -> the existing approval prompt is shown
        -> an approved retry runs through the existing unsandboxed retry path
    ```
    
    ## What changes
    
    ### The executor remembers its selected sandbox
    
    The prepared remote process now retains the executor-selected
    `SandboxType`. This value never crosses the executor boundary.
    
    Commands started without a sandbox retain `SandboxType::None` and are
    never reported as sandbox denials.
    
    ### The executor uses the existing denial heuristic
    
    The existing local denial heuristic moves from `codex-core` into the
    shared `codex-sandboxing` crate.
    
    When a sandboxed remote process exits, the executor:
    
    1. waits the same short output grace period used by local unified exec;
    2. reads the output currently available in the existing retained output
    buffer;
    3. runs the existing heuristic using the exit code and common denial
    messages;
    4. stores the yes/no result before publishing the process exit.
    
    This deliberately matches the old local unified-exec behavior. It does
    not add a new streaming classifier, another output buffer, or stronger
    output-retention guarantees.
    
    ### The protocol reports a portable boolean
    
    `process/read` gains `sandboxDenied`:
    
    ```json
    {
      "exited": true,
      "exitCode": 1,
      "closed": false,
      "sandboxDenied": true
    }
    ```
    
    The field defaults to `false` when an older executor omits it. The
    response does not expose the executor sandbox implementation or
    executor-native paths.
    
    ### Unified exec uses the existing error path
    
    The exec-server client carries `sandboxDenied` into the unified process
    state. If it is true, unified exec returns the existing `SandboxDenied`
    error instead of trying to classify remote output using an
    orchestrator-side sandbox type.
    
    Remote process exit remains visible as soon as the process exits. This
    PR does not wait for stdout or stderr to close and does not change the
    existing process lifecycle.
    
    ## Scope
    
    This PR is intentionally limited to matching the existing local
    unified-exec behavior for the initial command execution path.
    
    It does not add:
    
    - incremental denial tracking across the full output stream;
    - new denial handling for commands completed later through
    `write_stdin`;
    - new guarantees for preserving the semantic flag during the narrow
    reconnect-recovery race.
    
    Those can be considered separately if the same behavior is added for
    local execution.
    
    ## Test coverage
    
    One remote end-to-end integration test covers the complete intended
    flow:
    
    ```text
    remote read-only sandbox
        -> denied write
        -> executor reports the denial
        -> Codex requests approval
        -> user approves
        -> retry succeeds on the remote executor
    ```
    
    Existing lifecycle coverage continues to verify that remote process exit
    is reported before late output streams close.
  • [codex] Centralize Plugin Analytics Metadata (#27102)
    This PR moves construction of `PluginTelemetryMetadata` from loader and
    model helpers into `PluginsManager`, which already owns installed plugin
    state and will eventually perform remote identity enrichment. The
    metadata type remains in `codex-plugin`, and serialized analytics events
    remain unchanged.
    
    ## Before
    
    ```mermaid
    flowchart LR
        subgraph Events["Analytics event paths"]
            direction TB
            Lifecycle["Local install / uninstall"]
            Config["Enable / disable"]
            Remote["Remote install"]
            Used["Plugin used"]
        end
    
        subgraph Construction["Metadata construction"]
            direction TB
            Loader["Loader telemetry helpers"]
            Summary["PluginCapabilitySummary::telemetry_metadata"]
            Override["Caller adds remote_plugin_id"]
        end
    
        Metadata["PluginTelemetryMetadata"]
    
        Lifecycle --> Loader
        Config --> Loader
        Remote --> Loader
        Loader -->|"local events"| Metadata
        Loader -->|"remote install"| Override
        Override --> Metadata
        Used --> Summary
        Summary --> Metadata
    ```
    
    Telemetry metadata was constructed through loader helpers, a
    capability-summary method, and a remote-install call-site override.
    
    ## After
    
    ```mermaid
    flowchart LR
        subgraph Events["Analytics event paths"]
            direction TB
            Lifecycle["Local install / uninstall"]
            Config["Enable / disable"]
            Remote["Remote install"]
            Used["Plugin used"]
        end
    
        Manager["PluginsManager — single construction owner"]
        Metadata["PluginTelemetryMetadata"]
    
        Lifecycle --> Manager
        Config --> Manager
        Remote -->|"authoritative remote ID"| Manager
        Used -->|"capability summary"| Manager
        Manager --> Metadata
    ```
    
    Every analytics path delegates metadata construction to
    `PluginsManager`. Remote install still supplies its authoritative
    backend ID explicitly.
    
    ## What Changes
    
    - Make loader code return a focused plugin capability summary instead of
    constructing analytics metadata.
    - Centralize immutable plugin telemetry metadata construction in
    `PluginsManager`.
    - Route local install/uninstall, remote install, enable/disable, and
    plugin-used emitters through the manager.
    - Preserve the current serialized analytics contract exactly.
    
    Normal metadata still has no remote override. Remote install continues
    to provide its authoritative backend ID explicitly, so the existing
    serializer continues reporting that ID through `plugin_id`.
    Snapshot-based enrichment is intentionally deferred to the final PR.
    
    ## Testing
    
    - `just test -p codex-core-plugins` (238 tests passed)
    - `just test -p codex-plugin` (3 tests passed)
    - Scoped Clippy/compile checks passed for `codex-plugin`,
    `codex-core-plugins`, `codex-app-server`, and `codex-core`.
    
    ## Split Overview
    
    ```text
    main
    ├── #27093  Debug analytics capture                 (merged)
    ├── #27099  Non-mutating plugin smoke               (merged)
    ├── #27100  Remote install/uninstall smoke          (merged)
    └── #27102  Plugin telemetry metadata refactor      ← you are here
        └── #27669  Persist remote plugin identity
    
    After #27102 and #27669 merge:
    └── Final PR: add explicit local and remote IDs to plugin analytics
    ```
    
    Review order and dependencies:
    
    1. [#27093 Add debug-only analytics event
    capture](https://github.com/openai/codex/pull/27093) (merged)
    2. [#27099 Add a plugin analytics smoke
    workflow](https://github.com/openai/codex/pull/27099) (merged)
    3. [#27100 Add a remote plugin analytics mutation smoke
    workflow](https://github.com/openai/codex/pull/27100) (merged)
    4. This metadata refactor, independent and based on `main`
    5. [#27669 Persist remote plugin
    identity](https://github.com/openai/codex/pull/27669), stacked on this
    PR
    6. Final remote-ID behavior PR, created after the prerequisites merge
    
    The original [#26281](https://github.com/openai/codex/pull/26281)
    remains open as the aggregate reference until the final replacement PR
    is published.
  • remove flag for image preparation (#29429)
    ## What
    
    - make Fjord's centralized response-item image preparation unconditional
    for new and resumed history
    - have local user images and `view_image` outputs always defer decoding
    and resizing to that path
    - retain `resize_all_images` as an ignored, removed compatibility key
    for released clients
    - delete the flag-off producer paths and obsolete policy-specific tests
    
    ## Why
    
    Centralized preparation is now the intended image path. Keeping the
    runtime feature checks also kept two image-processing implementations
    alive and allowed client config to select the legacy behavior.
    
    This is a clean replacement for #28975, rebuilt from the latest `main`.
    
    ## How
    
    `prepare_response_items` now runs whenever items enter history and
    whenever persisted history is reconstructed. Producers emit deferred
    image data, so malformed images become the existing model-visible
    placeholder instead of failing the session at the producer.
    
    ## Test plan
    
    - `just fmt`
    - `just fix -p codex-core -p codex-features`
    - `just test -p codex-features` — 52 passed
    - focused affected `codex-core` set — 20 passed
    - `just test -p codex-core handle_accepts_explicit_high_detail` — 1
    passed
    - full `just test -p codex-core` attempt — 2,723 passed; 88 unrelated
    environment failures from read-only `~/.codex` SQLite state and
    unavailable integration helper binaries
  • fix(core): restore thread_source in x-codex-turn-metadata (#29455)
    ## Description
    
    Restore `thread_source` in `x-codex-turn-metadata`.
    
    Inadvertently removed `thread_source` from `x-codex-turn-metadata` in
    https://github.com/openai/codex/pull/27122 - didn't realize it was a
    top-level thread app-server API field, not passed in
    `responsesapi_client_metadata`.
    
    This also reserves the key so `responsesapi_client_metadata` cannot
    override it.
  • core: refresh environment context before sampling (#29073)
    ## Why
    
    Nonblocking environment snapshots allow a turn to reach the model while
    a remote environment is still starting. The initial context can describe
    that environment as still loading, but nothing currently refreshes the
    model-visible environment context when startup finishes during the same
    turn.
    
    This adds the first request-scoped reconciliation slice on top of
    #28683. It is gated by `DeferredExecutor` and intentionally updates only
    model-visible environment context; tools and other environment-derived
    state will migrate separately.
    
    ## What
    
    - Add a minimal `StepContext` containing the environment snapshot
    captured before each sampling request.
    - Render attached environments with their resolved shell and starting
    environments with `still loading`.
    - Track the latest environment state recorded in model history and
    append a bounded update only when it changes.
    - Seed that baseline from full initial context so ready-at-start
    environments are not duplicated.
    - Clear the in-memory baseline when history is rewritten so replacement
    history can be refreshed safely.
    
    ## Testing
    
    - `just test -p codex-core deferred_executor`
    - `just test -p codex-core
    environment_context_baseline_deduplicates_until_history_is_replaced`
    
    The integration coverage verifies that a pending environment reaches the
    first request, the ready state reaches the next request, later requests
    do not duplicate it, and ready-at-start environments remain
    single-injected.
    
    <details>
    <summary>Live verification</summary>
    
    - Connected to a real remote executor with startup deliberately delayed
    and forced three sampling requests in one turn.
    - Inspected the raw model inputs: request 1 showed the remote
    environment as `still loading`, request 2 appended its ready shell and
    cwd, and request 3 contained no duplicate ready update.
    - With the feature disabled, startup waited for the delayed executor and
    the first request contained only the ready environment.
    - With a synchronously ready environment and the feature enabled, the
    first request contained one environment context with no duplicate.
    - Executed `pwd` and read a marker file through the remote process
    runner; the command exited successfully and returned the remote cwd and
    marker contents.
    
    </details>
  • Apply sandbox intent inside remote exec servers (#29113)
    ## Why
    
    PR #29108 lets the orchestrator send sandbox intent with `process/start`
    without wrapping the command for its own operating system.
    
    This PR completes that boundary by making the executor interpret and
    enforce the intent using its own filesystem paths and sandbox
    implementation.
    
    For example, a macOS TUI targeting a Linux devbox sends `/bin/bash -lc
    pwd`. The Linux executor turns that into its own `codex-linux-sandbox
    ... /bin/bash -lc pwd` launch.
    
    ## What changes
    
    - Keep `process/start` unchanged when no sandbox intent is present.
    - Convert sandbox `PathUri` values into native paths on the executor.
    - Bind symbolic `:workspace_roots` permissions to the executor's native
    sandbox cwd.
    - Select the sandbox implementation on the executor and wrap the
    original command immediately before spawning it.
    - Reject sandbox-required execution before spawning when the executor
    cannot enforce the intent.
    - Pass exec-server runtime paths into process creation so Linux can
    locate `codex-linux-sandbox`.
    
    The boundary is therefore:
    
    ```text
    orchestrator                         executor
    original argv + sandbox intent  ->  select and enforce local sandbox
    ```
    
    This PR intentionally treats a denied remote command as an ordinary
    command failure. Draft follow-up #29424 carries a semantic
    `sandboxDenied` result back to unified exec for the existing approval
    and retry flow.
    
    ## Platform scope
    
    Linux and macOS use their existing direct-spawn sandbox transforms.
    
    Windows sandboxed remote process launch is intentionally unsupported in
    this PR. The current Windows direct-spawn wrapper does not correctly
    preserve arbitrary argv, TTY behavior, or pass the full child
    environment out of band. The executor rejects the request instead of
    running it incorrectly or unsandboxed.
    
    ## Known follow-ups
    
    - The transported permission profile can still contain
    orchestrator-materialized helper or explicit paths. A `TODO(jif)` marks
    where the executor boundary should receive pre-host-materialization
    permission intent.
    - The sandbox wrapper currently replaces a requested custom inner
    `arg0`. A `TODO(jif)` marks where this must be preserved or rejected
    explicitly.
    - Draft PR #29424 contains the deferred sandbox-denial classification
    and approval/retry behavior.
    
    ## Rollout assumption
    
    This executor-sandbox stack is unreleased and its client and executor
    are expected to move together. This PR does not add mixed-version
    negotiation with older exec servers.
  • Simplify multi-agent mode controls (#29324)
    ## Why
    
    Multi-agent delegation policy was split across `multiAgentMode`,
    `features.multi_agent_mode`, and `usage_hint_enabled`. These controls
    could disagree: a requested mode could be downgraded by the feature
    flag, and disabling usage hints also disabled mode instructions.
    
    Some clients also need multi-agent tools without adding
    delegation-policy text to model context. The previous two-mode API could
    not express that directly.
    
    ## What changed
    
    `multiAgentMode` is now the only live delegation-policy control:
    
    | Mode | Behavior |
    | --- | --- |
    | `none` | Keep multi-agent tools available without adding mode
    instructions. |
    | `explicitRequestOnly` | Only delegate after an explicit user request.
    |
    | `proactive` | Delegate when parallel work materially improves speed or
    quality. |
    
    - new threads default to `explicitRequestOnly`; omitting the mode on
    later turns keeps the current value
    - thread start, resume, fork, and settings responses always report the
    concrete current mode instead of `null`
    - mode selection remains sticky across turns and resume
    - usage-hint text no longer controls whether mode instructions apply
    - `features.multi_agent_mode` and `usage_hint_enabled` remain accepted
    as ignored compatibility settings so existing configs continue to load
    - app-server documentation and generated schemas describe the three-mode
    API
    
    ## Tests
    
    - `just test -p codex-core multi_agent_mode`
    - `just test -p codex-core multi_agent_v2_config_from_feature_table`
    - `just test -p codex-core spawn_agent_description`
    - `just test -p codex-features`
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-app-server multi_agent_mode`
  • Persist session IDs across thread resume (#29327)
    ## Summary
    
    A cold-resumed subagent kept its durable thread ID but could receive a
    new session ID, splitting one agent tree across multiple sessions after
    a restart.
    
    Persist the root session ID in every rollout `SessionMeta`, carry it
    through thread creation, and restore it before initializing the resumed
    `Session` and `AgentControl`.
    
    ## Behavior
    
    For a nested agent tree:
    
    ```text
    root session R
      parent thread P
        child thread C
    ```
    
    The child rollout stores:
    
    ```text
    session_id:       R
    parent_thread_id: P
    id:               C
    ```
    
    After a cold resume, the child still belongs to root session `R` while
    its immediate parent remains `P`. The integration coverage uses distinct
    values for all three IDs so it catches restoring the session from
    `parent_thread_id`.
    
    ## Legacy rollouts
    
    Previous rollouts have `id` but no `session_id`. `SessionMetaLine`
    deserialization treats a missing `session_id` as `id`, keeping those
    files readable, listable, and resumable. When a legacy subagent is
    resumed through its root, that synthesized child ID no longer overrides
    the inherited root-scoped `AgentControl`. New rollouts always persist
    the explicit root session ID.
  • chore: fix merge race (auto-compaction feature access) (#29393)
    ## Summary
    
    - read the `AutoCompaction` feature flag through `TurnContext::config`
    - fix both the mid-turn and pre-sampling compaction checks
    
    ## Why
    
    #28260 was validated against an older base where `TurnContext` exposed a
    direct `features` field. It was then merged after that field had moved
    under `config`, leaving the merge result unable to compile with `E0609`
    on `turn_context.features`.
    
    This restores compilation for Bazel, SDK, and argument-comment-lint jobs
    that build `codex-core`. Behavior is unchanged: disabling
    `auto_compaction` still skips automatic compaction.
    
    ## Validation
    
    - `just fmt`
    - `CODEX_HOME=/private/tmp/codex-fix-auto-compaction-test-home just test
    -p codex-core auto_compaction_feature_disabled` — 4 passed
    - `just test -p codex-core` — `codex-core` compiled; 2,722 passed and 89
    unrelated local-environment failures remained because the sandbox could
    not write the default Codex SQLite/proxy paths and some first-party test
    binaries were unavailable
  • Propagate safety buffering events to app-server clients (#29371)
    Responses API safety buffering metadata currently stops at the transport
    boundary, so app-server clients cannot render the in-progress safety
    review state.
    
    This change:
    - decodes and deduplicates `safety_buffering` metadata from Responses
    API SSE and WebSocket events without suppressing the original response
    event
    - emits a typed core event containing the requested model plus backend
    use cases and reasons
    - forwards that event as `turn/safetyBuffering/updated` through
    app-server v2 and updates generated protocol schemas
    - keeps the side-channel event out of persisted rollouts and turn timing
    
    This supports the Codex Apps buffering UX and depends on the Responses
    API backend work in https://github.com/openai/openai/pull/1044569 and
    https://github.com/openai/openai/pull/1044571.
    
    Validation:
    - focused `codex-core` safety-buffering integration test passes
    - `cargo check -p codex-core -p codex-app-server -p
    codex-app-server-protocol`
    - `just fix -p codex-api -p codex-protocol -p codex-core -p
    codex-app-server-protocol -p codex-app-server -p codex-rollout -p
    codex-rollout-trace -p codex-otel`
    - `just fmt`
    - broad package test run: 4,430/4,492 passed; 62 unrelated
    local-environment/concurrency failures involved unavailable test
    binaries, MCP subprocess setup, and app-server timeouts
  • [codex] Add internal auto-compaction opt-out (#28260)
    ## Summary
    
    - add a default-on `auto_compaction` feature flag as an internal escape
    hatch
    - skip pre-turn, model-switch/hash, and mid-turn automatic compaction
    when the flag is disabled
    - preserve manual `/compact` behavior and surface the existing
    context-window error when the provider runs out of room
    - add integration coverage for disabled pre-turn and mid-turn compaction
    
    ## Motivation
    
    Long-running SPO optimization rollouts need the option to preserve their
    full context and fail on context exhaustion instead of entering another
    compaction window. This deliberately uses the existing feature-flag
    mechanism rather than adding a dedicated public config or app-server
    API.
    
    Disable it with:
    
    ```sh
    codex --disable auto_compaction
    ```
    
    ## Testing
    
    - `just test -p codex-features` — 51 passed
    - `just test -p codex-core auto_compaction_feature_disabled` — 2 passed
    - `just fix -p codex-core -p codex-features`
    - `just write-config-schema`
    - `just test -p codex-core` — the new compaction tests passed; the
    overall local run had 54 unrelated environment failures, primarily
    missing first-party test binaries and shell-snapshot timeouts
  • Carry sandbox intent to remote exec servers (#29108)
    ## What changed
    
    PR #29099 stopped sending the orchestrator's concrete sandbox wrapper to
    a remote exec-server. Remote commands now arrive as plain native argv.
    
    This PR adds the next piece: Codex also sends portable sandbox intent
    next to that plain argv.
    
    For a remote unified-exec command, the request can now include:
    
    - the canonical permission profile before local workspace-root
    materialization
    - the sandbox cwd and workspace roots as `PathUri` values
    - Windows sandbox settings
    - the legacy Landlock setting
    - whether managed networking must be enforced
    
    The important part is that symbolic entries such as `:workspace_roots`
    stay symbolic while crossing the boundary. The executor can then bind
    them to its own workspace-root paths instead of receiving
    orchestrator-local absolute paths.
    
    The data travels through `ExecRequest` into `ExecParams`. Older
    exec-servers can still deserialize requests because the new fields have
    defaults.
    
    ## Why
    
    The orchestrator should not decide how another machine implements
    sandboxing.
    
    For example:
    
    - a local macOS Codex would normally build a Seatbelt command
    - a remote Linux executor needs a Linux sandbox command instead
    
    The orchestrator now sends the plain command plus the policy it intended
    to enforce. A later PR can let the exec-server choose and build the
    correct sandbox for its own operating system.
    
    ## Important detail
    
    This keeps the portable intent separate from the local `SandboxType`.
    
    `SandboxType::None` is ambiguous:
    
    - it can mean the command was explicitly approved to run without a
    sandbox
    - it can also mean the orchestrator host has no concrete sandbox
    implementation available
    
    Those cases are different for remote execution. This PR adds
    `sandbox_requested` so an executor can still receive sandbox intent when
    the orchestrator cannot build a local wrapper. Explicit unsandboxed
    retries still send no sandbox context.
    
    ## Behavior today
    
    This PR only transports the intent. The exec-server accepts the new
    fields but does not apply them yet.
    
    Remote commands therefore remain unsandboxed after this PR, just as they
    are after PR #29099.
    
    ## Follow-up
    
    The next PR will make exec-server read this portable intent, bind
    symbolic workspace permissions to executor-native roots, choose the
    sandbox for its own operating system, build the wrapper locally, and
    then spawn the command.
  • [codex] simplify token budget context (#29295)
    ## Why
    
    The token-budget feature currently adds remaining-token messages
    whenever usage crosses the 25%, 50%, and 75% thresholds. Those periodic
    inserts create prompt churn without requiring action, while the
    near-compaction reminder and explicit `get_context_remaining` tool
    already cover actionable and on-demand budget information.
    
    The context-window lineage block is also easier to scan as plain labeled
    text than as a `<token_budget>`-wrapped fragment.
    
    ## What changed
    
    - Stop recording automatic remaining-token messages at percentage
    thresholds.
    - Render context-window lineage in `First`, `Current`, `Previous` order
    with colon-separated labels.
    - Omit the `Previous` line for the first context window.
    - Remove `<token_budget>` wrappers from newly rendered lineage,
    near-compaction reminders, and `get_context_remaining` output.
    - Keep recognizing legacy wrapped fragments so existing rollouts remain
    compatible.
    - Remove the post-sampling token snapshot that was only needed by the
    periodic threshold path.
    
    ## Testing
    
    - `just test -p codex-core token_budget` (11 tests passed)
  • [codex] add configurable token budget compaction reminder (#29255)
    ## Why
    
    The token-budget feature reports coarse remaining-context milestones,
    but it does not give the model a configurable wrap-up prompt before
    automatic compaction. A strict threshold-crossing check can also miss
    resumed or reconfigured windows that are already inside the threshold.
    
    ## What changed
    
    - Add structured `[features.token_budget]` configuration for an absolute
    `reminder_threshold_tokens` and bounded `reminder_message_template`;
    `{n_remaining}` is expanded when the reminder is delivered.
    - Compute remaining tokens against the next effective auto-compaction
    boundary, including scoped `body_after_prefix` accounting and the full
    context-window limit.
    - Make reminder delivery level-triggered before and after sampling, with
    one-shot state owned by `AutoCompactWindow` and re-armed on compaction,
    `new_context`, restore, or history replacement.
    - Leave the existing initial full-window token-budget context, 25/50/75%
    notices, and token-budget tools unchanged.
    - Persist the resolved feature configuration in the session config lock
    and regenerate the config schema.
    
    ## Validation
    
    - `just test -p codex-core token_budget`
    - `just test -p codex-core
    token_budget_reminder_emits_after_crossing_compaction_threshold`
    - `just test -p codex-core auto_compact_window`
    - `just test -p codex-core
    lock_contains_prompts_and_materializes_features`
    - `just test -p codex-features`
    - `just test -p codex-config`
  • [codex] prototype mcp_history thread hint injection (#29259)
    ## Why
    
    Prototype whether the harness can invoke the `mcp_history` MCP while
    constructing full initial context and expose its thread hint to the
    model without requiring a model-issued tool call.
    
    The prototype builds on the context-window lineage added by #29256 and
    is now based directly on `main`.
    
    ## What changed
    
    - Call `mcp_history/thread_hint` with no arguments while building the
    full `<token_budget>` context.
    - Pass the current `threadId` through MCP request metadata, matching the
    normal MCP tool-call path.
    - Serialize only the unstructured `content` result and append it inside
    `<token_budget>` when the call succeeds.
    - Omit the additional context when the MCP call or content serialization
    fails.
    
    ## Prototype limitations
    
    - The direct call bypasses the normal model-initiated MCP approval,
    lifecycle-event, telemetry, and result-sanitization path.
    - The call has no prototype-specific timeout, result-size cap, or
    per-window cache.
    - MCP latency is added to full-context construction, including
    applicable compaction paths.
    
    ## Validation
    
    - `just test -p codex-core token_budget`
  • core: add context window lineage IDs (#29256)
    ## Why
    
    The rendered `<token_budget>` fragment identifies the thread and current
    context window, but it does not expose enough lineage to identify the
    first window in the thread or the immediately preceding window. Those
    IDs also need to remain stable across compaction, resume, and rollback.
    
    ## What changed
    
    - Track first, previous, and current UUIDv7 context-window IDs in
    auto-compaction state.
    - Render `thread_id`, `first_window_id`, `previous_window_id`, and the
    current window ID in the full `<token_budget>` fragment.
    - Persist the first and previous window IDs in compacted rollout
    checkpoints and restore them during rollout reconstruction.
    - Preserve compatibility with older compacted records that do not
    contain the new optional fields.
    - Update focused state, rendering, reconstruction, rollback, and
    serialization coverage.
    
    ## Validation
    
    - `just test -p codex-core token_budget`
    - `just test -p codex-protocol compacted_item::tests`
    - `just test -p codex-core tracks_prefill_and_window_boundaries`
    - `just test -p codex-core
    reconstruct_history_uses_replacement_history_verbatim`
    - `just test -p codex-core
    thread_rollback_restores_cleared_reference_context_item_after_compaction`
  • Keep remote exec commands native to the executor (#29099)
    ## Summary
    
    - Remote unified-exec now sends the original command argv to exec-server
    instead of materializing the orchestrator's sandbox wrapper first.
    - Local unified-exec keeps the existing sandbox path unchanged.
    - Add a focused regression test for a macOS-selected sandbox producing
    plain remote argv.
    
    Before:
    
        macOS orchestrator -> /usr/bin/sandbox-exec ... -> Linux exec-server
    
    After:
    
        macOS orchestrator -> /bin/bash -lc pwd -> Linux exec-server
    
    This is intentionally only the first cleanup step. Remote unified-exec
    commands are sent without a process sandbox until the targeted
    follow-ups below land. For the macOS-to-Linux path this is not a
    practical regression: the old sandboxed attempt failed before process
    launch because the Linux executor could not spawn macOS sandbox paths.
    
    ## Targeted follow-ups
    
    1. Carry sandbox intent separately from argv.
       - Add an optional sandbox field to exec-server process params.
    - Reuse FileSystemSandboxContext rather than introducing a new sandbox
    model.
       - Carry managed-network enforcement as one explicit bit.
       - Keep argv plain.
    
    2. Apply that intent inside exec-server.
       - Add a small process-start adapter before LocalProcess::exec.
    - Reuse the existing codex-sandboxing SandboxManager and exec-server
    runtime paths.
    - Follow the same shape already used by exec-server filesystem
    sandboxing.
       - Do not duplicate or move the sandbox implementations.
    
    3. Report the sandbox actually used.
       - Return the executor-selected sandbox type from process/start.
    - Use that value in core for sandbox-denial detection and retry
    behavior.
    
    ## End state
    
    The orchestrator sends plain commands plus portable sandbox intent. The
    executor chooses and applies its own native sandbox: Linux executors use
    Linux sandboxing, macOS executors use Seatbelt, and Windows executors
    use Windows sandboxing. Concrete wrapper argv, helper paths, and sandbox
    env markers never cross the executor boundary.
  • Add config toggles for orchestrator skills and MCP (#28942)
    ## Why
    
    Orchestrator-provided skills and Codex Apps MCP tools add model-visible
    instructions, resources, and tools beyond the local workspace. Hosts
    need config-level switches to disable those orchestrator-owned surfaces
    independently, without disabling regular skills or regular MCP servers.
    
    ## What changed
    
    - Adds `[orchestrator.skills].enabled` and `[orchestrator.mcp].enabled`
    config entries, both defaulting to `true`.
    - Includes the new settings in `config.schema.json` and in the config
    lock so resolved thread configuration preserves the same orchestrator
    exposure decisions.
    - Threads `orchestrator.skills.enabled` through the app-server skills
    extension so disabled orchestrator skills do not expose the `skills`
    namespace or inject orchestrator skill context.
    - Gates Codex Apps MCP exposure, app instructions, and app auth
    eligibility on `orchestrator.mcp.enabled` while leaving non-Codex-Apps
    MCP tools available.
    - Updates the thread-manager sample config to disable both
    orchestrator-owned surfaces.
    
    ## Verification
    
    - Added config parsing, loading, defaulting, and schema coverage for the
    new settings.
    - Added MCP exposure coverage that `orchestrator.mcp.enabled = false`
    removes Codex Apps tools while preserving regular MCP tools.
    - Added app-server coverage that `orchestrator.skills.enabled = false`
    prevents orchestrator skill tools, prompts, and resource reads from
    reaching the model turn.
  • Add indexed web search mode (#28489)
    ## Summary
    
    - Add `web_search = "indexed"` alongside `disabled`, `cached`, and
    `live`.
    - Use that same resolved mode for both hosted and standalone web search.
    - For hosted search, send `index_gated_web_access: true` with external
    web access enabled only when `indexed` is selected.
    - For standalone search, preserve the existing boolean wire values for
    existing modes (`cached` maps to `false` and `live` to `true`) and send
    `"indexed"` only for `indexed`; `disabled` keeps the tool unavailable.
    - Carry the mode through managed configuration requirements and
    generated schemas.
    
    ## Why
    
    Indexed search provides a middle ground between cached-only search and
    unrestricted live page fetching. Search queries can remain live while
    direct page fetches are limited to URLs admitted by the server.
    
    The existing `web_search` setting remains the single source of truth, so
    hosted and standalone executors cannot drift into different access
    modes. Without an explicit `indexed` selection, the existing
    model-visible tool and request shapes are unchanged.
    
    ```toml
    web_search = "indexed"
    
    [features]
    standalone_web_search = true
    ```
    
    ## Validation
    
    - `just fmt`
    - `just test -p codex-api` (`126 passed`)
    - `just test -p codex-web-search-extension` (`7 passed`)
    - `just test -p codex-core
    code_mode_can_call_indexed_standalone_web_search` (`1 passed`)
    - Focused configuration, hosted request, standalone request, and
    managed-requirement coverage is included in the PR; remaining suites run
    in CI.
    
    The full workspace test suite was not run locally.
  • Scope network approvals by environment (#28899)
    Stacked on #28766.
    
    ## Why
    
    Network approvals are environment-scoped: allowing a host in one
    execution environment should not allow the same host in another
    environment.
    
    #28766 adds the inert IDs and constructor plumbing. This PR applies the
    behavior on top.
    
    ## What changed
    
    - Route managed network traffic through per-environment HTTP and SOCKS
    proxy listeners.
    - Stamp HTTP, HTTPS CONNECT, SOCKS TCP, and SOCKS UDP policy requests
    with the source environment at the proxy boundary.
    - Carry the selected execution environment through shell, unified exec,
    zsh-fork, and sandbox transform paths.
    - Include the environment in pending, approved-for-session, and
    denied-for-session network approval cache keys.
    - Include the environment in approval IDs and approval prompts.
    - Preserve legacy fallback for unattributed requests, but deny when
    active-call attribution is ambiguous.
    - Fail closed if an environment-specific proxy endpoint cannot be
    prepared.
    
    ## Validation
    
    - just fmt
    - CI will run tests and clippy
  • [codex] abort turns when rollout budgets expire (token budget 3/3) (#28707)
    ## Stack
    
    Depends on #28494.
    
    ## Description
    
    This PR propagates shared rollout-budget exhaustion through the existing
    `CodexErr::TurnAborted` task result.
    
    Each thread records its model usage against the same ledger. Once the
    ledger is exhausted, that usage update and all later usage updates
    return `TurnAborted`. The task wrapper emits the normal aborted-turn
    event and lifecycle instead of completing the turn.
    
    This is intentionally a soft boundary: there is no cross-thread
    `Op::Interrupt` fanout. An in-flight thread can finish its current
    response before it observes the exhausted ledger, but every thread
    aborts at its next usage-accounting boundary.
    
    ## Tests
    
    The integration coverage verifies that:
    
    - the response that exhausts the budget aborts its turn;
    - a later response also aborts because the shared ledger remains
    exhausted; and
    - sub-agent usage draws from the same shared ledger; and
    - local and remote-v2 compaction abort without retrying or emitting a
    generic error.
    
    Local checks:
    
    - `just test -p codex-core
    exhausted_budget_aborts_current_and_later_turns`
    - `just test -p codex-core subagent_usage_draws_from_the_shared_budget`
    - `just test -p codex-core
    abort_regular_task_emits_marker_before_turn_aborted`
    - `just test -p codex-core
    compaction_budget_exhaustion_aborts_without_error_or_retry`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
    
    The full workspace test suite was not run locally.
  • Expose thread-level multi-agent mode (#28792)
    ## Why
    
    Once multi-agent mode can be selected per turn, clients also need to
    choose the initial selection when creating a thread and observe that
    selection through lifecycle and settings APIs.
    
    The selected value is intentionally distinct from the effective
    model-visible value: no client selection is represented as `null`, even
    though an eligible multi-agent v2 turn derives `explicitRequestOnly` as
    its effective default.
    
    ## What changed
    
    - Add the optional experimental `thread/start.multiAgentMode` parameter
    and pass it through thread creation.
    - Preserve an omitted initial value as an unset selection rather than
    eagerly storing `explicitRequestOnly`.
    - Apply an explicit `thread/start` selection to the first turn through
    the session configuration established at thread creation.
    - Restore the latest persisted effective mode as the selected baseline
    on cold resume when rollout history contains one.
    - Inherit the optional selected mode from a loaded parent when creating
    related runtime threads.
    - Return the current selected `multiAgentMode` from `thread/start`,
    `thread/resume`, `thread/fork`, and thread settings, using `null` when
    no mode is selected.
    - Keep lifecycle reporting independent from model capability and feature
    eligibility; core turn construction remains responsible for calculating
    and persisting the effective mode.
    
    ## Not covered
    
    - Clearing an existing loaded-session selection back to unset through
    `turn/start`; omitted or `null` currently retains the session's
    selection.
    - A TUI control, slash command, or `config.toml` preference.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server-protocol`
    - `CARGO_INCREMENTAL=0 just test -p codex-app-server multi_agent_mode`
    
    The focused app-server coverage verifies explicit `thread/start`
    initialization, first-turn prompting, nullable reporting for an omitted
    selection, and retention of selections that are not currently
    runtime-eligible.
    
    ## Stack
    
    Stacked on #28685. This PR contains only the thread initialization and
    lifecycle/settings API layer.
  • Add per-turn multi-agent mode (#28685)
    ## Why
    
    Multi-agent v2 currently carries an explicit-request-only delegation
    rule in its static usage hint. That provides a safe default, but it
    prevents clients from selecting proactive delegation per turn without
    changing static guidance or rewriting prior model context.
    
    This change makes delegation mode a session selection that can be
    updated through `turn/start`, while deriving the effective model-visible
    mode separately for each turn. Eligible multi-agent v2 turns remain
    explicit-request-only unless proactive mode is both selected and
    enabled.
    
    ## What changed
    
    - Add the experimental `turn/start.multiAgentMode` parameter with
    `explicitRequestOnly` and `proactive` values. Omission retains the
    loaded session's current optional selection.
    - Add the default-off `features.multi_agent_mode` feature gate. Eligible
    multi-agent v2 turns use the selected mode when enabled; an unset
    selection or disabled gate resolves to `explicitRequestOnly`.
    - Treat mode prompting as inapplicable for multi-agent v1 and other
    unsupported session configurations, producing no multi-agent mode
    developer message rather than rejecting the turn.
    - Move the explicit-request-only rule out of the static v2 usage hint
    and into a bounded, tagged developer context fragment.
    - Emit the effective mode in initial context and only when that
    effective mode changes on later turns.
    - Persist the effective mode in `TurnContextItem` as the durable
    baseline for resume and context-update comparisons.
    
    Historical rollout items are not rewritten. Later mode developer
    messages establish the current rule incrementally.
    
    ## Not covered
    
    - Initial selection through `thread/start` and selected-mode reporting
    from thread lifecycle/settings APIs; those are isolated in the stacked
    #28792.
    - A TUI control or slash command for selecting the mode.
    - Persisting a preferred mode to `config.toml`; selection remains
    session/turn scoped.
    - Changes to multi-agent concurrency limits, tool availability, or model
    catalog capability declarations.
    - Rewriting historical rollout prompt items. Cold resume restores the
    latest persisted effective mode when available while leaving historical
    developer messages intact.
    
    ## Verification
    
    - `CARGO_INCREMENTAL=0 just test -p codex-core multi_agent_mode`
    - Focused app-server coverage verifies that `turn/start.multiAgentMode`
    produces proactive developer instructions for an eligible v2 turn.
    
    ## Stack
    
    Followed by #28792, which adds `thread/start` initialization and
    lifecycle/settings observability.
  • [3/3] app-server: configure environment connection timeout (#29025)
    ## Why
    
    Remote environments registered through `environment/add` currently use
    the fixed 10-second WebSocket connection timeout. Slow-starting
    executors need a caller-selected connection window, but this should not
    add retry policy or couple exec-server behavior to Core’s
    `deferred_executor` feature.
    
    Make the timeout an optional part of the existing experimental request.
    Existing clients continue using the current default, while callers that
    know an executor may take longer can request a larger window explicitly.
    
    Depends on #28683.
    
    ## What changed
    
    - Add optional `connectTimeoutMs` to `EnvironmentAddParams` and document
    it in the app-server README.
    - Pass the optional timeout through `EnvironmentRequestProcessor` into
    one `EnvironmentManager::upsert_environment()` path; the manager applies
    the existing default when it is omitted.
    - Preserve the existing single-attempt lifecycle. The configured value
    controls WebSocket connection and handshake time for both initial
    connection and later reconnects; initialization retains its separate
    timeout.
    - Add an app-server integration test that sends the real JSON-RPC
    request and verifies a stalled handshake observes the requested timeout.
    
    ## Test plan
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-exec-server`
    - `just test -p codex-app-server
    environment_add_applies_connect_timeout`
    
    ## Rollout
    
    This is additive and does not enable `deferred_executor`. Callers should
    send a non-default timeout only after a compatible app-server is
    deployed; omitted or `null` values retain the existing 10-second
    default.