Commit Graph

7833 Commits

  • [codex] route sleep through time providers (#29973)
    ## Summary
    
    - add a cancellable sleep operation to `TimeProvider`
    - route `clock.sleep` through the configured provider
    - extend the supported sleep duration to 12 hours
    - complete the sleep turn item before propagating provider failures
    
    ## Why
    
    This isolates the core clock abstraction needed by external clock
    integrations. Existing system and app-server behavior remains wall-clock
    based in this PR; the stacked follow-up supplies app-server sleeps from
    an external clock.
  • core: raise token budget message limits (#29970)
    ## Why
    
    Token-budget reminder and guidance messages can require more than 1,000
    bytes to provide useful model-facing instructions. At the same time,
    these strings are injected into model-visible context, so their size
    must remain tightly bounded in response to the P0 context-growth
    concern. A 2,000-byte runtime cap provides additional room without
    allowing the substantially larger context growth of a 4 KiB limit.
    
    ## What changed
    
    - raises the runtime byte limits for token-budget reminder templates and
    guidance messages from 1,000 to 2,000
    - raises the corresponding JSON Schema `maxLength` values to 2,000
    - regenerates `codex-rs/core/config.schema.json`
    
    ## Testing
    
    - `just test -p codex-features`
    - `just test -p codex-core load_config_resolves_token_budget_config
    load_config_rejects_invalid_token_budget_reminder_template`
    
    The full `codex-core` test run completed 2,858 tests successfully and
    encountered seven unrelated environment-sensitive failures involving
    Seatbelt/network environment assertions, MCP capability setup, and abort
    timing.
  • Report MCP error codes with server attribution (#29969)
    ## Why
    
    MCP error-code telemetry special-cased Codex Apps: its reported error
    codes were retained, while codes from every other MCP server were
    replaced with `unknown`. Error reporting should behave consistently for
    every MCP server. The server name already identifies where an error came
    from, so telemetry does not need a separate Codex Apps classification.
    
    This follows up on [#28976](https://github.com/openai/codex/pull/28976),
    which introduced MCP error-code telemetry.
    
    ## What changed
    
    - Add the MCP server name to call, duration, and error metrics.
    - Retain bounded, sanitized tool error codes from every MCP server.
    - Remove `McpErrorCodeSource` and the Codex Apps ownership lookup from
    telemetry collection.
    - Use the same metric-tagging path for blocked, rejected, and executed
    MCP calls.
    
    ## Test plan
    
    - Verify the complete metric tag set includes the sanitized MCP server
    name.
    - Verify error codes from ordinary MCP servers are retained, bounded,
    and sanitized.
    - Preserve coverage for request failures, tool-result failures, nested
    auth failures, and span attributes.
  • [3/3] core: replay persisted world state (#29837)
    ## Why
    
    Persisting `WorldState` snapshots and patches is only useful if resume
    and fork restore that exact comparison baseline. Rebuilding it from
    `TurnContextItem` loses section state and can either repeat or suppress
    model-visible updates.
    
    This is the third PR in the WorldState persistence stack, built on
    #29835.
    
    ## What
    
    - Replay full WorldState snapshots and RFC 7386 patches through the
    existing rollout reconstruction segments.
    - Discard state from rolled-back turns and treat compaction as a
    baseline reset.
    - Hydrate `ContextManager` from the reconstructed snapshot on resume and
    fork.
    - Remove the synthetic `TurnContextItem` to WorldState conversion path.
    - Leave legacy or malformed rollouts without a baseline so the next
    update safely emits a full snapshot.
    
    ## Testing
    
    - `just test -p codex-core world_state`
    - `just test -p codex-core rollout_reconstruction_tests`
    - `just fix -p codex-core`
    - `just test -p codex-core` *(the changed tests passed; the full run
    also hit unrelated existing/test-environment failures, primarily a
    missing `test_stdio_server` binary)*
  • [codex] Add Ultra reasoning effort (#29899)
    ## Why
    
    Ultra should be one user-facing reasoning selection for work that
    benefits from both maximum reasoning and proactive multi-agent
    delegation. Without it, clients must coordinate maximum reasoning with
    the experimental `multiAgentMode` setting, even though the inference
    backend still expects its existing `max` effort value.
    
    This change makes reasoning effort the source of truth: clients select
    `ultra`, core derives proactive multi-agent behavior when the turn is
    eligible for multi-agent V2, and inference requests continue to use the
    backend-compatible `max` value.
    
    ## What changed
    
    - Add `ultra` as a first-class reasoning effort and preserve
    model-catalog ordering when exposing it to clients.
    - Convert `ultra` to `max` at the inference request boundary, including
    Responses HTTP/WebSocket requests, startup prewarm, compaction, and
    memory summarization.
    - Derive effective multi-agent mode per turn from effective reasoning
    effort:
      - eligible multi-agent V2 + `ultra` → `proactive`
      - eligible multi-agent V2 + any other effort → `explicitRequestOnly`
    - V1 or otherwise ineligible sessions → no multi-agent mode instruction
    - Keep the derived effective mode in turn context history so successive
    turns can emit a developer-message update only when the effective mode
    changes.
    - Remove selected multi-agent mode from core session configuration, turn
    construction, thread settings, resume/fork restoration, and subagent
    spawn plumbing. Subagents inherit reasoning effort and derive their own
    effective mode.
    - Retain the experimental app-server `multiAgentMode` fields for wire
    compatibility while marking them deprecated. Request values are accepted
    but ignored; compatibility response fields report `explicitRequestOnly`.
    - Display Ultra in the TUI using the order supplied by `model/list`.
    
    ## Validation
    
    - `just test -p codex-core ultra_reasoning_uses_max_for_requests`
    - `just test -p codex-tui model_reasoning_selection_popup`
  • [2/3] core: persist world state in rollouts (#29835)
    ## Why
    
    `WorldState` currently remembers its model-visible diff baseline only in
    memory. That leaves no durable source for restoring the exact baseline
    after resume, fork, rollback, or compaction.
    
    This is the second PR in the WorldState persistence stack, built on
    #29833 and following #29249. It records durable state transitions; the
    next PR will replay them during rollout reconstruction.
    
    ## What
    
    - Add a `world_state` rollout item containing either a full snapshot or
    an RFC 7386 JSON Merge Patch.
    - Persist a full snapshot after initial context and after compaction
    establishes a new context window.
    - Persist non-empty patches when later sampling steps or turns advance
    the WorldState baseline.
    - Write model-visible history before its matching WorldState record, so
    an interrupted write can only cause a safe repeated update on replay.
    - Preserve WorldState records for full-history forks while excluding
    them from thread previews, metadata, and app-server history
    materialization.
    
    Older binaries read rollout lines independently, so they skip the
    unknown `world_state` records while retaining the rest of the thread.
    
    ## Testing
    
    - `just test -p codex-core
    snapshot_merge_patch_changes_and_removes_nested_values`
    - `just test -p codex-core
    world_state_baseline_deduplicates_until_history_is_replaced`
    - `just test -p codex-core
    deferred_executor_compaction_preserves_then_updates_environment_once`
    - `just test -p codex-protocol`
    - `just test -p codex-rollout`
    - `just test -p codex-state`
    - `just test -p codex-thread-store`
    - `just test -p codex-app-server-protocol`
  • [codex] Populate remote plugin local versions (#29956)
    # What
    
    - Carry installed remote release versions through remote plugin
    summaries as `localVersion`.
    - Keep the app-server mapping a pure adapter by populating that value in
    the remote catalog layer.
    
    # Why
    
    Remote plugin summaries always returned `localVersion: null` even after
    their versioned bundles had been installed locally. Consumers such as
    scheduled-task template discovery use `localVersion` to resolve a
    plugin's materialized root, so templates from remote curated plugins
    were silently skipped.
  • code-mode: define process host wire protocol (#29804)
    ## Why
    
    The process-owned code mode implementation needs an explicit, bounded
    wire contract before either side depends on it. Keeping framing and
    message semantics in `codex-code-mode-protocol` gives the client and
    sidecar one shared source of truth and makes compatibility failures
    detectable during connection setup.
    
    ## What changed
    
    - adds a versioned client/host handshake with required and optional
    capabilities
    - defines operation requests and responses for session lifecycle and
    cell control
    - defines reverse delegate request, response, cancellation, and
    cell-closure messages
    - adds a four-byte little-endian length-prefixed JSON codec with a hard
    frame cap
    - rejects malformed frames, unknown fields, invalid identifiers, and
    unsupported protocol states
    - locks the wire representation down with explicit JSON round-trip tests
    
    ## Testing
    
    - `just test -p codex-code-mode-protocol`
    
    ## Stack
    
    Part 1 of 6. Followed by
    [#29805](https://github.com/openai/codex/pull/29805).
  • Represent MCP authentication with an enum (#29924)
    ## Why
    
    MCP authentication has distinct OAuth and ChatGPT-session flows.
    Representing that choice as `use_chatgpt_auth` makes one flow implicit
    and allows the configuration model to express the distinction only
    through a boolean.
    
    ChatGPT credential forwarding also needs a first-party trust boundary. A
    configurable `chatgpt_base_url` controls routing, but must not grant an
    MCP server permission to receive session credentials.
    
    This change builds on #29733, where the boolean was introduced.
    
    ## What changed
    
    - Replace `use_chatgpt_auth` with an `auth` field backed by the
    exhaustive `McpServerAuth` enum.
    - Support `auth = "oauth"` and `auth = "chatgpt"`, with OAuth remaining
    the default.
    - Trust only the origin derived from the existing hardcoded
    `CHATGPT_CODEX_BASE_URL` when granting ChatGPT auth to an MCP server.
    - Keep configured bearer tokens and authorization headers ahead of the
    selected authentication flow.
    - Update config writers, schema output, fixtures, and integration-test
    setup to use the enum.
    
    ## Verification
    
    Integration coverage exercises the complete streamable HTTP startup path
    in two independent configurations:
    
    - A directly constructed MCP configuration verifies that matching an
    overridden `chatgpt_base_url` does not grant ChatGPT auth.
    - A persisted `config.toml` containing an attacker-controlled
    `chatgpt_base_url` and `auth = "chatgpt"` verifies the same boundary
    through normal config parsing.
    
    Both tests complete MCP initialization and tool listing and assert that
    the full captured request sequence contains no authorization headers.
    Separate integration coverage verifies that configured authorization
    takes precedence over ChatGPT auth.
  • [1/3] core: make world state snapshots serializable (#29833)
    ## Why
    
    `WorldState` currently keeps its diff baseline as live Rust objects
    keyed by process-local `TypeId`. That baseline cannot be written to a
    rollout or restored after resume, so Codex reconstructs an approximation
    from `TurnContextItem`.
    
    This is the first change in the WorldState persistence stack. It gives
    every section a stable persisted identity and a compact serializable
    comparison snapshot without changing rollout behavior yet.
    
    ## What changed
    
    - Require each `WorldStateSection` to define a stable ID and
    serializable snapshot type.
    - Reject duplicate section IDs when constructing `WorldState`.
    - Persist a dedicated environment comparison snapshot using
    model-visible strings instead of runtime path types.
    - Store only `WorldStateSnapshot` in `ContextManager`, removing the
    parallel live-object baseline.
    - Render diffs by restoring each section's typed snapshot; invalid
    snapshots fall back to a full section render.
    - Omit null object fields for future RFC 7386 patches while preserving
    null values inside arrays.
    
    Follow-up PRs will record full snapshots and merge patches, then restore
    the baseline during resume, fork, and rollback.
    
    ## Test plan
    
    - WorldState snapshot tests cover stable IDs, duplicate rejection, null
    omission, and array preservation.
    - Environment tests cover persistence-safe snapshot values and existing
    diff rendering.
    - ContextManager baseline deduplication and session context-update
    persistence tests.
    
    Related: #29249
  • Allow ChatGPT-hosted MCP servers to use session auth (#29733)
    ## Why
    
    ChatGPT session authentication was inferred from the reserved Codex Apps
    server name. That couples credential routing to Codex Apps-specific
    behavior and prevents other MCP endpoints hosted by ChatGPT from
    explicitly using the current session.
    
    The opt-in also needs a clear security boundary: an arbitrary MCP
    configuration must not be able to redirect ChatGPT credentials to
    another origin.
    
    ## What changed
    
    - Add `use_chatgpt_auth` to HTTP MCP server configuration, defaulting to
    `false`.
    - Honor the setting only when the parsed server URL has the same HTTP(S)
    origin as the configured `chatgpt_base_url`; otherwise remove the
    capability before startup.
    - Resolve bearer tokens and static or environment-backed authorization
    headers before selecting authentication, with configured authorization
    taking precedence over ChatGPT session auth.
    - Enable the setting for the built-in Codex Apps and hosted plugin
    runtime endpoints while keeping Codex Apps caching and tool
    normalization scoped to the reserved server.
    - Persist the setting through MCP config rewrite paths and expose it in
    the generated config schema.
    - Load the current login state for `codex mcp list` so reported auth
    status matches runtime behavior.
    
    ## Verification
    
    Core integration coverage exercises the complete streamable HTTP MCP
    startup path and verifies that:
    
    - a same-origin opted-in server receives the current ChatGPT access
    token;
    - an explicitly configured authorization header takes precedence;
    - a different-origin server completes MCP initialization and tool
    listing without receiving any ChatGPT authorization header.
  • TUI Plugin Sharing 5 - polish remote plugin catalog rows (#26705)
    This is the final plugin sharing PR in the 5-PR stack. It applies the
    remaining TUI polish for remote plugin catalog rows and tabs:
    admin-disabled plugins now read as blocked/view-only instead of looking
    toggleable, admin-installed/default-installed plugins count and sort
    like installed plugins, plugin search matches richer metadata, and an
    empty successful `Shared with me` section stays hidden.
    
    - Admin-disabled rows use a blocked marker, show `Disabled`, and keep
    Enter-only detail behavior without a toggle hint.
    - Admin-installed/default-installed plugins show as installed in counts,
    ordering, tabs, and detail copy.
    - Plugin search now matches descriptions and keywords in addition to
    existing row metadata.
    - Successful-empty `Shared with me` tabs are hidden, while loading,
    error, workspace-empty, and real shared-plugin states remain visible.
    - Updates coverage in
    `plugins_popup_snapshot_shows_all_marketplaces_and_sorts_installed_then_name`,
    `plugins_popup_admin_disabled_installed_plugin_has_no_toggle_hint`,
    `plugins_popup_search_matches_plugin_descriptions`, and
    `plugins_popup_remote_section_fallback_states_snapshot`.
    - Updates snapshots `plugins_popup_curated_marketplace` and
    `plugins_popup_empty_shared_section_hidden`.
    
    
    <img width="2034" height="106" alt="image"
    src="https://github.com/user-attachments/assets/3f9a57e1-edd8-4e6c-b0b0-9f632a3c9529"
    />
    <img width="2038" height="380" alt="image"
    src="https://github.com/user-attachments/assets/45a47491-3381-4846-a13d-496bc0051d42"
    />
  • core: add configurable <context_window_guidance> message (#29936)
    ## Why
    
    This PR adds a configurable `<context_window_guidance>` developer
    section immediately after `<context_window>`. Harness integrations need
    this section to give the model deployment-specific instructions for
    preparing for context-window transitions.
    
    ## What changed
    
    - Add an optional `features.token_budget.guidance_message` config with a
    1,000-byte runtime cap and generated schema support.
    - Render configured guidance as a developer `ContextualUserFragment`
    wrapped in `<context_window_guidance>` immediately after
    `<context_window>`.
    - Omit the section when guidance is unset, empty, or whitespace-only.
    - Preserve the resolved value in config locks and classify persisted
    guidance as contextual developer content.
    - Add integration coverage for rendered content and ordering.
  • feat(remote-control): add daemon pairing command (#29913)
    ## Why
    
    Users who run Codex remote control through daemon mode can keep the
    daemon running, but they do not have a CLI path to mint the short-lived
    manual pairing code needed to connect another device. Without this
    command, they need to speak app-server JSON-RPC directly.
    
    Related: #25675
    
    ## What Changed
    
    - Added `codex remote-control pair`, which connects to the existing
    daemon control socket and calls `remoteControl/pairing/start` with
    `manualCode: true`.
    - Kept the command non-lifecycle-mutating: it does not start, enable, or
    restart the daemon.
    - Human output labels the manual code as `Pairing code: ...`; `--json`
    preserves the full pairing response.
    - Added daemon socket-client, CLI formatting, and parser coverage.
    
    ## Verification
    
    - `remote_control_client::tests::start_pairing_requests_manual_code`
    verifies the daemon client sends `{ "manualCode": true }` and parses the
    complete response.
    -
    `remote_control_cmd::tests::remote_control_pairing_human_output_labels_the_manual_code`
    verifies the human-facing output.
  • [codex] nest sleep config under current time reminder (#29910)
    ## Summary
    
    - move sleep tool enablement from top-level `[features].sleep_tool` to
    `[features.current_time_reminder].sleep_tool`
    - remove the standalone `Feature::SleepTool` flag and gate `clock.sleep`
    from resolved current-time configuration
    - update config schema, config-lock materialization, and existing sleep
    coverage
    
    Stacked on #29907.
  • [codex] namespace sleep under clock (#29907)
    ## Summary
    
    - expose the interruptible sleep tool as `clock.sleep` instead of
    top-level `sleep`
    - keep `clock.curr_time` and `clock.sleep` in the same model-visible
    namespace when both features are enabled
    - update existing core and app-server integration coverage to issue
    namespaced sleep calls
    
    ## Why
    
    Sleep is a clock operation. Grouping it with `clock.curr_time` gives the
    model a more coherent tool surface without changing the sleep feature
    gate or runtime behavior.
    
    ## Validation
    
    - `just test -p codex-core sleep_tool_follows_feature_gate`
    - `just test -p codex-core any_new_input_interrupts_sleep`
    - `just test -p codex-app-server
    sleep_emits_started_and_completed_items`
  • Isolate curated plugin sync Git environment (#29785)
    ## Why
    
    Several users have reported data loss from this bug, including tracked
    files being deleted or replaced and branches appearing to be reset to
    the curated plugins repository. This can happen during startup, before
    the model chooses to edit anything.
    
    Ambient repository variables such as `GIT_DIR` and `GIT_WORK_TREE` can
    override the repository selected by `git -C`, redirecting startup sync's
    `git reset --hard` and `git clean -fdx` into the user's active
    workspace.
    
    ## What
    
    Route every startup-sync Git invocation through a shared command builder
    that removes repository-local environment variables before execution.
    Add regression coverage to keep those variables isolated.
    
    Fixes #27416
  • Read connector declarations from executor plugins (#29852)
    ## Why
    
    Selected capability roots can live on a different executor and operating
    system from app-server. Their connector declarations must therefore be
    read through the executor that owns the package, without converting
    executor URIs into host paths.
    
    This PR adds that authority-bound reader without activating connectors
    or changing thread startup.
    
    ## What changed
    
    - Add a small `codex-connectors-extension` crate for executor-owned
    connector I/O.
    - Read only the app configuration explicitly declared by the resolved
    plugin manifest.
    - Read through the `ExecutorFileSystem` retained by
    `ResolvedExecutorPlugin`; there is no host-filesystem fallback or
    default-file probe.
    - Keep `PathUri` values intact so Windows, Unix, and remote executor
    paths work from any orchestrator OS.
    - Return full `AppDeclaration` values so the caller retains declaration
    names and categories for routing.
    - Preserve the selected plugin ID and exact executor URI in read and
    parse errors.
    
    The contract is intentionally narrow: selected packages are trusted,
    valid packages and packages that provide connectors explicitly declare
    their app configuration.
    
    ## Stack scope
    
    This PR is stacked on #29851. It only provides the executor-backed
    reader. #29856 resolves selected roots at thread start, freezes their
    connector snapshot, and contains the remote-capable end-to-end authority
    test for the complete path.
  • path-uri: normalize parent segments in absolute joins (#29903)
    ## Why
    
    `PathUri::join` normalized `..` for relative paths, but its
    absolute-path branch rebuilt URIs through `url::PathSegmentsMut::push`,
    which skips dot segments. `/tmp/a/../b` therefore resolved to `/tmp/a/b`
    instead of `/tmp/b`.
    
    ## What changed
    
    Normalize absolute native path segments before constructing the file
    URI. Parent traversal now clamps at POSIX roots, Windows drive roots,
    and UNC share roots, including paths with repeated separators.
    
    Add platform-independent coverage for POSIX, drive, UNC, root-clamping,
    and repeated-separator cases.
    
    ## Manual validation
    
    - `just test -p codex-utils-path-uri`
  • Add a connector declaration snapshot (#29851)
    ## Why
    
    Connector declarations currently enter Codex through broad plugin
    capability summaries, then MCP setup, turn tooling, and `app/list` each
    reconstruct the same information. That makes executor-selected
    connectors difficult to add without coupling connector behavior to the
    host plugin loader.
    
    This PR introduces a small connector-owned value that later stack layers
    can populate before thread startup.
    
    ## What changed
    
    - Move the pure app-declaration parser into `codex-connectors`,
    preserving declaration order and category cleanup while leaving
    host-side validation and deduplication unchanged.
    - Add an immutable `ConnectorSnapshot` with ordered connector IDs and
    plugin display-name provenance.
    - Adapt the existing local-plugin capability summaries into that
    snapshot at current consumer boundaries.
    - Use the snapshot for MCP tool provenance, turn connector inventory,
    and `app/list`.
    - Keep the crate API narrow: no test-only snapshot accessors are
    exposed.
    
    The externally visible behavior is unchanged. Connector tools still come
    from the orchestrator-owned `/ps/mcp` server, and local plugin
    enablement remains owned by the existing plugin loader.
    
    ## Stack scope
    
    This is the foundation only. It does not read selected executor packages
    or change thread startup. #29852 adds the executor-backed declaration
    reader, and #29856 composes selected declarations into a thread
    snapshot.
  • [codex] dedupe remote control account header (#29893)
    ## Why
    
    Remote-control HTTP requests applied the authentication headers and then
    appended `ChatGPT-Account-ID` again with
    `reqwest::RequestBuilder::header`. Since reqwest appends, the wire
    request could contain the same header twice. Intermediaries may coalesce
    duplicate values into `uuid,uuid`, which is not a valid account ID.
    
    ## What changed
    
    - Build remote-control request authentication headers in one place.
    - Apply provider headers first, then use `HeaderMap::insert` for the
    explicit account ID. This preserves the current account-ID precedence
    and all other authentication headers while ensuring exactly one account
    header is sent.
    - Preserve duplicate HTTP headers in the test harness and assert exactly
    one account header for enroll, refresh, list, and revoke requests.
    
    ## Validation
    
    Added focused coverage for:
    
    - Adding the explicit account header when the auth provider omits it.
    - Replacing multiple provider-supplied account values, including a
    differently cased header name.
    - Preserving authorization and routing headers while replacing only the
    account header.
    - Rejecting invalid account header values before sending a request.
    - Emitting exactly one account header for enroll, refresh, list, and
    revoke requests.
    - Maintaining header uniqueness across unauthorized recovery, retry, and
    error-response paths.
    - Emitting exactly one installation header for enroll and refresh
    requests.
    
    Checks run:
    
    - `just test -p codex-app-server-transport request_headers`: 3 passed
    - `just test -p codex-app-server-transport remote_control_http_mode`: 6
    passed
    - `just test -p codex-app-server-transport clients_tests`: 6 passed
    - `just test -p codex-app-server-transport`: 123 passed
    - `cargo test -p codex-app-server-transport`: 123 passed
    - `just clippy -p codex-app-server-transport`
    - `just fmt-check`
    - `bazel test
    //codex-rs/app-server-transport:app-server-transport-unit-tests`
  • Pipeline bounded AGENTS.md and Git root probes (#29870)
    ## Why
    
    When Codex uses a remote `ExecutorFileSystem`, every `get_metadata` call
    is an exec-server round trip. Upward discovery currently pays those
    round trips serially in two latency-sensitive places:
    
    - session startup, while locating the configured project root before
    loading `AGENTS.md`; and
    - Git-root discovery, which runs before per-turn Git diff enrichment.
    
    The goal is to remove the serial ancestor dependency without adding a
    new filesystem RPC, JSON-RPC batch method, Git executable dependency, or
    cache.
    
    ## Example
    
    Assume this layout, with `.git` as the configured project-root marker:
    
    ```text
    /workspace/repo/.git
    /workspace/repo/AGENTS.md
    /workspace/repo/crates/core/    <- cwd
    ```
    
    The marker probes have this required precedence:
    
    ```text
    1. /workspace/repo/crates/core/.git
    2. /workspace/repo/crates/.git
    3. /workspace/repo/.git
    4. /workspace/.git
    5. /.git
    ```
    
    Previously, probe 2 was not sent until probe 1 returned, and probe 3 was
    not sent until probe 2 returned. With this change, the client lazily
    keeps up to eight ordinary `fs/getMetadata` requests in flight, but
    consumes their results in the order above. Codex must still learn that
    probes 1 and 2 are absent before accepting probe 3, so the nearest root
    always wins. Once probe 3 succeeds, the client has its answer and stops
    awaiting probes 4 and 5. Requests that were already sent may still
    finish on the worker.
    
    For the marker phase alone, with a 50 ms client-to-worker round trip and
    fast local metadata calls, finding the root at probe 3 changes from
    roughly three serialized round trips (150 ms) to one round trip plus
    worker processing. The later `AGENTS.md` candidate phase remains
    separate and ordered.
    
    Only after `/workspace/repo` is selected does `AGENTS.md` discovery
    check instruction candidates, in root-to-cwd order:
    
    ```text
    /workspace/repo/AGENTS.override.md
    /workspace/repo/AGENTS.md
    /workspace/repo/crates/AGENTS.override.md
    /workspace/repo/crates/AGENTS.md
    /workspace/repo/crates/core/AGENTS.override.md
    /workspace/repo/crates/core/AGENTS.md
    ```
    
    The first configured candidate found in each directory wins. These
    checks remain ordered and no instruction candidate above
    `/workspace/repo` is issued. Git-root discovery uses the same bounded
    lookup with only `.git` as the marker.
    
    ## What changed
    
    - Added a client-side find-up helper that generates `ancestor x marker`
    probes lazily, nearest directory first and configured marker order
    within each directory.
    - Uses an ordered concurrency window of eight scalar metadata requests.
    This bounds executor load while preserving nearest-root and marker
    precedence.
    - Reuses the helper for both configured project-root discovery and
    remote Git-root discovery.
    - Keeps Git ancestor and marker construction in `AbsolutePathBuf`,
    converting only each complete `.git` probe to `PathUri`. This preserves
    native paths that require an opaque URI fallback, such as Windows
    namespace paths.
    - Preserves existing error behavior: `AGENTS.md` discovery propagates
    non-`NotFound` metadata errors, while Git discovery treats a failed
    marker probe as absent and continues upward.
    - Reads each discovered `AGENTS.md` directly instead of statting it a
    second time.
    
    No filesystem trait or exec-server protocol method is added. An empty
    `project_root_markers` list performs no ancestor-marker I/O and checks
    instruction candidates only in `cwd`. This change also deliberately does
    not cache roots across turns.
    
    ## Symlinks
    
    Upward traversal remains **lexical**. The helper does not canonicalize
    `cwd`; it appends marker names to the supplied path and walks that
    path's textual parents. The filesystem performs the actual metadata/read
    operation, and the current local and exec-server implementations follow
    live symlink targets.
    
    For example:
    
    ```text
    /tmp/pkg -> /workspace/repo/packages/pkg
    cwd = /tmp/pkg/src
    actual Git marker = /workspace/repo/.git
    ```
    
    The lexical probes are `/tmp/pkg/src/.git`, `/tmp/pkg/.git`,
    `/tmp/.git`, and `/.git`. They do not jump from `/tmp/pkg` to the
    target's parent `/workspace/repo`, so this spelling of `cwd` does not
    discover `/workspace/repo/.git`. That is the existing behavior and is
    unchanged by this PR.
    
    Conversely, if `/tmp/repo -> /workspace/repo`, then probing
    `/tmp/repo/.git` follows the directory symlink and finds
    `/workspace/repo/.git`; the reported root remains the lexical path
    `/tmp/repo`. A live symlink used directly as `.git`, another configured
    marker, or `AGENTS.md` is also followed. A symlinked `AGENTS.md` is
    loaded when its target is a regular file, while a broken symlink behaves
    as `NotFound`.
  • [plugins] Track plugin install requests by ID (#29684)
    Summary
    - Emit `codex_plugin_install_requested` when a validated plugin install
    request is made, before the user accepts or declines the elicitation.
    - Record the exact model-visible plugin ID, remote plugin ID, required
    connector IDs, stable suggestion ID, and `endpoint_recommendation` vs
    `legacy_discovery` source.
    - Keep `suggest_reason` out of telemetry and leave connector-only
    install requests unchanged.
    
    Rollout
    - Backend/schema dependency:
    https://github.com/openai/openai/pull/1065270
    - Land the backend PR before this producer starts sending the event.
    
    Validation
    - `just test -p codex-analytics` (83 passed)
    - `just test -p codex-core request_plugin_install` (17 passed)
    - `just fix -p codex-analytics`
    - `just fix -p codex-core`
    - `just fmt`
    - `git diff --check`
  • mcp: keep elicitation requests below app wire types (#29724)
    ## Why
    
    Core and tools need to request MCP elicitation without constructing
    app-server wire payloads. The request should remain a neutral protocol
    concept until app-server serializes it for a client.
    
    ## What changed
    
    - Switched core and tools to
    `codex_protocol::approvals::ElicitationRequest`.
    - Derived turn and server context inside core instead of carrying
    app-server request types through lower layers.
    - Kept the app-server payload unchanged through an explicit boundary
    conversion.
    - Removed the remaining production app-server-protocol dependency from
    tools.
    
    ## Stack
    
    This is PR 5 of 6, stacked on [PR
    #29723](https://github.com/openai/codex/pull/29723). Review only the
    delta from `codex/split-connector-metadata-types`. Next: [PR
    #29725](https://github.com/openai/codex/pull/29725).
    
    ## Validation
    
    - `codex-core` MCP coverage passed: 87 tests.
    - Tools elicitation and app-server round-trip coverage passed.
  • [apps] Thread structured icon assets through app list (#29889)
    ## Summary
    
    - Add `iconAssets` and `iconDarkAssets` to the app-list protocol.
    - Preserve structured icons through directory merging and the connector,
    app-
      server, and TUI boundaries.
    - Keep legacy logo URLs unchanged as compatibility fallbacks.
    - Update generated protocol schemas and TypeScript types.
  • [codex] Inject agent graph store into ThreadManager (#29736)
    Pick up the AgentGraphStore migration.
    
    - Inject an explicit optional agent graph store into `ThreadManager` 
    - Move all calls to spawn, close, recursive resume, and
    subtree/archive/delete/feedback traversal through it
    - Keep using  `LocalAgentGraphStore` when SQLite is available
    
    This required some changes to the interface to deal with futures:
    
    - The interface now matches `ThreadStore`'s object-safe pattern by
    returning a boxed `AgentGraphStoreFuture` directly, allowing
    `ThreadManager` to hold `Arc<dyn AgentGraphStore>`
    
    *Slight behavior change!* Unfiltered subtree enumeration now performs a
    single all-status breadth-first traversal, so a closed grandchild
    beneath an open edge is included; the previous Open-then-Closed
    traversals could not cross mixed-status paths and silently omitted it.
  • feat(network-proxy): experimental local credential broker (#28034)
    ## Why
    
    Codex child processes can inherit injectable local credentials directly,
    which lets commands read and exfiltrate the real values. This
    experimental slice keeps supported workflows working while moving those
    credentials behind the managed network proxy.
    
    This PR contains only the proxy-owned broker implementation. The Codex
    config and runtime integration is stacked separately in #29752.
    
    ## What changed
    
    - discover supported credentials during child setup, retain real values
    only in the in-memory proxy broker, and replace them with shaped dummy
    values
    - require a presented dummy to select a stored credential and preserve
    unrelated explicit authorization headers
    - bind GitHub cloud, GitHub Enterprise, and OpenAI credentials to their
    intended hosts
    - inject credentials only into TLS traffic by default; plaintext
    injection requires the explicit dangerous opt-in
    - use TLS ClientHello routing for CONNECT so non-TLS protocols remain
    opaque tunnels
    - expose a pure API that identifies environment keys still holding
    broker-generated dummies without mutating the caller's environment
    
    ## Scope
    
    - supported credentials: `GH_TOKEN`, `GITHUB_TOKEN`,
    `GH_ENTERPRISE_TOKEN`, `GITHUB_ENTERPRISE_TOKEN`, and `OPENAI_API_KEY`
    - GitHub cloud credentials match `github.com`, `api.github.com`, and
    `*.ghe.com`
    - GitHub Enterprise credentials match only the normalized non-cloud
    `GH_HOST`
    - OpenAI API keys match only `api.openai.com`
    - this does not cover SSH agents, kube client certificates, filesystem
    secret discovery, or context-injected secret scrubbing
    
    ## Validation
    
    - `just test -p codex-network-proxy` (191 passed)
    - focused opaque CONNECT, plaintext opt-in, dummy-selection, and
    child-isolation regressions passed
    - scoped Clippy check for `codex-network-proxy` passed
    
    ---------
    
    Co-authored-by: viyatb-oai <viyatb@openai.com>
    Co-authored-by: Codex <noreply@openai.com>
  • feat(app-server): list descendant threads by ancestor (#29591)
    ## Why
    
    `thread/list` can filter direct children with `parentThreadId`, but
    clients cannot request an entire spawned subtree. Discovering every
    descendant requires repeated client-side requests and gives up the
    database's existing filtering and pagination path.
    
    ## What changed
    
    Experimental clients can use `ancestorThreadId` to return strict
    descendants at any depth while `parentThreadId` retains its direct-child
    meaning. The filters are mutually exclusive, the ancestor is excluded,
    and every result preserves its immediate `parentThreadId` so callers can
    reconstruct the tree.
    
    ## How it works
    
    - **Explicit relationship:** Internal list parameters distinguish direct
    children from transitive descendants without changing the meaning of
    `parentThreadId`.
    - **Existing graph:** Persisted parent-child spawn edges remain the
    source of truth, so descendant lookup needs no schema migration or
    ancestry cache.
    - **Indexed traversal:** A recursive SQLite query starts from the
    parent-edge index, walks each generation, and applies thread filters,
    sorting, and cursor pagination in the same database request.
    - **Reconstructable results:** The response stays flat and normally
    ordered while carrying each descendant's immediate parent.
    
    ## Verification
    
    Ran 550 tests across the protocol, state, rollout, and thread-store
    crates, then reran the four focused state, store, and app-server
    descendant-listing tests after the final diff reduction. Scoped Clippy
    and formatting checks passed. Stable and experimental schema generation
    was checked; the stable fixtures remain unchanged while the experimental
    schema includes the new field.
  • Skip credential refresh for WindowsApps launch failures (#29637)
    ## Summary
    
    - keep the child error 1312 credential retry for normal executables
    - return WindowsApps/AppX launch errors directly instead of rotating
    sandbox credentials and retrying the same command
    
    ## Why
    
    Windows AppX activation can return `ERROR_NO_SUCH_LOGON_SESSION` (1312)
    even when the sandbox token is healthy. For executables under
    `WindowsApps`, refreshing the sandbox account password cannot fix that
    activation failure; it only triggers elevated setup before the same
    command fails again.
    
    This is a focused follow-up to #29624.
  • Follow directory symlinks in filesystem walks (#29844)
    Stack 3 of 3. Stacked on #29842.
    
    ## What changes
    
    Adds an opt-in `followDirectorySymlinks` setting to `fs/walk`.
    
    When enabled, the walk follows directory symlinks but continues to
    ignore symlinked files. Canonical directory identities prevent symlink
    cycles, while normal paths keep their existing spelling.
    
    Environment skill discovery enables the setting so symlinked skill
    directories continue to work with the new single-RPC scan.
  • [codex] Trace exec-server JSON-RPC requests (#27466)
    ## Why
    
    Exec-server JSON-RPC calls can cross local and remote transports, but
    trace context stopped at the RPC boundary. That made client and server
    work difficult to correlate when diagnosing latency or failures.
    
    ## What changed
    
    - Propagate the current W3C trace context on outbound JSON-RPC requests.
    - Parent inbound request spans from received trace context.
    - Record the received JSON-RPC method on server spans and keep each span
    open through response enqueue.
    - Add only the OTEL dependencies required by the exec-server crate.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests **(this PR)**
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (153 passed)
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server`
  • Preserve Windows sandbox identity during credential retry (#29624)
    ## Summary
    
    - recognize stale Windows sandbox credentials from both runner logon and
    child startup failures
    - refresh credentials once without changing the original command,
    permissions, file rules, desktop mode, or managed-network identity
    - add a Windows regression test that forces error 1312 and inspects the
    real retry arguments
    
    ## Why
    
    Elevated unified exec starts commands in two steps:
    
    ```text
    Codex -> sandbox command runner -> requested command
    ```
    
    Either process start can fail when Windows invalidates the sandbox logon
    session. The child-side failure was previously returned as text, so the
    parent could not reliably recognize Windows error 1312.
    
    The existing retry also refreshed credentials with `proxy_enforced =
    false`, even when the original request used managed networking. That
    could change the selected Windows sandbox identity from offline to
    online during the retry.
    
    ## How
    
    - carry the failure stage and numeric Windows error code through the
    command-runner IPC protocol
    - preserve native `CreateProcessAsUserW` error codes instead of parsing
    error messages
    - keep every retry-sensitive field in one request and use it for both
    attempts
    - retry exactly once after refreshing credentials, then return the
    second failure
    - share the retry rule with the elevated capture path
    
    The Windows test injects error 1312 on both attempts and verifies:
    
    - two spawn attempts and one credential refresh
    - stale credentials are replaced by refreshed credentials
    - both attempts receive the same command, environment, cwd, permissions,
    roots, deny paths, TTY settings, and private-desktop mode
    - credential refresh receives the original `proxy_enforced` value
    
    ## Tests
    
    - `just test -p codex-windows-sandbox`
    - the new Windows-only regression test is included in the Windows
    nextest CI archive
  • [codex] suppress low usage remaining warnings when credits are available (#28593)
    ## Why
    
    The TUI computed proactive `Heads up, you have less than ...` warnings
    before considering workspace credits. As a result, users could see
    included-limit warnings even when they could continue using Codex with
    workspace credits.
    
    `has_credits` alone is not sufficient to determine whether finite
    credits are usable: a spend-control hard limit can cap the reported
    balance to zero while `has_credits` still reflects the workspace's raw
    balance. Unlimited credits are the opposite case: they are usable even
    though no numeric balance is reported.
    
    ## What changed
    
    - suppress proactive TUI rate-limit usage warnings and the lower-cost
    model nudge when usable workspace credits are available
    - treat credits as usable when `has_credits` is true and either
    `unlimited` is true or the parsed balance is positive
    - continue showing warnings when the usable balance is zero, including
    when a spend-control limit has capped otherwise available workspace
    credits
    - add regression coverage for zero-balance, positive-balance, and
    unlimited workspace-credit snapshots
    
    ## Validation
    
    - `just test -p codex-tui rate_limit_usage_warnings_`
  • [codex] fix Windows ConPTY input handling (#29734)
    ## Why
    
    Windows unified-exec TTY input did not behave like the non-Windows PTY
    path. ConPTY sessions could receive the wrong line ending or mishandle
    backspace, especially when sending input to a foreground program through
    PowerShell or cmd. The local, legacy restricted, and elevated paths also
    handled this normalization separately.
    
    ## What changed
    
    - share one stateful Windows TTY input normalizer across local, legacy
    restricted, and elevated runner paths
    - translate LF and split CRLF into one Windows terminal Enter, encode
    backspace as DEL, and preserve UTF-8 and control bytes such as Ctrl-C
    - add Windows integration coverage for Unicode input, backspace, Enter,
    and PowerShell foreground-child Ctrl-C behavior
    
    ## Validation
    
    - `just test -p codex-utils-pty` (13 tests passed; the Unicode
    integration test retried once)
    - the Unicode integration test passed five consecutive runs with retries
    disabled
    - integration coverage sends `cafeé 漢字` through cmd and PowerShell and
    verifies that Ctrl-C interrupts a running PowerShell foreground child
  • Fix environment skill discovery after merge (#29887)
    ## Why
    
    The merge of #29831 with the new `fs/walk` environment discovery path
    left three `SkillFileDiscovery` initializers without the new namespace
    fields. This makes `codex-core-skills` fail to compile and breaks CI for
    every PR based on current `main`.
    
    ## What changed
    
    - collect plugin roots from the directory entries already returned by
    `fs/walk`
    - keep the selected root as the namespace fallback
    - initialize empty discovery results with empty namespace sets
    
    This preserves the bounded `fs/walk` implementation while restoring the
    namespace caching added by #29831.
  • ci: fail jobs that dirty the worktree (#29720)
    ## Why
    
    CI jobs should not silently leave tracked changes or untracked files in
    the repository worktree.
    
    ## What
    
    - Add a shared final worktree-cleanliness action to 19 checkout-bearing
    PR and main CI jobs.
    - Ignore the intentional SDK scratch directory and nested V8 checkout.
    - Pin Bazelisk in shared CI setup so `.bazelversion` remains
    authoritative, avoiding `MODULE.bazel.lock` deltas on Windows runners.
    - Leave `rust-ci-full` and release-only workflows unchanged.
    - Update `AGENTS.md` to discourage review bots from asking for
    `MODULE.bazel.lock` changes.
  • Cache plugin namespace during executor skill discovery (#29831)
    ## Why
    
    Executor skill discovery runs before the remote skills catalog is
    available. For a remote environment, each `ExecutorFileSystem` operation
    becomes an exec-server RPC.
    
    Previously, every discovered `SKILL.md` independently resolved its
    plugin namespace by walking its ancestors and probing both supported
    manifest locations. In the common `plugin/skills/<skill>/SKILL.md`
    layout, that repeats 8 RPCs per skill even though every skill under the
    plugin root uses the same namespace. These lookups happen while skills
    are parsed, so their cost grows linearly with the skill count and adds
    directly to first-turn latency.
    
    A selected capability root can also contain standalone skills, multiple
    sibling plugins, nested plugins, or symlinked directories. The
    optimization therefore needs to retain the nearest-ancestor namespace
    for each skill rather than assuming the selected root represents exactly
    one plugin.
    
    ## What changed
    
    - record plugin-root candidates from directory entries already returned
    during skill discovery
    - prune candidates that are not ancestors of any discovered `SKILL.md`
    before reading manifests
    - resolve each relevant plugin root once, with one fallback lookup per
    canonical traversal root for symlinked directories
    - select the nearest cached plugin namespace for each discovered skill
    - avoid namespace lookup entirely when the root contains no skills
    
    No additional directory traversal is required. Namespace work now scales
    with the number of plugin roots that contain discovered skills, rather
    than the total number of skills or unrelated sibling plugins. Standalone
    and nested-plugin names keep their previous behavior.
    
    ## Benchmarks
    
    I used a temporary counting `ExecutorFileSystem` around the real local
    filesystem. Each filesystem operation was counted as one remote RPC and
    given 1 ms of injected latency. Each variant ran three times; times
    below are medians.
    
    ### One plugin with 100 skills
    
    | Operation | Before | After | Delta |
    | --- | ---: | ---: | ---: |
    | `get_metadata` | 1,002 | 303 | -699 |
    | `read_file` | 200 | 101 | -99 |
    | `read_directory` | 102 | 102 | 0 |
    | **Total filesystem RPCs** | **1,304** | **506** | **-798 (-61.2%)** |
    | **Median load time** | **2.890 s** | **0.997 s** | **2.90× faster** |
    
    The namespace-specific work drops from 800 RPCs to 2 in this layout.
    
    ### Multiple plugins under one selected root
    
    These runs compare the correct pre-optimization implementation with the
    final nearest-plugin-root cache. The total plugin skill count stays at
    100 while the number of plugin roots changes.
    
    | Layout | Before RPCs | After RPCs | Reduction | Before | After |
    Speedup |
    | --- | ---: | ---: | ---: | ---: | ---: | ---: |
    | 2 plugins × 50 skills | 1,312 | 530 | 59.6% | 1,819 ms | 711 ms |
    2.56× |
    | 10 plugins × 10 skills | 1,344 | 578 | 57.0% | 1,850 ms | 778 ms |
    2.38× |
    | 50 plugins × 2 skills | 1,504 | 818 | 45.6% | 2,094 ms | 1,086 ms |
    1.93× |
    | 10 plugins × 10 skills + 10 standalone skills | 1,596 | 630 | 60.5% |
    2,209 ms | 860 ms | 2.57× |
    
    The remaining cost grows with the number of relevant plugin manifests.
    Each relevant manifest is read once instead of once per skill, while
    sibling plugins with no discovered skills are not read. Absolute latency
    savings depend on the executor's real RPC latency.
    
    ## Tests
    
    - `just test -p codex-core-skills` (109 passed across the library and
    integration-test binaries)
    - one integration test covers standalone, outer-plugin, nested-plugin,
    and unused sibling-plugin layouts, and asserts the exact set of
    manifests read
  • [codex] show external import result counts (#29567)
    ## What changed
    
    - Show per-type import counts in the `/import` review UI and started
    message.
    - Render completion results as a multi-line summary with total
    imported/failed counts and one row per import type.
    - Add snapshot coverage for the updated review and completion output.
    
    <img width="537" height="322" alt="Screenshot 2026-06-23 at 9 41 20 PM"
    src="https://github.com/user-attachments/assets/166542eb-2097-4b2b-8130-8f6fd8c680ce"
    />
    
    
    ## Why
    
    The TUI previously only reported that Claude Code import started or
    finished. Users could not see how many items of each type were selected
    or how many actually imported versus failed.
  • Use fs/walk for environment skill discovery (#29842)
    Stack 2 of 3. Base: #29841. Follow-up: #29844.
    
    ## What changes
    
    Environment skill discovery currently walks remote filesystems through
    repeated `readDirectory` and `getMetadata` calls. This switches that
    scan to the bounded `fs/walk` operation from the base PR.
    
    ```text
    Before: readDirectory(root) -> getMetadata(...) -> readDirectory(child) -> ...
    After:  fs/walk(root, limits) -> filter the result for SKILL.md
    ```
    
    This makes environment skill discovery one RPC while preserving
    traversal warnings and the existing depth and directory limits. The scan
    also has an explicit entry limit. The follow-up restores
    directory-symlink traversal.
  • Add a bounded filesystem walk RPC (#29841)
    Stack 1 of 3. Follow-ups: #29842 and #29844.
    
    ## What changes
    
    Adds a general bounded `fs/walk` operation to the exec server.
    
    The operation returns file and directory entries plus recoverable
    per-path errors. It skips symlinks, preserves the existing filesystem
    sandbox routing, and enforces depth, directory, entry, and response-size
    limits.
    
    This PR only defines and wires the filesystem operation. It does not
    change any callers yet.
  • Persist agent messages as response items (#29829)
    ## Why
    
    Inter-agent messages are recorded in live history as
    `ResponseItem::AgentMessage`, but rollouts stored
    `InterAgentCommunication` and rebuilt the response item during resume.
    This made the rollout differ from the actual Responses history.
    
    ## What changed
    
    - store the prepared `agent_message` response item directly
    - keep `trigger_turn` in a small local metadata record for fork
    truncation
    - keep reading older `inter_agent_communication` rollout items
  • [codex] Emit implicit skill usage for support reads (#29731)
    ## Summary
    - Index all enabled skills for command-based usage detection, regardless
    of `allow_implicit_invocation`.
    - Preserve `allow_implicit_invocation` for the model-visible implicit
    routing list.
    - Add regression coverage for a support/preflight skill whose `SKILL.md`
    is read and whose script is run while implicit invocation is disabled.
    
    ## Root cause
    `allow_implicit_invocation` was used for both model routing and
    command-based usage-event detection. That meant support skills like
    `data-analytics:user-context` could be read or run by other skills, but
    those accesses could not emit implicit usage events.
    
    ## Validation
    - `just fmt`
    - `just test -p codex-core-skills
    service::tests::skills_for_config_indexes_usage_detection_for_non_implicit_skills`
    - `just test -p codex-core-skills` now has the new test passing, but 3
    unrelated local tests fail because
    `/Users/alexsong/.agents/skills/test/SKILL.md` is invalid/missing YAML
    frontmatter.
  • Keep executor plugin MCP paths URI-native (#29628)
    ## Why
    
    Executor-owned plugin roots are `PathUri`, but MCP config normalization
    still converts them into a native `Path` using the app-server host's
    rules. Relative `cwd` values can therefore resolve against the wrong
    filesystem when host and executor path conventions differ.
    
    This PR keeps executor MCP paths URI-native until the selected
    environment launches the server, while retaining the existing host
    parser behavior.
    
    ## What changed
    
    - Keep one shared MCP normalization path with narrow host-`Path` and
    executor-`PathUri` entrypoints.
    - Preserve native host resolution for locally installed plugin MCP
    configs.
    - For executor configs, default `cwd` to the plugin root and resolve
    relative working directories with the root URI's path convention.
    - Accept explicit executor `file:` URIs only when they remain within the
    selected plugin root.
    - Preserve the selected environment id and existing remote
    environment-variable ownership rules.
    - Route the executor plugin provider through the URI-native entrypoint
    without converting the root on the host.
    - Ensure `codex doctor` does not probe executor-owned stdio commands or
    foreign working directories on the host.
    - Cover foreign Windows roots, relative and absolute executor working
    directories, traversal rejection, runtime resolution, and doctor
    behavior.
    
    ```text
    plugin root:    file:///C:/plugins/demo
    configured cwd: scripts
                      |
                      v
    resolved cwd:  file:///C:/plugins/demo/scripts
                      |
                      v
    launch through the selected executor
    ```
    
    No new provider or filesystem abstraction is introduced.
    
    ## Stack
    
    1. #29614 — add lexical `PathUri` containment.
    2. #29620 — share URI-native manifest path resolution.
    3. #28918 — keep selected plugin roots and resources URI-native.
    4. #29626 — load executor skills without host path conversion.
    5. **This PR** — resolve executor MCP working directories without host
    path conversion.
  • [codex] Remove auto-compaction opt-out (#29815)
    ## Summary
    
    - remove the default-on `auto_compaction` feature flag and generated
    config schema entries
    - restore unconditional pre-turn, model-switch/hash, and mid-turn
    automatic compaction
    - expose `new_context` whenever token-budget tooling is enabled
    - remove the disabled-auto-compaction integration coverage introduced by
    #28260
    
    ## Motivation
    
    Roll back the internal auto-compaction escape hatch added in #28260.
    Automatic compaction should no longer be suppressible with `--disable
    auto_compaction`; existing manual `/compact` behavior remains unchanged.
    
    ## Testing
    
    - `just write-config-schema`
    - `just test -p codex-features` — 53 passed
    - `just test -p codex-core 'suite::compact::'` — 36 passed
    - `just test -p codex-core
    suite::token_budget::new_context_tool_starts_new_window_before_follow_up`
    — 1 passed
    - `just fix -p codex-core -p codex-features`
    - `just fmt`
    - `just test -p codex-core` — 2,778 passed, 59 failed, 16 skipped;
    failures were outside the changed compaction paths and were dominated by
    missing first-party test binaries and shell-snapshot timeouts
  • docs: document remote executor integration testing (#29790)
    ## Why
    
    Agents need a clear default for writing remote-compatible integration
    tests and reproducible commands for each supported runner.
    
    ## What
    
    Expand the `remote-tests` skill with fixture guidance, skip selection,
    and Docker and Wine commands. Add always-visible `AGENTS.md` guidance
    that points new core and app-server tests toward automatic environment
    fixtures.
    
    Stacked on #29789.
  • test: use automatic environments in app-server integration tests (#29789)
    ## Why
    
    Topology-neutral app-server integration tests should exercise automatic
    environment selection so the same setup covers local and remote
    executors.
    
    ## What
    
    Migrate eligible tests to `TestAppServer::new_with_auto_env()` and
    `send_thread_start_request_with_auto_env()`. Leave explicit-topology
    tests unchanged, and skip the request-permissions case on Windows with a
    TODO for cross-platform tool routing.
    
    ## Validation
    
    - `just test -p codex-app-server`
    - `bazel test //codex-rs/app-server:app-server-all-wine-exec-test
    --test_output=errors`
    
    Stacked on #29788.
  • test: run app-server integration tests under Wine (#29788)
    ## Why
    
    Made a mistake when carving #29746 out of my local changes and the test
    was missing from the build graph. Oops!
    
    ## What
    
    Enable the app-server Wine exec test target. Remove the `manual` tag
    from generated Wine-exec test variants so wildcard Bazel test
    invocations select them. Refactor the smoke test to ensure it passes
    with current Windows support.
  • connectors: own app metadata types (#29723)
    ## Why
    
    Connector metadata is consumed by connector discovery, ChatGPT
    integration, core, and TUI code. Treating app-server's wire DTO as the
    shared domain model reverses the intended dependency direction.
    
    ## What changed
    
    - Added connector-owned app branding, review, screenshot, metadata, and
    info types.
    - Added explicit conversions in app-server and TUI while preserving
    app-server's wire payloads.
    - Removed production app-server-protocol dependencies from connectors
    and ChatGPT connector code.
    
    ## Stack
    
    This is PR 4 of 6, stacked on [PR
    #29722](https://github.com/openai/codex/pull/29722). Review only the
    delta from `codex/split-config-layer-types`. Next: [PR
    #29724](https://github.com/openai/codex/pull/29724).
    
    ## Validation
    
    - Connector and tools coverage passed.
    - App-server app-list coverage passed: 13 tests.
  • config: own layer provenance types (#29722)
    ## Why
    
    Config layer provenance describes how effective configuration was
    assembled, so it belongs with the config loader rather than in
    app-server's serialized API types.
    
    ## What changed
    
    - Moved `ConfigLayerSource`, `ConfigLayerMetadata`, and `ConfigLayer`
    ownership into `codex-config`.
    - Kept app-server's wire payloads unchanged and added explicit
    conversions at the app boundary.
    - Removed lower-level app-server-protocol dependencies from config
    consumers.
    
    ## Stack
    
    This is PR 3 of 6, stacked on [PR
    #29721](https://github.com/openai/codex/pull/29721). Review only the
    delta from `codex/split-auth-domain-types`. Next: [PR
    #29723](https://github.com/openai/codex/pull/29723).
    
    ## Validation
    
    - `codex-config` coverage passed.
    - App-server config-manager and config RPC coverage passed.