198 Commits

  • [codex] disable Nagle on Rendezvous WebSockets (#30269)
    ## Summary
    
    Disable Nagle unconditionally for both exec-server Rendezvous WebSocket
    connections.
    
    - pass `disable_nagle=true` at the executor and harness connection call
    sites
    - keep the existing signed URL, protocol, and connection flow unchanged
    - add no feature flag, rollout schema, path variant, or
    experiment-specific telemetry
    
    The companion internal PR enables `TCP_NODELAY` on accepted Rendezvous
    sockets: https://github.com/openai/openai/pull/1082463
    
    ## Why
    
    Rendezvous carries small, latency-sensitive relay and JSON-RPC frames.
    Three staging runs of 30 steady-state `process/read` calls per
    configuration measured p50 improving from 139.1 ms to 81.5 ms and p95
    from 162.0 ms to 95.8 ms with Nagle disabled.
    
    The expected packet overhead is small at the current connection scale.
    We will use existing latency, error, packet, and CPU monitoring and
    revert normally if production regresses.
    
    ## Rollout and rollback
    
    The client and accepted-socket changes can deploy independently. New
    connections receive the setting as each side deploys. Rollback is a
    normal code revert; there is no persisted assignment or gate state to
    unwind.
    
    ## Validation
    
    - `just test -p codex-exec-server --lib`: 164 passed
    - `just fix -p codex-exec-server`: passed
    - `just fmt`: passed
    - independent final review found no actionable issue
  • [app-server] expose environment info RPC (#30291)
    ## Why
    
    App-server clients that configure named execution environments need to
    discover an environment's shell and working directory before selecting
    it for a thread or turn. Because the environment can run on a different
    operating system than app-server, its working directory is represented
    as a canonical `file:` URI rather than a host-local path string. The
    probe also needs a bounded response time: an exec-server that completes
    initialization but never answers `environment/info` must not hold the
    environment serialization queue indefinitely.
    
    ## What changed
    
    - Add an experimental `environment/info` app-server RPC for named
    environments.
    - Route the probe through the managed environment connection and return
    target-native shell metadata plus the default working directory as a
    `PathUri`.
    - Return connection and protocol failures as JSON-RPC errors.
    - Bound the exec-server probe response to 30 seconds and remove
    timed-out calls from the pending-request table so later environment
    mutations can proceed.
    - Cover successful responses, omitted working directories, unknown
    environments, connection failures, and pending-call cleanup.
    
    ## Protocol examples
    
    Request:
    
    ```json
    {
      "id": 42,
      "method": "environment/info",
      "params": {
        "environmentId": "remote-a"
      }
    }
    ```
    
    Successful response:
    
    ```json
    {
      "id": 42,
      "result": {
        "shell": {
          "name": "zsh",
          "path": "/bin/zsh"
        },
        "cwd": "file:///workspace"
      }
    }
    ```
    
    If the exec-server initializes but does not answer the probe within 30
    seconds:
    
    ```json
    {
      "id": 42,
      "error": {
        "code": -32603,
        "message": "failed to get info for environment `remote-a`: exec-server protocol error: timed out waiting for exec-server `environment/info` response after 30s"
      }
    }
    ```
    
    ## Testing
    
    - App-server integration coverage for successful info (including omitted
    `cwd`), unknown environments, and connection failures.
    - Exec-server RPC coverage verifying a timed-out call is removed from
    the pending-request table.
    
    ---------
    
    Co-authored-by: Michael Bolin <mbolin@openai.com>
  • [codex] consume pushed exec-server process events (#30273)
    ## Summary
    
    - complete unified-exec processes from the ordered event stream instead
    of issuing a final zero-wait `process/read`
    - add optional executor sandbox-denial state to `process/exited`
    - retain `process/read` as a retained-output and compatibility fallback
    for receiver lag, sequence gaps, and legacy servers
    - recover sandbox-denial state across transport reconnection
    - cover the real `TestCodex` remote-exec path without adding a public
    test-only event constructor
    
    ## Why
    
    A successful one-shot tool call currently receives its output and
    terminal notifications, then pays another wide-area `process/read` round
    trip before returning. Staging traces showed that remote response wait
    accounted for more than 99.8% of RPC time; local serialization,
    queueing, and deserialization were below 0.6 ms.
    
    ## Measured impact
    
    A direct staging A/B used the same build and route and changed only
    completion mode. Each arm ran three times with 30 one-shot
    `/usr/bin/true` calls per run. The table reports the median of the three
    per-run percentiles.
    
    | Metric | Final `process/read` | Pushed events | Change |
    | --- | ---: | ---: | ---: |
    | End-to-end completion p50 | 159.5 ms | 118.7 ms | -40.8 ms (-25.6%) |
    | End-to-end completion p95 | 182.4 ms | 131.7 ms | -50.6 ms (-27.8%) |
    | Completion-wait p50 | 80.1 ms | 41.5 ms | -38.5 ms (-48.1%) |
    | Final `process/read` RPC p50 | 79.9 ms | eliminated | -79.9 ms |
    
    TCP_NODELAY was enabled in both A/B arms, so its effect cancels out. The
    successful, complete, in-order event path issued zero final
    `process/read` calls.
    
    ## Compatibility and recovery
    
    - new servers send `sandboxDenied` on `process/exited`
    - legacy servers omit it, which triggers one compatibility
    `process/read`
    - broadcast lag or a sequence gap triggers a retained-output read
    - recovery remains bounded by the server's existing 1 MiB
    retained-output window
    - complete, in-order event streams issue no completion read
    - sandbox denial is attached to the exit event before consumers can
    observe process completion
    - server-first and client-first rollouts remain wire-compatible;
    server-first realizes the latency win immediately
    
    ## Integration coverage
    
    The `TestCodex` suite exercises four distinct remote-exec contracts:
    
    - complete pushed output/exit/close with zero reads
    - direct pushed sandbox denial with zero reads
    - legacy missing denial metadata with exactly one compatibility read
    - count-bounded replay eviction recovered from retained output without
    duplication
    
    ## Validation
    
    - `just test -p codex-core
    exec_command_consumes_pushed_remote_process_events`: 4 passed
    - `just test -p codex-core unified_exec::process_tests::`: 4 passed
    - `just test -p codex-exec-server`: 294 passed, 2 skipped
    - `just test -p codex-exec-server-protocol`: 5 passed
    - `just test -p codex-rmcp-client`: 89 passed, 2 skipped
    - focused Bazel `//codex-rs/core:core-all-test`: passed across 16 shards
    - scoped `just fix` passed for core and exec-server
    - `just fmt` passed
    
    The complete workspace suite was not rerun; focused Cargo and Bazel
    coverage passed for the changed behavior.
  • Persist Cloudflare affinity cookies for MCP HTTP (#29516)
    [Codex Thread
    019ef1f9-36e2-7e91-9337-504f097b9dc1](https://codex-thread-link.openai.chatgpt-team.site/thread/019ef1f9-36e2-7e91-9337-504f097b9dc1)
    
    ## Why
    
    Hosted plugin-service Streamable HTTP MCP traffic uses
    `https://chatgpt.com/backend-api/ps/mcp` and depends on Cloudflare's
    `__cflb` cookie for load-balancer affinity. The local and exec-server
    `http/request` path built a fresh reqwest client for each request
    without installing Codex's existing shared ChatGPT Cloudflare cookie
    store, so affinity could be lost between calls.
    
    This is an affinity-hardening change motivated by an incident
    investigation. It does not establish the broader connector-cache
    incident RCA or claim to fix that incident in full.
    
    ## What changed
    
    - Install the existing process-local, strictly allowlisted ChatGPT
    Cloudflare cookie store on the reqwest client used by
    `ReqwestHttpClient`.
    - Fresh clients now share allowed Cloudflare infrastructure cookies
    within the process that originates the local or exec-server network
    request.
    - Keep the existing HTTPS ChatGPT-host and Cloudflare-cookie-name
    restrictions. This does not introduce a general cookie jar or send
    ChatGPT Cloudflare cookies to unrelated hosts.
    
    ## Test coverage
    
    - `codex-client` unit coverage verifies that the existing strict store
    accepts and returns `__cflb` for HTTPS ChatGPT URLs.
    - The exec-server HTTPS integration test sends four independent
    `http/request` calls through a local TLS-intercepting proxy and verifies
    that:
    - `Set-Cookie: __cflb=west` is sent on the next plugin-service request;
      - a later `Set-Cookie: __cflb=central` replaces the stored value;
      - non-Cloudflare session cookies are discarded;
      - no stored ChatGPT Cloudflare cookie is sent to a non-ChatGPT host.
    - `just test -p codex-client` — 38 passed.
    - `just test -p codex-exec-server --test chatgpt_cloudflare_affinity` —
    1 passed.
    - `just bazel-lock-check` — passed.
    
    ## Non-goals
    
    - No persistence of ChatGPT auth, account, session, residency, or
    arbitrary cookies.
    - No cookie persistence for third-party MCP servers.
    - No special composition of caller-provided `Cookie` headers.
    - No plugin-service, connector-cache, Habitat/habicache, routing,
    redirect, or API-contract changes.
    - No broader incident RCA conclusions.
  • Test selected capabilities across availability and resume (#30157)
    ## Why
    
    This stack crosses World State, executor skills, selected plugin
    metadata, MCP processes, connectors, dynamic environments, and resume.
    This PR adds two end-to-end scenarios that validate those pieces
    together.
    
    Both tests enable `deferred_executor`, so they exercise the real
    delayed-environment path.
    
    ## Scenario 1: availability across turns and resume
    
    ```text
    1. Start a thread with one selected plugin root bound to E1.
    2. E1 is unavailable.
       - executor skill is absent
       - selected MCP is absent
       - connector has no selected-plugin attribution
    3. Start E1 and register the same stable environment ID.
    4. Start a new turn.
       - the executor skill appears through World State
       - its body beats a colliding host skill
       - the selected MCP tool is advertised and executes inside E1
       - the connector is attributed to the selected plugin
    5. Start another turn without changing E1.
       - the MCP PID stays the same, proving runtime reuse
    6. Restart app-server and resume the thread.
       - durable selected-root intent is restored
       - skills, MCP, and connector attribution are restored
       - a new MCP PID proves ephemeral process state was rebuilt
    ```
    
    ## Scenario 2: availability changes inside one turn
    
    ```text
    1. Start a turn while E1 is unavailable.
    2. The first model sample sees no executor skill, MCP, or selected connector.
    3. The turn pauses on request_user_input.
    4. Start E1 and register it while that same turn is still active.
    5. Continue the turn.
    6. The very next model sample sees:
       - the executor skill catalog
       - the selected MCP tool
       - selected-plugin connector attribution
    7. The model calls the MCP, and its output proves execution happened inside E1.
    ```
    
    This second scenario specifically protects the aeon-style behavior:
    capability state is captured again for every sampling step, not only at
    the next user turn.
    
    ## Scope
    
    These are integration tests only. They do not add a combinatorial matrix
    for unsupported plugin-file mutation, environment generations, transport
    disconnects, or delayed `required = true` executor MCPs.
  • [codex] Propagate traces through exec-server HTTP (#30117)
    Fixes distributed trace continuity across exec-server JSON-RPC HTTP
    egress by adding an executor client span and injecting its W3C context
    through a reusable `codex-otel` helper.
    
    This preserves the caller trace across core/tool → executor →
    provider/MCP instead of dropping parentage at raw reqwest.
    
    Note that this doesn't include the websocket path, which is needed to
    really get the full story but at least we cover the basic http path with
    this change.
  • [codex] Observe remote exec-server lifecycle (#27470)
    ## Summary
    
    - Record bounded duration and outcome metrics for remote environment
    registration and Noise rendezvous connection attempts.
    - Count reconnects by bounded reason: disconnect, connection failure, or
    rejected registration.
    - Trace registration at the owning client boundary without exporting raw
    environment or registration identifiers.
    - Replace the stale pre-Noise WebSocket observability design with the
    current remote transport model.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    **(this PR)**
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (149 passed)
    - `just test -p codex-cli --test exec_server` (4 passed)
    - `just argument-comment-lint`
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server -p codex-cli`
    - `just fmt`
  • [codex] Retry temporarily offline exec-server recovery (#30098)
    ## Summary
    
    - retry ERS `409 environment_offline` responses inside the existing
    exec-server recovery loop
    - keep all other registry conflicts terminal
    - add focused coverage for both cases
    
    ## Root cause
    
    When an exec server disconnects and reconnects, the client already
    starts recovery and calls ERS `/connect`. During the transient executor
    presence gap, ERS can return `409 environment_offline`. The retry
    classifier treated every 409 as terminal, so the first response aborted
    the existing 25-second recovery window before the executor came back
    online. That then caused active processes to be marked lost.
    
    This change classifies only the structured `environment_offline`
    conflict as retryable. Recovery continues with the existing bounded
    deadline, exponential backoff, and jitter.
    
    ## Validation
    
    - `just test -p codex-exec-server client::recovery::tests` — 4 passed
    - `just fix -p codex-exec-server` — passed
    - `just fmt` — passed
    - Full `just test -p codex-exec-server` reached unrelated macOS
    filesystem-sandbox integration failures because nested
    `/usr/bin/sandbox-exec` is denied in this environment (`sandbox_apply:
    Operation not permitted`).
  • [codex] Record exec-server lifecycle metrics (#27467)
    ## Summary
    
    - Record bounded connection, request, and process lifecycle metrics.
    - Report active gauges from callbacks on every collection, including
    delta exports.
    - Serialize active-count updates so concurrent starts and finishes
    cannot publish stale values.
    - Serialize process exit, explicit termination, and shutdown through the
    process registry so exactly one completion result wins.
    - Keep the implementation small with single-owner RAII guards and one
    real OTLP/HTTP integration test using the existing `wiremock`
    dependency.
    
    ## Root cause
    
    Process exit and session shutdown previously used cloned completion
    state. That avoided duplicate emission, but it duplicated lifecycle
    ownership and made the ordering harder to reason about. The process
    registry mutex already defines the lifecycle ordering, so the final
    implementation stores the metric guard and termination flag directly on
    the process entry. Whichever path claims the entry first owns the
    completion result.
    
    Production metric export uses delta temporality. Event-only synchronous
    gauge recordings disappear after the next collection when no count
    changes, so active counts now use observable callbacks that report
    current state on every collection.
    
    The cleanup also removes the constant `result="accepted"` connection
    tag, redundant route and response assertions, a custom HTTP collector,
    and fallback initialization machinery that did not add behavior.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics **(this PR)**
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (158 passed)
    - `just test -p codex-cli --test exec_server` (3 passed)
    - `just test -p codex-otel
    observable_gauge_is_collected_on_every_delta_snapshot` (1 passed)
    - `CARGO_BUILD_JOBS=1 just fix -p codex-otel -p codex-exec-server`
    - `just fmt`
    - `git diff --check`
  • Persist selected capability roots and resolve availability per model step (#29856)
    ## Why
    
    `selectedCapabilityRoots` is durable thread intent: “use this capability
    root from environment `worker`.”
    
    The important product assumption is:
    
    > One environment ID always names the same logical executor and stable
    contents.
    
    `worker` does not silently change from executor A to an unrelated
    executor B. The process-local connection handle for `worker` can still
    be replaced while Codex is running, though, for example when
    `environment/add` registers a fresh handle for the same logical
    environment.
    
    The thread should persist only the stable selection. Each model step
    should pair that selection with the exact ready handle captured for that
    step.
    
    ## The boundary
    
    ```text
    persisted thread intent
      plugin@1 -> environment "worker"
                    |
                    | capture the current step
                    v
    model-step view
      unavailable, or
      plugin@1 + worker's exact captured ready handle
    ```
    
    The environment ID is the stable identity and cache key. The
    `Arc<Environment>` is only a process-local handle retained so consumers
    of one model step use the same captured environment. It is never
    persisted and it does not imply different environment contents.
    
    ## What changes
    
    ### Persist the stable selection
    
    Selected roots are written into `SessionMeta` and restored with the
    thread. Forked subagents inherit the same selections, including
    bounded-history forks.
    
    Only stable data is persisted: root ID, environment ID, and root path.
    
    ### Capture readiness together with the exact handle
    
    The environment snapshot records:
    
    ```rust
    environment_id -> Some(Arc<Environment>) // ready in this step
    environment_id -> None                   // still starting in this step
    ```
    
    This prevents readiness and execution from coming from different
    registry snapshots.
    
    For example:
    
    ```text
    step snapshot: worker -> handle A, ready
    environment/add: worker -> fresh handle B for the same logical environment
    current step: plugin@1 still uses captured handle A
    ```
    
    Without carrying handle A in the snapshot, the resolver could combine “A
    was ready” with handle B and treat B as ready before it had finished
    starting.
    
    This does not change cache invalidation. Stable capability metadata
    remains identified by environment ID and capability root. Replacing a
    process-local handle under the same stable environment ID does not
    invalidate or rediscover that metadata.
    
    ### Resolve availability per model step
    
    - A ready captured environment produces resolved roots using its
    captured handle.
    - A starting, missing, or failed environment is omitted from that step.
    - A selected lazy environment that is outside the turn's captured
    environment set is asked to start, and a later step can observe it as
    ready.
    - No capability files are scanned here.
    
    Transient transport disconnects remain the remote client's reconnect
    concern. This PR models initial attachment/readiness; it does not add
    live socket-connectivity state.
    
    ## Example
    
    ```text
    thread selection: plugin@1 -> environment "worker"
    
    step 1: worker is starting -> plugin@1 unavailable
    step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
    step 3: fresh local handle -> current step remains pinned; a later step captures its own view
    ```
    
    Temporary unavailability does not discard the durable selection. Later
    PRs can retain stable metadata caches while projecting only currently
    available capabilities into model-visible World State.
    
    ## Compatibility
    
    The app-server request shape does not change. Older rollouts without
    `selected_capability_roots` deserialize to an empty list.
    
    ## Stack
    
    1. **This PR:** persist stable selected roots and resolve them through
    an exact model-step handle.
    2. #29960: cache stable skill metadata and project available skills into
    World State.
    3. #29946: cache stable plugin declarations and manage the separate live
    MCP runtime.
  • Support OAuth for HTTP MCP servers from selected executor plugins (#28529)
    ## Why
    
    #28522 routes selected-plugin HTTP MCP traffic through the owning
    executor, but OAuth bootstrap and refresh still used host-local clients.
    Executor-only servers therefore cannot complete discovery or login
    through the same network boundary as the MCP connection.
    
    ## What changed
    
    - adapt `codex_exec_server::HttpClient` to RMCP 1.8's `OAuthHttpClient`
    contract
    - let RMCP own discovery, dynamic registration, PKCE, token exchange,
    and refresh
    - route auth status, persisted-token startup, and app-server login
    through the server runtime while preserving the existing local discovery
    path
    - add optional `threadId` to `mcpServer/oauth/login` and echo it in the
    completion notification
    - implement RMCP's redirect policy and 1 MiB OAuth response limit over
    executor HTTP
    - cover selected-thread OAuth discovery and login through an
    executor-only route
    
    Depends on #28522.
  • Follow directory symlinks in filesystem walks (#29844)
    Stack 3 of 3. Stacked on #29842.
    
    ## What changes
    
    Adds an opt-in `followDirectorySymlinks` setting to `fs/walk`.
    
    When enabled, the walk follows directory symlinks but continues to
    ignore symlinked files. Canonical directory identities prevent symlink
    cycles, while normal paths keep their existing spelling.
    
    Environment skill discovery enables the setting so symlinked skill
    directories continue to work with the new single-RPC scan.
  • [codex] Trace exec-server JSON-RPC requests (#27466)
    ## Why
    
    Exec-server JSON-RPC calls can cross local and remote transports, but
    trace context stopped at the RPC boundary. That made client and server
    work difficult to correlate when diagnosing latency or failures.
    
    ## What changed
    
    - Propagate the current W3C trace context on outbound JSON-RPC requests.
    - Parent inbound request spans from received trace context.
    - Record the received JSON-RPC method on server spans and keep each span
    open through response enqueue.
    - Add only the OTEL dependencies required by the exec-server crate.
    
    ## Stack
    
    Review and land this stack in order:
    
    1. #27466 — trace exec-server JSON-RPC requests **(this PR)**
    2. #27467 — record bounded connection, request, and process lifecycle
    metrics
    3. #27470 — observe remote registration and Noise rendezvous lifecycle
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (153 passed)
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server`
  • Add a bounded filesystem walk RPC (#29841)
    Stack 1 of 3. Follow-ups: #29842 and #29844.
    
    ## What changes
    
    Adds a general bounded `fs/walk` operation to the exec server.
    
    The operation returns file and directory entries plus recoverable
    per-path errors. It skips symlinks, preserves the existing filesystem
    sandbox routing, and enforces depth, directory, entry, and response-size
    limits.
    
    This PR only defines and wires the filesystem operation. It does not
    change any callers yet.
  • test: add app-server auto environment helper (#29746)
    ## Why
    
    Start moving towards app-server tests defaulting to running against
    remote & foreign OS executors. To do so we need a point of indirection
    similar to core integration tests' `build_with_auto_env`, but with the
    flexibility of letting tests control environment registration if they
    need to.
    
    ## What
    
    This adds:
    
    - `TestAppServer::new_with_auto_env()` for constructing an app server
    with a default environment defined by the test runner (e.g. bazel)
    - `TestAppServer::auto_env_params()` for tests to easily acquire turn
    env params tailored to the automatic environment
    - `TestAppServer::send_thread_start_request_with_auto_env()` to make it
    easy for tests to start a thread using the automatic environment
    
    The above methods all fail if the test calling them has set up an
    environment where the automatic environment configuration conflicts with
    test-created state.
    
    ## Validation
    
    Adds a couple of basic smoke tests to the app-server test suite.
    Follow-ups will migrate more tests to use it.
  • protocol: separate app and exec RPC ownership (#29714)
    ## Why
    
    The app-server and exec-server expose separate JSON-RPC APIs, but
    exec-server currently sources its serialized protocol and envelope types
    through app-server-oriented code. Giving each API an explicit owner
    makes the crate boundary legible without introducing shared generic
    envelopes.
    
    ## What changed
    
    - Added `codex-exec-server-protocol` to own exec DTOs, process IDs, and
    JSON-RPC envelopes.
    - Updated exec-server clients, transports, handlers, and tests to use
    the new crate.
    - Exposed app-server's existing JSON-RPC types through a public `rpc`
    module while retaining root re-exports.
    - Preserved existing wire shapes, including exec `PathUri` behavior.
    
    ## Stack
    
    This is PR 1 of 6. Next: [PR
    #29721](https://github.com/openai/codex/pull/29721), which moves auth
    mode below the app wire boundary.
    
    ## Validation
    
    - Exec-server protocol and server coverage passed in the focused
    protocol test runs.
    - App-server protocol schema fixtures passed.
  • path-uri: remove legacy path deserialization (#29158)
    ## Why
    
    I'd originally added `PathUri` legacy path deserialization thinking we'd
    want it for having `PathUri` in public app-server APIs. Since then we've
    added `LegacyAppPathString` to handle the messy conversions that we need
    for backcompat. It's confusing for `PathUri` to support deserializing
    legacy paths when we don't yet want to actually expose app-server
    callers or rollout storage to the new URI format.
    
    Stacked on top of #29472 to avoid breaking compatibility in case those
    types ended up stored somewhere for someone.
    
    ## What changed
    
    - Parse deserialized `PathUri` values exclusively as valid `file:` URIs.
    - Replace legacy acceptance coverage with rejection coverage for
    top-level filesystem paths and sandbox working directories.
    - Serialize CWDs in hand-built exec-server process requests as `PathUri`
    values.
  • [codex] Report the exec-server working directory (#29666)
    ## Summary
    
    - add the exec-server working directory to `environment/info` as an
    optional `PathUri`
    - populate it from the executor process's current directory
    - preserve compatibility with older responses that omit `cwd`
    
    ## Why
    
    Remote clients currently have no executor-native default working
    directory. This forces callers such as app-server-backend to assume
    `/workspace`, which fails for laptop environments. Reporting the cwd
    alongside the detected shell lets clients use the path convention and
    location of the actual executor.
    
    ## Impact
    
    This is backward-compatible: the new response field is optional, and
    clients can continue handling responses from older exec servers. A
    follow-up app-server-backend change will consume the value for cwd-less
    `command/exec` requests.
    
    ## Validation
    
    - `just test -p codex-exec-server` (275 passed, 2 skipped)
  • [codex] Preserve proxy state for filesystem sandbox helpers (#29671)
    ## Why
    
    Filesystem helpers intentionally run with a minimal environment that
    excludes proxy variables. After filesystem operations started using the
    Windows sandbox wrapper, the wrapper derived an empty proxy
    configuration from that helper environment and compared it with the
    persistent sandbox setup marker. When the marker contained proxy ports,
    every filesystem operation appeared to require a firewall update, which
    could launch elevated setup, show a UAC or loader dialog, and fail
    operations such as `apply_patch` with error 1223.
    
    Filesystem helpers do not use network access, so they should preserve
    the proxy/firewall state established by normal sandboxed process
    launches.
    
    ## What changed
    
    - Add an explicit Windows sandbox proxy-settings mode for reconciling or
    preserving persistent proxy state.
    - Use preserve mode for filesystem helpers while normal process launches
    continue to reconcile proxy settings from their environment.
    - Carry the selected proxy state consistently through setup validation,
    elevated setup, and non-elevated ACL refreshes.
    - Cover wrapper argument propagation and marker-derived proxy
    preservation.
    
    ## Validation
    
    - `cargo build -p codex-cli --bin codex`
    - `just test -p codex-windows-sandbox
    preserving_proxy_settings_uses_the_existing_marker`
    - `just test -p codex-windows-sandbox windows_wrapper_args_round_trip`
    - `just test -p codex-windows-sandbox
    setup_request_prefers_explicit_proxy_settings`
    - `just test -p codex-sandboxing transform_for_direct_spawn_windows`
    - `just test -p codex-exec-server fs_sandbox::tests`
    - Ran the same sandboxed `fs/writeFile` reproduction against published
    `0.142.0-alpha.6` and the new CLI. The published CLI launched elevated
    setup and failed with `ShellExecuteExW ... 1223`; the new CLI completed
    without elevation.
    
    Related to #28359.
  • Prepare managed network sandbox context (#29456)
    ## Why
    
    Managed network configures commands to use local HTTP and SOCKS proxies.
    For commands delegated to the exec server, the proxy environment and the
    sandbox policy were prepared separately. On macOS, that meant a command
    could receive `HTTPS_PROXY=http://127.0.0.1:43123` while Seatbelt still
    denied access to port `43123`.
    
    ## What changed
    
    `NetworkProxy` now prepares the command environment and sandbox context
    together from the same runtime snapshot:
    
    ```text
    Prepared managed network
    ├── command environment: HTTPS_PROXY=http://127.0.0.1:43123
    └── sandbox context: allow outbound to 127.0.0.1:43123
    ```
    
    That context travels with remote exec requests. The exec server
    preserves the managed proxy and CA environment, and macOS Seatbelt
    allows only the prepared loopback proxy ports without enabling broad
    network access or local binding.
    
    The protocol field is optional and the existing enforcement flag remains
    in place, preserving compatibility with callers that do not send the new
    context.
  • path-uri: clarify host-native path conversion (#29501)
    ## Why
    
    Downstream refactors are producing confusing code with this
    functionality having a very generic name. Encoding the specific
    conversion approach in the method name makes it clearer.
    
    ## What
    
    Rename `PathUri::from_path` to `PathUri::from_host_native_path` and
    update its Rust call sites.
  • Report remote sandbox denials semantically (#29424)
    ## Why
    
    #29113 moved remote sandbox setup and enforcement to the exec server.
    That gives the executor ownership of the platform-specific work: a Linux
    executor chooses and runs a Linux sandbox even when the Codex
    orchestrator is running on macOS or Windows.
    
    It also means the orchestrator no longer knows which concrete sandbox
    the executor selected. When that sandbox blocks a remote command, the
    orchestrator currently sees only a failed process and can treat the
    denial as an ordinary command failure. The existing sandbox approval and
    retry path is then skipped.
    
    This PR lets the executor report one portable fact:
    
    > This command probably failed because the executor sandbox blocked it.
    
    The executor keeps its concrete sandbox type private. The protocol sends
    only the semantic result.
    
    ## Example
    
    Suppose a local macOS Codex session asks a Linux devbox to write outside
    the allowed workspace.
    
    Before this PR:
    
    ```text
    Linux sandbox blocks the write
        -> remote process exits with "Permission denied"
        -> local orchestrator sees an ordinary command failure
        -> the normal sandbox approval and retry path can be skipped
    ```
    
    With this PR:
    
    ```text
    Linux sandbox blocks the write
        -> executor reports sandboxDenied: true
        -> unified exec returns UnifiedExecError::SandboxDenied
        -> the existing approval prompt is shown
        -> an approved retry runs through the existing unsandboxed retry path
    ```
    
    ## What changes
    
    ### The executor remembers its selected sandbox
    
    The prepared remote process now retains the executor-selected
    `SandboxType`. This value never crosses the executor boundary.
    
    Commands started without a sandbox retain `SandboxType::None` and are
    never reported as sandbox denials.
    
    ### The executor uses the existing denial heuristic
    
    The existing local denial heuristic moves from `codex-core` into the
    shared `codex-sandboxing` crate.
    
    When a sandboxed remote process exits, the executor:
    
    1. waits the same short output grace period used by local unified exec;
    2. reads the output currently available in the existing retained output
    buffer;
    3. runs the existing heuristic using the exit code and common denial
    messages;
    4. stores the yes/no result before publishing the process exit.
    
    This deliberately matches the old local unified-exec behavior. It does
    not add a new streaming classifier, another output buffer, or stronger
    output-retention guarantees.
    
    ### The protocol reports a portable boolean
    
    `process/read` gains `sandboxDenied`:
    
    ```json
    {
      "exited": true,
      "exitCode": 1,
      "closed": false,
      "sandboxDenied": true
    }
    ```
    
    The field defaults to `false` when an older executor omits it. The
    response does not expose the executor sandbox implementation or
    executor-native paths.
    
    ### Unified exec uses the existing error path
    
    The exec-server client carries `sandboxDenied` into the unified process
    state. If it is true, unified exec returns the existing `SandboxDenied`
    error instead of trying to classify remote output using an
    orchestrator-side sandbox type.
    
    Remote process exit remains visible as soon as the process exits. This
    PR does not wait for stdout or stderr to close and does not change the
    existing process lifecycle.
    
    ## Scope
    
    This PR is intentionally limited to matching the existing local
    unified-exec behavior for the initial command execution path.
    
    It does not add:
    
    - incremental denial tracking across the full output stream;
    - new denial handling for commands completed later through
    `write_stdin`;
    - new guarantees for preserving the semantic flag during the narrow
    reconnect-recovery race.
    
    Those can be considered separately if the same behavior is added for
    local execution.
    
    ## Test coverage
    
    One remote end-to-end integration test covers the complete intended
    flow:
    
    ```text
    remote read-only sandbox
        -> denied write
        -> executor reports the denial
        -> Codex requests approval
        -> user approves
        -> retry succeeds on the remote executor
    ```
    
    Existing lifecycle coverage continues to verify that remote process exit
    is reported before late output streams close.
  • Apply sandbox intent inside remote exec servers (#29113)
    ## Why
    
    PR #29108 lets the orchestrator send sandbox intent with `process/start`
    without wrapping the command for its own operating system.
    
    This PR completes that boundary by making the executor interpret and
    enforce the intent using its own filesystem paths and sandbox
    implementation.
    
    For example, a macOS TUI targeting a Linux devbox sends `/bin/bash -lc
    pwd`. The Linux executor turns that into its own `codex-linux-sandbox
    ... /bin/bash -lc pwd` launch.
    
    ## What changes
    
    - Keep `process/start` unchanged when no sandbox intent is present.
    - Convert sandbox `PathUri` values into native paths on the executor.
    - Bind symbolic `:workspace_roots` permissions to the executor's native
    sandbox cwd.
    - Select the sandbox implementation on the executor and wrap the
    original command immediately before spawning it.
    - Reject sandbox-required execution before spawning when the executor
    cannot enforce the intent.
    - Pass exec-server runtime paths into process creation so Linux can
    locate `codex-linux-sandbox`.
    
    The boundary is therefore:
    
    ```text
    orchestrator                         executor
    original argv + sandbox intent  ->  select and enforce local sandbox
    ```
    
    This PR intentionally treats a denied remote command as an ordinary
    command failure. Draft follow-up #29424 carries a semantic
    `sandboxDenied` result back to unified exec for the existing approval
    and retry flow.
    
    ## Platform scope
    
    Linux and macOS use their existing direct-spawn sandbox transforms.
    
    Windows sandboxed remote process launch is intentionally unsupported in
    this PR. The current Windows direct-spawn wrapper does not correctly
    preserve arbitrary argv, TTY behavior, or pass the full child
    environment out of band. The executor rejects the request instead of
    running it incorrectly or unsandboxed.
    
    ## Known follow-ups
    
    - The transported permission profile can still contain
    orchestrator-materialized helper or explicit paths. A `TODO(jif)` marks
    where the executor boundary should receive pre-host-materialization
    permission intent.
    - The sandbox wrapper currently replaces a requested custom inner
    `arg0`. A `TODO(jif)` marks where this must be preserved or rejected
    explicitly.
    - Draft PR #29424 contains the deferred sandbox-denial classification
    and approval/retry behavior.
    
    ## Rollout assumption
    
    This executor-sandbox stack is unreleased and its client and executor
    are expected to move together. This PR does not add mixed-version
    negotiation with older exec servers.
  • Test pipelined scalar exec-server requests (#29325)
    ## Summary
    
    This adds focused coverage for the simpler same-connection scalar
    request path.
    
    The exec-server connection already supports multiple in-flight JSON-RPC
    scalar requests on one connection. This test locks in that behavior by
    sending two normal requests before reading either response, without
    adding a batch frame or any new API surface.
    
    ## What changed
    
    - Added a processor-level test that initializes an exec-server
    connection.
    - Sends two scalar `environment/info` requests back-to-back on the same
    connection.
    - Verifies both responses come back on the same connection by request
    id.
    
    Checked locally with:
    
    - `just test -p codex-exec-server
    connection_accepts_pipelined_scalar_requests`
  • Carry sandbox intent to remote exec servers (#29108)
    ## What changed
    
    PR #29099 stopped sending the orchestrator's concrete sandbox wrapper to
    a remote exec-server. Remote commands now arrive as plain native argv.
    
    This PR adds the next piece: Codex also sends portable sandbox intent
    next to that plain argv.
    
    For a remote unified-exec command, the request can now include:
    
    - the canonical permission profile before local workspace-root
    materialization
    - the sandbox cwd and workspace roots as `PathUri` values
    - Windows sandbox settings
    - the legacy Landlock setting
    - whether managed networking must be enforced
    
    The important part is that symbolic entries such as `:workspace_roots`
    stay symbolic while crossing the boundary. The executor can then bind
    them to its own workspace-root paths instead of receiving
    orchestrator-local absolute paths.
    
    The data travels through `ExecRequest` into `ExecParams`. Older
    exec-servers can still deserialize requests because the new fields have
    defaults.
    
    ## Why
    
    The orchestrator should not decide how another machine implements
    sandboxing.
    
    For example:
    
    - a local macOS Codex would normally build a Seatbelt command
    - a remote Linux executor needs a Linux sandbox command instead
    
    The orchestrator now sends the plain command plus the policy it intended
    to enforce. A later PR can let the exec-server choose and build the
    correct sandbox for its own operating system.
    
    ## Important detail
    
    This keeps the portable intent separate from the local `SandboxType`.
    
    `SandboxType::None` is ambiguous:
    
    - it can mean the command was explicitly approved to run without a
    sandbox
    - it can also mean the orchestrator host has no concrete sandbox
    implementation available
    
    Those cases are different for remote execution. This PR adds
    `sandbox_requested` so an executor can still receive sandbox intent when
    the orchestrator cannot build a local wrapper. Explicit unsandboxed
    retries still send no sandbox context.
    
    ## Behavior today
    
    This PR only transports the intent. The exec-server accepts the new
    fields but does not apply them yet.
    
    Remote commands therefore remain unsandboxed after this PR, just as they
    are after PR #29099.
    
    ## Follow-up
    
    The next PR will make exec-server read this portable intent, bind
    symbolic workspace permissions to executor-native roots, choose the
    sandbox for its own operating system, build the wrapper locally, and
    then spawn the command.
  • [3/3] app-server: configure environment connection timeout (#29025)
    ## Why
    
    Remote environments registered through `environment/add` currently use
    the fixed 10-second WebSocket connection timeout. Slow-starting
    executors need a caller-selected connection window, but this should not
    add retry policy or couple exec-server behavior to Core’s
    `deferred_executor` feature.
    
    Make the timeout an optional part of the existing experimental request.
    Existing clients continue using the current default, while callers that
    know an executor may take longer can request a larger window explicitly.
    
    Depends on #28683.
    
    ## What changed
    
    - Add optional `connectTimeoutMs` to `EnvironmentAddParams` and document
    it in the app-server README.
    - Pass the optional timeout through `EnvironmentRequestProcessor` into
    one `EnvironmentManager::upsert_environment()` path; the manager applies
    the existing default when it is omitted.
    - Preserve the existing single-attempt lifecycle. The configured value
    controls WebSocket connection and handshake time for both initial
    connection and later reconnects; initialization retains its separate
    timeout.
    - Add an app-server integration test that sends the real JSON-RPC
    request and verifies a stalled handshake observes the requested timeout.
    
    ## Test plan
    
    - `just test -p codex-app-server-protocol`
    - `just test -p codex-exec-server`
    - `just test -p codex-app-server
    environment_add_applies_connect_timeout`
    
    ## Rollout
    
    This is additive and does not enable `deferred_executor`. Callers should
    send a non-default timeout only after a compatible app-server is
    deployed; omitted or `null` values retain the existing 10-second
    default.
  • [1/3] core: add remote environment connection lifecycle (#28674)
    ## Why
    
    Remote environments can be registered before their exec-server is first
    used. Starting the connection at registration time uses that startup
    window, while sharing one startup result prevents background work and
    capability calls from opening competing connections.
    
    Keep initial startup simple: each environment makes one connection
    attempt using its configured transport timeout. A failed initial attempt
    is final for that environment, while an environment that disconnects
    after connecting can still recover on a later operation.
    
    ## What changed
    
    - Start URL and Noise environments in the background when they are added
    to `EnvironmentManager`. Provider snapshots are fully validated before
    connection work begins.
    - Share one initial connection attempt and its saved result across
    metadata, process, filesystem, and HTTP callers.
    - Keep configured stdio environments lazy until first use so
    registration does not launch a process.
    - Tie background startup work to the environment lifetime so replacing
    or dropping an environment cancels unfinished work.
    - After an established client disconnects, share one fresh connection
    attempt across concurrent callers. A failed attempt fails the current
    operation without permanently preventing a later attempt.
    - Store the shared lazy client directly on `Environment` and expose
    small methods for starting, observing, and awaiting startup.
    
    ## Test plan
    
    - `just test -p codex-exec-server`
    - `just test -p codex-app-server
    turn_start_resolves_sticky_thread_local_environment_and_turn_overrides`
  • core: load AGENTS.md from foreign environments (#28958)
    ## Why
    
    Make it possible to load AGENTS.md from remote exec-servers whose OS is
    different than app-server.
    
    ## What
    
    - keep `AGENTS.md` discovery and provenance as `PathUri`, with
    root-aware parent and ancestor traversal
    - expose lifecycle instruction sources as legacy app-server path strings
    in events while retaining `PathUri` internally
    - preserve and test mixed POSIX and Windows paths in model context and
    TUI status output
    - cover remote Windows loading end to end by seeding the Wine prefix
    through host filesystem APIs
    - fix bug in `PathUri`'s parent() implementation that would erase
    Windows drive letters
  • [codex] Initialize exec-server OpenTelemetry at startup (#25019)
    ## Summary
    
    - Initialize stderr tracing and the configured OpenTelemetry provider
    for local and remote `codex exec-server` startup.
    - Instrument the local and remote server entrypoints with a root runtime
    span.
    - Keep raw Noise environment, registration, and stream identifiers out
    of exported spans while preserving them in local debug events.
    - Keep telemetry setup in a focused CLI module instead of growing the
    top-level command entrypoint.
    
    ## Stack
    
    - Previous: none (`#27058` has merged)
    - Next: #27466
    
    ## Validation
    
    - `just test -p codex-exec-server --lib` (139 passed)
    - `just test -p codex-cli --test exec_server` (3 passed)
    - `just bazel-lock-check`
    - `just fix -p codex-exec-server -p codex-cli`
    - `just fmt`
    
    ---------
    
    Co-authored-by: Richard Lee <richardlee@openai.com>
  • Recover exec process stdin writes (#28895)
    ## Summary
    
    Remote stdio MCP servers send tool calls by writing JSON-RPC bytes
    through `process/write`.
    
    When the exec-server websocket drops at the wrong time, the remote
    process can survive session recovery, but the stdin write can still fail
    back to RMCP as a transport send error. RMCP then closes the stdio MCP
    transport, so tools like `node_repl` are lost even though the
    process/session recovery path is working.
    
    This changes `process/write` to be safe to retry across exec-server
    recovery:
    
    - adds a required `writeId` to `process/write`
    - retries remote `Session::write` with the same `writeId` after
    reconnect
    - remembers accepted write ids per process so duplicate retries return
    `Accepted` without writing the same bytes to child stdin again
    - covers both the client retry path and server-side write id dedupe with
    tests
    
    In simple terms:
    
    ```text
    before:
    write to MCP stdin -> websocket closes -> write errors -> RMCP closes node_repl
    
    after:
    write to MCP stdin -> websocket closes -> reconnect -> retry same writeId
    server either writes once or recognizes it already did
    ```
  • Add network environment ID plumbing (#28766)
    ## Why
    
    Prepare network approval scoping to distinguish execution environments
    without changing behavior yet.
    
    ## What changed
    
    - Add optional environment IDs to network policy requests.
    - Add optional network environment IDs to exec and sandbox request
    structs.
    - Thread default None values through existing construction points.
    - Fix stale constructor call sites that caused the CI compile failures.
    
    ## Not included
    
    - Per-environment proxy listeners.
    - Network approval cache or prompt behavior changes.
    - Ambiguous request attribution handling.
    
    Those behavior changes moved to stacked follow-up #28899.
    
    ## Validation
    
    - just fmt
    - CI will run tests and clippy
  • Refresh signed exec-server URLs on reconnect (#28374)
    ## Summary
    
    - add a provider API that supplies a fresh signed WebSocket URL for each
    remote exec-server connection
    - refresh the signed URL after disconnects and retry once when a
    handshake returns `401 Unauthorized`
    - allow `EnvironmentManager` consumers to register remote environments
    backed by the URL provider
    
    ## Tests
    
    - `just test -p codex-exec-server -E
    'test(remote_websocket_client_refreshes_url_after_unauthorized_handshake)
    | test(remote_websocket_client_refreshes_url_after_disconnect)'` — 2
    passed
    - `cargo check -p codex-core-api` — passed
    - `just fix -p codex-exec-server` — passed
    - `just fix -p codex-core-api` — no test targets; no-op
    - `just fmt` — passed
    - `just test -p codex-exec-server` — 187 passed; 32 unrelated macOS
    sandbox tests could not invoke nested `sandbox-exec` (`Operation not
    permitted`)
  • feat(exec-server): add Noise rendezvous environment (#28774)
    ## Why
    
    Codex can run a remote exec server through the Noise relay, but the
    normal
    environment-manager path could not establish an
    environment-registry-backed
    harness connection. Signed rendezvous URLs and harness authorizations
    are
    short-lived, so reconnects must fetch a fresh bundle instead of
    retaining
    stale connection credentials. A stalled registry request must also fail
    within
    the regular remote connection deadline, without exposing these
    credentials in
    debug logs.
    
    Issue: N/A (internal environment-service integration).
    
    ## What Changed
    
    - Add environment-manager configuration for a registry-backed Noise
    rendezvous
      environment.
    - Request a fresh bundle from
    `/cloud/environment/{environment_id}/connect` for every physical harness
      connection, using the existing 10-second remote connection timeout.
    - Share the Environment Registry register, connect, and validate wire
    payloads
      through `codex-exec-server` and `codex-core-api`.
    - Redact the signed rendezvous URL and harness authorization from the
    public
      connect response's `Debug` output.
    - Add focused coverage for registry bundle retrieval, stalled requests,
    and
      credential redaction.
  • exec-server: expose environment registry payloads (#28651)
    ## Why
    
    Services that proxy the exec-server environment registry endpoints need
    to deserialize and forward the same Noise registration and harness-key
    validation payloads. Those wire models currently live as private,
    serialize-only structs in `exec-server`, which forces consumers to
    duplicate the contract.
    
    ## What changed
    
    - Add owned serde models for registration and harness-key validation
    requests and responses.
    - Use those models in the existing exec-server registry client.
    - Re-export the models from `codex-exec-server` and `codex-core-api`.
    - Keep the harness authorization request free of a derived `Debug`
    implementation so it is not accidentally logged.
    
    ## Testing
    
    - Focused exec-server registration and harness-key validation tests: 2
    passed.
    - `cargo check -p codex-core-api`
    
    The full `codex-exec-server` suite compiled and ran 254 tests: 222
    passed, while 32 existing filesystem sandbox tests could not run under
    the nested macOS sandbox (`sandbox_apply: Operation not permitted`).
    
    Co-authored-by: Codex <noreply@openai.com>
  • unified-exec: preserve PathUri through exec-server (#28681)
    ## Why
    
    It should be possible for app-server to handle "foreign" OS paths in
    unified_exec working directories, allowing e.g. a Linux app-server to
    run processes on e.g. a Windows exec-server.
    
    ## What
    
    Convert the core unified_exec cwd values to use `PathUri`.
    
    Adds fallible path conversion in several places to try to minimize the
    scope of this change. The only time this change suppresses errors from
    converting `PathUri` to an `AbsolutePathBuf` is when the turn is
    configured with no sandboxing at all to allow us to make progress
    testing without sandboxing.
    
    Future changes to apply_patch and sandboxing will clean up these error
    paths.
    
    A tool's cwd is resolved from joining a model-provided workdir to the
    environment's cwd. When using `AbsolutePathBuf::join()`, an
    absolute-path workdir would overwrite the environment's cwd and we would
    resolve permissions/sandboxing against the model-provided path. This
    change extends `PathUri::join()` to also treat an absolute rhs as an
    override of the base/lhs.
    
    This also removes some coverage from the remove_env_windows tests until
    a follow-up converts foreign paths in command exec events correctly.
    
    ## Breaking Changes
    
    When using `AbsolutePathBuf::join()` for workdir resolution, we ended up
    resolving tilde-prefixed paths against the app-server's `$HOME`, e.g.
    `~/foo/bar` becomes `/home/anp/foo/bar`. It's difficult to do this with
    `PathUri` joining, so after offline discussion this PR no longer
    implements it.
    
    A quick check of some power users' rollouts suggests that models don't
    actually generate home-prefixed absolute working directories for their
    spawns, so this shouldn't have any real blast radius.
  • Run fs helper through Windows sandbox wrapper (#28359)
    ## Why
    
    This is the final PR in the Windows fs-helper sandbox stack and contains
    the actual bug fix.
    
    The exec-server filesystem helper is a direct-spawn path: it asks
    `SandboxManager` for a `SandboxExecRequest`, then launches the returned
    argv itself. That works on macOS and Linux because the transformed argv
    is already a self-contained sandbox wrapper. On Windows, the transformed
    request carried `WindowsRestrictedToken` metadata, but the direct-spawn
    fs-helper runner still launched the helper argv directly.
    
    That means Windows filesystem built-ins backed by the fs-helper could
    run with the parent Codex process permissions instead of the configured
    Windows sandbox. This PR makes the direct-spawn transform produce a
    self-contained Windows wrapper argv before fs-helper launches it.
    
    ## What Changed
    
    - Added `SandboxManager::transform_for_direct_spawn()` for callers that
    launch the returned argv themselves.
    - Wrapped Windows restricted-token direct-spawn requests with `codex.exe
    --run-as-windows-sandbox` and then marked the outer request as
    unsandboxed, matching the macOS/Linux wrapper argv shape.
    - Updated `exec-server/src/fs_sandbox.rs` to use the direct-spawn
    transform for fs-helper launches.
    - Materialized the inner `codex.exe --codex-run-as-fs-helper` executable
    into `.sandbox-bin` so the sandboxed user can run it.
    - Carried runtime workspace roots through `FileSystemSandboxContext` as
    `PathUri` values so `:workspace_roots` policies resolve correctly
    without sending native client paths over exec-server JSON.
    - Preserved wrapper setup identity environment needed by Windows sandbox
    setup without changing the serialized inner helper environment.
    
    ## Verification
    
    - `just bazel-lock-update`
    - `just bazel-lock-check`
    - `just test -p codex-sandboxing transform_for_direct_spawn_windows`
    - `just test -p codex-exec-server fs_sandbox::tests`
    - `just fix -p codex-windows-sandbox -p codex-sandboxing -p
    codex-exec-server -p codex-core -p codex-file-system`
    
    Local note: `just fmt` completed Rust formatting, but this workstation
    still fails the non-Rust formatter phases because uv cannot open its
    cache and the local buildifier/dotslash path is missing.
  • Back off registry retries during exec recovery (#28546)
    ## Why
    
    PR #28512 retries a failed session recovery every 100 ms. Every Noise
    recovery attempt first asks the environment registry for a fresh
    connection bundle, even when the eventual failure comes from the
    WebSocket or initialize handshake. During an outage, that could make
    each disconnected client call the registry about 250 times during the
    25-second recovery window.
    
    ## What changes
    
    All retryable Noise recovery failures now use a separate backoff
    schedule:
    
    ```text
    base:    500 ms -> 1 s -> 2 s -> 4 s -> 5 s maximum
    actual:  500-750 ms, 1-1.5 s, 2-3 s, 4-6 s, 5-7.5 s
    ```
    
    The extra 0-50% is deterministic per-session jitter so disconnected
    clients do not retry together. Direct WebSocket recovery keeps the
    existing 100 ms retry because it does not re-enter the registry.
  • Resume exec-server sessions after disconnect (#28512)
    Supersedes #28288 (closed).
    
    ## Why
    
    A short WebSocket interruption currently ends every client-side process
    handle, even though exec-server keeps the server session and its
    processes alive for a short time.
    
    This is especially visible for executor-backed stdio MCP servers: a
    temporary connection loss becomes a permanent `Transport closed` error.
    The server already has the information needed to resume the session, but
    the client opens a fresh session instead of using it.
    
    This change reconnects below the process and MCP layers. Existing
    process handles stay valid, missed output is recovered, and the same
    server-side processes continue running.
    
    ## State machine
    
    One logical `ExecServerClient` stays alive while its underlying RPC
    connection changes generations.
    
    ```text
                             transport closes
           +------------------------------------------------+
           |                                                v
    +-------------+                                  +-------------+
    |  Connected  |                                  | Recovering  |
    +-------------+                                  +-------------+
           ^                                                |
           | session resumed, processes caught up           | retryable error
           +------------------------------------------------+ loops until deadline
                                                            |
                                                            | deadline or permanent error
                                                            v
                                                      +-------------+
                                                      |   Failed    |
                                                      +-------------+
    ```
    
    ### `Connected`
    
    - New RPC calls use the current connection.
    - Process notifications are published in sequence order.
    - A disconnect only starts recovery if it came from the current
    connection generation. Late events from older generations cannot replace
    the active connection.
    
    ### `Recovering`
    
    - New calls wait instead of choosing a half-connected RPC client.
    - Existing process handles, wake subscriptions, and event subscriptions
    stay open.
    - Streaming HTTP response bodies fail immediately because their byte
    streams cannot be resumed safely.
    - Recovery first waits for process starts that were already in flight. A
    start whose result became ambiguous is cleaned up after reconnection
    instead of being silently adopted.
    - The client reconnects with the learned `session_id`. The server may
    briefly report that the old connection is still attached, so that error
    is retried until the detach finishes.
    - The notification consumer starts before the resume handshake
    completes. This prevents a busy process from filling the notification
    queue and blocking the initialize response.
    - Before installing the new connection, the client catches up every
    recoverable process with `process/read`.
    
    ### `Failed`
    
    - Recovery stops after 25 seconds or after a permanent error.
    - Waiting calls are released with one stable disconnect error.
    - Existing process sessions receive a terminal failure instead of
    waiting forever.
    
    ## Recovering process events
    
    Output, exit, and close events share one sequence. During normal
    operation, the client buffers early events until every lower sequence
    has been published.
    
    After reconnection, the client reads each process starting after its
    last published sequence:
    
    1. Retained output chunks are inserted by sequence number.
    2. Exit and close state are reconstructed in their sequence positions.
    3. Events already received as live notifications are ignored as
    duplicates.
    4. Newly contiguous events are published in order.
    5. If the server no longer retains enough output to fill a sequence gap,
    only that process is terminated and failed. The recovered connection
    remains usable for other processes.
    
    The server reports its full next event sequence for unbounded reads,
    including exit and close events. Closed processes remain readable for
    the same 30-second window used to retain detached sessions.
    
    ## Other details
    
    - Detached server sessions are retained for 30 seconds, leaving margin
    around the client's 25-second recovery deadline.
    - Session attach and detach update the active notification sender under
    the same attachment lock, so an old connection cannot clear a newly
    attached sender.
    - A dedicated error code distinguishes the temporary "session is still
    attached" race from permanent initialization errors.
    - Process starts are identity-checked on both client and server. Cleanup
    from an older start cannot remove a newer process that reused the same
    ID.
    - Mutating requests that were already in flight when the transport
    closed are not replayed, because the client cannot know whether the
    server applied them. Requests started after recovery is known wait for
    the replacement connection.
    - We assume the server/client version stays in sync (on the before/after
    this PR)
    
    ## User impact
    
    Long-running commands and stdio MCP servers can survive a temporary
    exec-server WebSocket interruption without changing process IDs or
    losing output produced during the outage.
  • [codex] exec-server: stream files in chunks (#28354)
    ## Why
    
    `fs/readFile` buffers the entire file in one response, which makes large
    remote reads expensive and prevents callers from applying backpressure.
    We need an opt-in streaming path with bounded block sizes while
    preserving the existing single-call API for small and sandboxed reads.
    
    ## What changed
    
    - Add `ExecServerClient::stream`, returning a named `FileReadStream`
    that implements `futures::Stream` and yields immutable 1 MiB byte
    blocks.
    - Add internal `fs/open`, `fs/readBlock`, and `fs/close` RPCs.
    `fs/readBlock` accepts an explicit offset and length.
    - Keep unsandboxed files open between block reads, cap open handles per
    connection, and clean them up on EOF, error, stream drop, explicit
    close, or connection shutdown.
    - Reject platform-sandboxed streaming opens instead of turning the
    one-shot sandbox helper into a persistent server. Existing `fs/readFile`
    behavior is unchanged.
    
    ## Testing
    
    - `just test -p codex-exec-server`
    - Integration coverage for 1 MiB chunking, exact block-boundary EOF,
    sandbox rejection, and continued reads from the opened file after path
    replacement.
    - Handle-manager coverage for non-sequential offsets, variable block
    lengths, the 128-handle limit, and capacity release after close.
  • path-uri: clarify invalid host path errors (#28473)
    ## Why
    
    Ensure a consistent string format when exposing path conversion errors
    to the model.
    
    ## What
    
    - Render `PathUriParseError::InvalidFileUriPath` as `'$PATH' is invalid
    on '$OS'`.
  • [codex] Use expect in integration tests (#28441)
    The workspace denies `clippy::expect_used` in production. Although
    `clippy.toml` allows `expect` in tests, Bazel Clippy compiles
    integration-test helper code in a way that does not receive that
    exemption, which encouraged verbose `unwrap_or_else(... panic!(...))`
    and equivalent `match`/`let else` forms.
    
    This allows `clippy::expect_used` once at each integration-test crate
    root (including aggregated suites and test-support libraries), then
    replaces manual panic-based Result and Option unwraps with
    `expect`/`expect_err`. Standalone `tests/*.rs` files remain their own
    crate roots. Intentional assertion and unexpected-variant panics remain
    unchanged, and the production `expect_used = "deny"` lint remains in
    place.
    
    The cleanup is mechanical and net-negative in line count.
  • exec-server: default remote transport to Noise (#26245)
    ## Why
    
    The transport in
    [openai/codex#26242](https://github.com/openai/codex/pull/26242) needs
    to be used by every remote orchestrator-to-executor connection before
    JSON-RPC traffic starts.
    
    ## Changes
    
    - Generates one executor Noise identity when remote exec-server starts
    and registers its public key.
    - Creates a harness identity for each physical remote environment
    connection.
    - Fetches a fresh registry bundle before connecting and validates the
    authenticated harness key before completing the executor handshake.
    - Multiplexes encrypted logical streams over the existing executor
    WebSocket.
    - Adds bounded stream, handshake-failure, and reassembly state.
    - Adds safe lifecycle diagnostics without logging keys, authorizations,
    plaintext, or ciphertext.
    - Covers reconnects, replay rejection, validation failure, framing
    limits, and encrypted JSON-RPC tool traffic.
    
    ## Stack
    
    1. [openai/codex#26242](https://github.com/openai/codex/pull/26242):
    Noise channel and relay transport
    2. **[openai/codex#26245](https://github.com/openai/codex/pull/26245)**:
    remote registration and runtime activation
    
    ## Verification
    
    - `just test -p codex-exec-server`
    - `just fix -p codex-exec-server`
    - `just bazel-lock-check`
    - `cargo shear`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • Run core integration tests against a Wine-backed Windows executor (#28401)
    ## Why
    
    We want to exercise a linux app-server against a windows exec-server
    without having to repeat every test case. This approach has slight
    precedent in the remote docker test setup.
    
    ## What
    
    Run the shared `codex-core` integration suite against Windows
    exec-server behavior from Linux. This makes cross-OS path and shell
    regressions visible while keeping unsupported cases owned by individual
    tests.
    
    - Add `local`, `docker`, and `wine-exec` test environment selection with
    legacy Docker compatibility.
    - Extend `codex_rust_crate` to generate a sharded Wine-exec variant
    using a cross-built Windows server and pinned Bazel Wine/PowerShell
    runtimes.
    - Teach remote-aware helpers about Windows paths and track temporary
    incompatibilities with source-local `skip_if_wine_exec!` calls and
    follow-up reasons.
  • Use PathUri in filesystem permission paths for exec-server (#28165)
    ## Why
    
    Progress towards letting app-server and exec-server run on different
    platforms, specifically for sandbox configuration.
    
    ## What
    
    - Make the filesystem path containment hierarchy generic, defaulting to
    `AbsolutePathBuf` for now.
    - Have clients specify `AbsolutePathBuf` or `PathUri` directly where
    needed.
    - Use `PathUri` throughout exec-server filesystem protocol and trait
    boundaries.
    - Implement `From` for conversion to path URIs and `TryFrom` for
    fallible conversion to absolute paths through the generic type
    hierarchy.
  • exec-server: add Noise relay transport (#26242)
    ## Why
    
    Rendezvous forwards traffic between the orchestrator and exec-server.
    The endpoints need to authenticate each other and encrypt that traffic
    without trusting Rendezvous with plaintext or endpoint keys.
    
    ## Changes
    
    - Adds a hybrid Noise IK channel through Clatter using X25519,
    ML-KEM-768, AES-256-GCM, and SHA-256.
    - Binds each handshake to `environment_id`, `executor_registration_id`,
    and `stream_id`.
    - Pins the registry-provided executor key and carries the harness
    authorization inside the encrypted handshake.
    - Orders relay frames before consuming Noise nonces and fragments large
    JSON-RPC messages into bounded records.
    - Bounds handshake payloads, frames, streams, and message reassembly.
    
    Runtime activation is in
    [openai/codex#26245](https://github.com/openai/codex/pull/26245).
    
    ## Stack
    
    1. **[openai/codex#26242](https://github.com/openai/codex/pull/26242)**:
    Noise channel and relay transport
    2. [openai/codex#26245](https://github.com/openai/codex/pull/26245):
    remote registration and runtime activation
    
    ## Verification
    
    - `just test -p codex-exec-server`
    - Oversized initiator payload regression coverage
    - `just fix -p codex-exec-server`
    - `just bazel-lock-check`
    - `cargo shear`
    
    ---------
    
    Co-authored-by: Codex <noreply@openai.com>
  • chore: restore exec-server relay keepalives (#28286)
    ## Why
    
    The ws pump refactor removed the relay keepalive timers that had been
    added to keep idle rendezvous connections alive. An idle relay could
    therefore be closed by the rendezvous service or a load balancer,
    disconnecting executor-backed MCP processes.
    
    ## What
    
    - restore periodic WebSocket ping frames on both rendezvous relay
    endpoints
    - keep missed-tick behavior bounded with `MissedTickBehavior::Skip`
    - cover the harness and remote-environment pumps with focused
    traffic-after-keepalive tests
  • [codex] exec-server honors remote environment cwd and shell (#28122)
    ## Why
    
    Next slice needed to make progress on the `remote_env_windows` test is
    to support passing a Windows cwd for the remote environment and using
    that environment's native shell. This lets the test run a real Windows
    process instead of only recording an early path or shell mismatch.
    
    ## What
    
    - change `TurnEnvironmentSelection.cwd` from `AbsolutePathBuf` to
    `PathUri`
    - convert local cwd values to URIs when constructing selections
    - preserve a remote primary cwd instead of replacing it with the local
    legacy fallback
    - prefer the selected environment's discovered shell for unified exec,
    falling back to the session shell when unavailable
    - convert back to a host-native absolute path at current native-only
    consumer boundaries
    - reject or deny unsupported foreign cwd values at the existing
    request-permissions boundary, with TODOs for its future migration
    - extend the hermetic Wine test to execute Windows PowerShell in
    `C:\windows` and verify successful process completion
    - record the current app-server rejection against the same Wine-backed
    remote Windows fixture when its cwd is supplied as a native Windows path
  • build: run buildifier from just fmt (#28125)
    ## Intent
    
    Keep Bazel and Starlark files consistently formatted without requiring
    contributors to install or version buildifier themselves.
    
    ## Implementation
    
    - Add a SHA-256-pinned, cross-platform DotSlash manifest for buildifier
    v8.5.1.
    - Run buildifier from the shared `just fmt` and `just fmt-check` driver,
    with Windows-safe explicit DotSlash invocation.
    - Provision DotSlash in formatting CI and contributor devcontainers, and
    document the source-build prerequisite.
    - Apply the initial mechanical buildifier formatting baseline.
  • [codex] Carry exec-server cwd as PathUri (#28032)
    ## Why
    
    This is the second-to-last place in the exec-server protocol that needs
    to migrate to URIs to support cross-OS operation.
    
    ## What
    
    - Change `ExecParams.cwd` to `PathUri`.
    - Keep the cwd URI-shaped through core and rmcp producers, converting it
    to `AbsolutePathBuf` only in `LocalProcess::start_process`.
    - Reject non-native cwd URIs before launch and update the affected
    protocol documentation and call sites.
  • [codex] Add hermetic Wine exec-server test (#27937)
    ## Why
    
    We want to make it possible for an app-server orchestrator on one OS to
    control an exec-server on another host running a different OS. In
    practice this kinda already works if you get lucky and the two hosts
    have the same path format, but we mangle quite a lot of operations if
    either end is Windows.
    
    This test starts exercising that interaction, although right now the
    initial bootstrap fails. Future changes will expand the test's
    assertions to match improved support.
    
    ## What
    
    Stacked on #27964. This adds a small Windows exec-server fixture and a
    Linux protocol smoke test using the reusable Wine harness, covering
    Windows environment discovery, non-TTY `cmd.exe` execution, output, exit
    status, and working directory.
    
    Once we've got the full codex binary cross-building under Bazel we could
    consider moving to the real binary instead of the stripped down
    exec-server-only binary used here.