jif cf17e1bc20 Resume exec-server sessions after disconnect (#28512)
Supersedes #28288 (closed).

## Why

A short WebSocket interruption currently ends every client-side process
handle, even though exec-server keeps the server session and its
processes alive for a short time.

This is especially visible for executor-backed stdio MCP servers: a
temporary connection loss becomes a permanent `Transport closed` error.
The server already has the information needed to resume the session, but
the client opens a fresh session instead of using it.

This change reconnects below the process and MCP layers. Existing
process handles stay valid, missed output is recovered, and the same
server-side processes continue running.

## State machine

One logical `ExecServerClient` stays alive while its underlying RPC
connection changes generations.

```text
                         transport closes
       +------------------------------------------------+
       |                                                v
+-------------+                                  +-------------+
|  Connected  |                                  | Recovering  |
+-------------+                                  +-------------+
       ^                                                |
       | session resumed, processes caught up           | retryable error
       +------------------------------------------------+ loops until deadline
                                                        |
                                                        | deadline or permanent error
                                                        v
                                                  +-------------+
                                                  |   Failed    |
                                                  +-------------+
```

### `Connected`

- New RPC calls use the current connection.
- Process notifications are published in sequence order.
- A disconnect only starts recovery if it came from the current
connection generation. Late events from older generations cannot replace
the active connection.

### `Recovering`

- New calls wait instead of choosing a half-connected RPC client.
- Existing process handles, wake subscriptions, and event subscriptions
stay open.
- Streaming HTTP response bodies fail immediately because their byte
streams cannot be resumed safely.
- Recovery first waits for process starts that were already in flight. A
start whose result became ambiguous is cleaned up after reconnection
instead of being silently adopted.
- The client reconnects with the learned `session_id`. The server may
briefly report that the old connection is still attached, so that error
is retried until the detach finishes.
- The notification consumer starts before the resume handshake
completes. This prevents a busy process from filling the notification
queue and blocking the initialize response.
- Before installing the new connection, the client catches up every
recoverable process with `process/read`.

### `Failed`

- Recovery stops after 25 seconds or after a permanent error.
- Waiting calls are released with one stable disconnect error.
- Existing process sessions receive a terminal failure instead of
waiting forever.

## Recovering process events

Output, exit, and close events share one sequence. During normal
operation, the client buffers early events until every lower sequence
has been published.

After reconnection, the client reads each process starting after its
last published sequence:

1. Retained output chunks are inserted by sequence number.
2. Exit and close state are reconstructed in their sequence positions.
3. Events already received as live notifications are ignored as
duplicates.
4. Newly contiguous events are published in order.
5. If the server no longer retains enough output to fill a sequence gap,
only that process is terminated and failed. The recovered connection
remains usable for other processes.

The server reports its full next event sequence for unbounded reads,
including exit and close events. Closed processes remain readable for
the same 30-second window used to retain detached sessions.

## Other details

- Detached server sessions are retained for 30 seconds, leaving margin
around the client's 25-second recovery deadline.
- Session attach and detach update the active notification sender under
the same attachment lock, so an old connection cannot clear a newly
attached sender.
- A dedicated error code distinguishes the temporary "session is still
attached" race from permanent initialization errors.
- Process starts are identity-checked on both client and server. Cleanup
from an older start cannot remove a newer process that reused the same
ID.
- Mutating requests that were already in flight when the transport
closed are not replayed, because the client cannot know whether the
server applied them. Requests started after recovery is known wait for
the replacement connection.
- We assume the server/client version stays in sync (on the before/after
this PR)

## User impact

Long-running commands and stdio MCP servers can survive a temporary
exec-server WebSocket interruption without changing process IDs or
losing output produced during the outage.
cf17e1bc20 · 2026-06-17 10:20:39 +02:00
7,557 Commits
2026-04-24 17:49:29 -07:00
2025-04-16 12:56:08 -04:00
2025-04-16 12:56:08 -04:00
2026-04-24 17:49:29 -07:00

Codex CLI is a coding agent from OpenAI that runs locally on your computer.

Codex CLI splash


If you want Codex in your code editor (VS Code, Cursor, Windsurf), install in your IDE.
If you want the desktop app experience, run codex app or visit the Codex App page.
If you are looking for the cloud-based agent from OpenAI, Codex Web, go to chatgpt.com/codex.


Quickstart

Installing and running Codex CLI

Run the following on Mac or Linux to install Codex CLI:

curl -fsSL https://chatgpt.com/codex/install.sh | sh

Run the following on Windows to install Codex CLI:

powershell -ExecutionPolicy ByPass -c "irm https://chatgpt.com/codex/install.ps1 | iex"

Codex CLI can also be installed via the following package managers:

# Install using npm
npm install -g @openai/codex
# Install using Homebrew
brew install --cask codex

Then simply run codex to get started.

You can also go to the latest GitHub Release and download the appropriate binary for your platform.

Each GitHub Release contains many executables, but in practice, you likely want one of these:

  • macOS
    • Apple Silicon/arm64: codex-aarch64-apple-darwin.tar.gz
    • x86_64 (older Mac hardware): codex-x86_64-apple-darwin.tar.gz
  • Linux
    • x86_64: codex-x86_64-unknown-linux-musl.tar.gz
    • arm64: codex-aarch64-unknown-linux-musl.tar.gz

Each archive contains a single entry with the platform baked into the name (e.g., codex-x86_64-unknown-linux-musl), so you likely want to rename it to codex after extracting it.

Using Codex with your ChatGPT plan

Run codex and select Sign in with ChatGPT. We recommend signing into your ChatGPT account to use Codex as part of your Plus, Pro, Business, Edu, or Enterprise plan. Learn more about what's included in your ChatGPT plan.

You can also use Codex with an API key, but this requires additional setup.

Docs

This repository is licensed under the Apache-2.0 License.

S
Description
No description provided
Readme Apache-2.0 156 MiB
Languages
Rust 96.1%
Python 2.9%
Shell 0.3%
Starlark 0.2%
TypeScript 0.2%
Other 0.1%