mirror of
https://github.com/pchuan98/codex.git
synced 2026-07-01 00:31:56 +08:00
b128da272e
## Why Remote-control app-server sessions can reconnect every 5-7 seconds when the shared transport-event queue fills. The queue's consumer handled `ConnectionClosed` by awaiting all in-flight RPCs for the disconnected connection. A stuck RPC therefore blocked processing of replacement connection and initialize events until remote-control forwarding hit its five-second timeout and reconnected again. Related issue: N/A (internal remote-control incident investigation). ## What Changed - Split fast RPC admission closure from draining: `ConnectionRpcGate::close()` rejects queued and future RPCs, while `shutdown()` continues waiting for RPCs that already started. - Close a disconnected connection's RPC gate before spawning the existing RPC drain and resource cleanup in a tracked background task, so the transport-event consumer remains available without waiting for active RPCs. - Reap completed cleanup tasks during normal operation, drain them during graceful shutdown, and abort them during forced shutdown. - Add regression coverage for closing with an active RPC, rejecting post-close requests without polling them, and preserving the existing shutdown wait behavior. ## Verification `just test -p codex-app-server --lib connection_rpc_gate` passes all 6 tests, including the new close-versus-drain regression coverage.
b128da272e
·
2026-06-08 10:20:54 -07:00
History