Files
codex/codex-rs
T
Anton Panasenko b128da272e fix(app-server): avoid blocking connection cleanup (#26852)
## Why

Remote-control app-server sessions can reconnect every 5-7 seconds when
the shared transport-event queue fills. The queue's consumer handled
`ConnectionClosed` by awaiting all in-flight RPCs for the disconnected
connection. A stuck RPC therefore blocked processing of replacement
connection and initialize events until remote-control forwarding hit its
five-second timeout and reconnected again.

Related issue: N/A (internal remote-control incident investigation).

## What Changed

- Split fast RPC admission closure from draining:
`ConnectionRpcGate::close()` rejects queued and future RPCs, while
`shutdown()` continues waiting for RPCs that already started.
- Close a disconnected connection's RPC gate before spawning the
existing RPC drain and resource cleanup in a tracked background task, so
the transport-event consumer remains available without waiting for
active RPCs.
- Reap completed cleanup tasks during normal operation, drain them
during graceful shutdown, and abort them during forced shutdown.
- Add regression coverage for closing with an active RPC, rejecting
post-close requests without polling them, and preserving the existing
shutdown wait behavior.

## Verification

`just test -p codex-app-server --lib connection_rpc_gate` passes all 6
tests, including the new close-versus-drain regression coverage.
b128da272e · 2026-06-08 10:20:54 -07:00
History
..
2026-05-18 21:33:05 -07:00
2026-06-04 09:16:03 -07:00