## Why
`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”
The important product assumption is:
> One environment ID always names the same logical executor and stable
contents.
`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.
The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.
## The boundary
```text
persisted thread intent
plugin@1 -> environment "worker"
|
| capture the current step
v
model-step view
unavailable, or
plugin@1 + worker's exact captured ready handle
```
The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.
## What changes
### Persist the stable selection
Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.
Only stable data is persisted: root ID, environment ID, and root path.
### Capture readiness together with the exact handle
The environment snapshot records:
```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None // still starting in this step
```
This prevents readiness and execution from coming from different
registry snapshots.
For example:
```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```
Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.
This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.
### Resolve availability per model step
- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.
Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.
## Example
```text
thread selection: plugin@1 -> environment "worker"
step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```
Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.
## Compatibility
The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.
## Stack
1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.
codex-exec-server
codex-exec-server is the library backing codex exec-server, a small
JSON-RPC server for spawning and controlling subprocesses through
codex-utils-pty.
It provides:
- a CLI entrypoint:
codex exec-server - a Rust client:
ExecServerClient - a small protocol module with shared request/response types
This crate owns the transport, protocol, and filesystem/process handlers. The
top-level codex binary owns hidden helper dispatch for sandboxed
filesystem operations and codex-linux-sandbox.
Transport
The server speaks the exec-specific codex-exec-server-protocol message
envelope on the wire.
The CLI entrypoint supports:
ws://IP:PORT(default)--remote URL --environment-id ID [--name NAME]
Remote mode registers the local exec-server with the environment registry,
then reconnects to the service-provided rendezvous websocket as the environment.
Remote communication uses the Noise relay contract; the registry and harness
must support it.
It uses the standard Codex ChatGPT sign-in state; run codex login first when
remote registration needs authentication. Containerized callers that receive an
Agent Identity JWT in CODEX_ACCESS_TOKEN can opt into that auth path with
--use-agent-identity-auth; Codex then registers an Agent task and sends the
derived AgentAssertion headers on the registry request.
Alternatively, API users can instead use CODEX_API_KEY;
Codex sends it as a bearer token on the registration request. For example:
CODEX_API_KEY="$OPENAI_API_KEY" \
codex exec-server \
--remote ... \
--environment-id "$ENVIRONMENT_ID"
Wire framing:
- local websocket: one JSON-RPC message per websocket frame
- Noise remote websocket: binary protobuf relay frames carrying encrypted payloads
Remote Relay Message Format
In remote mode, the harness and environment communicate through rendezvous using
codex.exec_server.relay.v1.RelayMessageFrame; the checked-in schema is in
src/proto/codex.exec_server.relay.v1.proto. The relay frame carries stream
identity plus endpoint-owned reliability metadata:
version
stream_id
body // handshake | data | ack_frame | resume | reset | heartbeat
ack // highest contiguous peer segment seq received
ack_bits // bitset for peer segment seqs after ack
seq // data only: segment sequence number
segment_index // data only: 0-based index within message
segment_count // data only: number of segments in message
payload // handshake bytes or encrypted data record
next_seq // resume only: next sender seq
reason // reset only: reset reason
stream_id identifies one virtual harness/environment JSON-RPC session on the
environment websocket. The harness generates a UUIDv4 stream_id; the environment
demuxes frames by stream_id and runs an independent ConnectionProcessor per
stream.
Use segment-level sequence numbers for reliability:
seq = 0, 1, 2, 3, ...
Use contiguous segment sequence ranges to identify and stitch a segmented application message:
message_start_seq = seq - segment_index
segment_index = 0
segment_count = 1
message_start_seq is derived by the receiver, not sent on the wire. For
unsplit messages, message_start_seq == seq, segment_index == 0, and
segment_count == 1.
Use cumulative ack plus fixed-size ack_bits instead of variable ack ranges:
ack = highest contiguous received segment seq
bit i in ack_bits acknowledges seq = ack + 1 + i
Send ack and ack_bits redundantly on every outbound frame. Acks are not
themselves acked. Acks, retries, duplicate suppression, segmentation, and
reassembly are endpoint responsibilities; rendezvous only routes relay frames
by stream_id.
Lifecycle
Each connection follows this sequence:
- Send
initialize. - Wait for the
initializeresponse. - Send
initialized. - Call process or filesystem RPCs.
If the server receives any notification other than initialized, it replies
with an error using request id -1.
If the websocket connection closes, the server terminates any remaining managed processes for that client connection.
API
initialize
Initial handshake request.
Request params:
{
"clientName": "my-client"
}
Response:
{}
initialized
Handshake acknowledgement notification sent by the client after a successful
initialize response.
Params are currently ignored. Sending any other notification method is treated as an invalid request.
process/start
Starts a new managed process.
Request params:
{
"processId": "proc-1",
"argv": ["bash", "-lc", "printf 'hello\\n'"],
"cwd": "file:///absolute/working/directory",
"env": {
"PATH": "/usr/bin:/bin"
},
"tty": true,
"pipeStdin": false,
"arg0": null
}
Field definitions:
processId: caller-chosen stable id for this process within the connection.argv: command vector. It must be non-empty.cwd:file:URI for the child process working directory.env: environment variables passed to the child process.tty: whentrue, spawn a PTY-backed interactive process.pipeStdin: whentrue, keep non-PTY stdin writable viaprocess/write.arg0: optional argv0 override forwarded tocodex-utils-pty.
Response:
{
"processId": "proc-1"
}
Behavior notes:
- Reusing an existing
processIdis rejected. - PTY-backed processes accept later writes through
process/write. - Non-PTY processes reject writes unless
pipeStdinistrue. - Output is streamed asynchronously via
process/output. - Exit is reported asynchronously via
process/exited.
process/read
Reads buffered output and terminal state for a managed process.
Request params:
{
"processId": "proc-1",
"afterSeq": null,
"maxBytes": 65536,
"waitMs": 1000
}
Field definitions:
processId: managed process id returned byprocess/start.afterSeq: optional sequence number cursor; when present, only newer chunks are returned.maxBytes: optional response byte budget.waitMs: optional long-poll timeout in milliseconds.
Response:
{
"chunks": [],
"nextSeq": 1,
"exited": false,
"exitCode": null,
"closed": false,
"failure": null
}
process/write
Writes raw bytes to a running process stdin.
Request params:
{
"processId": "proc-1",
"chunk": "aGVsbG8K"
}
chunk is base64-encoded raw bytes. In the example above it is hello\n.
Response:
{
"status": "accepted"
}
Behavior notes:
- Writes to an unknown
processIdare rejected. - Writes to a non-PTY process are rejected unless it started with
pipeStdin.
process/terminate
Terminates a running managed process.
Request params:
{
"processId": "proc-1"
}
Response:
{
"running": true
}
If the process is already unknown or already removed, the server responds with:
{
"running": false
}
Notifications
process/output
Streaming output chunk from a running process.
Params:
{
"processId": "proc-1",
"seq": 1,
"stream": "stdout",
"chunk": "aGVsbG8K"
}
Fields:
processId: process identifierseq: per-process output sequence numberstream:"stdout","stderr", or"pty"chunk: base64-encoded output bytes
process/exited
Final process exit notification.
Params:
{
"processId": "proc-1",
"seq": 2,
"exitCode": 0
}
process/closed
Notification emitted after process output is closed and the process handle is removed.
Params:
{
"processId": "proc-1"
}
Filesystem RPCs
Filesystem methods require valid file: URI strings and return JSON-RPC errors
for invalid or unavailable paths. Native absolute path strings are rejected;
callers must convert them to file: URIs before sending requests:
fs/readFilefs/open,fs/readBlock, andfs/close(internal transport forExecutorFileSystem::read_file_stream)fs/writeFilefs/createDirectoryfs/getMetadatafs/canonicalizefs/readDirectoryfs/removefs/copy
Each filesystem request accepts an optional sandbox object. When sandbox
contains a ReadOnly or WorkspaceWrite policy, the operation runs in a
hidden helper process launched from the top-level codex executable and
prepared through the shared sandbox transform path. Helper requests and
responses are passed over stdin/stdout.
Errors
The server returns JSON-RPC errors with these codes:
-32600: invalid request-32602: invalid params-32603: internal error
Typical error cases:
- unknown method
- malformed params
- empty
argv - duplicate
processId - writes to unknown processes
- writes to non-PTY processes
- sandbox-denied filesystem operations
Rust surface
The crate exports:
ExecServerClientExecServerErrorExecServerClientConnectOptionsRemoteExecServerConnectArgs- protocol request/response structs for process and filesystem RPCs
DEFAULT_LISTEN_URLandExecServerListenUrlParseErrorExecServerRuntimePathsrun_main()for embedding the websocket serverRemoteEnvironmentConfigandrun_remote_environment()for embedding remote registration mode
Callers must pass ExecServerRuntimePaths to run_main(). The top-level
codex exec-server command builds these paths from the codex arg0 dispatch
state. RemoteEnvironmentConfig::new(...) also takes the auth provider that
remote registration should use; the CLI builds that provider from Codex auth
state before starting remote mode.
Example session
Initialize:
{"id":1,"method":"initialize","params":{"clientName":"example-client"}}
{"id":1,"result":{}}
{"method":"initialized","params":{}}
Start a process:
{"id":2,"method":"process/start","params":{"processId":"proc-1","argv":["bash","-lc","printf 'ready\\n'; while IFS= read -r line; do printf 'echo:%s\\n' \"$line\"; done"],"cwd":"file:///tmp","env":{"PATH":"/usr/bin:/bin"},"tty":true,"pipeStdin":false,"arg0":null}}
{"id":2,"result":{"processId":"proc-1"}}
{"method":"process/output","params":{"processId":"proc-1","seq":1,"stream":"stdout","chunk":"cmVhZHkK"}}
Write to the process:
{"id":3,"method":"process/write","params":{"processId":"proc-1","chunk":"aGVsbG8K"}}
{"id":3,"result":{"status":"accepted"}}
{"method":"process/output","params":{"processId":"proc-1","seq":2,"stream":"stdout","chunk":"ZWNobzpoZWxsbwo="}}
Terminate it:
{"id":4,"method":"process/terminate","params":{"processId":"proc-1"}}
{"id":4,"result":{"running":true}}
{"method":"process/exited","params":{"processId":"proc-1","seq":3,"exitCode":0}}
{"method":"process/closed","params":{"processId":"proc-1"}}