Files
codex/codex-rs/exec-server/tests/selected_capability_roots.rs
jif 8f02973d25 Persist selected capability roots and resolve availability per model step (#29856)
## Why

`selectedCapabilityRoots` is durable thread intent: “use this capability
root from environment `worker`.”

The important product assumption is:

> One environment ID always names the same logical executor and stable
contents.

`worker` does not silently change from executor A to an unrelated
executor B. The process-local connection handle for `worker` can still
be replaced while Codex is running, though, for example when
`environment/add` registers a fresh handle for the same logical
environment.

The thread should persist only the stable selection. Each model step
should pair that selection with the exact ready handle captured for that
step.

## The boundary

```text
persisted thread intent
  plugin@1 -> environment "worker"
                |
                | capture the current step
                v
model-step view
  unavailable, or
  plugin@1 + worker's exact captured ready handle
```

The environment ID is the stable identity and cache key. The
`Arc<Environment>` is only a process-local handle retained so consumers
of one model step use the same captured environment. It is never
persisted and it does not imply different environment contents.

## What changes

### Persist the stable selection

Selected roots are written into `SessionMeta` and restored with the
thread. Forked subagents inherit the same selections, including
bounded-history forks.

Only stable data is persisted: root ID, environment ID, and root path.

### Capture readiness together with the exact handle

The environment snapshot records:

```rust
environment_id -> Some(Arc<Environment>) // ready in this step
environment_id -> None                   // still starting in this step
```

This prevents readiness and execution from coming from different
registry snapshots.

For example:

```text
step snapshot: worker -> handle A, ready
environment/add: worker -> fresh handle B for the same logical environment
current step: plugin@1 still uses captured handle A
```

Without carrying handle A in the snapshot, the resolver could combine “A
was ready” with handle B and treat B as ready before it had finished
starting.

This does not change cache invalidation. Stable capability metadata
remains identified by environment ID and capability root. Replacing a
process-local handle under the same stable environment ID does not
invalidate or rediscover that metadata.

### Resolve availability per model step

- A ready captured environment produces resolved roots using its
captured handle.
- A starting, missing, or failed environment is omitted from that step.
- A selected lazy environment that is outside the turn's captured
environment set is asked to start, and a later step can observe it as
ready.
- No capability files are scanned here.

Transient transport disconnects remain the remote client's reconnect
concern. This PR models initial attachment/readiness; it does not add
live socket-connectivity state.

## Example

```text
thread selection: plugin@1 -> environment "worker"

step 1: worker is starting -> plugin@1 unavailable
step 2: worker is ready    -> plugin@1 resolves through worker's captured handle
step 3: fresh local handle -> current step remains pinned; a later step captures its own view
```

Temporary unavailability does not discard the durable selection. Later
PRs can retain stable metadata caches while projecting only currently
available capabilities into model-visible World State.

## Compatibility

The app-server request shape does not change. Older rollouts without
`selected_capability_roots` deserialize to an empty list.

## Stack

1. **This PR:** persist stable selected roots and resolve them through
an exact model-step handle.
2. #29960: cache stable skill metadata and project available skills into
World State.
3. #29946: cache stable plugin declarations and manage the separate live
MCP runtime.
2026-06-25 17:49:43 +00:00

70 lines
2.2 KiB
Rust

#![cfg(unix)]
mod common;
use std::collections::HashMap;
use std::sync::Arc;
use codex_exec_server::EnvironmentManager;
use codex_protocol::capabilities::CapabilityRootLocation;
use codex_protocol::capabilities::SelectedCapabilityRoot;
use codex_utils_path_uri::PathUri;
use common::exec_server::exec_server;
use pretty_assertions::assert_eq;
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn selected_capability_roots_use_captured_handle_after_replacement() -> anyhow::Result<()> {
let mut executor = exec_server().await?;
let manager = EnvironmentManager::without_environments();
let selected_root = SelectedCapabilityRoot {
id: "demo@1".to_string(),
location: CapabilityRootLocation::Environment {
environment_id: "tools".to_string(),
path: PathUri::parse("file:///plugins/demo")?,
},
};
manager.upsert_environment(
"tools".to_string(),
executor.websocket_url().to_string(),
/*connect_timeout*/ None,
)?;
let environment_a = manager
.get_environment("tools")
.expect("executor A should be registered");
environment_a.wait_until_ready().await?;
let unavailable = manager
.resolve_selected_capability_roots(
std::slice::from_ref(&selected_root),
&HashMap::from([("tools".to_string(), None)]),
)
.await;
assert!(unavailable.is_empty());
let captured_environments =
HashMap::from([("tools".to_string(), Some(Arc::clone(&environment_a)))]);
// Replace only the process-local handle; the stable environment ID and executor stay the same.
manager.upsert_environment(
"tools".to_string(),
executor.websocket_url().to_string(),
/*connect_timeout*/ None,
)?;
let available = manager
.resolve_selected_capability_roots(
std::slice::from_ref(&selected_root),
&captured_environments,
)
.await;
let [resolved] = available.as_slice() else {
anyhow::bail!("selected root should resolve through its stable environment");
};
assert_eq!(resolved.selected_root(), &selected_root);
assert!(Arc::ptr_eq(resolved.environment(), &environment_a));
executor.shutdown().await?;
Ok(())
}