Files
jif 39aab9fc45 Pipeline bounded AGENTS.md and Git root probes (#29870)
## Why

When Codex uses a remote `ExecutorFileSystem`, every `get_metadata` call
is an exec-server round trip. Upward discovery currently pays those
round trips serially in two latency-sensitive places:

- session startup, while locating the configured project root before
loading `AGENTS.md`; and
- Git-root discovery, which runs before per-turn Git diff enrichment.

The goal is to remove the serial ancestor dependency without adding a
new filesystem RPC, JSON-RPC batch method, Git executable dependency, or
cache.

## Example

Assume this layout, with `.git` as the configured project-root marker:

```text
/workspace/repo/.git
/workspace/repo/AGENTS.md
/workspace/repo/crates/core/    <- cwd
```

The marker probes have this required precedence:

```text
1. /workspace/repo/crates/core/.git
2. /workspace/repo/crates/.git
3. /workspace/repo/.git
4. /workspace/.git
5. /.git
```

Previously, probe 2 was not sent until probe 1 returned, and probe 3 was
not sent until probe 2 returned. With this change, the client lazily
keeps up to eight ordinary `fs/getMetadata` requests in flight, but
consumes their results in the order above. Codex must still learn that
probes 1 and 2 are absent before accepting probe 3, so the nearest root
always wins. Once probe 3 succeeds, the client has its answer and stops
awaiting probes 4 and 5. Requests that were already sent may still
finish on the worker.

For the marker phase alone, with a 50 ms client-to-worker round trip and
fast local metadata calls, finding the root at probe 3 changes from
roughly three serialized round trips (150 ms) to one round trip plus
worker processing. The later `AGENTS.md` candidate phase remains
separate and ordered.

Only after `/workspace/repo` is selected does `AGENTS.md` discovery
check instruction candidates, in root-to-cwd order:

```text
/workspace/repo/AGENTS.override.md
/workspace/repo/AGENTS.md
/workspace/repo/crates/AGENTS.override.md
/workspace/repo/crates/AGENTS.md
/workspace/repo/crates/core/AGENTS.override.md
/workspace/repo/crates/core/AGENTS.md
```

The first configured candidate found in each directory wins. These
checks remain ordered and no instruction candidate above
`/workspace/repo` is issued. Git-root discovery uses the same bounded
lookup with only `.git` as the marker.

## What changed

- Added a client-side find-up helper that generates `ancestor x marker`
probes lazily, nearest directory first and configured marker order
within each directory.
- Uses an ordered concurrency window of eight scalar metadata requests.
This bounds executor load while preserving nearest-root and marker
precedence.
- Reuses the helper for both configured project-root discovery and
remote Git-root discovery.
- Keeps Git ancestor and marker construction in `AbsolutePathBuf`,
converting only each complete `.git` probe to `PathUri`. This preserves
native paths that require an opaque URI fallback, such as Windows
namespace paths.
- Preserves existing error behavior: `AGENTS.md` discovery propagates
non-`NotFound` metadata errors, while Git discovery treats a failed
marker probe as absent and continues upward.
- Reads each discovered `AGENTS.md` directly instead of statting it a
second time.

No filesystem trait or exec-server protocol method is added. An empty
`project_root_markers` list performs no ancestor-marker I/O and checks
instruction candidates only in `cwd`. This change also deliberately does
not cache roots across turns.

## Symlinks

Upward traversal remains **lexical**. The helper does not canonicalize
`cwd`; it appends marker names to the supplied path and walks that
path's textual parents. The filesystem performs the actual metadata/read
operation, and the current local and exec-server implementations follow
live symlink targets.

For example:

```text
/tmp/pkg -> /workspace/repo/packages/pkg
cwd = /tmp/pkg/src
actual Git marker = /workspace/repo/.git
```

The lexical probes are `/tmp/pkg/src/.git`, `/tmp/pkg/.git`,
`/tmp/.git`, and `/.git`. They do not jump from `/tmp/pkg` to the
target's parent `/workspace/repo`, so this spelling of `cwd` does not
discover `/workspace/repo/.git`. That is the existing behavior and is
unchanged by this PR.

Conversely, if `/tmp/repo -> /workspace/repo`, then probing
`/tmp/repo/.git` follows the directory symlink and finds
`/workspace/repo/.git`; the reported root remains the lexical path
`/tmp/repo`. A live symlink used directly as `.git`, another configured
marker, or `AGENTS.md` is also followed. A symlinked `AGENTS.md` is
loaded when its target is a regular file, while a broken symlink behaves
as `NotFound`.
39aab9fc45 ยท 2026-06-24 22:58:34 +01:00
History
..
2026-04-27 18:48:57 -07:00

codex-git-utils

Helpers for interacting with git, including patch application. The crate also exposes a lightweight baseline API for internal directories that use git only as a resettable diff mechanism: ensure_git_baseline_repository preserves a usable root/.git baseline or creates one when it is missing or unusable, reset_git_repository replaces root/.git with a fresh one-commit baseline, and diff_since_latest_init returns structured file changes plus a unified diff from that baseline to the current directory contents.

use std::path::Path;

use codex_git_utils::{apply_git_patch, ApplyGitRequest};

let repo = Path::new("/path/to/repo");

// Apply a patch (omitted here) to the repository.
let request = ApplyGitRequest {
    cwd: repo.to_path_buf(),
    diff: String::from("...diff contents..."),
    revert: false,
    preflight: false,
};
let result = apply_git_patch(&request)?;