Merge pull request #204 from Lum1104/feat/semantic-batching-and-output-chunking

fix(#159): semantic batching + bundled importMap + Phase 1 speedup
2026-06-22 10:58:03 +08:00 · 2026-05-24 20:12:14 +08:00
parent 42d70c3f9c
commit a59a573a1d
30 changed files with 12235 additions and 307 deletions
@@ -1,7 +1,7 @@
 {
  "name": "understand-anything",
  "description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
-  "version": "2.7.4",
+  "version": "2.7.5",
  "author": {
    "name": "Lum1104"
  },
@@ -1,7 +1,7 @@
 {
  "name": "understand-anything",
  "description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
-  "version": "2.7.4",
+  "version": "2.7.5",
  "author": {
    "name": "Lum1104"
  },
@@ -2,7 +2,7 @@
  "name": "understand-anything",
  "displayName": "Understand Anything",
  "description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
-  "version": "2.7.4",
+  "version": "2.7.5",
  "author": {
    "name": "Lum1104"
  },
@@ -33,4 +33,4 @@ jobs:
        run: pnpm --filter @understand-anything/core test

      - name: Test skill
-        run: pnpm --filter @understand-anything/skill test
+        run: pnpm test
@@ -35,7 +35,7 @@ An open-source tool combining LLM intelligence + static analysis to produce inte
 - `pnpm --filter @understand-anything/core build` — Build the core package
 - `pnpm --filter @understand-anything/core test` — Run core tests
 - `pnpm --filter @understand-anything/skill build` — Build the plugin package
- `pnpm --filter @understand-anything/skill test` — Run plugin tests
+- `pnpm test` — Run all tests (skill tests live at repo-root `tests/skill/`, picked up by root `vitest.config.ts`)
 - `pnpm --filter @understand-anything/dashboard build` — Build the dashboard
 - `pnpm dev:dashboard` — Start dashboard dev server
 - `pnpm lint` — Run ESLint across the project
@@ -0,0 +1,587 @@
+# Semantic Batching and Output Chunking Design
+
+**Date:** 2026-05-24
+**Status:** Draft
+**Branch:** `feat/semantic-batching-and-output-chunking`
+**Issue:** [#159](https://github.com/Lum1104/Understand-Anything/issues/159) — Frequently seeing output limit exceeded
+
+---
+
+## Problem
+
+The `/understand` skill's Phase 2 dispatches `file-analyzer` subagents in batches of 20-30 files each (`skills/understand/SKILL.md:282`). Two issues compound on output-constrained LLM backends (notably Bedrock OPUS with default max_tokens of 4096-8192):
+
+1. **Output cap pressure.** Each `file-analyzer` writes one `batch-<N>.json` containing all nodes (file + functions + classes) and edges for its batch. For 25 dense files the JSON content easily exceeds the per-turn `Write(content=...)` token budget. The agent improvises by entering an undefined "minimal output mode" and drops nodes/edges silently. Issue #159 reports this for OPUS on Bedrock at the 100-file scale.
+
+2. **Count-based batching breaks module semantics.** Files are batched by count, not by logical relationship. Files that import each other (and would together form an `auth` module, an `api` module, etc.) get split across batches. The file-analyzer only sees within-batch edges confidently; `calls`/`related`/`inherits`/`implements` edges between modules get dropped at batch boundaries.
+
+The existing `recover_imports_from_scan` in `merge-batch-graphs.py:913` is a deterministic safety net for `imports` edges — but it cannot recover semantic edges (calls / related / inherits / implements). Those are lost.
+
+---
+
+## Goals
+
+- Eliminate "Batch X failed (output limit)" from `/understand` runs on Bedrock OPUS for projects up to 500 files.
+- Improve cross-batch semantic edge coverage by replacing count-based batching with Louvain community detection on the import graph.
+- Maintain `imports` edge coverage parity (no regression on existing safety net).
+- Stay within one PR — defer broader refactors to follow-ups (Section "Out of scope").
+
+## Non-goals
+
+- Refactoring Phase 1 / 2 tree-sitter usage to deduplicate per-batch extraction.
+- Adding LLM-generated file summaries to neighborMap.
+- Auto-tuning output thresholds per provider.
+
+---
+
+## Architecture
+
+Pipeline before:
+
+```
+Phase 1   project-scanner          → scan-result.json (files + importMap)
+Phase 2   file-analyzer (×N concur) → batch-<i>.json (one per batch; SKILL.md prose batching)
+Phase 2末 merge-batch-graphs.py    → assembled-graph.json
+```
+
+Pipeline after:
+
+```
+Phase 1   project-scanner          → scan-result.json (unchanged)
+Phase 1.5 compute-batches.mjs      → batches.json (NEW — semantic batching + neighborMap)
+Phase 2   file-analyzer (×N concur) → batch-<i>.json (single) OR batch-<i>-part-<k>.json (split)
+Phase 2末 merge-batch-graphs.py    → assembled-graph.json (verified, no code change)
+```
+
+**Phase 1.5 single responsibility:** topology decision + neighborMap construction. Pure algorithm — reads `scan-result.json`, writes `batches.json`, no LLM calls.
+
+**Phase 2 changes:** SKILL.md stops doing prose batching; iterates `batches.json` and dispatches one file-analyzer per batch.
+
+**file-analyzer changes:** consumes neighborMap; self-checks output size before writing; splits into `batch-<i>-part-<k>.json` when above thresholds.
+
+**merge-batch-graphs.py:** no code changes — the `batch-*.json` glob and sort-key regex already accept multi-part naming. Test fixture and stderr report enhancement added.
+
+---
+
+## Component 1 — `compute-batches.mjs`
+
+**Location:** `understand-anything-plugin/skills/understand/compute-batches.mjs`
+
+**Invocation:** `node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT [--changed-files=<path>]`
+
+**Input:** `$PROJECT_ROOT/.understand-anything/intermediate/scan-result.json`
+
+**Output:** `$PROJECT_ROOT/.understand-anything/intermediate/batches.json`
+
+### Dependencies
+
+Added to `understand-anything-plugin/package.json`:
+
+- `graphology` (~10KB)
+- `graphology-communities-louvain` (~30KB)
+
+Reuses `@understand-anything/core`'s `TreeSitterPlugin` and `PluginRegistry` (already imported by `extract-structure.mjs`).
+
+### Algorithm
+
+```
+1. Load scan-result.json.
+
+2. Partition files by fileCategory:
+   - codeFiles = files where fileCategory === "code"
+   - nonCodeFiles = the rest
+
+3. Code batching (Louvain on import graph):
+   a. Build undirected graph: nodes = codeFiles, edges = importMap relations
+      (weight=1, undirected so import and imported-by both count).
+   b. Run graphology-communities-louvain → community assignment per file.
+   c. For any community with size > 35 (max): split via edge-betweenness greedy
+      cut (or simpler weakly-connected-component partition) until each
+      sub-community ≤ 35. Log warning per split.
+      (Whether this branch fires is decided by the implementation prototype
+      step — see "Prototype-first implementation" below.)
+   d. Communities with size < 5 are kept as-is. Wasted dispatches are
+      bounded by the 5-concurrent cap, and the alternative ("merge small")
+      adds edge cases without proportional value.
+
+4. Non-code batching (hardcoded heuristics, moved from SKILL.md prose):
+   - Group A: For each directory containing a `Dockerfile`, bundle that
+     directory's `Dockerfile` + any `docker-compose.*` + any
+     `.dockerignore` → one batch per such directory (so multi-service
+     repos with several Dockerfiles get one batch per service).
+   - Group B: `.github/workflows/*.yml` files → one batch.
+   - Group C: `.gitlab-ci.yml` + files under `.circleci/` → one batch.
+   - Group D: SQL files under any `migrations/` or `migration/` directory,
+     sorted by filename → one batch per directory.
+   - Group E: All other non-code files grouped by their immediate parent
+     directory, max 20 per batch.
+
+5. Assign batchIndex: code communities first (1..N), non-code groups
+   second (N+1..M).
+
+6. Exports extraction:
+   - For each code file, run TreeSitterPlugin.extract() and collect
+     top-level exports (function names, class names, exported const names).
+   - Per-file failures: catch, set exports = [], emit warning.
+   - Non-code files: exports = [].
+
+7. Construct neighborMap (1-hop):
+   For each file F in batch B:
+     neighborMap[F.path] = [
+       { path: G.path, batchIndex: G.batch, symbols: G.exports }
+       for G in importMap[F.path] ∪ reverseImportMap[F.path]
+       where G.batch ≠ B
+     ]
+   If neighborMap[F.path].length > 50, truncate to top 50 by neighbor
+   degree (highest-imported neighbors kept), emit warning.
+
+8. Construct batchImportData:
+   For each batch B:
+     batchImportData[F.path] = importMap[F.path]  for F in B.files
+
+9. Write batches.json.
+
+Fallback (script-internal): If steps 3a-3c throw, catch → emit warning
+→ assign batches by alphabetical chunking (12 files per code batch).
+Steps 4, 6, 7, 8 still run normally. Set `algorithm: "count-fallback"`
+in the output.
+```
+
+### Louvain implementation
+
+Use `graphology-communities-louvain`'s default modularity-greedy algorithm:
+
+```js
+import Graph from 'graphology';
+import louvain from 'graphology-communities-louvain';
+
+const graph = new Graph({ type: 'undirected' });
+for (const file of codeFiles) graph.addNode(file.path);
+for (const [src, targets] of Object.entries(importMap)) {
+  for (const tgt of targets) {
+    if (graph.hasNode(src) && graph.hasNode(tgt) && !graph.hasEdge(src, tgt)) {
+      graph.addEdge(src, tgt);
+    }
+  }
+}
+const communities = louvain(graph); // { nodeId: communityId }
+```
+
+### Output schema (`batches.json`)
+
+```json
+{
+  "schemaVersion": 1,
+  "algorithm": "louvain",
+  "totalFiles": 100,
+  "totalBatches": 7,
+  "batches": [
+    {
+      "batchIndex": 1,
+      "files": [
+        { "path": "src/auth/login.ts", "language": "typescript",
+          "sizeLines": 120, "fileCategory": "code" }
+      ],
+      "batchImportData": {
+        "src/auth/login.ts": ["src/auth/session.ts", "src/db/users.ts"]
+      },
+      "neighborMap": {
+        "src/auth/login.ts": [
+          { "path": "src/db/users.ts", "batchIndex": 3,
+            "symbols": ["User", "findById", "createUser"] }
+        ]
+      }
+    }
+  ]
+}
+```
+
+`algorithm` is `"louvain"` on the happy path, `"count-fallback"` when the Louvain branch crashed.
+
+### `--changed-files` mode
+
+When invoked with `--changed-files=<path>`, the script:
+
+- Loads file paths from `<path>` (one per line).
+- Still builds the full project import graph (for accurate neighborMap construction).
+- Only emits batches containing changed files.
+- neighborMap entries reference unchanged files with their batchIndex from the deterministic full-graph Louvain re-run. The seed is fixed so the assignment is reproducible across incremental invocations.
+
+### Prototype-first implementation
+
+Before writing the full script, build a minimal skeleton:
+
+1. Load `scan-result.json` from this repo's `.understand-anything/` directory (if absent, generate via `/understand --full`).
+2. Run Louvain only — no size enforcement, no neighborMap.
+3. Print community size distribution.
+4. Decide: do real-world communities cluster in [5, 35]? If yes, size enforcement branch may be unnecessary or trivially defensive. If no, implement edge-betweenness split.
+
+This gates the more speculative code (size enforcement) on empirical observation rather than upfront design.
+
+---
+
+## Component 2 — `skills/understand/SKILL.md` changes
+
+### Add — Phase 1.5 section (after Phase 1)
+
+```markdown
+## Phase 1.5 — BATCH
+
+Report: `[Phase 1.5/7] Computing semantic batches...`
+
+Run the bundled batching script:
+\`\`\`bash
+node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT
+\`\`\`
+
+Reads `.understand-anything/intermediate/scan-result.json`, writes
+`.understand-anything/intermediate/batches.json`.
+
+Capture stderr. Append any line starting with `Warning:` to
+$PHASE_WARNINGS for the final report.
+
+If the script exits non-zero, the failure is hard — relay the full
+stderr to the user as a Phase 1.5 failure. Do not attempt to recover;
+the script's internal fallback (count-based) already handles recoverable
+issues. A non-zero exit means a fundamental problem (missing input file,
+malformed JSON, etc.).
+```
+
+### Replace — Phase 2 ANALYZE section (current SKILL.md:280-332)
+
+Delete the existing "Batch the file list from Phase 1 into groups of 20-30 files each" prose, the non-code grouping prose (now in compute-batches), and the dispatch-time `batchImportData` construction prose (now provided in batches.json). Replace with:
+
+```markdown
+## Phase 2 — ANALYZE
+
+### Full analysis path
+
+Load `.understand-anything/intermediate/batches.json` (produced by
+Phase 1.5). Iterate the `batches[]` array.
+
+Report: `[Phase 2/7] Analyzing files — <totalFiles> files in
+<totalBatches> batches (up to 5 concurrent)...`
+
+For each batch, dispatch a `file-analyzer` subagent (up to 5
+concurrent). Dispatch prompt template:
+
+> Analyze these files and produce GraphNode and GraphEdge objects.
+> Project root: `$PROJECT_ROOT`
+> Project: `<projectName>`
+> Languages: `<languages>`
+> Batch: `<batchIndex>/<totalBatches>`
+> Skill directory: `<SKILL_DIR>`
+> Output: write to
+> `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
+> (single-file mode) OR `batch-<batchIndex>-part-<k>.json` (split mode,
+> per Step B of your output protocol).
+>
+> Pre-resolved import data (use directly — do NOT re-resolve from source):
+> \`\`\`json
+> <batchImportData JSON inline from batches.json[i].batchImportData>
+> \`\`\`
+>
+> Cross-batch neighbors with their exported symbols (confidence boost
+> for cross-batch edges):
+> \`\`\`json
+> <neighborMap JSON inline from batches.json[i].neighborMap>
+> \`\`\`
+>
+> Files to analyze:
+> 1. `<path>` (<sizeLines> lines, language: `<language>`,
+>    fileCategory: `<fileCategory>`)
+> ...
+
+$LANGUAGE_DIRECTIVE
+
+After ALL batches complete, run the merge-and-normalize script:
+\`\`\`bash
+python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
+\`\`\`
+
+(Rest of Phase 2 unchanged.)
+```
+
+### Replace — Incremental update path (current SKILL.md:355-366)
+
+```markdown
+### Incremental update path
+
+Run compute-batches.mjs with `--changed-files=<path>`, where `<path>`
+is a temp file listing changed file paths (one per line). The script
+reuses the full project's import graph for neighborMap computation
+but only emits batches containing changed files. Dispatch file-analyzer
+subagents per the same template as the full path.
+```
+
+### Line budget
+
+Net added LLM-context prose: Phase 1.5 (~12 lines) + Phase 2 template clarifications (~5 lines) − removed batching prose (~15 lines) − removed batchImportData construction prose (~6 lines) ≈ **−4 lines**.
+
+---
+
+## Component 3 — `agents/file-analyzer.md` changes
+
+### Add — Cross-batch context section
+
+Insert after "Step 1: Input file construction":
+
+```markdown
+### Cross-batch context (neighborMap)
+
+Your dispatch prompt includes a `neighborMap` — for each file in your
+batch, it lists project-internal neighbors in OTHER batches (files that
+import yours or that you import), with their exported symbols.
+
+Use neighborMap as a confidence boost for cross-batch edges (`calls`,
+`related`, `inherits`, `implements` to nodes outside your batch):
+
+- If your source clearly references a symbol that appears in some
+  `neighbor.symbols`, emit the edge to
+  `function:<neighbor.path>:<symbol>` or
+  `class:<neighbor.path>:<symbol>` with confidence.
+- If your source references a cross-batch symbol that is NOT in
+  neighborMap (the project-scanner may not have extracted it), you may
+  still emit the edge if you saw it explicitly in the imported file's
+  surface — but prefer matching neighborMap symbols when available.
+- Imports continue to use `batchImportData` (fully resolved), not
+  neighborMap.
+
+The merge script's dangling-edge dropper is the safety net for
+genuinely unresolvable targets.
+```
+
+### Replace — Writing Results section (current file-analyzer.md:467-475)
+
+```markdown
+## Writing Results — single or multi-part
+
+**Step A — Compute totals.**
+\`\`\`
+nodeCount = nodes.length
+edgeCount = edges.length
+\`\`\`
+
+**Step B — Decide split.**
+- If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to
+  `.understand-anything/intermediate/batch-<batchIndex>.json`. Done.
+  Skip to Step E.
+- Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`.
+
+**Step C — Partition.**
+Sort files in your batch alphabetically by path. Chunk them sequentially
+into `parts` groups of size `ceil(N / parts)`. For each part:
+- All nodes whose `filePath` is in this part's files (for non-file
+  nodes like `module`/`concept`, use the file they belong to).
+- All edges whose `source` is in this part's nodes (target may be
+  anywhere — same part, different part of same batch, different batch).
+
+**Step D — Write each part.**
+Write part `k` (1-indexed) to
+`.understand-anything/intermediate/batch-<batchIndex>-part-<k>.json`.
+Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`.
+
+**Step E — Self-validate.**
+For each file written, verify:
+- Valid JSON.
+- `nodes` array exists and is well-formed.
+- For every edge: `source` and `target` both appear as either (a) a
+  node `id` in this part's nodes, OR (b) a `file:<path>` reference
+  where `<path>` is in `neighborMap` or `batchImportData`, OR (c) a
+  `function:<path>:<symbol>` / `class:<path>:<symbol>` reference where
+  `<symbol>` is in some `neighbor.symbols`.
+
+If validation fails on a part, do NOT silently rebuild. Respond with
+an explicit error stating which part failed, which edge(s) failed
+validation, and why. The dispatching session can then retry.
+
+**Step F — Respond.**
+Respond with ONLY a brief text summary: parts written (1 or more),
+total nodes/edges across all parts, any files skipped. Do NOT include
+JSON content in the response.
+```
+
+### Threshold rationale
+
+`60 nodes / 120 edges per part` derives from:
+
+- File node JSON serialized ≈ 150-300 chars; function/class ≈ 80-150 chars; edge ≈ 100-150 chars.
+- 60 nodes + 120 edges ≈ 25-35KB JSON ≈ 7000-9000 output tokens (JSON tokenization is dense).
+- Bedrock OPUS default `max_tokens` 4096-8192 → ~10% safety margin.
+
+These constants live as file-analyzer.md prose for now. Auto-tuning per provider is deferred to follow-up.
+
+---
+
+## Component 4 — `merge-batch-graphs.py` (verify-only)
+
+### Confirmed compatibility
+
+The existing glob and sort-key already handle multi-part files transparently:
+
+- `intermediate_dir.glob("batch-*.json")` matches `batch-3-part-1.json`.
+- `re.search(r"batch-(\d+)", p.stem)` extracts `3` from `batch-3-part-1`, giving the same sort key as `batch-3.json`. Python `sorted` is stable, so parts load in lexicographic tie-break order.
+- `merge_and_normalize` walks `all_nodes.extend(...)` / `all_edges.extend(...)`; load order does not affect dedup correctness.
+- `recover_imports_from_scan` operates on the merged graph — transparent to multi-part inputs.
+- `link_tests` operates on the merged node pool — transparent.
+
+No code change required for correctness.
+
+### Add — Multi-part awareness in stderr report
+
+`merge-batch-graphs.py:1026` currently prints `Found {N} batch files:`. Enhance:
+
+```python
+from collections import defaultdict
+by_batch = defaultdict(list)
+for f in batch_files:
+    m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name)
+    if m:
+        by_batch[int(m.group(1))].append(f.name)
+
+logical_count = len(by_batch)
+multi_part = sum(1 for files in by_batch.values() if len(files) > 1)
+print(
+    f"Found {len(batch_files)} batch files "
+    f"({logical_count} logical batches, {multi_part} multi-part)",
+    file=sys.stderr,
+)
+```
+
+### Add — Missing-part warning
+
+After grouping, detect logical batches with non-contiguous part numbers (e.g. parts `{2, 3}` present but `1` missing) and emit:
+
+```
+Warning: merge: batch <i> has parts {<set>} but missing part {<missing>}
+  — possible truncated write — affected nodes/edges may be lost
+```
+
+---
+
+## Failure modes & observability
+
+| Failure point | Behavior | Safety net | Required warning text |
+|---|---|---|---|
+| Louvain library throws | exception | Script-internal: catch → count-based fallback (12 files/batch); neighborMap still built | `Warning: compute-batches: Louvain failed (<msg>) — falling back to count-based grouping (12 files/batch) — module semantic boundaries lost` |
+| tree-sitter exports per-file failure | empty exports | symbols=[] in neighborMap | `Warning: compute-batches: exports extraction failed for <path> (<msg>) — symbols=[] in neighborMap — cross-batch edges to this file limited to file-level` |
+| Louvain produces oversized community | size > 35 | Edge-betweenness split | `Warning: compute-batches: community size <N> > max 35 — splitting via edge-betweenness — modularity may decrease` |
+| compute-batches complete crash | exit non-zero, no batches.json | SKILL.md surfaces full stderr to user; no Phase 2 fallback | (script's own error to stderr; SKILL.md relays verbatim) |
+| neighborMap truncation | > 50 neighbors | Top-50 by degree kept | `Warning: compute-batches: neighborMap for <path> truncated from <N> to top 50 (by neighbor degree)` |
+| file-analyzer part JSON malformed | `load_batch` skips | Existing `load_batch:139` warns and skips | (existing — verify the warning is not swallowed) |
+| Missing part in multi-part batch | gap in parts | merge detects and warns | `Warning: merge: batch <i> has parts {<set>} but missing part {<missing>} — possible truncated write — affected nodes/edges may be lost` |
+| file-analyzer dangling edges | source/target missing | merge drops, adds to `unfixable` (existing) | (existing) |
+| file-analyzer dispatch fails | subagent error | existing retry-once mechanism | (existing) |
+
+### Observability invariant
+
+Every fallback / degrade / drop MUST:
+
+1. Write a stderr line in `Warning: <component>: <what happened> — <why> — <impact>` format.
+2. Bubble up to `$PHASE_WARNINGS` (SKILL.md existing mechanism) → user-facing Phase 7 final report.
+3. Never use silent `catch {}` / `except: pass`. Code review treats this as a blocker.
+
+### Invariants
+
+1. **scan-result.json is source of truth.** Any batching/topology change preserves importMap; `recover_imports_from_scan` always restores `imports` edges.
+2. **Dangling-edge dropper is final defense.** No batch-generated edge can connect to a nonexistent node in the assembled graph.
+3. **No silent fallback.** `batches.json` missing → loud failure. Internal compute-batches fallback → loud warning that bubbles to user.
+
+---
+
+## Testing
+
+### Unit tests — `compute-batches.mjs`
+
+New file: `understand-anything-plugin/skills/understand/test_compute_batches.test.mjs` (Vitest).
+
+Required cases:
+
+- **Louvain basic:** 3 disjoint cliques → 3 batches.
+- **Empty importMap:** independent files → count-fallback batches by alphabetical chunking.
+- **Oversized community:** 50-node complete graph → split triggered, all sub-batches ≤ 35.
+- **Non-code grouping A:** `Dockerfile` + `docker-compose.yml` + `.dockerignore` siblings → one batch per directory cluster.
+- **Non-code grouping B:** `.github/workflows/*.yml` → one batch.
+- **Non-code grouping C:** SQL migrations under `migrations/` → one batch per directory.
+- **Mixed code + non-code:** non-code batchIndex follows code batches.
+- **neighborMap correctness:** file A imports file B across batches → `neighborMap[A]` contains `{path: B, batchIndex: B's, symbols: B's exports}`.
+- **neighborMap excludes same-batch:** A and C in same batch → `neighborMap[A]` does not contain C.
+- **Exports failure tolerance:** mock TreeSitter to throw on one file → `exports = []` for that file, others unaffected.
+- **`--changed-files`:** input subset → output contains only batches with changed files; neighborMap may reference unchanged files.
+- **Fallback triggers:** mock Louvain throw → `algorithm` field = `"count-fallback"`, warning in stderr.
+- **Warning assertion per fallback:** for each of {Louvain crash, exports failure, oversize split, neighborMap truncation}, assert the exact warning string appears in stderr.
+
+### Unit tests — `merge-batch-graphs.py`
+
+New test class `TestMultiPart` in `test_merge_batch_graphs.py`:
+
+- Two parts of one logical batch: `batch-1-part-1.json` + `batch-1-part-2.json` → assembled contains all nodes/edges from both.
+- Three parts of one logical batch.
+- Cross-part edges: edge with source in part-1, target node in part-2 → connected after merge.
+- Malformed part-1 + valid part-2: part-1 skipped with warning, part-2 contents present.
+- Mixed single-batch and multi-part inputs.
+- Missing part detection: `batch-1-part-2.json` + `batch-1-part-3.json` (no part-1) → warning emitted with exact text.
+- stderr format: assert `"X logical batches, Y multi-part"` appears.
+
+### Integration — PR acceptance gate (manual)
+
+Documented in the PR's Test plan:
+
+- [ ] `pnpm install` (graphology installs cleanly).
+- [ ] `pnpm --filter @understand-anything/core build`.
+- [ ] Run `/understand --full` on this repo (Understand-Anything itself):
+  - `batches.json` generated; community size distribution sanity-check (mix of small and medium batches).
+  - At least one batch produces multi-part output.
+  - `assembled-graph.json` node/edge counts within expected range vs current main.
+  - Dashboard renders normally.
+  - Phase 7 final report includes any `$PHASE_WARNINGS` from compute-batches (visually verify warnings reach user-facing output, not just stderr).
+- [ ] Run on a ~100-file repo matching ayushghosh's scenario; confirm no "output limit" errors.
+- [ ] Run on a 5-10 file small repo: fallback path (all one batch) works correctly.
+
+### Not tested
+
+- Louvain algorithm correctness (trust `graphology-communities-louvain`'s own tests).
+- Performance benchmarks (sub-second on 100-500 files is empirical; not gated).
+- Multiple LLM provider output-cap variations (thresholds are conservative for Bedrock OPUS; first-party Anthropic is more permissive).
+
+---
+
+## Out of scope (tracked for follow-up)
+
+### Tree-sitter deduplication
+
+Currently Phase 1 (project-scanner), Phase 1.5 (compute-batches), and Phase 2 (file-analyzer per-batch) each run tree-sitter independently. Consolidating into a single Phase 1.5 structure extraction would simplify file-analyzer and save time on large projects. Defer because it requires reorganizing file-analyzer's protocol significantly.
+
+### neighborMap LLM summaries
+
+Adding one-sentence summaries per file to neighborMap would enable file-analyzer to emit `related` edges across batches with semantic justification. Requires a new lightweight summary-pass agent; defer until the tree-sitter dedup lands (Phase 1.5 will already have full structure → cheaper to add).
+
+### Adaptive thresholds
+
+`60 nodes / 120 edges` are conservative for Bedrock OPUS. Anthropic first-party supports much larger output caps. Adding a `--output-cap=<N>` CLI to compute-batches and propagating to file-analyzer would unlock larger parts on permissive backends. Track real-world part counts before implementing.
+
+### Cross-batch edge audit
+
+A post-merge audit comparing neighborMap-suggested edges vs actually-emitted edges would surface gaps. Mirror the existing `recover_imports_from_scan` pattern. Requires preserving `batches.json` for merge-time consumption.
+
+### Multi-language monorepo handling
+
+Multi-language repos (TS + Python) tend to naturally split via Louvain (no cross-language imports). Bridge files (OpenAPI, protobuf) might create odd communities. Address only if real reports surface.
+
+---
+
+## Implementation order
+
+1. **Prototype:** minimal `compute-batches.mjs` skeleton — load scan-result.json, run Louvain, print community sizes. Run against this repo's `scan-result.json` (generate if missing via `/understand --full`). Decide whether size-enforcement branch is needed; if needed, choose between edge-betweenness and weakly-connected-component split.
+2. Add exports extraction (reuse TreeSitterPlugin).
+3. Add neighborMap construction + batchImportData passthrough.
+4. Add non-code grouping heuristics (Groups A-E).
+5. Add fallback path + warning emissions for every failure mode listed in the Failure modes table.
+6. Write unit tests for compute-batches (per Testing section), including warning-text assertions.
+7. Modify `agents/file-analyzer.md` — add Cross-batch context section, replace Writing Results.
+8. Modify `skills/understand/SKILL.md` — add Phase 1.5, replace Phase 2 ANALYZE batching prose, replace incremental path.
+9. Add multi-part stderr report + missing-part warning to `merge-batch-graphs.py`.
+10. Write unit tests for `merge-batch-graphs.py` multi-part handling.
+11. Add `graphology` + `graphology-communities-louvain` to `understand-anything-plugin/package.json`.
+12. Run integration acceptance gate.
+13. Bump version in all five `package.json` / `plugin.json` files per the project's CLAUDE.md versioning rule.
@@ -7,7 +7,7 @@
  "scripts": {
    "prepare": "pnpm --filter @understand-anything/core build",
    "build": "pnpm -r build",
-    "test": "vitest",
+    "test": "vitest run",
    "dev:dashboard": "pnpm --filter @understand-anything/dashboard dev",
    "lint": "eslint ."
  },
@@ -38,6 +38,12 @@ importers:
      '@understand-anything/core':
        specifier: workspace:*
        version: link:packages/core
+      graphology:
+        specifier: ~0.26.0
+        version: 0.26.0(graphology-types@0.24.8)
+      graphology-communities-louvain:
+        specifier: ^2.0.2
+        version: 2.0.2(graphology-types@0.24.8)
    devDependencies:
      '@types/node':
        specifier: ^22.0.0
@@ -1861,6 +1867,11 @@ packages:
    peerDependencies:
      graphology-types: '>=0.24.0'

+  graphology@0.26.0:
+    resolution: {integrity: sha512-8SSImzgUUYC89Z042s+0r/vMibY7GX/Emz4LDO5e7jYXhuoWfHISPFJYjpRLUSJGq6UQ6xlenvX1p/hJdfXuXg==}
+    peerDependencies:
+      graphology-types: '>=0.24.0'
+
  h3@1.15.11:
    resolution: {integrity: sha512-L3THSe2MPeBwgIZVSH5zLdBBU90TOxarvhK9d04IDY2AmVS8j2Jz2LIWtwsGOU3lu2I5jCN7FNvVfY2+XyF+mg==}

@@ -4966,6 +4977,11 @@ snapshots:
      graphology-types: 0.24.8
      obliterator: 2.0.5

+  graphology@0.26.0(graphology-types@0.24.8):
+    dependencies:
+      events: 3.3.0
+      graphology-types: 0.24.8
+
  h3@1.15.11:
    dependencies:
      cookie-es: 1.2.3
@@ -0,0 +1,31 @@
+{
+  "name": "fixture-3-cliques",
+  "description": "Three disjoint import cliques for Louvain testing",
+  "languages": ["typescript"],
+  "frameworks": [],
+  "files": [
+    {"path": "src/auth/login.ts", "language": "typescript", "sizeLines": 50, "fileCategory": "code"},
+    {"path": "src/auth/session.ts", "language": "typescript", "sizeLines": 40, "fileCategory": "code"},
+    {"path": "src/auth/tokens.ts", "language": "typescript", "sizeLines": 60, "fileCategory": "code"},
+    {"path": "src/api/handlers.ts", "language": "typescript", "sizeLines": 80, "fileCategory": "code"},
+    {"path": "src/api/middleware.ts", "language": "typescript", "sizeLines": 30, "fileCategory": "code"},
+    {"path": "src/api/routes.ts", "language": "typescript", "sizeLines": 45, "fileCategory": "code"},
+    {"path": "src/db/users.ts", "language": "typescript", "sizeLines": 70, "fileCategory": "code"},
+    {"path": "src/db/queries.ts", "language": "typescript", "sizeLines": 55, "fileCategory": "code"},
+    {"path": "src/db/migrations.ts", "language": "typescript", "sizeLines": 35, "fileCategory": "code"}
+  ],
+  "totalFiles": 9,
+  "filteredByIgnore": 0,
+  "estimatedComplexity": "small",
+  "importMap": {
+    "src/auth/login.ts": ["src/auth/session.ts", "src/auth/tokens.ts"],
+    "src/auth/session.ts": ["src/auth/tokens.ts"],
+    "src/auth/tokens.ts": [],
+    "src/api/handlers.ts": ["src/api/middleware.ts", "src/api/routes.ts"],
+    "src/api/middleware.ts": ["src/api/routes.ts", "src/auth/session.ts"],
+    "src/api/routes.ts": [],
+    "src/db/users.ts": ["src/db/queries.ts", "src/db/migrations.ts"],
+    "src/db/queries.ts": ["src/db/migrations.ts"],
+    "src/db/migrations.ts": []
+  }
+}
@@ -0,0 +1,233 @@
+{
+  "name": "fixture-merge-respects-non-mergeable",
+  "description": "Regression guard for mergeSmallBatches: a small non-mergeable batch (Dockerfile cluster, marked mergeable=false by buildNonCodeBatches Group A) must NOT be pooled into the misc bucket alongside isolated code singletons, even though its size (1) is well below MIN_BATCH_SIZE=3. Pooling Dockerfiles into misc would destroy the semantic atom — an LLM analyzing the misc batch loses the per-service infra context.",
+  "languages": [
+    "typescript",
+    "dockerfile"
+  ],
+  "frameworks": [],
+  "files": [
+    {
+      "path": "src/leaf000.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf001.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf002.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf003.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf004.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf005.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf006.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf007.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf008.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf009.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf010.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf011.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf012.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf013.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf014.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf015.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf016.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf017.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf018.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf019.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf020.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf021.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf022.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf023.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf024.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf025.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf026.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf027.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf028.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf029.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "services/api/Dockerfile",
+      "language": "dockerfile",
+      "sizeLines": 18,
+      "fileCategory": "infra"
+    }
+  ],
+  "totalFiles": 31,
+  "filteredByIgnore": 0,
+  "estimatedComplexity": "moderate",
+  "importMap": {
+    "src/leaf000.ts": [],
+    "src/leaf001.ts": [],
+    "src/leaf002.ts": [],
+    "src/leaf003.ts": [],
+    "src/leaf004.ts": [],
+    "src/leaf005.ts": [],
+    "src/leaf006.ts": [],
+    "src/leaf007.ts": [],
+    "src/leaf008.ts": [],
+    "src/leaf009.ts": [],
+    "src/leaf010.ts": [],
+    "src/leaf011.ts": [],
+    "src/leaf012.ts": [],
+    "src/leaf013.ts": [],
+    "src/leaf014.ts": [],
+    "src/leaf015.ts": [],
+    "src/leaf016.ts": [],
+    "src/leaf017.ts": [],
+    "src/leaf018.ts": [],
+    "src/leaf019.ts": [],
+    "src/leaf020.ts": [],
+    "src/leaf021.ts": [],
+    "src/leaf022.ts": [],
+    "src/leaf023.ts": [],
+    "src/leaf024.ts": [],
+    "src/leaf025.ts": [],
+    "src/leaf026.ts": [],
+    "src/leaf027.ts": [],
+    "src/leaf028.ts": [],
+    "src/leaf029.ts": [],
+    "services/api/Dockerfile": []
+  }
+}
@@ -0,0 +1,38 @@
+{
+  "name": "fixture-non-code",
+  "description": "Mix of non-code files exercising Groups A-E. The src/ clique has 3 mutually-importing files so it survives merge-small (size >= MIN_BATCH_SIZE=3) and stays a pure-code batch — required by the 'non-code batch indices follow code batches' assertion.",
+  "languages": ["typescript", "dockerfile", "yaml", "sql", "markdown"],
+  "frameworks": [],
+  "files": [
+    {"path": "src/index.ts", "language": "typescript", "sizeLines": 10, "fileCategory": "code"},
+    {"path": "src/server.ts", "language": "typescript", "sizeLines": 15, "fileCategory": "code"},
+    {"path": "src/router.ts", "language": "typescript", "sizeLines": 12, "fileCategory": "code"},
+    {"path": "Dockerfile", "language": "dockerfile", "sizeLines": 20, "fileCategory": "infra"},
+    {"path": "docker-compose.yml", "language": "yaml", "sizeLines": 15, "fileCategory": "infra"},
+    {"path": ".dockerignore", "language": "config", "sizeLines": 5, "fileCategory": "config"},
+    {"path": "services/api/Dockerfile", "language": "dockerfile", "sizeLines": 18, "fileCategory": "infra"},
+    {"path": "services/api/docker-compose.yml", "language": "yaml", "sizeLines": 12, "fileCategory": "infra"},
+    {"path": ".github/workflows/ci.yml", "language": "yaml", "sizeLines": 30, "fileCategory": "infra"},
+    {"path": ".github/workflows/deploy.yml", "language": "yaml", "sizeLines": 25, "fileCategory": "infra"},
+    {"path": ".gitlab-ci.yml", "language": "yaml", "sizeLines": 20, "fileCategory": "infra"},
+    {"path": ".circleci/config.yml", "language": "yaml", "sizeLines": 25, "fileCategory": "infra"},
+    {"path": "migrations/001_init.sql", "language": "sql", "sizeLines": 40, "fileCategory": "data"},
+    {"path": "migrations/002_users.sql", "language": "sql", "sizeLines": 20, "fileCategory": "data"},
+    {"path": "docs/getting-started.md", "language": "markdown", "sizeLines": 100, "fileCategory": "docs"},
+    {"path": "README.md", "language": "markdown", "sizeLines": 200, "fileCategory": "docs"}
+  ],
+  "totalFiles": 16,
+  "filteredByIgnore": 0,
+  "estimatedComplexity": "small",
+  "importMap": {
+    "src/index.ts": ["src/server.ts", "src/router.ts"],
+    "src/server.ts": ["src/router.ts"],
+    "src/router.ts": [],
+    "Dockerfile": [], "docker-compose.yml": [], ".dockerignore": [],
+    "services/api/Dockerfile": [], "services/api/docker-compose.yml": [],
+    ".github/workflows/ci.yml": [], ".github/workflows/deploy.yml": [],
+    ".gitlab-ci.yml": [], ".circleci/config.yml": [],
+    "migrations/001_init.sql": [], "migrations/002_users.sql": [],
+    "docs/getting-started.md": [], "README.md": []
+  }
+}
@@ -0,0 +1,715 @@
+{
+  "name": "fixture-singletons",
+  "description": "100 isolated TS files that should merge into ~4 misc batches",
+  "languages": [
+    "typescript"
+  ],
+  "frameworks": [],
+  "files": [
+    {
+      "path": "src/leaf000.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf001.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf002.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf003.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf004.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf005.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf006.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf007.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf008.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf009.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf010.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf011.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf012.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf013.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf014.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf015.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf016.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf017.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf018.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf019.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf020.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf021.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf022.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf023.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf024.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf025.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf026.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf027.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf028.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf029.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf030.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf031.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf032.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf033.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf034.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf035.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf036.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf037.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf038.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf039.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf040.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf041.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf042.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf043.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf044.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf045.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf046.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf047.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf048.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf049.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf050.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf051.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf052.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf053.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf054.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf055.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf056.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf057.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf058.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf059.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf060.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf061.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf062.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf063.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf064.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf065.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf066.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf067.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf068.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf069.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf070.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf071.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf072.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf073.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf074.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf075.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf076.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf077.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf078.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf079.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf080.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf081.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf082.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf083.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf084.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf085.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf086.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf087.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf088.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf089.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf090.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf091.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf092.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf093.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf094.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf095.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf096.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf097.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf098.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    },
+    {
+      "path": "src/leaf099.ts",
+      "language": "typescript",
+      "sizeLines": 10,
+      "fileCategory": "code"
+    }
+  ],
+  "totalFiles": 100,
+  "filteredByIgnore": 0,
+  "estimatedComplexity": "moderate",
+  "importMap": {
+    "src/leaf000.ts": [],
+    "src/leaf001.ts": [],
+    "src/leaf002.ts": [],
+    "src/leaf003.ts": [],
+    "src/leaf004.ts": [],
+    "src/leaf005.ts": [],
+    "src/leaf006.ts": [],
+    "src/leaf007.ts": [],
+    "src/leaf008.ts": [],
+    "src/leaf009.ts": [],
+    "src/leaf010.ts": [],
+    "src/leaf011.ts": [],
+    "src/leaf012.ts": [],
+    "src/leaf013.ts": [],
+    "src/leaf014.ts": [],
+    "src/leaf015.ts": [],
+    "src/leaf016.ts": [],
+    "src/leaf017.ts": [],
+    "src/leaf018.ts": [],
+    "src/leaf019.ts": [],
+    "src/leaf020.ts": [],
+    "src/leaf021.ts": [],
+    "src/leaf022.ts": [],
+    "src/leaf023.ts": [],
+    "src/leaf024.ts": [],
+    "src/leaf025.ts": [],
+    "src/leaf026.ts": [],
+    "src/leaf027.ts": [],
+    "src/leaf028.ts": [],
+    "src/leaf029.ts": [],
+    "src/leaf030.ts": [],
+    "src/leaf031.ts": [],
+    "src/leaf032.ts": [],
+    "src/leaf033.ts": [],
+    "src/leaf034.ts": [],
+    "src/leaf035.ts": [],
+    "src/leaf036.ts": [],
+    "src/leaf037.ts": [],
+    "src/leaf038.ts": [],
+    "src/leaf039.ts": [],
+    "src/leaf040.ts": [],
+    "src/leaf041.ts": [],
+    "src/leaf042.ts": [],
+    "src/leaf043.ts": [],
+    "src/leaf044.ts": [],
+    "src/leaf045.ts": [],
+    "src/leaf046.ts": [],
+    "src/leaf047.ts": [],
+    "src/leaf048.ts": [],
+    "src/leaf049.ts": [],
+    "src/leaf050.ts": [],
+    "src/leaf051.ts": [],
+    "src/leaf052.ts": [],
+    "src/leaf053.ts": [],
+    "src/leaf054.ts": [],
+    "src/leaf055.ts": [],
+    "src/leaf056.ts": [],
+    "src/leaf057.ts": [],
+    "src/leaf058.ts": [],
+    "src/leaf059.ts": [],
+    "src/leaf060.ts": [],
+    "src/leaf061.ts": [],
+    "src/leaf062.ts": [],
+    "src/leaf063.ts": [],
+    "src/leaf064.ts": [],
+    "src/leaf065.ts": [],
+    "src/leaf066.ts": [],
+    "src/leaf067.ts": [],
+    "src/leaf068.ts": [],
+    "src/leaf069.ts": [],
+    "src/leaf070.ts": [],
+    "src/leaf071.ts": [],
+    "src/leaf072.ts": [],
+    "src/leaf073.ts": [],
+    "src/leaf074.ts": [],
+    "src/leaf075.ts": [],
+    "src/leaf076.ts": [],
+    "src/leaf077.ts": [],
+    "src/leaf078.ts": [],
+    "src/leaf079.ts": [],
+    "src/leaf080.ts": [],
+    "src/leaf081.ts": [],
+    "src/leaf082.ts": [],
+    "src/leaf083.ts": [],
+    "src/leaf084.ts": [],
+    "src/leaf085.ts": [],
+    "src/leaf086.ts": [],
+    "src/leaf087.ts": [],
+    "src/leaf088.ts": [],
+    "src/leaf089.ts": [],
+    "src/leaf090.ts": [],
+    "src/leaf091.ts": [],
+    "src/leaf092.ts": [],
+    "src/leaf093.ts": [],
+    "src/leaf094.ts": [],
+    "src/leaf095.ts": [],
+    "src/leaf096.ts": [],
+    "src/leaf097.ts": [],
+    "src/leaf098.ts": [],
+    "src/leaf099.ts": []
+  }
+}
@@ -0,0 +1,602 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { mkdtempSync, mkdirSync, writeFileSync, readFileSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { spawnSync } from 'node:child_process';
+import { fileURLToPath } from 'node:url';
+import { dirname, resolve } from 'node:path';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const SCRIPT = resolve(__dirname, '../../../understand-anything-plugin/skills/understand/compute-batches.mjs');
+const FIXTURES = resolve(__dirname, 'fixtures');
+
+function runScript(projectRoot, extraArgs = []) {
+  return spawnSync('node', [SCRIPT, projectRoot, ...extraArgs], {
+    encoding: 'utf-8',
+  });
+}
+
+function setupProject(fixtureName) {
+  const root = mkdtempSync(join(tmpdir(), 'ua-cb-test-'));
+  mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+  const fixturePath = join(FIXTURES, fixtureName);
+  const dest = join(root, '.understand-anything', 'intermediate', 'scan-result.json');
+  writeFileSync(dest, readFileSync(fixturePath, 'utf-8'));
+  return root;
+}
+
+function readBatches(projectRoot) {
+  const p = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json');
+  return JSON.parse(readFileSync(p, 'utf-8'));
+}
+
+describe('compute-batches.mjs — Louvain basic', () => {
+  let projectRoot;
+
+  beforeEach(() => {
+    projectRoot = setupProject('scan-result-3-cliques.json');
+  });
+
+  afterEach(() => {
+    if (projectRoot) rmSync(projectRoot, { recursive: true, force: true });
+  });
+
+  it('produces 3 batches for 3 disjoint cliques', () => {
+    const result = runScript(projectRoot);
+    expect(result.status).toBe(0);
+
+    const batches = readBatches(projectRoot);
+    expect(batches.algorithm).toBe('louvain');
+    expect(batches.totalFiles).toBe(9);
+    expect(batches.batches.length).toBe(3);
+    expect(batches.schemaVersion).toBe(1);
+    expect(batches.totalBatches).toBe(3);
+    expect(batches.batches.map(b => b.batchIndex)).toEqual([1, 2, 3]);
+
+    // Each batch should contain exactly one clique (3 files)
+    for (const b of batches.batches) {
+      expect(b.files.length).toBe(3);
+      const dirs = new Set(b.files.map(f => f.path.split('/')[1]));
+      expect(dirs.size).toBe(1); // all files in the batch share src/<dir>/
+    }
+  });
+
+  it('produces deterministic output across runs', () => {
+    const r1 = runScript(projectRoot);
+    expect(r1.status).toBe(0);
+    const json1 = readFileSync(
+      join(projectRoot, '.understand-anything', 'intermediate', 'batches.json'),
+      'utf-8',
+    );
+
+    const r2 = runScript(projectRoot);
+    expect(r2.status).toBe(0);
+    const json2 = readFileSync(
+      join(projectRoot, '.understand-anything', 'intermediate', 'batches.json'),
+      'utf-8',
+    );
+
+    expect(json1).toBe(json2);
+  });
+});
+
+describe('compute-batches.mjs — size enforcement', () => {
+  let projectRoot;
+
+  beforeEach(() => {
+    projectRoot = setupProject('scan-result-large-community.json');
+  });
+
+  afterEach(() => {
+    if (projectRoot) rmSync(projectRoot, { recursive: true, force: true });
+  });
+
+  it('splits a 40-node clique into batches ≤ 35', () => {
+    const result = runScript(projectRoot);
+    expect(result.status).toBe(0);
+
+    const batches = readBatches(projectRoot);
+    expect(batches.algorithm).toBe('louvain');  // confirm fallback didn't fire
+    expect(batches.totalFiles).toBe(40);
+    expect(batches.batches.length).toBe(2);
+    expect(batches.batches.map(b => b.files.length).sort()).toEqual([20, 20]);
+    // Sum of all batch file counts equals total files
+    const sum = batches.batches.reduce((acc, b) => acc + b.files.length, 0);
+    expect(sum).toBe(40);
+    // Warning was emitted to stderr
+    expect(result.stderr).toMatch(/Warning: compute-batches: community size 40 > max 35/);
+  });
+});
+
+describe('compute-batches.mjs — exports extraction', () => {
+  let root;
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('populates exports for code files via tree-sitter', () => {
+    root = mkdtempSync(join(tmpdir(), 'ua-cb-exp-'));
+    mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+    mkdirSync(join(root, 'src'), { recursive: true });
+    writeFileSync(join(root, 'src', 'a.ts'),
+      'export function greet(name: string) { return "hi " + name; }\n' +
+      'export class Greeter { greet(n: string) { return "hi " + n; } }\n');
+    writeFileSync(join(root, 'src', 'b.ts'),
+      'import { greet } from "./a";\nexport const helper = () => greet("world");\n');
+
+    const scan = {
+      name: 'exports-test',
+      description: '',
+      languages: ['typescript'],
+      frameworks: [],
+      files: [
+        { path: 'src/a.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+        { path: 'src/b.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+      ],
+      totalFiles: 2, filteredByIgnore: 0, estimatedComplexity: 'small',
+      importMap: { 'src/a.ts': [], 'src/b.ts': ['src/a.ts'] },
+    };
+    writeFileSync(
+      join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
+      JSON.stringify(scan));
+
+    const result = runScript(root);
+    expect(result.status).toBe(0);
+
+    const batches = readBatches(root);
+    expect(batches.exportsByPath).toBeDefined();
+    expect(batches.exportsByPath['src/a.ts']).toEqual(
+      expect.arrayContaining(['greet', 'Greeter']));
+    expect(batches.exportsByPath['src/b.ts']).toEqual(
+      expect.arrayContaining(['helper']));
+  });
+
+  it('emits warning when file is missing from disk (read error path)', () => {
+    root = mkdtempSync(join(tmpdir(), 'ua-cb-exp-err-'));
+    mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+    // Note: NOT creating the file on disk — scan-result.json references it,
+    // but the file doesn't exist, so the read branch fires.
+    const scan = {
+      name: 'missing-file-test',
+      description: '',
+      languages: ['typescript'],
+      frameworks: [],
+      files: [
+        { path: 'src/missing.ts', language: 'typescript', sizeLines: 1, fileCategory: 'code' },
+      ],
+      totalFiles: 1, filteredByIgnore: 0, estimatedComplexity: 'small',
+      importMap: { 'src/missing.ts': [] },
+    };
+    writeFileSync(
+      join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
+      JSON.stringify(scan));
+
+    const result = runScript(root);
+    expect(result.status).toBe(0);  // script must still succeed
+    expect(result.stderr).toMatch(
+      /Warning: compute-batches: exports extraction failed for src\/missing\.ts \(read error:/);
+
+    const batches = readBatches(root);
+    expect(batches.exportsByPath['src/missing.ts']).toEqual([]);
+  });
+});
+
+describe('compute-batches.mjs — non-code grouping', () => {
+  let root;
+  let batches;
+
+  beforeEach(() => {
+    root = setupProject('scan-result-non-code.json');
+    const result = runScript(root);
+    expect(result.status).toBe(0);
+    batches = readBatches(root);
+  });
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('Group A: bundles Dockerfile cluster per directory', () => {
+    // Root-level cluster: Dockerfile + docker-compose.yml + .dockerignore → one batch
+    const rootDockerBatch = batches.batches.find(b =>
+      b.files.some(f => f.path === 'Dockerfile'));
+    expect(rootDockerBatch).toBeDefined();
+    const paths = rootDockerBatch.files.map(f => f.path).sort();
+    expect(paths).toEqual(['.dockerignore', 'Dockerfile', 'docker-compose.yml']);
+
+    // services/api cluster is a separate batch
+    const apiDockerBatch = batches.batches.find(b =>
+      b.files.some(f => f.path === 'services/api/Dockerfile'));
+    expect(apiDockerBatch).toBeDefined();
+    expect(apiDockerBatch).not.toBe(rootDockerBatch);
+    expect(apiDockerBatch.files.map(f => f.path).sort()).toEqual([
+      'services/api/Dockerfile', 'services/api/docker-compose.yml',
+    ]);
+  });
+
+  it('Group B: .github/workflows/* all in one batch', () => {
+    const wfBatch = batches.batches.find(b =>
+      b.files.some(f => f.path.startsWith('.github/workflows/')));
+    expect(wfBatch).toBeDefined();
+    const wfPaths = wfBatch.files.map(f => f.path).filter(p => p.startsWith('.github/workflows/'));
+    expect(wfPaths.sort()).toEqual([
+      '.github/workflows/ci.yml', '.github/workflows/deploy.yml',
+    ]);
+  });
+
+  it('Group C: .gitlab-ci.yml + .circleci/* in one batch', () => {
+    const ciBatch = batches.batches.find(b =>
+      b.files.some(f => f.path === '.gitlab-ci.yml'));
+    expect(ciBatch).toBeDefined();
+    const ciPaths = ciBatch.files.map(f => f.path).sort();
+    expect(ciPaths).toEqual(['.circleci/config.yml', '.gitlab-ci.yml']);
+  });
+
+  it('Group D: SQL migrations under migrations/ in one batch', () => {
+    const migBatch = batches.batches.find(b =>
+      b.files.some(f => f.path.startsWith('migrations/')));
+    expect(migBatch).toBeDefined();
+    const migPaths = migBatch.files.map(f => f.path).filter(p => p.startsWith('migrations/'));
+    expect(migPaths.sort()).toEqual([
+      'migrations/001_init.sql', 'migrations/002_users.sql',
+    ]);
+  });
+
+  it('non-code batch indices follow code batches', () => {
+    const codeBatches = batches.batches.filter(b =>
+      b.files.every(f => f.fileCategory === 'code'));
+    const nonCodeBatches = batches.batches.filter(b =>
+      b.files.some(f => f.fileCategory !== 'code'));
+    expect(codeBatches.length).toBeGreaterThan(0);
+    expect(nonCodeBatches.length).toBeGreaterThan(0);
+    const maxCodeIdx = Math.max(...codeBatches.map(b => b.batchIndex));
+    const minNonCodeIdx = Math.min(...nonCodeBatches.map(b => b.batchIndex));
+    expect(minNonCodeIdx).toBeGreaterThan(maxCodeIdx);
+  });
+});
+
+describe('compute-batches.mjs — Group E MAX_E split', () => {
+  let root;
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('splits 25 .md files under docs/ into [20, 5]', () => {
+    root = mkdtempSync(join(tmpdir(), 'ua-cb-maxe-'));
+    mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+
+    const files = [];
+    const importMap = {};
+    for (let i = 0; i < 25; i++) {
+      const p = `docs/page${String(i).padStart(2, '0')}.md`;
+      files.push({ path: p, language: 'markdown', sizeLines: 10, fileCategory: 'docs' });
+      importMap[p] = [];
+    }
+    const scan = {
+      name: 'maxe-test', description: '',
+      languages: ['markdown'], frameworks: [],
+      files, totalFiles: 25, filteredByIgnore: 0,
+      estimatedComplexity: 'small', importMap,
+    };
+    writeFileSync(
+      join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
+      JSON.stringify(scan));
+
+    const result = runScript(root);
+    expect(result.status).toBe(0);
+
+    const batches = readBatches(root);
+    // All 25 docs/ files go through Group E with MAX_E = 20, split into [20, 5].
+    const docsBatches = batches.batches.filter(b =>
+      b.files.every(f => f.path.startsWith('docs/')));
+    expect(docsBatches.length).toBe(2);
+    const sizes = docsBatches.map(b => b.files.length).sort((a, b) => b - a);
+    expect(sizes).toEqual([20, 5]);
+  });
+});
+
+describe('compute-batches.mjs — neighborMap + batchImportData', () => {
+  let batches;
+  let batchOf;  // path → batchIndex
+  let projectRoot;
+
+  beforeEach(() => {
+    projectRoot = setupProject('scan-result-3-cliques.json');
+    const result = runScript(projectRoot);
+    expect(result.status).toBe(0);
+    batches = readBatches(projectRoot);
+    batchOf = new Map();
+    for (const b of batches.batches) {
+      for (const f of b.files) batchOf.set(f.path, b.batchIndex);
+    }
+  });
+
+  afterEach(() => {
+    if (projectRoot) rmSync(projectRoot, { recursive: true, force: true });
+  });
+
+  it('batchImportData mirrors scan importMap per batch', () => {
+    for (const b of batches.batches) {
+      for (const f of b.files) {
+        expect(b.batchImportData[f.path]).toBeDefined();
+        expect(Array.isArray(b.batchImportData[f.path])).toBe(true);
+      }
+    }
+    // src/auth/login.ts imports src/auth/session.ts and src/auth/tokens.ts
+    const loginBatch = batches.batches.find(b =>
+      b.files.some(f => f.path === 'src/auth/login.ts'));
+    expect(loginBatch.batchImportData['src/auth/login.ts'].sort()).toEqual([
+      'src/auth/session.ts', 'src/auth/tokens.ts',
+    ]);
+  });
+
+  it('neighborMap excludes same-batch files', () => {
+    // The fixture's three cliques each go into one batch — all imports are
+    // intra-batch, so no neighbor map should reference any same-batch file.
+    for (const b of batches.batches) {
+      const sameBatchPaths = new Set(b.files.map(f => f.path));
+      for (const [, neighbors] of Object.entries(b.neighborMap)) {
+        for (const n of neighbors) {
+          expect(sameBatchPaths.has(n.path)).toBe(false);
+        }
+      }
+    }
+  });
+
+  it('neighborMap entries carry symbols when target has exports', () => {
+    const root = mkdtempSync(join(tmpdir(), 'ua-cb-nbr-'));
+    mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+    mkdirSync(join(root, 'src', 'a'), { recursive: true });
+    mkdirSync(join(root, 'src', 'b'), { recursive: true });
+
+    // Cluster A: 3 tightly-imported files. a/core.ts exports symbols.
+    writeFileSync(join(root, 'src', 'a', 'core.ts'),
+      'export function findUser(id: string) { return null; }\nexport class User {}\n');
+    writeFileSync(join(root, 'src', 'a', 'helper1.ts'),
+      'import { findUser } from "./core";\nexport const h1 = () => findUser("x");\n');
+    writeFileSync(join(root, 'src', 'a', 'helper2.ts'),
+      'import { User } from "./core";\nimport { h1 } from "./helper1";\nexport const h2 = () => h1();\n');
+
+    // Cluster B: 3 tightly-imported files. b/entry.ts has ONE cross-cluster import to a/core.ts.
+    writeFileSync(join(root, 'src', 'b', 'entry.ts'),
+      'import { findUser } from "../a/core";\nexport const entry = () => findUser("y");\n');
+    writeFileSync(join(root, 'src', 'b', 'middle.ts'),
+      'import { entry } from "./entry";\nexport const middle = () => entry();\n');
+    writeFileSync(join(root, 'src', 'b', 'leaf.ts'),
+      'import { middle } from "./middle";\nexport const leaf = () => middle();\n');
+
+    const files = [
+      { path: 'src/a/core.ts',    language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+      { path: 'src/a/helper1.ts', language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+      { path: 'src/a/helper2.ts', language: 'typescript', sizeLines: 3, fileCategory: 'code' },
+      { path: 'src/b/entry.ts',   language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+      { path: 'src/b/middle.ts',  language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+      { path: 'src/b/leaf.ts',    language: 'typescript', sizeLines: 2, fileCategory: 'code' },
+    ];
+    const scan = {
+      name: 't', description: '',
+      languages: ['typescript'], frameworks: [],
+      files,
+      totalFiles: 6, filteredByIgnore: 0, estimatedComplexity: 'small',
+      importMap: {
+        'src/a/core.ts': [],
+        'src/a/helper1.ts': ['src/a/core.ts'],
+        'src/a/helper2.ts': ['src/a/core.ts', 'src/a/helper1.ts'],
+        'src/b/entry.ts': ['src/a/core.ts'],  // CROSS-CLUSTER
+        'src/b/middle.ts': ['src/b/entry.ts'],
+        'src/b/leaf.ts': ['src/b/middle.ts'],
+      },
+    };
+    writeFileSync(
+      join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
+      JSON.stringify(scan));
+
+    const result = runScript(root);
+    expect(result.status).toBe(0);
+    const out = readBatches(root);
+
+    // Expect 2 communities (cluster A and cluster B). Verify that some batch's
+    // neighborMap entry references src/a/core.ts with its symbols.
+    let sawSymbols = false;
+    for (const batch of out.batches) {
+      for (const [, neighbors] of Object.entries(batch.neighborMap)) {
+        for (const n of neighbors) {
+          if (n.path === 'src/a/core.ts') {
+            expect(n.symbols).toEqual(expect.arrayContaining(['findUser', 'User']));
+            sawSymbols = true;
+          }
+        }
+      }
+    }
+    expect(sawSymbols).toBe(true);
+
+    rmSync(root, { recursive: true, force: true });
+  });
+});
+
+describe('compute-batches.mjs — neighborMap truncation', () => {
+  let root;
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('truncates and warns when neighbors > 50', () => {
+    root = mkdtempSync(join(tmpdir(), 'ua-cb-trunc-'));
+    mkdirSync(join(root, '.understand-anything', 'intermediate'), { recursive: true });
+    // hub.ts imported by 60 other files
+    const files = [{ path: 'src/hub.ts', language: 'typescript', sizeLines: 1, fileCategory: 'code' }];
+    const importMap = { 'src/hub.ts': [] };
+    for (let i = 0; i < 60; i++) {
+      const p = `src/leaf${i}.ts`;
+      files.push({ path: p, language: 'typescript', sizeLines: 1, fileCategory: 'code' });
+      importMap[p] = ['src/hub.ts'];
+    }
+    const scan = {
+      name: 't', description: '', languages: ['typescript'], frameworks: [],
+      files, totalFiles: files.length, filteredByIgnore: 0,
+      estimatedComplexity: 'moderate', importMap,
+    };
+    writeFileSync(
+      join(root, '.understand-anything', 'intermediate', 'scan-result.json'),
+      JSON.stringify(scan));
+    const result = runScript(root);
+    expect(result.status).toBe(0);
+    expect(result.stderr).toMatch(
+      /neighborMap for src\/hub\.ts has high 1-hop degree 60 — exceeds soft cap of 50/);
+    const out = readBatches(root);
+    // Find hub.ts and confirm its neighbor list capped at 50 (in whichever batch it landed)
+    for (const b of out.batches) {
+      const nbrs = b.neighborMap['src/hub.ts'];
+      if (nbrs) expect(nbrs.length).toBeLessThanOrEqual(50);
+    }
+  });
+});
+
+describe('compute-batches.mjs — fallback', () => {
+  let root;
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('falls back to count-based when Louvain throws (env-injected mock)', () => {
+    // We can't easily monkey-patch louvain mid-script in Vitest because the
+    // script runs in a subprocess. Instead, set an env var the script honors:
+    // UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1 → script throws inside its
+    // Louvain branch, exercising the fallback path.
+    root = setupProject('scan-result-3-cliques.json');
+    const result = spawnSync('node',
+      [SCRIPT, root],
+      { encoding: 'utf-8', env: { ...process.env, UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW: '1' } },
+    );
+    expect(result.status).toBe(0);
+    expect(result.stderr).toMatch(
+      /Warning: compute-batches: Louvain failed.*falling back to count-based grouping/);
+    const out = readBatches(root);
+    expect(out.algorithm).toBe('count-fallback');
+    expect(out.totalFiles).toBe(9);
+    // Count-based: 12 files per batch → all 9 fit in one batch
+    const codeBatchFileCount = out.batches
+      .filter(b => b.files.every(f => f.fileCategory === 'code'))
+      .reduce((sum, b) => sum + b.files.length, 0);
+    expect(codeBatchFileCount).toBe(9);
+  });
+});
+
+describe('compute-batches.mjs — merge-small', () => {
+  let projectRoot;
+
+  beforeEach(() => {
+    projectRoot = setupProject('scan-result-singletons.json');
+  });
+
+  afterEach(() => {
+    if (projectRoot) rmSync(projectRoot, { recursive: true, force: true });
+  });
+
+  it('merges 100 isolated singletons into a small number of misc batches', () => {
+    const result = runScript(projectRoot);
+    expect(result.status).toBe(0);
+
+    const batches = readBatches(projectRoot);
+    expect(batches.totalFiles).toBe(100);
+
+    // Without merge: 100 singletons → 100 batches.
+    // With merge-small (MAX_MERGE_TARGET=25): ceil(100 / 25) = exactly 4 misc
+    // batches. Pin the exact count — a loose >=4 && <=8 would mask off-by-one
+    // regressions in the slice math (e.g., a stride miscalculation that
+    // splintered the pool into 5-7 underfull buckets).
+    expect(batches.batches.length).toBe(4);
+
+    // All files accounted for
+    const totalAssigned = batches.batches.reduce((sum, b) => sum + b.files.length, 0);
+    expect(totalAssigned).toBe(100);
+
+    // Bucket-fullness check: 100 singletons evenly divisible by
+    // MAX_MERGE_TARGET=25, so every bucket must be exactly 25 — not just
+    // ≤ 25. Drift toward [25, 25, 25, 24, 1] etc. would slip past a
+    // ≤25 bound while indicating a stride bug.
+    for (const b of batches.batches) {
+      expect(b.files.length).toBe(25);
+    }
+
+    // Info: (not Warning:) — merge-small is a routine optimization, not a
+    // fallback path. See compute-batches.mjs mergeSmallBatches WHY comment.
+    expect(result.stderr).toMatch(
+      /Info: compute-batches: merged \d+ small batches \(\d+ files\) into \d+ misc batches/);
+    expect(result.stderr).not.toMatch(/Warning: compute-batches: merged \d+ small batches/);
+  });
+
+  it('preserves non-mergeable batches: Dockerfile cluster not pooled into misc', () => {
+    // Dedicated fixture: 30 isolated TS singletons + 1 Dockerfile-only cluster.
+    // Group A marks the Dockerfile batch mergeable=false; even though its size
+    // (1) is below MIN_BATCH_SIZE=3, mergeSmallBatches must leave it intact.
+    const altRoot = setupProject('scan-result-merge-respects-non-mergeable.json');
+    try {
+      const result = runScript(altRoot);
+      expect(result.status).toBe(0);
+
+      const out = readBatches(altRoot);
+      expect(out.totalFiles).toBe(31);
+
+      const dockerBatch = out.batches.find(b =>
+        b.files.some(f => f.path === 'services/api/Dockerfile'));
+      expect(dockerBatch).toBeDefined();
+      // Standalone: exactly the Dockerfile, nothing pooled in alongside it.
+      expect(dockerBatch.files.length).toBe(1);
+      expect(dockerBatch.files[0].path).toBe('services/api/Dockerfile');
+
+      // The TS singletons must still merge into at least one misc batch —
+      // and that misc batch must NOT contain the Dockerfile.
+      const miscBatches = out.batches.filter(b =>
+        b.files.some(f => f.path.startsWith('src/leaf')));
+      expect(miscBatches.length).toBeGreaterThanOrEqual(1);
+      for (const m of miscBatches) {
+        for (const f of m.files) {
+          expect(f.path).not.toBe('services/api/Dockerfile');
+        }
+      }
+
+      // Every TS singleton accounted for across the misc bucket(s).
+      const tsInMisc = miscBatches.flatMap(b => b.files.map(f => f.path))
+        .filter(p => p.startsWith('src/leaf'));
+      expect(tsInMisc.length).toBe(30);
+    } finally {
+      rmSync(altRoot, { recursive: true, force: true });
+    }
+  });
+});
+
+describe('compute-batches.mjs — --changed-files', () => {
+  let root;
+
+  afterEach(() => {
+    if (root) rmSync(root, { recursive: true, force: true });
+  });
+
+  it('emits only batches containing changed files', () => {
+    root = setupProject('scan-result-3-cliques.json');
+    const changedPath = join(root, 'changed.txt');
+    // Only the auth clique is changed
+    writeFileSync(changedPath, ['src/auth/login.ts', 'src/auth/tokens.ts'].join('\n'));
+
+    const result = runScript(root, [`--changed-files=${changedPath}`]);
+    expect(result.status).toBe(0);
+
+    const out = readBatches(root);
+    // Auth files are in batches; other cliques' batches must be omitted
+    const allPaths = out.batches.flatMap(b => b.files.map(f => f.path));
+    expect(allPaths).toContain('src/auth/login.ts');
+    expect(allPaths).toContain('src/auth/tokens.ts');
+    expect(allPaths).not.toContain('src/api/handlers.ts');
+    expect(allPaths).not.toContain('src/db/users.ts');
+
+    // neighborMap may still reference unchanged files (with their full-graph batchIndex)
+    const loginBatch = out.batches.find(b =>
+      b.files.some(f => f.path === 'src/auth/login.ts'));
+    expect(loginBatch).toBeDefined();
+  });
+});
@@ -2,8 +2,8 @@
 """
 test_merge_batch_graphs.py — Tests for the deterministic tested_by linker.

-Run from this directory:
-    python -m unittest test_merge_batch_graphs.py -v
+Run from the repo root:
+    python -m unittest tests.skill.understand.test_merge_batch_graphs -v
 """

 from __future__ import annotations
@@ -20,7 +20,14 @@ from typing import Any
 # directly. Load it via importlib so we can call its module-level helpers.

 _HERE = Path(__file__).resolve().parent
-_MODULE_PATH = _HERE / "merge-batch-graphs.py"
+_REPO_ROOT = _HERE.parent.parent.parent
+_MODULE_PATH = (
+    _REPO_ROOT
+    / "understand-anything-plugin"
+    / "skills"
+    / "understand"
+    / "merge-batch-graphs.py"
+)


 def _load_module() -> Any:
@@ -941,5 +948,240 @@ class MergeEdgeDirectionTests(unittest.TestCase):
        self.assertEqual(edges[0]["weight"], 0.9)


+# ── Multi-part batch handling ─────────────────────────────────────────────
+
+
+class TestMultiPart(unittest.TestCase):
+    """End-to-end tests for batch-<i>-part-<k>.json input handling.
+
+    These tests invoke merge-batch-graphs.py as a subprocess in a temp
+    directory so we exercise the full path: glob → load → merge → write.
+    """
+
+    def setUp(self) -> None:
+        import tempfile
+        self.tmp = Path(tempfile.mkdtemp(prefix="ua-mbg-"))
+        self.intermediate = self.tmp / ".understand-anything" / "intermediate"
+        self.intermediate.mkdir(parents=True, exist_ok=True)
+
+    def tearDown(self) -> None:
+        import shutil
+        shutil.rmtree(self.tmp, ignore_errors=True)
+
+    def _write_batch(self, name: str, nodes: list, edges: list) -> None:
+        import json as _j
+        (self.intermediate / name).write_text(
+            _j.dumps({"nodes": nodes, "edges": edges}),
+            encoding="utf-8",
+        )
+
+    def _run_merge(self) -> tuple[int, str, dict]:
+        import subprocess
+        import json as _j
+        result = subprocess.run(
+            ["python3", str(_MODULE_PATH), str(self.tmp)],
+            capture_output=True, text=True,
+        )
+        out_path = self.intermediate / "assembled-graph.json"
+        assembled = _j.loads(out_path.read_text()) if out_path.exists() else {}
+        return result.returncode, result.stderr, assembled
+
+    def test_two_parts_of_one_logical_batch_merge(self) -> None:
+        self._write_batch("batch-1-part-1.json",
+            [_file_node("src/a.ts")],
+            [{"source": "file:src/a.ts", "target": "file:src/b.ts",
+              "type": "imports", "direction": "forward", "weight": 0.7}])
+        self._write_batch("batch-1-part-2.json",
+            [_file_node("src/b.ts")],
+            [])
+        rc, _stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        self.assertEqual(node_ids, {"file:src/a.ts", "file:src/b.ts"})
+        # Cross-part edge survived
+        edge_keys = {(e["source"], e["target"], e["type"]) for e in assembled["edges"]}
+        self.assertIn(
+            ("file:src/a.ts", "file:src/b.ts", "imports"), edge_keys)
+
+    def test_three_parts_of_one_logical_batch_merge(self) -> None:
+        for k, path in enumerate(["src/a.ts", "src/b.ts", "src/c.ts"], start=1):
+            self._write_batch(f"batch-1-part-{k}.json",
+                [_file_node(path)], [])
+        rc, _stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        self.assertEqual(node_ids,
+            {"file:src/a.ts", "file:src/b.ts", "file:src/c.ts"})
+
+    def test_malformed_part_is_skipped_with_warning(self) -> None:
+        (self.intermediate / "batch-1-part-1.json").write_text(
+            "{ this is not valid json", encoding="utf-8")
+        self._write_batch("batch-1-part-2.json",
+            [_file_node("src/b.ts")], [])
+        rc, stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        # The skip warning is from existing load_batch logic
+        self.assertIn("skipping batch-1-part-1.json", stderr)
+        # part-2 content still made it in
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        self.assertEqual(node_ids, {"file:src/b.ts"})
+
+    def test_mixed_single_and_multi_part(self) -> None:
+        self._write_batch("batch-1.json",
+            [_file_node("src/single.ts")], [])
+        self._write_batch("batch-2-part-1.json",
+            [_file_node("src/multi-a.ts")], [])
+        self._write_batch("batch-2-part-2.json",
+            [_file_node("src/multi-b.ts")], [])
+        self._write_batch("batch-3.json",
+            [_file_node("src/another-single.ts")], [])
+        rc, _stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        self.assertEqual(node_ids, {
+            "file:src/single.ts", "file:src/multi-a.ts",
+            "file:src/multi-b.ts", "file:src/another-single.ts",
+        })
+
+    def test_missing_part_emits_warning(self) -> None:
+        # parts {2, 3} present, part-1 missing
+        self._write_batch("batch-1-part-2.json",
+            [_file_node("src/b.ts")], [])
+        self._write_batch("batch-1-part-3.json",
+            [_file_node("src/c.ts")], [])
+        rc, stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        self.assertRegex(stderr,
+            r"Warning: merge: batch 1 has parts \[2, 3\] but "
+            r"missing part \[1\] — possible truncated write")
+
+    def test_stderr_report_format(self) -> None:
+        self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
+        self._write_batch("batch-2-part-1.json", [_file_node("src/b.ts")], [])
+        self._write_batch("batch-2-part-2.json", [_file_node("src/c.ts")], [])
+        rc, stderr, _assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        # 3 files on disk, 2 logical batches, 1 multi-part
+        self.assertIn(
+            "Found 3 batch files (2 logical batches, 1 multi-part)", stderr)
+
+
+# ── Unrecognized batch filename handling ───────────────────────────────────
+
+
+class TestUnrecognizedBatchFilename(unittest.TestCase):
+    """File-analyzer fuses multiple batches into one output (e.g.,
+    `batch-fused-8-13.json`, `batch-8-13.json`) — the merge script's regex
+    requires `batch-<N>.json` or `batch-<N>-part-<K>.json` and would
+    otherwise silently drop the contents. The script must warn loudly and
+    surface the drop in its report so the downstream review step catches it.
+    """
+
+    def setUp(self) -> None:
+        import tempfile
+        self.tmp = Path(tempfile.mkdtemp(prefix="ua-mbg-unrec-"))
+        self.intermediate = self.tmp / ".understand-anything" / "intermediate"
+        self.intermediate.mkdir(parents=True, exist_ok=True)
+
+    def tearDown(self) -> None:
+        import shutil
+        shutil.rmtree(self.tmp, ignore_errors=True)
+
+    def _write_batch(self, name: str, nodes: list, edges: list) -> None:
+        import json as _j
+        (self.intermediate / name).write_text(
+            _j.dumps({"nodes": nodes, "edges": edges}),
+            encoding="utf-8",
+        )
+
+    def _run_merge(self) -> tuple[int, str, dict]:
+        import subprocess
+        import json as _j
+        result = subprocess.run(
+            ["python3", str(_MODULE_PATH), str(self.tmp)],
+            capture_output=True, text=True,
+        )
+        out_path = self.intermediate / "assembled-graph.json"
+        assembled = _j.loads(out_path.read_text()) if out_path.exists() else {}
+        return result.returncode, result.stderr, assembled
+
+    def test_fused_filename_emits_stderr_warning(self) -> None:
+        # `batch-fused-3-5.json` does not match the merge regex —
+        # script must warn on stderr (not silently drop).
+        self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
+        self._write_batch("batch-2.json", [_file_node("src/b.ts")], [])
+        self._write_batch(
+            "batch-fused-3-5.json",
+            [_file_node("src/c.ts"), _file_node("src/d.ts"), _file_node("src/e.ts")],
+            [],
+        )
+        rc, stderr, _assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        self.assertIn("Warning: merge-batch-graphs:", stderr)
+        self.assertIn("unrecognized filenames", stderr)
+        self.assertIn("batch-fused-3-5.json", stderr)
+        # Remediation hint must be present so users know what to fix.
+        self.assertIn("file-analyzer", stderr)
+        self.assertIn("batch-<N>.json", stderr)
+
+    def test_fused_filename_surfaces_in_report(self) -> None:
+        # The merge report (printed after the per-file load lines) must
+        # also flag the drop so Phase 3 review picks it up.
+        self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
+        self._write_batch(
+            "batch-fused-2-4.json", [_file_node("src/x.ts")], [],
+        )
+        rc, stderr, _assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        # "dropped N batch file(s) with unrecognized filenames" appears in the
+        # report section (printed after "Output: ..." line).
+        self.assertIn("dropped 1 batch file(s) with unrecognized filenames", stderr)
+        self.assertIn("batch-fused-2-4.json", stderr)
+        self.assertIn(
+            "every node/edge in these files was excluded from the final graph",
+            stderr,
+        )
+
+    def test_recognized_batches_still_loaded(self) -> None:
+        # With both recognized and unrecognized files present, recognized
+        # ones must still produce a valid assembled graph.
+        self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
+        self._write_batch("batch-2.json", [_file_node("src/b.ts")], [])
+        self._write_batch(
+            "batch-fused-3-5.json",
+            [_file_node("src/dropped-c.ts")],
+            [],
+        )
+        rc, _stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        # batch-1 + batch-2 survive
+        self.assertIn("file:src/a.ts", node_ids)
+        self.assertIn("file:src/b.ts", node_ids)
+        # batch-fused-3-5.json content is excluded
+        self.assertNotIn("file:src/dropped-c.ts", node_ids)
+        self.assertEqual(node_ids, {"file:src/a.ts", "file:src/b.ts"})
+
+    def test_range_filename_also_unrecognized(self) -> None:
+        # A bare range like `batch-8-13.json` is just as broken as
+        # `batch-fused-8-13.json` — both must be flagged. The regex
+        # `batch-(\d+)(?:-part-(\d+))?\.json` requires the literal
+        # `-part-` separator before a second number.
+        self._write_batch("batch-1.json", [_file_node("src/a.ts")], [])
+        self._write_batch(
+            "batch-8-13.json",
+            [_file_node("src/x.ts"), _file_node("src/y.ts")],
+            [],
+        )
+        rc, stderr, assembled = self._run_merge()
+        self.assertEqual(rc, 0)
+        self.assertIn("Warning: merge-batch-graphs:", stderr)
+        self.assertIn("batch-8-13.json", stderr)
+        # Content is dropped
+        node_ids = {n["id"] for n in assembled["nodes"]}
+        self.assertNotIn("file:src/x.ts", node_ids)
+        self.assertNotIn("file:src/y.ts", node_ids)
+
+
 if __name__ == "__main__":
    unittest.main()
@@ -0,0 +1,738 @@
+import { describe, it, expect, afterEach } from 'vitest';
+import {
+  mkdtempSync,
+  mkdirSync,
+  writeFileSync,
+  readFileSync,
+  rmSync,
+  chmodSync,
+  existsSync,
+} from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join, dirname, resolve } from 'node:path';
+import { spawnSync } from 'node:child_process';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const SCRIPT = resolve(
+  __dirname,
+  '../../../understand-anything-plugin/skills/understand/scan-project.mjs',
+);
+
+/**
+ * Build a project tree from a `{ relPath: contents }` object. Creates parent
+ * directories as needed. Initializes a real git repo so the script's preferred
+ * `git ls-files` enumeration path is exercised — tests that need the walker
+ * fallback can set `gitInit=false`.
+ */
+function setupTree(files, { gitInit = true } = {}) {
+  const root = mkdtempSync(join(tmpdir(), 'ua-scan-test-'));
+  for (const [relPath, contents] of Object.entries(files)) {
+    const abs = join(root, relPath);
+    mkdirSync(dirname(abs), { recursive: true });
+    writeFileSync(abs, contents, 'utf-8');
+  }
+  if (gitInit) {
+    // `git ls-files -co --exclude-standard` returns BOTH cached and others
+    // (modulo gitignore), so an `add` is unnecessary for our tests — the
+    // bare repo init is enough for ls-files to enumerate.
+    const init = spawnSync('git', ['init', '-q'], { cwd: root, encoding: 'utf-8' });
+    if (init.status !== 0) {
+      // CI without git: continue without it; the walker fallback will fire.
+    }
+  }
+  return root;
+}
+
+/**
+ * Tracks every temp output dir created by runScript() so the global
+ * cleanup can sweep them between tests. The output file must live
+ * OUTSIDE projectRoot because the project's default ignore patterns
+ * do NOT exclude `.understand-anything/` (the dir is reserved for
+ * persistent state, not transient scratch). If we wrote inside
+ * projectRoot, the second call in the determinism test would
+ * enumerate the first call's output file and produce drift.
+ */
+const _runScriptOutputDirs = [];
+
+/**
+ * Run scan-project.mjs against `projectRoot`. Returns
+ * { status, stdout, stderr, output } where `output` is the parsed JSON
+ * written by the script (or null on failure).
+ */
+function runScript(projectRoot) {
+  const outputDir = mkdtempSync(join(tmpdir(), 'ua-scan-out-'));
+  _runScriptOutputDirs.push(outputDir);
+  const outputPath = join(outputDir, 'scan-output.json');
+  const result = spawnSync('node', [SCRIPT, projectRoot, outputPath], {
+    encoding: 'utf-8',
+  });
+  let output = null;
+  try {
+    output = JSON.parse(readFileSync(outputPath, 'utf-8'));
+  } catch {
+    /* output missing on hard failure */
+  }
+  return { status: result.status, stdout: result.stdout, stderr: result.stderr, output };
+}
+
+/**
+ * Look up the `files[]` entry for a given path. Returns undefined if not
+ * present — callers should `expect(byPath('x')).toBeDefined()` first.
+ */
+function byPath(output, path) {
+  return output.files.find(f => f.path === path);
+}
+
+// Sweep every output dir created during a test back to disk-empty between
+// tests. The top-level afterEach fires after each `it()` regardless of which
+// describe block it lives in, so a single hook covers the whole file.
+afterEach(() => {
+  while (_runScriptOutputDirs.length) {
+    const d = _runScriptOutputDirs.pop();
+    rmSync(d, { recursive: true, force: true });
+  }
+});
+
+describe('scan-project.mjs — language detection', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('maps TypeScript/JavaScript extensions to typescript/javascript', () => {
+    projectRoot = setupTree({
+      'a.ts': 'export const a = 1;\n',
+      'b.tsx': 'export const B = () => null;\n',
+      'c.js': 'module.exports = {};\n',
+      'd.jsx': 'export default () => null;\n',
+      'e.mjs': 'export const e = 1;\n',
+      'f.cjs': 'module.exports = 1;\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.ts').language).toBe('typescript');
+    expect(byPath(r.output, 'b.tsx').language).toBe('typescript');
+    expect(byPath(r.output, 'c.js').language).toBe('javascript');
+    expect(byPath(r.output, 'd.jsx').language).toBe('javascript');
+    expect(byPath(r.output, 'e.mjs').language).toBe('javascript');
+    expect(byPath(r.output, 'f.cjs').language).toBe('javascript');
+  });
+
+  it('maps Python, Go, Rust, Java, Kotlin, C# to their language ids', () => {
+    projectRoot = setupTree({
+      'a.py': 'x = 1\n',
+      'b.go': 'package main\n',
+      'c.rs': 'fn main() {}\n',
+      'd.java': 'class D {}\n',
+      'e.kt': 'fun main() {}\n',
+      'f.cs': 'class F {}\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.py').language).toBe('python');
+    expect(byPath(r.output, 'b.go').language).toBe('go');
+    expect(byPath(r.output, 'c.rs').language).toBe('rust');
+    expect(byPath(r.output, 'd.java').language).toBe('java');
+    expect(byPath(r.output, 'e.kt').language).toBe('kotlin');
+    expect(byPath(r.output, 'f.cs').language).toBe('csharp');
+  });
+
+  it('maps Ruby, PHP, C, C++ to their language ids', () => {
+    projectRoot = setupTree({
+      'a.rb': 'puts 1\n',
+      'b.php': '<?php echo 1;\n',
+      'c.c': 'int main() { return 0; }\n',
+      'd.h': 'void f();\n',
+      'e.cpp': 'int main() {}\n',
+      'f.hpp': 'class F {};\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.rb').language).toBe('ruby');
+    expect(byPath(r.output, 'b.php').language).toBe('php');
+    expect(byPath(r.output, 'c.c').language).toBe('c');
+    expect(byPath(r.output, 'd.h').language).toBe('c');
+    expect(byPath(r.output, 'e.cpp').language).toBe('cpp');
+    expect(byPath(r.output, 'f.hpp').language).toBe('cpp');
+  });
+
+  it('maps web markup (HTML, CSS) to their language ids', () => {
+    projectRoot = setupTree({
+      'a.html': '<!doctype html><html></html>\n',
+      'b.htm': '<html></html>\n',
+      'c.css': '.a { }\n',
+      'd.scss': '$x: 1;\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.html').language).toBe('html');
+    expect(byPath(r.output, 'b.htm').language).toBe('html');
+    expect(byPath(r.output, 'c.css').language).toBe('css');
+    expect(byPath(r.output, 'd.scss').language).toBe('css');
+  });
+
+  it('maps configuration formats (YAML, JSON, JSONC, TOML, XML, Markdown) to their language ids', () => {
+    projectRoot = setupTree({
+      'a.yaml': 'x: 1\n',
+      'b.yml': 'x: 1\n',
+      'c.json': '{}\n',
+      'd.jsonc': '{ /* c */ }\n',
+      'e.toml': 'x = 1\n',
+      'f.xml': '<x/>\n',
+      'g.md': '# h\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.yaml').language).toBe('yaml');
+    expect(byPath(r.output, 'b.yml').language).toBe('yaml');
+    expect(byPath(r.output, 'c.json').language).toBe('json');
+    expect(byPath(r.output, 'd.jsonc').language).toBe('jsonc');
+    expect(byPath(r.output, 'e.toml').language).toBe('toml');
+    expect(byPath(r.output, 'f.xml').language).toBe('xml');
+    expect(byPath(r.output, 'g.md').language).toBe('markdown');
+  });
+
+  it('maps shell + batch + Dockerfile (no extension) to their language ids', () => {
+    projectRoot = setupTree({
+      'a.sh': 'echo 1\n',
+      'b.bat': '@echo off\n',
+      Dockerfile: 'FROM node:22\n',
+      'Dockerfile.dev': 'FROM node:22\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'a.sh').language).toBe('shell');
+    expect(byPath(r.output, 'b.bat').language).toBe('batch');
+    expect(byPath(r.output, 'Dockerfile').language).toBe('dockerfile');
+    expect(byPath(r.output, 'Dockerfile.dev').language).toBe('dockerfile');
+  });
+
+  it('falls back to "unknown" for files with no extension and no filename match', () => {
+    projectRoot = setupTree({
+      WEIRD_FILE: 'mystery contents\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'WEIRD_FILE').language).toBe('unknown');
+  });
+
+  it('falls back to bare extension (without dot) for unknown extensions', () => {
+    projectRoot = setupTree({
+      'data.weirdext': 'some data\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'data.weirdext').language).toBe('weirdext');
+  });
+});
+
+describe('scan-project.mjs — category assignment (project-scanner.md Step 4)', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('assigns code to TypeScript, JavaScript, Python, Go, Rust source files', () => {
+    projectRoot = setupTree({
+      'src/a.ts': 'export const a = 1;\n',
+      'src/b.py': 'def b(): pass\n',
+      'src/c.go': 'package main\n',
+      'src/d.rs': 'fn main() {}\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'src/a.ts').fileCategory).toBe('code');
+    expect(byPath(r.output, 'src/b.py').fileCategory).toBe('code');
+    expect(byPath(r.output, 'src/c.go').fileCategory).toBe('code');
+    expect(byPath(r.output, 'src/d.rs').fileCategory).toBe('code');
+  });
+
+  it('assigns config to JSON/YAML/TOML/INI/XML', () => {
+    projectRoot = setupTree({
+      'package.json': '{}\n',
+      'tsconfig.json': '{}\n',
+      'pyproject.toml': '[project]\nname = "p"\n',
+      'config.yaml': 'x: 1\n',
+      'app.ini': '[s]\nk=v\n',
+      'data.xml': '<x/>\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'package.json').fileCategory).toBe('config');
+    expect(byPath(r.output, 'tsconfig.json').fileCategory).toBe('config');
+    expect(byPath(r.output, 'pyproject.toml').fileCategory).toBe('config');
+    expect(byPath(r.output, 'config.yaml').fileCategory).toBe('config');
+    expect(byPath(r.output, 'app.ini').fileCategory).toBe('config');
+    expect(byPath(r.output, 'data.xml').fileCategory).toBe('config');
+  });
+
+  it('assigns docs to .md / .rst / .txt (but NOT to LICENSE)', () => {
+    projectRoot = setupTree({
+      'README.md': '# x\n',
+      'docs/guide.rst': 'Guide\n=====\n',
+      'NOTES.txt': 'notes\n',
+      LICENSE: 'Apache-2.0\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'README.md').fileCategory).toBe('docs');
+    expect(byPath(r.output, 'docs/guide.rst').fileCategory).toBe('docs');
+    expect(byPath(r.output, 'NOTES.txt').fileCategory).toBe('docs');
+    // LICENSE exception: must NOT be docs. The default ignore filter
+    // normally drops LICENSE entirely, so we re-include it via
+    // `!LICENSE` so the category test can fire.
+    writeFileSync(join(projectRoot, '.understandignore'), '!LICENSE\n');
+    const r2 = runScript(projectRoot);
+    const license = byPath(r2.output, 'LICENSE');
+    expect(license).toBeDefined();
+    expect(license.fileCategory).not.toBe('docs');
+  });
+
+  it('assigns infra to Dockerfile, docker-compose, .gitlab-ci.yml, .tf, .github/workflows/, Makefile, Jenkinsfile, k8s paths', () => {
+    projectRoot = setupTree({
+      Dockerfile: 'FROM node:22\n',
+      'docker-compose.yml': 'services: {}\n',
+      '.gitlab-ci.yml': 'stages: []\n',
+      'infra/main.tf': 'resource "x" "y" {}\n',
+      '.github/workflows/ci.yml': 'name: ci\n',
+      Makefile: 'all:\n\t@echo hi\n',
+      Jenkinsfile: 'pipeline { }\n',
+      'k8s/deploy.yaml': 'kind: Deployment\n',
+      'kubernetes/svc.yaml': 'kind: Service\n',
+      'foo.k8s.yaml': 'kind: ConfigMap\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'Dockerfile').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'docker-compose.yml').fileCategory).toBe('infra');
+    expect(byPath(r.output, '.gitlab-ci.yml').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'infra/main.tf').fileCategory).toBe('infra');
+    expect(byPath(r.output, '.github/workflows/ci.yml').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'Makefile').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'Jenkinsfile').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'k8s/deploy.yaml').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'kubernetes/svc.yaml').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'foo.k8s.yaml').fileCategory).toBe('infra');
+  });
+
+  it('assigns data to SQL, GraphQL, Proto, Prisma, CSV', () => {
+    projectRoot = setupTree({
+      'db/schema.sql': 'CREATE TABLE x (id INT);\n',
+      'api/schema.graphql': 'type X { id: ID! }\n',
+      'api/types.proto': 'syntax = "proto3";\n',
+      'prisma/schema.prisma': 'model X { id Int @id }\n',
+      'data/seed.csv': 'a,b\n1,2\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'db/schema.sql').fileCategory).toBe('data');
+    expect(byPath(r.output, 'api/schema.graphql').fileCategory).toBe('data');
+    expect(byPath(r.output, 'api/types.proto').fileCategory).toBe('data');
+    expect(byPath(r.output, 'prisma/schema.prisma').fileCategory).toBe('data');
+    expect(byPath(r.output, 'data/seed.csv').fileCategory).toBe('data');
+  });
+
+  it('assigns script to shell + batch files (.sh, .bash, .ps1, .bat)', () => {
+    projectRoot = setupTree({
+      'scripts/build.sh': '#!/bin/bash\necho 1\n',
+      'scripts/run.bash': '#!/bin/bash\necho run\n',
+      'scripts/build.ps1': 'Write-Output 1\n',
+      'scripts/setup.bat': '@echo off\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'scripts/build.sh').fileCategory).toBe('script');
+    expect(byPath(r.output, 'scripts/run.bash').fileCategory).toBe('script');
+    expect(byPath(r.output, 'scripts/build.ps1').fileCategory).toBe('script');
+    expect(byPath(r.output, 'scripts/setup.bat').fileCategory).toBe('script');
+  });
+
+  it('assigns markup to HTML + CSS variants', () => {
+    projectRoot = setupTree({
+      'public/index.html': '<!doctype html>\n',
+      'public/page.htm': '<html></html>\n',
+      'styles/app.css': 'body { }\n',
+      'styles/app.scss': '$x: 1;\n',
+      'styles/app.less': '@x: 1;\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'public/index.html').fileCategory).toBe('markup');
+    expect(byPath(r.output, 'public/page.htm').fileCategory).toBe('markup');
+    expect(byPath(r.output, 'styles/app.css').fileCategory).toBe('markup');
+    expect(byPath(r.output, 'styles/app.scss').fileCategory).toBe('markup');
+    expect(byPath(r.output, 'styles/app.less').fileCategory).toBe('markup');
+  });
+
+  it('priority: docker-compose.yml maps to infra, not config', () => {
+    // The .yml extension would normally route to `config`, but the
+    // docker-compose.* filename rule fires first per Step 4 priority.
+    projectRoot = setupTree({
+      'docker-compose.yml': 'services: {}\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'docker-compose.yml').fileCategory).toBe('infra');
+    expect(byPath(r.output, 'docker-compose.yml').language).toBe('yaml');
+  });
+
+  // Regression: path.extname returns '' for `.env` and the second segment
+  // for `.env.local` — neither hits CATEGORY_BY_EXT['.env']. Dotfile-style
+  // configs were falling through to `code` / `unknown`. Caught by Codex
+  // review on PR #204.
+  it('dotfile configs (.env, .env.local, .env.production) map to config + env language', () => {
+    projectRoot = setupTree({
+      '.env': 'API_KEY=abc\n',
+      '.env.local': 'LOCAL=1\n',
+      '.env.production': 'PROD=1\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    for (const p of ['.env', '.env.local', '.env.production']) {
+      expect(byPath(r.output, p).fileCategory).toBe('config');
+      // LANGUAGE_BY_EXT['.env'] -> 'config' (the language id itself; not
+      // a typo — the language for env files is the 'config' bucket).
+      expect(byPath(r.output, p).language).toBe('config');
+    }
+  });
+});
+
+describe('scan-project.mjs — .understandignore handling', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('respects .understandignore patterns and increments filteredByIgnore', () => {
+    // `**/*.log` is NOT in the hardcoded defaults at the recursive level
+    // — wait, `*.log` is. Use a custom pattern to exercise user-driven drops.
+    projectRoot = setupTree({
+      '.understandignore': 'fixtures/\n',
+      'src/index.ts': 'export const x = 1;\n',
+      'fixtures/snap1.json': '{ "a": 1 }\n',
+      'fixtures/snap2.json': '{ "b": 2 }\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    // fixtures/ files dropped
+    expect(byPath(r.output, 'fixtures/snap1.json')).toBeUndefined();
+    expect(byPath(r.output, 'fixtures/snap2.json')).toBeUndefined();
+    // Counted as user-driven
+    expect(r.output.filteredByIgnore).toBe(2);
+  });
+
+  it('supports `!pattern` negation to re-include defaults-excluded files', () => {
+    // `*.log` is in the hardcoded defaults; the user re-includes a
+    // specific file with `!keep.log`. After the override, keep.log MUST
+    // appear in the output. It is NOT counted in filteredByIgnore (it
+    // was re-included, not additionally filtered).
+    projectRoot = setupTree({
+      '.understandignore': '!keep.log\n',
+      'src/index.ts': 'export const x = 1;\n',
+      'keep.log': 'important diagnostic\n',
+      'drop.log': 'noise\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(byPath(r.output, 'keep.log')).toBeDefined();
+    // drop.log still excluded by defaults (no negation for it)
+    expect(byPath(r.output, 'drop.log')).toBeUndefined();
+    // The defaults dropped drop.log — that's a baseline default drop,
+    // NOT a user-driven drop. filteredByIgnore should be 0.
+    expect(r.output.filteredByIgnore).toBe(0);
+  });
+});
+
+describe('scan-project.mjs — special-file recognition', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('Dockerfile (no extension) is language=dockerfile, category=infra', () => {
+    projectRoot = setupTree({
+      Dockerfile: 'FROM alpine:3\nCMD ["sh"]\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    const entry = byPath(r.output, 'Dockerfile');
+    expect(entry).toBeDefined();
+    expect(entry.language).toBe('dockerfile');
+    expect(entry.fileCategory).toBe('infra');
+  });
+});
+
+describe('scan-project.mjs — determinism', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('produces byte-identical output across runs for the same input tree', () => {
+    projectRoot = setupTree({
+      'README.md': '# project\n',
+      'src/a.ts': 'export const a = 1;\n',
+      'src/b.ts': 'export const b = 2;\n',
+      'src/lib/c.ts': 'export const c = 3;\n',
+      'package.json': '{}\n',
+      'tsconfig.json': '{}\n',
+    });
+    const r1 = runScript(projectRoot);
+    const r2 = runScript(projectRoot);
+    expect(r1.status).toBe(0);
+    expect(r2.status).toBe(0);
+    expect(JSON.stringify(r1.output)).toBe(JSON.stringify(r2.output));
+  });
+});
+
+describe('scan-project.mjs — empty repo', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('handles a project with zero files without crashing', () => {
+    projectRoot = setupTree({}, { gitInit: true });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.scriptCompleted).toBe(true);
+    expect(r.output.totalFiles).toBe(0);
+    expect(r.output.files).toEqual([]);
+    expect(r.output.filteredByIgnore).toBe(0);
+    expect(r.output.estimatedComplexity).toBe('small');
+  });
+});
+
+describe('scan-project.mjs — per-file failure resilience', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      // Restore permissions on any chmod'd file before delete, so cleanup
+      // succeeds even when a test left a 000-permission file behind.
+      try {
+        const f = join(projectRoot, 'src/unreadable.ts');
+        if (existsSync(f)) chmodSync(f, 0o644);
+      } catch { /* best-effort */ }
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('emits a Warning: and skips a file with unreadable permissions; other files survive', () => {
+    if (process.platform === 'win32') {
+      // chmod permission bits don't apply on Windows the same way; skip.
+      return;
+    }
+    if (process.getuid && process.getuid() === 0) {
+      // Running as root bypasses permission checks; the test cannot exercise
+      // its failure mode. Skip rather than emit a false pass.
+      return;
+    }
+    projectRoot = setupTree({
+      'src/good.ts': 'export const good = 1;\n',
+      'src/unreadable.ts': 'export const bad = 2;\n',
+    });
+    // Strip read permission on the synthetic file.
+    chmodSync(join(projectRoot, 'src/unreadable.ts'), 0o000);
+
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.scriptCompleted).toBe(true);
+    // The good file is in the output.
+    expect(byPath(r.output, 'src/good.ts')).toBeDefined();
+    // The unreadable file is dropped.
+    expect(byPath(r.output, 'src/unreadable.ts')).toBeUndefined();
+    // A visible warning was emitted with the documented prefix.
+    expect(r.stderr).toMatch(
+      /Warning: scan-project: src\/unreadable\.ts — line count failed/,
+    );
+    expect(r.stderr).toMatch(/file skipped from output/);
+    // Final summary line still fires.
+    expect(r.stderr).toMatch(
+      /scan-project: filesScanned=1 filteredByIgnore=0 complexity=small/,
+    );
+  });
+});
+
+describe('scan-project.mjs — estimatedComplexity thresholds', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  /**
+   * Build a tree with exactly N .ts files at the top level. Used to
+   * lock in the complexity-tier boundary points from project-scanner.md
+   * Step 7: small (≤30), moderate (31-150), large (151-500), very-large
+   * (>500).
+   */
+  function setupNFiles(n) {
+    const tree = {};
+    for (let i = 0; i < n; i++) {
+      // Pad indices so localeCompare gives the natural order for any N.
+      tree[`f${String(i).padStart(4, '0')}.ts`] = 'export const x = 1;\n';
+    }
+    return setupTree(tree);
+  }
+
+  it('30 files -> small (upper boundary of small)', () => {
+    projectRoot = setupNFiles(30);
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.totalFiles).toBe(30);
+    expect(r.output.estimatedComplexity).toBe('small');
+  });
+
+  it('31 files -> moderate (lower boundary of moderate)', () => {
+    projectRoot = setupNFiles(31);
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.totalFiles).toBe(31);
+    expect(r.output.estimatedComplexity).toBe('moderate');
+  });
+
+  it('150 files -> moderate (upper boundary of moderate)', () => {
+    projectRoot = setupNFiles(150);
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.totalFiles).toBe(150);
+    expect(r.output.estimatedComplexity).toBe('moderate');
+  });
+
+  it('151 files -> large (lower boundary of large)', () => {
+    projectRoot = setupNFiles(151);
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.totalFiles).toBe(151);
+    expect(r.output.estimatedComplexity).toBe('large');
+  });
+
+  it('501 files -> very-large (lower boundary of very-large)', () => {
+    projectRoot = setupNFiles(501);
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output.totalFiles).toBe(501);
+    expect(r.output.estimatedComplexity).toBe('very-large');
+  });
+});
+
+describe('scan-project.mjs — CLI entry guard + invocation', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('invokes successfully via subprocess and produces a parseable output file', () => {
+    projectRoot = setupTree({
+      'README.md': '# proj\n',
+      'src/index.ts': 'export const x = 1;\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    expect(r.output).not.toBeNull();
+    expect(r.output.scriptCompleted).toBe(true);
+    // Stats summary line fires on stderr.
+    expect(r.stderr).toMatch(
+      /scan-project: filesScanned=2 filteredByIgnore=0 complexity=small/,
+    );
+    // Two files captured.
+    expect(r.output.totalFiles).toBe(2);
+  });
+
+  it('fails fast with usage message when projectRoot is missing', () => {
+    const result = spawnSync('node', [SCRIPT], { encoding: 'utf-8' });
+    expect(result.status).toBe(1);
+    expect(result.stderr).toMatch(/Usage: node scan-project\.mjs/);
+  });
+});
+
+describe('scan-project.mjs — output schema invariants', () => {
+  let projectRoot;
+
+  afterEach(() => {
+    if (projectRoot) {
+      rmSync(projectRoot, { recursive: true, force: true });
+      projectRoot = null;
+    }
+  });
+
+  it('emits the documented top-level fields with correct shapes', () => {
+    projectRoot = setupTree({
+      'src/a.ts': 'export const a = 1;\n',
+      'README.md': '# x\n',
+      'package.json': '{}\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    const out = r.output;
+    expect(out.scriptCompleted).toBe(true);
+    expect(Array.isArray(out.files)).toBe(true);
+    expect(typeof out.totalFiles).toBe('number');
+    expect(out.totalFiles).toBe(out.files.length);
+    expect(typeof out.filteredByIgnore).toBe('number');
+    expect(['small', 'moderate', 'large', 'very-large']).toContain(
+      out.estimatedComplexity,
+    );
+    expect(out.stats).toBeDefined();
+    expect(out.stats.filesScanned).toBe(out.files.length);
+    expect(typeof out.stats.byCategory).toBe('object');
+    expect(typeof out.stats.byLanguage).toBe('object');
+    // Per-file shape
+    for (const f of out.files) {
+      expect(typeof f.path).toBe('string');
+      expect(typeof f.language).toBe('string');
+      expect(typeof f.sizeLines).toBe('number');
+      expect([
+        'code', 'config', 'docs', 'infra', 'data', 'script', 'markup',
+      ]).toContain(f.fileCategory);
+    }
+  });
+
+  it('files[] is sorted by path.localeCompare', () => {
+    projectRoot = setupTree({
+      'zzz.ts': '\n',
+      'aaa.ts': '\n',
+      'mmm.ts': '\n',
+      'subdir/file.ts': '\n',
+    });
+    const r = runScript(projectRoot);
+    expect(r.status).toBe(0);
+    const paths = r.output.files.map(f => f.path);
+    const sortedPaths = [...paths].sort((a, b) => a.localeCompare(b));
+    expect(paths).toEqual(sortedPaths);
+  });
+});
@@ -1,7 +1,7 @@
 {
  "name": "understand-anything",
  "description": "AI-powered codebase understanding — analyze, visualize, and explain any project",
-  "version": "2.7.4",
+  "version": "2.7.5",
  "author": {
    "name": "Lum1104"
  },
@@ -52,6 +52,18 @@ cat > $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-<batchIndex>
 ENDJSON
 ```

+### Cross-batch context (neighborMap)
+
+Your dispatch prompt includes a `neighborMap` — for each file in your batch, it lists project-internal neighbors in OTHER batches (files that import yours or that you import), with their exported symbols.
+
+Use neighborMap as a confidence boost for cross-batch edges (`calls`, `related`, `inherits`, `implements` to nodes outside your batch):
+
+- If your source clearly references a symbol that appears in some `neighbor.symbols`, emit the edge to `function:<neighbor.path>:<symbol>` or `class:<neighbor.path>:<symbol>` with confidence.
+- If your source references a cross-batch symbol that is NOT in neighborMap (the project-scanner may not have extracted it), you may still emit the edge if you saw it explicitly in the imported file's surface — but prefer matching neighborMap symbols when available.
+- Imports continue to use `batchImportData` (fully resolved), not neighborMap.
+
+The merge script's dangling-edge dropper is the safety net for genuinely unresolvable targets.
+
 ### Step 2 — Execute the bundled extraction script

 Run the bundled `extract-structure.mjs` script. The `<SKILL_DIR>` path is provided in your dispatch prompt.
@@ -464,12 +476,46 @@ Use these hints for common edge patterns:
 - NEVER create self-referencing edges (where source equals target).
 - Trust the script's structural extraction. Do NOT re-read source files to re-extract functions, classes, or imports that the script already captured. Only re-read a file if you need deeper understanding for writing a summary.

-## Writing Results
+## Writing Results — single or multi-part

-After producing the JSON:
+### Output File Naming — STRICT

-1. Write the JSON to: `<project-root>/.understand-anything/intermediate/batch-<batchIndex>.json`
-2. The project root and batch index will be provided in your prompt.
-3. Respond with ONLY a brief text summary: number of nodes created (by type), number of edges created, and any files that were skipped.
+**For EVERY batch in your input, write a separate output file using ONLY one of these two filename patterns:**

-Do NOT include the full JSON in your text response.
+- `batch-<batchIndex>.json` — single-part output for batch `<batchIndex>`
+- `batch-<batchIndex>-part-<k>.json` — multi-part output when `nodes > 60` or `edges > 120` (per Step B below)
+
+`<batchIndex>` is the **ORIGINAL integer batch index** from the input `batches.json`. Even if your dispatch prompt fused multiple batches into one call (e.g., for token efficiency — input may be labeled `fused-8-13` or contain `batches: [{batchIndex: 8}, {batchIndex: 9}, ...]`), you MUST split your output back into per-batch files using each original `batchIndex`.
+
+**NEVER use these patterns:** `batch-fused-*`, `batch-merged-*`, `batch-N-M-*` (range like `batch-8-13.json`), `batches-*`, or any other variant. The downstream merge script (`merge-batch-graphs.py`) requires the regex `batch-(\d+)(?:-part-(\d+))?\.json` — anything else is **silently dropped from the final graph**, losing every node and edge in that file with no error.
+
+**Example.** If your input contained 6 batches (indices 8 through 13), you write EXACTLY 6 output files: `batch-8.json`, `batch-9.json`, `batch-10.json`, `batch-11.json`, `batch-12.json`, `batch-13.json`. Not one combined `batch-fused-8-13.json`. Not one `batch-8-13.json`. Six files, one per original `batchIndex`. Run Steps A–F below independently for each batch's nodes/edges.
+
+**Step A — Compute totals.**
+```
+nodeCount = nodes.length
+edgeCount = edges.length
+```
+
+**Step B — Decide split.**
+- If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to `.understand-anything/intermediate/batch-<batchIndex>.json`. Done. Skip to Step F.
+- Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`.
+
+**Step C — Partition.**
+Sort files in your batch alphabetically by path. Chunk them sequentially into `parts` groups of size `ceil(N / parts)`. For each part:
+- All nodes whose `filePath` is in this part's files (for non-file nodes like `module`/`concept`, use the file they belong to).
+- All edges whose `source` is in this part's nodes (target may be anywhere — same part, different part of same batch, different batch).
+
+**Step D — Write each part.**
+Write part `k` (1-indexed) to `.understand-anything/intermediate/batch-<batchIndex>-part-<k>.json`. Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`.
+
+**Step E — Self-validate.**
+For each file written, verify:
+- Valid JSON.
+- `nodes` array exists and is well-formed.
+- For every edge: `source` and `target` both appear as either (a) a node `id` in this part's nodes, OR (b) a `file:<path>` reference where `<path>` is in `neighborMap` or `batchImportData`, OR (c) a `function:<path>:<symbol>` / `class:<path>:<symbol>` reference where `<symbol>` is in some `neighbor.symbols`.
+
+If validation fails on a part, do NOT silently rebuild. Respond with an explicit error stating which part failed, which edge(s) failed validation, and why. The dispatching session can then retry.
+
+**Step F — Respond.**
+Respond with ONLY a brief text summary: parts written (1 or more), total nodes/edges across all parts, any files skipped. Do NOT include JSON content in the response.
@@ -12,246 +12,59 @@ You are a meticulous project inventory specialist. Your job is to scan a codebas

 ## Task

-Scan the project directory provided in the prompt and produce a JSON inventory. You will accomplish this in two phases: first, write and execute a discovery script that performs all deterministic file scanning; second, review the script's results and add a human-readable project description.
+Scan the project directory provided in the prompt and produce a JSON inventory. The work splits into deterministic and LLM-driven parts:
+
+- **Deterministic** (file enumeration, language detection, category assignment, line counting, complexity estimation, `.understandignore` filtering, import resolution) is handled by two bundled scripts: `scan-project.mjs` and `extract-import-map.mjs`. Do NOT re-implement any of this logic.
+- **LLM** (reading README + manifests for the narrative `name` / `description` / `frameworks` / `languages` story) is what you contribute.

 **Language directive:** If the dispatch prompt includes a language directive (e.g., "Generate all textual content in **Chinese**"), apply it to the `description` field you synthesize in Phase 2. Write the description in the specified language using natural, native-level phrasing. Keep technical terms in English when no standard translation exists (e.g., "middleware", "hook", "barrel").

 ---

-## Phase 1 -- Discovery Script
+## Phase 1 -- Discovery (bundled scan + LLM narrative)

-Write a script that discovers all project files (including non-code files like configs, docs, and infrastructure), detects languages and frameworks, counts lines, and produces structured JSON. Prefer Node.js for the script; fall back to Python if Node.js is unavailable. Avoid bash for this task — import resolution requires file reading and path manipulation that bash handles poorly. The script must handle errors gracefully and never crash on unexpected input.
+Phase 1 has three orchestrated steps. Steps **B** and **C** run bundled scripts; step **A** is the only LLM work in this phase.

-### Script Requirements
+### Step A (LLM) -- Read manifests and README for narrative fields

-1. **Accept** the project root directory as `$1` (bash) or `process.argv[2]` (Node.js) or `sys.argv[1]` (Python).
-2. **Write** results JSON to the path given as `$2` / `process.argv[3]` / `sys.argv[2]`.
-3. **Exit 0** on success.
-4. **Exit 1** on fatal error (cannot access directory, etc.). Print the error to stderr.
+Read the top-level project files to gather narrative metadata. Do NOT walk the file tree or count files yourself — that is Step B's job.

-### What the Script Must Do
+Read whichever of these exist at the project root:
+- `README.md` (or `README.rst`, `README`) — capture the first ~10 lines for narrative grounding
+- `package.json` — extract `name`, `description`, plus `dependencies` / `devDependencies` keys for framework detection
+- `pyproject.toml`, `setup.py`, `setup.cfg`, `Pipfile`, `requirements.txt` — Python framework signals
+- `Cargo.toml` — Rust project name + `[dependencies]`
+- `go.mod` — Go module name + `require` block
+- `Gemfile` — Ruby framework signals
+- `pom.xml`, `build.gradle`, `build.gradle.kts` — JVM project signals
+- `composer.json` — PHP project signals

-**Step 1 -- File Discovery**
+From these, synthesize:

-Discover all tracked files. In order of preference:
- Run `git ls-files` in the project root (most reliable for git repos)
- Fall back to a recursive file listing with exclusions if not a git repo
+- **`name`** -- in priority order: `package.json` `name`, `Cargo.toml` `[package].name`, `go.mod` module path's last segment, `pyproject.toml` `[project].name` or `[tool.poetry].name`, else the directory name of the project root.
+- **`rawDescription`** -- the `description` field from `package.json` (or its equivalent in the matching manifest), or `""` if none.
+- **`readmeHead`** -- the first ~10 lines of `README.md` (or equivalent), or `""` if no README exists.
+- **`frameworks`** -- match dependency names against known frameworks: `react`, `vue`, `svelte`, `@angular/core`, `express`, `fastify`, `koa`, `next`, `nuxt`, `vite`, `vitest`, `jest`, `mocha`, `tailwindcss`, `prisma`, `typeorm`, `sequelize`, `mongoose`, `redux`, `zustand`, `mobx`; Python: `django`, `djangorestframework`, `fastapi`, `flask`, `sqlalchemy`, `alembic`, `celery`, `pydantic`, `uvicorn`, `gunicorn`, `aiohttp`, `tornado`, `starlette`, `pytest`, `hypothesis`, `channels`; Ruby: `rails`, `railties`, `sinatra`, `grape`, `rspec`, `sidekiq`, `activerecord`, `actionpack`, `devise`, `pundit`; Go: `github.com/gin-gonic/gin`, `github.com/labstack/echo`, `github.com/gofiber/fiber`, `github.com/go-chi/chi`, `gorm.io/gorm`; Rust: `actix-web`, `axum`, `rocket`, `diesel`, `tokio`, `serde`, `warp`; JVM: `spring-boot`, `spring-web`, `spring-data`, `quarkus`, `micronaut`, `hibernate`, `jakarta`, `junit`, `ktor`. Also infer infrastructure tools from manifest presence: add `Docker` if `Dockerfile` exists in the file list, `Docker Compose` if `docker-compose.yml`/`docker-compose.yaml` exists, `Terraform` if any `*.tf`, `GitHub Actions` if `.github/workflows/*.yml`, `GitLab CI` if `.gitlab-ci.yml`, `Jenkins` if `Jenkinsfile`.
+- **`languages`** -- the deduplicated, alphabetically-sorted top-level language set you observe across the manifests + the bundled script's per-file language tally (you will read this from Step B's output).

-**Step 2 -- Exclusion Filtering**
+If the manifest is missing or malformed, leave the corresponding field empty rather than guessing.

-Remove ALL files matching these patterns:
- **Dependency directories:** paths containing `node_modules/`, `.git/`, `vendor/`, `venv/`, `.venv/`, `__pycache__/`
- **Build output:** paths with a directory segment matching `dist/`, `build/`, `out/`, `coverage/`, `.next/`, `.cache/`, `.turbo/`, `target/` (Rust), `obj/` (.NET) — match full directory segments only, not substrings (e.g., `buildSrc/` should NOT be excluded). Note: `bin/` is NOT excluded by default because Node.js and Ruby projects use `bin/` for CLI launchers; .NET users can add `bin/` to `.understandignore`.
- **Lock files:** `*.lock`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`
- **Binary/asset files:** `.png`, `.jpg`, `.jpeg`, `.gif`, `.svg`, `.ico`, `.woff`, `.woff2`, `.ttf`, `.eot`, `.mp3`, `.mp4`, `.pdf`, `.zip`, `.tar`, `.gz`
- **Generated files:** `*.min.js`, `*.min.css`, `*.map`, `*.generated.*` (note: do NOT exclude `*.d.ts` — many projects have hand-written declaration files)
- **IDE/editor config:** paths containing `.idea/`, `.vscode/`
- **Misc non-source:** `LICENSE`, `.gitignore`, `.editorconfig`, `.prettierrc`, `.eslintrc*`, `*.log`
+### Step B (bundled `scan-project.mjs`) -- File enumeration + language + category + lines

-**IMPORTANT:** Do NOT exclude non-code project files. The following MUST be kept:
- Documentation: `*.md`, `*.rst`, `*.txt` (except `LICENSE`)
- Configuration: `*.yaml`, `*.yml`, `*.json`, `*.toml`, `*.xml`, `*.cfg`, `*.ini`, `*.env`, `*.env.example` (include `.env` in the file list but downstream agents should NEVER include `.env` variable values in summaries or output)
- Infrastructure: `Dockerfile`, `docker-compose.*`, `*.tf`, `Makefile`, `Jenkinsfile`, `Procfile`, `Vagrantfile`
- CI/CD: `.github/workflows/*`, `.gitlab-ci.yml`, `.circleci/*`, `Jenkinsfile`
- Data/Schema: `*.sql`, `*.graphql`, `*.gql`, `*.proto`, `*.prisma`, `*.schema.json`
- Web markup: `*.html`, `*.css`, `*.scss`, `*.sass`, `*.less`
- Shell scripts: `*.sh`, `*.bash`, `*.ps1`, `*.bat`
- Kubernetes: `*.k8s.yaml`, `*.k8s.yml`, paths containing `k8s/`, paths containing `kubernetes/`
+Invoke the bundled scan script. It walks the project (preferring `git ls-files`, falling back to a recursive walk for non-git directories), applies `.understandignore` filtering (defaults + user patterns), assigns `language` and `fileCategory` per the canonical tables, counts lines, and writes deterministic JSON. You do not see or maintain those tables — they live in the script.

-**Note on package manifests:** Config files read for framework detection (`package.json`, `tsconfig.json`, `Cargo.toml`, `go.mod`, `pyproject.toml`, etc.) should also appear in the file list with `fileCategory: "config"`.
-
-**Step 2.5 -- User-Configured Filtering (.understandignore)**
-
-When `.understandignore` files exist, **replace** Step 2's hardcoded filtering with a unified filter that combines defaults and user patterns in a single pass. This ensures `!` negation patterns can override defaults.
-
-1. Check if `$PROJECT_ROOT/.understand-anything/.understandignore` exists. If so, read it.
-2. Check if `$PROJECT_ROOT/.understandignore` exists. If so, read it.
-3. If neither file exists, skip this step entirely — Step 2's hardcoded filtering is sufficient.
-4. If at least one file exists, re-filter the **original file list from Step 1** (not the Step 2 output) using the `createIgnoreFilter` function from `@understand-anything/core`, which merges hardcoded defaults and user patterns into a single `.gitignore`-compatible matcher. This ensures `!` negation in user files can override hardcoded defaults (e.g., `!dist/` force-includes dist/ files).
-5. Track the count of additional files removed beyond Step 2's baseline as `filteredByIgnore`.
-
-This filtering must be deterministic (not LLM-based). Use a Node.js script with the `ignore` npm package from `@understand-anything/core`.
-
-**Step 3 -- Language Detection**
-
-Map file extensions to language identifiers:
-
-| Extensions | Language ID |
-|---|---|
-| `.ts`, `.tsx` | `typescript` |
-| `.js`, `.jsx` | `javascript` |
-| `.py` | `python` |
-| `.go` | `go` |
-| `.rs` | `rust` |
-| `.java` | `java` |
-| `.rb` | `ruby` |
-| `.cpp`, `.cc`, `.cxx`, `.h`, `.hpp` | `cpp` |
-| `.c` | `c` |
-| `.cs` | `csharp` |
-| `.swift` | `swift` |
-| `.kt` | `kotlin` |
-| `.php` | `php` |
-| `.vue` | `vue` |
-| `.svelte` | `svelte` |
-| `.sh`, `.bash` | `shell` |
-| `.ps1` | `powershell` |
-| `.bat`, `.cmd` | `batch` |
-| `.md`, `.rst` | `markdown` |
-| `.yaml`, `.yml` | `yaml` |
-| `.json` | `json` |
-| `.jsonc` | `jsonc` |
-| `.toml` | `toml` |
-| `.sql` | `sql` |
-| `.graphql`, `.gql` | `graphql` |
-| `.proto` | `protobuf` |
-| `.tf`, `.tfvars` | `terraform` |
-| `.html`, `.htm` | `html` |
-| `.css`, `.scss`, `.sass`, `.less` | `css` |
-| `.xml` | `xml` |
-| `.cfg`, `.ini`, `.env` | `config` |
-| `Dockerfile` (no extension) | `dockerfile` |
-| `Makefile` (no extension) | `makefile` |
-| `Jenkinsfile` (no extension) | `jenkinsfile` |
-
-**Fallback:** If a file's extension is not in the table above, set `language` to the lowercased extension (without the leading dot), or `"unknown"` if there is no extension. Never emit `null` — downstream consumers rely on this field being a string.
-
-Collect unique languages, sorted alphabetically.
-
-**Step 4 -- File Category Detection**
-
-Assign a `fileCategory` to each discovered file based on its extension and path:
-
-| Pattern | Category |
-|---|---|
-| `.md`, `.rst`, `.txt` (except `LICENSE`) | `docs` |
-| `.yaml`, `.yml`, `.json`, `.jsonc`, `.toml`, `.xml`, `.cfg`, `.ini`, `.env`, `tsconfig.json`, `package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod` | `config` |
-| `Dockerfile`, `docker-compose.*`, `.tf`, `.tfvars`, `Makefile`, `Jenkinsfile`, `Procfile`, `Vagrantfile`, `.github/workflows/*`, `.gitlab-ci.yml`, `.circleci/*`, `*.k8s.yaml`, `*.k8s.yml`, paths in `k8s/` or `kubernetes/` | `infra` |
-| `.sql`, `.graphql`, `.gql`, `.proto`, `.prisma`, `*.schema.json`, `.csv` | `data` |
-| `.sh`, `.bash`, `.ps1`, `.bat` | `script` |
-| `.html`, `.htm`, `.css`, `.scss`, `.sass`, `.less` | `markup` |
-| All other extensions (`.ts`, `.tsx`, `.js`, `.py`, `.go`, `.rs`, etc.) | `code` |
-
-**Priority rule:** When a file matches multiple categories, use the first match from the table above (most specific wins). For example, `docker-compose.yml` is `infra`, not `config`.
-
-**Step 5 -- Line Counting**
-
-For each file, count lines using `wc -l`. For efficiency:
- If fewer than 500 files, count all of them
- If 500+ files, count all of them but batch the `wc -l` calls (pass multiple files per invocation to avoid spawning thousands of processes)
-
-**Step 6 -- Framework Detection**
-
-Read config files (if they exist) and extract framework information:
- `package.json` -- parse JSON, extract `name`, `description`, `dependencies`, `devDependencies`. Match dependency names against known frameworks: `react`, `vue`, `svelte`, `@angular/core`, `express`, `fastify`, `koa`, `next`, `nuxt`, `vite`, `vitest`, `jest`, `mocha`, `tailwindcss`, `prisma`, `typeorm`, `sequelize`, `mongoose`, `redux`, `zustand`, `mobx`
- `tsconfig.json` -- if present, confirms TypeScript usage
- `Cargo.toml` -- if present, confirms Rust project; extract `[package].name`
- `go.mod` -- if present, confirms Go project; extract module name
- `requirements.txt` -- if present, confirms Python project; read line by line and match package names (strip version specifiers) against known Python frameworks: `django`, `djangorestframework`, `fastapi`, `flask`, `sqlalchemy`, `alembic`, `celery`, `pydantic`, `uvicorn`, `gunicorn`, `aiohttp`, `tornado`, `starlette`, `pytest`, `hypothesis`, `channels`
- `pyproject.toml` -- if present, confirms Python project; parse the `[project].dependencies` or `[tool.poetry.dependencies]` section and apply the same Python framework keyword matching as above. Also check for `[tool.pytest.ini_options]` (confirms pytest) and `[tool.django]` (confirms Django).
- `setup.py` / `setup.cfg` / `Pipfile` -- if present, confirms Python project; read and apply Python framework keyword matching
- `Gemfile` -- if present, confirms Ruby project; read and match gem names against known Ruby frameworks: `rails`, `railties`, `sinatra`, `grape`, `rspec`, `sidekiq`, `activerecord`, `actionpack`, `devise`, `pundit`
- `go.mod` dependencies -- if present, read the `require` block and match module paths against known Go frameworks: `github.com/gin-gonic/gin`, `github.com/labstack/echo`, `github.com/gofiber/fiber`, `github.com/go-chi/chi`, `gorm.io/gorm`
- `Cargo.toml` dependencies -- if present, read `[dependencies]` and match crate names against known Rust frameworks: `actix-web`, `axum`, `rocket`, `diesel`, `tokio`, `serde`, `warp`
- `pom.xml` / `build.gradle` / `build.gradle.kts` -- if present, confirms Java/Kotlin project; match dependency names against known JVM frameworks: `spring-boot`, `spring-web`, `spring-data`, `quarkus`, `micronaut`, `hibernate`, `jakarta`, `junit`, `ktor`
-
-Also detect infrastructure tooling from discovered files:
- Presence of `Dockerfile` -> add `Docker` to frameworks
- Presence of `docker-compose.yml` or `docker-compose.yaml` -> add `Docker Compose` to frameworks
- Presence of `*.tf` files -> add `Terraform` to frameworks
- Presence of `.github/workflows/*.yml` -> add `GitHub Actions` to frameworks
- Presence of `.gitlab-ci.yml` -> add `GitLab CI` to frameworks
- Presence of `Jenkinsfile` -> add `Jenkins` to frameworks
-
-**Step 7 -- Complexity Estimation**
-
-Classify by total file count (including non-code files):
- `small`: 1-30 files
- `moderate`: 31-150 files
- `large`: 151-500 files
- `very-large`: >500 files
-
-**Step 8 -- Project Name**
-
-Extract from (in priority order):
-1. `package.json` `name` field
-2. `Cargo.toml` `[package].name`
-3. `go.mod` module path (last segment)
-4. `pyproject.toml` -- check `[project].name` first, then `[tool.poetry].name`
-5. Directory name of project root
-
-**Step 9 -- Import Resolution**
-
-For each **code-category** file in the discovered list (`fileCategory === "code"`), extract and resolve relative import statements. The goal is to produce a map from each file's path to the list of project-internal files it imports. External package imports are ignored.
-
-**Non-code files** (config, docs, infra, data, script, markup) should have an empty array `[]` in the import map — they do not participate in code-level import resolution.
-
-For each code file, read its content and extract import paths using language-appropriate patterns:
-
-| Language | Import patterns to match |
-|---|---|
-| TypeScript/JavaScript | Relative: `import ... from './...'` or `'../'`, `require('./...')` or `require('../...')`. **Plus path aliases** from `tsconfig.json` `compilerOptions.paths` and `baseUrl` (e.g. `@/foo` → `<baseUrl>/foo`, `~/foo` → `<baseUrl>/foo`). Read tsconfig.json (if present) and resolve every alias prefix against the discovered file list with the standard extension probes. |
-| Python | Both relative AND absolute. Relative: `from .x import y`, `from ..x import y`, `from . import x`. Absolute: `import a.b.c`, `from a.b.c import x[, y, ...]` — try every dotted path against the discovered file list (see resolution algorithm below) and keep matches; non-matches are external packages and are dropped. |
-| Go | Paths in `import (...)` blocks that start with the module path from `go.mod` |
-| Rust | `use crate::`, `use super::`, `mod x` (within the same crate) |
-| Java | `import com.example.foo.Bar;` — try `**/com/example/foo/Bar.java` against the discovered file list; keep matches |
-| Kotlin | `import com.example.foo.Bar` — try `**/com/example/foo/Bar.kt` against the discovered file list; keep matches |
-| Ruby | Relative: `require_relative '...'` paths. **Plus** `require 'foo/bar'` (load-path) — try `lib/foo/bar.rb`, `app/foo/bar.rb`, `foo/bar.rb` against the discovered file list. |
-| PHP | `use Vendor\Pkg\Class;` — read `composer.json` `autoload.psr-4` map (e.g. `"App\\": "src/"`), translate the namespace prefix to its directory, then try `<dir>/Pkg/Class.php` against the discovered file list. Skip imports whose namespace prefix isn't in the autoload map. |
-| C / C++ | `#include "foo.h"` (relative to the includer's directory) and `#include <foo.h>` — for both, also probe `include/foo.h`, `src/foo.h`, and the bare path against the discovered file list. Match `.h`, `.hpp`, `.hxx`, `.cuh`. |
-
-For each extracted import path:
-1. Compute the resolved file path relative to project root:
-   - For relative imports (`./x`, `../x`): resolve from the importing file's directory
-   - Try these extension variants in order if the import has no extension: `.ts`, `.tsx`, `.js`, `.jsx`, `/index.ts`, `/index.js`, `/index.tsx`, `/index.jsx`, `.py`, `.go`, `.rs`, `.rb`
-2. Check if the resolved path exists in the discovered file list
-3. If yes: add to this file's resolved imports list
-4. If no: skip (external, unresolvable, or dynamic import)
-
-**Python absolute imports — resolution algorithm.** This is the dominant import style in real Python projects, so it MUST be handled:
-
-For `import a.b.c`, try (in order, take first match in the discovered file list):
- `a/b/c.py`
- `a/b/c/__init__.py`
-
-For `from a.b.c import x, y, z`, try (in order, take first match for the module path):
- `a/b/c.py`
- `a/b/c/__init__.py`
-
-If the module path matched as a package (`__init__.py`), additionally probe each imported name `x`/`y`/`z` against:
- `a/b/c/x.py`
- `a/b/c/x/__init__.py`
-
-so that `from package import submodule` resolves to the submodule file. Skip names that don't match (they're class/function imports from inside the package, already covered by the `__init__.py` match).
-
-If NO probe matches, the import is external — drop it.
-
-**Worked example.** Discovered files include `src/utils/formatter.py`, `src/utils/__init__.py`. The line `from src.utils import formatter` resolves to `src/utils/__init__.py` (module match) AND `src/utils/formatter.py` (submodule probe). Both are added to the importer's resolved list.
-
-Output format in the script result:
-```json
-"importMap": {
-  "src/index.ts": ["src/utils.ts", "src/config.ts"],
-  "src/utils.ts": [],
-  "README.md": [],
-  "Dockerfile": [],
-  "src/components/App.tsx": ["src/hooks/useAuth.ts", "src/store/index.ts"]
-}
+```bash
+mkdir -p $PROJECT_ROOT/.understand-anything/tmp
+node $PLUGIN_ROOT/skills/understand/scan-project.mjs \
+  "$PROJECT_ROOT" \
+  "$PROJECT_ROOT/.understand-anything/tmp/ua-scan-files.json"
 ```

-Keys are project-relative paths. Values are arrays of resolved project-relative paths. Every key in the file list must appear in `importMap` (use an empty array `[]` if no imports were resolved). External packages and unresolvable imports are omitted entirely.
-
-### Script Output Format
-
-The script must write this exact JSON structure to the output file:
+Output JSON shape (you will read this verbatim and merge into the final scan-result):

 ```json
 {
  "scriptCompleted": true,
-  "name": "project-name",
-  "rawDescription": "Description from package.json or empty string",
-  "readmeHead": "First 10 lines of README.md or empty string",
-  "languages": ["javascript", "markdown", "typescript", "yaml"],
-  "frameworks": ["React", "Vite", "Vitest", "Docker"],
  "files": [
    {"path": "src/index.ts", "language": "typescript", "sizeLines": 150, "fileCategory": "code"},
    {"path": "README.md", "language": "markdown", "sizeLines": 45, "fileCategory": "docs"},
@@ -261,50 +74,106 @@ The script must write this exact JSON structure to the output file:
  "totalFiles": 42,
  "filteredByIgnore": 0,
  "estimatedComplexity": "moderate",
-  "importMap": {
-    "src/index.ts": ["src/utils.ts", "src/config.ts"],
-    "src/utils.ts": [],
-    "README.md": [],
-    "Dockerfile": [],
-    "package.json": []
+  "stats": {
+    "filesScanned": 42,
+    "byCategory": {"code": 28, "config": 6, "docs": 4, "infra": 2, "script": 2},
+    "byLanguage": {"typescript": 22, "javascript": 6, "json": 5, "markdown": 4, "yaml": 3, "shell": 2}
  }
 }
 ```

- `scriptCompleted` (boolean) -- always `true` when the script finishes normally
- `name` (string) -- project name extracted from config or directory name
- `rawDescription` (string) -- raw description from `package.json` or empty string
- `readmeHead` (string) -- first 10 lines of `README.md` or empty string if no README exists
- `languages` (string[]) -- deduplicated, sorted alphabetically
- `frameworks` (string[]) -- only confirmed frameworks; empty array if none detected
- `files` (object[]) -- every discovered file, sorted by `path` alphabetically
- `files[].fileCategory` (string) -- one of: `code`, `config`, `docs`, `infra`, `data`, `script`, `markup`
- `totalFiles` (integer) -- must equal `files.length`
- `filteredByIgnore` (integer) -- count of files removed by `.understandignore` patterns in Step 2.5; 0 if no `.understandignore` file exists
- `estimatedComplexity` (string) -- one of `small`, `moderate`, `large`, `very-large`
- `importMap` (object) -- map from every file path to its list of resolved project-internal import paths; empty array for non-code files and files with no resolved imports; external packages excluded
+The script:
+- sorts `files` by `path.localeCompare` (deterministic)
+- emits `fileCategory ∈ {code, config, docs, infra, data, script, markup}` per file (priority-ordered per the rules below)
+- emits `language` as a non-null string for every file (canonical id for known extensions, lowercased extension for unknowns, `"unknown"` for no-extension files that don't match `Dockerfile` / `Makefile` / `Jenkinsfile`)
+- counts `filteredByIgnore` as the delta beyond hardcoded defaults — `!`-negation in `.understandignore` correctly re-includes files
+- emits `Warning: scan-project: <path> — <reason> — file skipped from output` on stderr for per-file failures (permission denied, malformed unicode, vanished file). Capture these and append to phase warnings.
+- emits `scan-project: filesScanned=… filteredByIgnore=… complexity=…` as the final stderr summary line; informational only.

-### Executing the Script
+**Canonical category table** (for the record — the script is authoritative; do NOT re-derive these rules in your prompt):

-After writing the script, execute it. `$PROJECT_ROOT` is the project root directory provided in your dispatch prompt:
+| Pattern | Category |
+|---|---|
+| `LICENSE` | `code` (exception — not docs) |
+| `Dockerfile`, `Dockerfile.*`, `docker-compose.*`, `compose.yml`/`compose.yaml`, `Makefile`, `Jenkinsfile`, `Procfile`, `Vagrantfile`, `.gitlab-ci.yml`, `.dockerignore`, `.github/workflows/*`, `.circleci/*`, paths in `k8s/` or `kubernetes/`, `*.k8s.yml`/`*.k8s.yaml` | `infra` |
+| `.md`, `.mdx`, `.rst`, `.txt`, `.text` (except `LICENSE`) | `docs` |
+| `.yaml`, `.yml`, `.json`, `.jsonc`, `.toml`, `.xml`, `.xsl`, `.xsd`, `.plist`, `.cfg`, `.ini`, `.env`, `.properties`, `.csproj`, `.sln`, `.mod`, `.sum`, `.gradle` | `config` |
+| `.tf`, `.tfvars` | `infra` |
+| `.sql`, `.graphql`, `.gql`, `.proto`, `.prisma`, `.csv`, `.tsv` | `data` |
+| `.sh`, `.bash`, `.zsh`, `.ps1`, `.psm1`, `.psd1`, `.bat`, `.cmd` | `script` |
+| `.html`, `.htm`, `.css`, `.scss`, `.sass`, `.less` | `markup` |
+| Everything else | `code` |
+
+**Priority rule:** most-specific wins. Filename / path rules fire before extension rules — e.g., `docker-compose.yml` is `infra` (not `config`); `.github/workflows/ci.yml` is `infra` (not `config`); `LICENSE` is `code` (not `docs`).
+
+**`.understandignore` behavior:** the bundled script reads `.understandignore` and `.understand-anything/.understandignore` if present and merges them with the hardcoded defaults via `createIgnoreFilter`. `!`-negation overrides defaults (`!dist/` would re-include `dist/` files). The `filteredByIgnore` counter measures only user-driven drops, not baseline default drops.
+
+If the script exits with a non-zero status, read stderr to diagnose. You have up to 2 retry attempts (re-invocations) before failing the phase. Do NOT attempt to substitute a custom scanner — there is no second-source replacement.
+
+### Step C -- Import Resolution (bundled `extract-import-map.mjs`)
+
+After Step B has produced the file list, invoke the bundled `extract-import-map.mjs` script for deterministic import extraction across all supported code languages. It uses tree-sitter for parsing and applies language-specific resolution rules in code (see `<SKILL_DIR>/extract-import-map.mjs`).
+
+**Do not** attempt to re-implement import patterns. Step B emits `path`/`language`/`fileCategory` for every file; this script consumes that list and produces the `importMap`.
+
+Write the input JSON for the bundled script (the `files[]` array is exactly Step B's `files[]` — pass it through verbatim):

 ```bash
-node $PROJECT_ROOT/.understand-anything/tmp/ua-project-scan.js "$PROJECT_ROOT" "$PROJECT_ROOT/.understand-anything/tmp/ua-scan-results.json"
+mkdir -p $PROJECT_ROOT/.understand-anything/tmp
+cat > $PROJECT_ROOT/.understand-anything/tmp/ua-import-map-input.json << 'ENDJSON'
+{
+  "projectRoot": "<absolute-project-root>",
+  "files": [
+    {"path": "src/index.ts", "language": "typescript", "fileCategory": "code"},
+    {"path": "README.md", "language": "markdown", "fileCategory": "docs"}
+  ]
+}
+ENDJSON
 ```

-(Or the equivalent for Python, depending on which language you chose.)
+Then run:

-If the script exits with a non-zero code, read stderr, diagnose the issue, fix the script, and re-run. You have up to 2 retry attempts.
+```bash
+node $PLUGIN_ROOT/skills/understand/extract-import-map.mjs \
+  $PROJECT_ROOT/.understand-anything/tmp/ua-import-map-input.json \
+  $PROJECT_ROOT/.understand-anything/tmp/ua-import-map-output.json
+```
+
+The output JSON has shape:
+
+```json
+{
+  "scriptCompleted": true,
+  "stats": { "filesScanned": 314, "filesWithImports": 142, "totalEdges": 487 },
+  "importMap": {
+    "src/index.ts": ["src/utils.ts", "src/config.ts"],
+    "src/utils.ts": [],
+    "README.md": [],
+    "Dockerfile": []
+  }
+}
+```
+
+Read the output JSON and merge the `importMap` field directly into your final scan-result.json (under the same key — `importMap`). The format matches the project-scanner contract: every input file has an entry; non-code files have empty arrays; resolved internal paths only (external packages are dropped).
+
+**Capture stderr** when you run the bundled script. Any line starting with `Warning:` should be appended to phase warnings — the SKILL.md orchestrator captures these for the final report. The script also writes a one-line summary `extract-import-map: filesScanned=… filesWithImports=… totalEdges=…` on completion; you can ignore that line or surface it as informational.
+
+**Languages supported.** The bundled script natively handles import resolution for: TypeScript, JavaScript (including CJS `require()`), Python (relative + absolute + `__init__.py`), Go (go.mod prefix stripping), Rust (`use crate::`, `use super::`, `use self::`, and `mod x;` declarations), Java, Kotlin, C#, Ruby (`require` + `require_relative`), PHP (composer.json PSR-4 autoload), C, and C++ (`#include` with relative + include/ + src/ probes). Languages outside this set get empty arrays — there is no LLM-based fallback.

 ---

 ## Phase 2 -- Description and Final Assembly

-After the script completes, read `$PROJECT_ROOT/.understand-anything/tmp/ua-scan-results.json`. Do NOT re-run file discovery commands or re-count lines -- trust the script's results entirely.
+After Steps A + B + C have all completed, read:
+1. `$PROJECT_ROOT/.understand-anything/tmp/ua-scan-files.json` — output of `scan-project.mjs` (file list with language, sizeLines, fileCategory; plus `totalFiles`, `filteredByIgnore`, `estimatedComplexity`).
+2. `$PROJECT_ROOT/.understand-anything/tmp/ua-import-map-output.json` — output of `extract-import-map.mjs` (the `importMap` field).
+3. Your Step A in-memory notes (`name`, `rawDescription`, `readmeHead`, `frameworks`, `languages` narrative).

-**IMPORTANT:** The final output must NOT contain the `scriptCompleted`, `rawDescription`, or `readmeHead` fields. These are intermediate script fields only. Strip them when assembling the final JSON. All other fields — including `importMap` — MUST be preserved exactly as output by the script.
+Do NOT re-walk the file tree, re-count lines, or re-derive categories — trust `scan-project.mjs` entirely. Do NOT re-implement import resolution — trust `extract-import-map.mjs` entirely.

-Your only task in this phase is to produce the final `description` field:
+**IMPORTANT:** The final output must NOT contain the `scriptCompleted` or `stats` fields from either bundled script, nor your transient `rawDescription` / `readmeHead` work-strings. Strip them when assembling the final JSON. The final `importMap` MUST equal the `importMap` field from `extract-import-map.mjs` verbatim (do not edit, re-sort, or filter it). The final `files` array MUST equal Step B's `files` array verbatim (do not re-order, drop, or augment it).
+
+Your only synthesis task in this phase is the final `description` field:

 1. If `rawDescription` is non-empty, use it as the basis. Clean it up if needed (remove marketing fluff, ensure it is 1-2 sentences).
 2. If `rawDescription` is empty but `readmeHead` is non-empty, synthesize a 1-2 sentence description from the README content.
@@ -334,25 +203,25 @@ Then assemble the final output JSON:
 ```

 **Field requirements:**
- `name` (string): directly from script output
+- `name` (string): from your Step A narrative work
 - `description` (string): your synthesized 1-2 sentence description
- `languages` (string[]): directly from script output
- `frameworks` (string[]): directly from script output
- `files` (object[]): directly from script output, including `fileCategory` per file
- `totalFiles` (integer): directly from script output
- `filteredByIgnore` (integer): directly from script output
- `estimatedComplexity` (string): directly from script output
- `importMap` (object): directly from script output
+- `languages` (string[]): from your Step A narrative work (deduplicated, sorted alphabetically; cross-checked against Step B's `stats.byLanguage` keys)
+- `frameworks` (string[]): from your Step A narrative work; only confirmed frameworks (empty array if none detected)
+- `files` (object[]): directly from Step B's `files[]` (verbatim, including `fileCategory`)
+- `totalFiles` (integer): directly from Step B
+- `filteredByIgnore` (integer): directly from Step B
+- `estimatedComplexity` (string): directly from Step B
+- `importMap` (object): directly from Step C's `importMap` field

 ## Critical Constraints

- NEVER invent or guess file paths. Every `path` in the `files` array must come from the script's file discovery, which in turn comes from `git ls-files` or a real directory listing.
+- NEVER invent or guess file paths. Every `path` in the `files` array must come from `scan-project.mjs`'s output (which itself comes from `git ls-files` or a real directory listing).
 - NEVER include files that do not exist on disk.
 - ALWAYS validate that `totalFiles` matches the actual length of the `files` array.
- ALWAYS sort `files` by `path` for deterministic output.
- Include ALL discovered project files in `files` -- code, configs, docs, infrastructure, and data files. Only exclude binaries, lock files, generated files, and dependency directories.
- Every file MUST have a `fileCategory` field with one of: `code`, `config`, `docs`, `infra`, `data`, `script`, `markup`.
- Trust the script's output for all structural data. Your only contribution is the `description` field.
+- Trust Step B for file enumeration + language detection + category assignment + line counts + complexity. Trust Step C for `importMap`. Your only synthesis is the `description` field (plus the Step A narrative fields: `name`, `frameworks`, `languages`).
+- Do NOT re-implement file enumeration, language detection, or category assignment in your discovery script. Use the bundled `scan-project.mjs`. If the table doesn't cover your project type, file an issue rather than ad-hoc handling.
+- Do NOT attempt to re-implement import resolution. The bundled `extract-import-map.mjs` handles all 12 supported code languages (TS, JS, Python, Go, Rust, Java, Kotlin, C#, Ruby, PHP, C, C++) deterministically via tree-sitter + per-language resolvers.
+- Every file MUST have a `fileCategory` field with one of: `code`, `config`, `docs`, `infra`, `data`, `script`, `markup` — `scan-project.mjs` guarantees this; just don't strip it.

 ## Writing Results

@@ -1,15 +1,17 @@
 {
  "name": "@understand-anything/skill",
-  "version": "2.7.4",
+  "version": "2.7.5",
  "type": "module",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
  "scripts": {
    "build": "tsc",
-    "test": "vitest run"
+    "test": "node -e \"console.log('skill tests live at <repo-root>/tests/skill — run via root \\`pnpm test\\`')\""
  },
  "dependencies": {
-    "@understand-anything/core": "workspace:*"
+    "@understand-anything/core": "workspace:*",
+    "graphology": "~0.26.0",
+    "graphology-communities-louvain": "^2.0.2"
  },
  "devDependencies": {
    "@types/node": "^22.0.0",
@@ -0,0 +1,7 @@
+import { defineConfig } from 'vitest/config';
+
+export default defineConfig({
+  test: {
+    include: ['src/**/*.test.{ts,tsx,mjs}'],
+  },
+});
@@ -275,26 +275,32 @@ If the scan result includes `filteredByIgnore > 0`, report:

 ---

+## Phase 1.5 — BATCH
+
+Report: `[Phase 1.5/7] Computing semantic batches...`
+
+Run the bundled batching script:
+```bash
+node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT
+```
+
+Reads `.understand-anything/intermediate/scan-result.json`, writes `.understand-anything/intermediate/batches.json`.
+
+Capture stderr. Append any line starting with `Warning:` to `$PHASE_WARNINGS` for the final report.
+
+If the script exits non-zero, the failure is hard — relay the full stderr to the user as a Phase 1.5 failure. Do not attempt to recover; the script's internal fallback (count-based) already handles recoverable issues. A non-zero exit means a fundamental problem (missing input file, malformed JSON, etc.).
+
+---
+
 ## Phase 2 — ANALYZE

 ### Full analysis path

-Batch the file list from Phase 1 into groups of **20-30 files each** (aim for ~25 files per batch for balanced sizes).
+Load `.understand-anything/intermediate/batches.json` (produced by Phase 1.5). Iterate the `batches[]` array.

-**Batching strategy for non-code files:**
- Group related non-code files together in the same batch when possible:
-  - Dockerfile + docker-compose.yml + .dockerignore → same batch
-  - SQL migration files → same batch (ordered by filename)
-  - CI/CD config files (.github/workflows/*) → same batch
-  - Documentation files (docs/*.md) → same batch
- This allows the file-analyzer to create cross-file edges (e.g., docker-compose `depends_on` Dockerfile)
- Non-code files can be mixed with code files in the same batch if batch sizes are small
- Each file's `fileCategory` from Phase 1 must be included in the batch file list
+Report: `[Phase 2/7] Analyzing files — <totalFiles> files in <totalBatches> batches (up to 5 concurrent)...`

-After batching, report the plan to the user:
-> `[Phase 2/7] Analyzing files — <totalFiles> files in <totalBatches> batches (up to 5 concurrent)...`
-
-For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently** using parallel dispatch. Append the following additional context:
+For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently**. Append the following additional context:

 > **Additional context from main session:**
 >
@@ -303,14 +309,7 @@ For each batch, dispatch a subagent using the `file-analyzer` agent definition (
 >
 > $LANGUAGE_DIRECTIVE

-Before dispatching each batch, construct `batchImportData` from `$IMPORT_MAP`:
-```json
-batchImportData = {}
-for each file in this batch:
-  batchImportData[file.path] = $IMPORT_MAP[file.path] ?? []
-```
-
-Fill in batch-specific parameters below and dispatch:
+Dispatch prompt template (fill in batch-specific values from `batches.json[i]`):

 > Analyze these files and produce GraphNode and GraphEdge objects.
 > Project root: `$PROJECT_ROOT`
@@ -318,11 +317,16 @@ Fill in batch-specific parameters below and dispatch:
 > Languages: `<languages>`
 > Batch: `<batchIndex>/<totalBatches>`
 > Skill directory (for bundled scripts): `<SKILL_DIR>`
-> Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json`
+> Output: write to `$PROJECT_ROOT/.understand-anything/intermediate/batch-<batchIndex>.json` (single-file mode) OR `batch-<batchIndex>-part-<k>.json` (split mode, per Step B of your output protocol).
 >
-> Pre-resolved import data for this batch (use this for all import edge creation — do NOT re-resolve imports from source):
+> Pre-resolved import data for this batch (use directly — do NOT re-resolve imports from source):
 > ```json
-> <batchImportData JSON>
+> <batchImportData JSON from batches.json[i].batchImportData>
+> ```
+>
+> Cross-batch neighbors with their exported symbols (confidence boost for cross-batch edges):
+> ```json
+> <neighborMap JSON from batches.json[i].neighborMap>
 > ```
 >
 > Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`):
@@ -330,6 +334,8 @@ Fill in batch-specific parameters below and dispatch:
 > 2. `<path>` (<sizeLines> lines, language: `<language>`, fileCategory: `<fileCategory>`)
 > ...

+**Output naming is per-batchIndex — no fusion.** If you fuse multiple small batches into a single file-analyzer dispatch for token efficiency, the dispatched agent must STILL write one output file per original `batchIndex` using `batch-<batchIndex>.json` or `batch-<batchIndex>-part-<k>.json`. The merge script's regex (`batch-(\d+)(?:-part-(\d+))?\.json`) silently drops any other naming (e.g., `batch-fused-8-13.json`, `batch-8-13.json`), losing every node and edge in that file. After each dispatch returns, verify each `batchIndex` in the dispatched input has a corresponding `batch-<batchIndex>.json` (or `batch-<batchIndex>-part-*.json`) on disk before proceeding to the next dispatch.
+
 After ALL batches complete, report to the user: `Phase 2 complete. All <totalBatches> batches analyzed.`

 Run the merge-and-normalize script bundled with this skill (located next to this SKILL.md file — use the skill directory path, not the project root):
@@ -337,7 +343,7 @@ Run the merge-and-normalize script bundled with this skill (located next to this
 python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
 ```

-This script reads all `batch-*.json` files from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass:
+This script reads all `batch-*.json` files (including `batch-<i>-part-<k>.json` produced by file-analyzers that split their output) from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass:
 - Combines all nodes and edges across batches
 - Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes)
 - Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.)
@@ -346,7 +352,7 @@ This script reads all `batch-*.json` files from `$PROJECT_ROOT/.understand-anyth
 - Drops dangling edges referencing missing nodes
 - Logs all corrections and dropped items to stderr

-The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place (the LLM systematically emits `test → production` because it sees the import only when analyzing the test file); semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings (`X.ts` ↔ `X.test.ts`, JS/TS `__tests__/` and `<dir>/test/` walk-out, Python in-package `tests/`, Go `_test.go` sibling, Maven/Gradle `src/test/...` ↔ `src/main/...`, .NET `<svc>/tests/` ↔ `<svc>/src/...` and `<App>.Tests/` ↔ `<App>/`). Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`.
+The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place; semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings. Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`.

 Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`

@@ -354,7 +360,20 @@ Include the script's warnings in `$PHASE_WARNINGS` for the reviewer.

 ### Incremental update path

-Use the changed files list from Phase 0. Batch and dispatch file-analyzer subagents using the same process as above (20-30 files per batch, up to 5 concurrent, with batchImportData constructed from $IMPORT_MAP), but only for changed files.
+Write the changed-files list (one path per line) to a temp file:
+```bash
+git diff <lastCommitHash>..HEAD --name-only > $PROJECT_ROOT/.understand-anything/tmp/changed-files.txt
+```
+
+Run compute-batches with `--changed-files`:
+```bash
+node <SKILL_DIR>/compute-batches.mjs $PROJECT_ROOT \
+  --changed-files=$PROJECT_ROOT/.understand-anything/tmp/changed-files.txt
+```
+
+This produces a `batches.json` that contains only batches with changed files, but neighborMap entries still reference unchanged files (with their full-graph batchIndex) so cross-batch edges remain emittable.
+
+Then dispatch file-analyzer subagents per the same template as the full path.

 After batches complete:
 1. Remove old nodes whose `filePath` matches any changed file from the existing graph
@@ -0,0 +1,555 @@
+#!/usr/bin/env node
+/**
+ * compute-batches.mjs — Phase 1.5 of /understand
+ *
+ * Reads scan-result.json, runs Louvain community detection on the import
+ * graph, and writes batches.json containing batches + neighborMap.
+ *
+ * Usage:
+ *   node compute-batches.mjs <project-root> [--changed-files=<path>]
+ *
+ * Input:  <project-root>/.understand-anything/intermediate/scan-result.json
+ * Output: <project-root>/.understand-anything/intermediate/batches.json
+ */
+
+import { readFileSync, writeFileSync, existsSync, realpathSync } from 'node:fs';
+import { dirname, join, resolve } from 'node:path';
+import { fileURLToPath, pathToFileURL } from 'node:url';
+import { createRequire } from 'node:module';
+
+const __filename = fileURLToPath(import.meta.url);
+const PLUGIN_ROOT = resolve(dirname(__filename), '../..');
+const require = createRequire(resolve(PLUGIN_ROOT, 'package.json'));
+
+let core;
+try {
+  core = await import(pathToFileURL(require.resolve('@understand-anything/core')).href);
+} catch {
+  core = await import(pathToFileURL(resolve(PLUGIN_ROOT, 'packages/core/dist/index.js')).href);
+}
+const { TreeSitterPlugin, PluginRegistry, builtinLanguageConfigs, registerAllParsers } = core;
+
+import Graph from 'graphology';
+import louvain from 'graphology-communities-louvain';
+
+/**
+ * For each code file, returns its top-level exported symbol names (functions,
+ * classes, exported consts). Per-file errors are swallowed into [] with a
+ * visible warning so a single bad file does not abort batching.
+ *
+ * Returns Map<path, string[]>.
+ */
+async function extractExports(projectRoot, codeFiles) {
+  let registry;
+  try {
+    const tsConfigs = builtinLanguageConfigs.filter(c => c.treeSitter);
+    const tsPlugin = new TreeSitterPlugin(tsConfigs);
+    await tsPlugin.init();
+    registry = new PluginRegistry();
+    registry.register(tsPlugin);
+    registerAllParsers(registry);
+  } catch (err) {
+    process.stderr.write(
+      `Warning: compute-batches: tree-sitter init failed (${err.message}) ` +
+      `— all symbols=[] in neighborMap — cross-batch edges limited to file-level\n`,
+    );
+    return new Map(codeFiles.map(f => [f.path, []]));
+  }
+
+  const exportsByPath = new Map();
+  for (const file of codeFiles) {
+    const abs = join(projectRoot, file.path);
+    let content;
+    try {
+      content = readFileSync(abs, 'utf-8');
+    } catch (err) {
+      process.stderr.write(
+        `Warning: compute-batches: exports extraction failed for ${file.path} ` +
+        `(read error: ${err.message}) — symbols=[] in neighborMap — ` +
+        `cross-batch edges to this file limited to file-level\n`,
+      );
+      exportsByPath.set(file.path, []);
+      continue;
+    }
+    try {
+      const analysis = registry.analyzeFile(file.path, content);
+      const names = (analysis?.exports || []).map(e => e.name).filter(Boolean);
+      exportsByPath.set(file.path, names);
+    } catch (err) {
+      process.stderr.write(
+        `Warning: compute-batches: exports extraction failed for ${file.path} ` +
+        `(analyze error: ${err.message}) — symbols=[] in neighborMap — ` +
+        `cross-batch edges to this file limited to file-level\n`,
+      );
+      exportsByPath.set(file.path, []);
+    }
+  }
+  return exportsByPath;
+}
+
+/**
+ * Build batches for non-code files per Groups A-E in the design spec.
+ * Returns Array<{ files: FileMeta[], mergeable: boolean }> — caller assigns
+ * batchIndex. `mergeable=false` for semantic Groups A-D (Dockerfile clusters,
+ * .github/workflows, .gitlab-ci/.circleci, SQL migrations) preserves their
+ * boundary intent across the merge-small pass; Group E (catch-all parent-dir
+ * grouping) is `mergeable=true` so its tiny singletons can be pooled.
+ */
+function buildNonCodeBatches(nonCodeFiles) {
+  const byPath = new Map(nonCodeFiles.map(f => [f.path, f]));
+  const consumed = new Set();
+  const groups = [];
+
+  const dirOf = p => p.includes('/') ? p.slice(0, p.lastIndexOf('/')) : '';
+  const baseOf = p => p.includes('/') ? p.slice(p.lastIndexOf('/') + 1) : p;
+
+  // Group A: per-directory Dockerfile clusters.
+  const dirsWithDockerfile = new Set(
+    [...byPath.keys()]
+      .filter(p => baseOf(p) === 'Dockerfile')
+      .map(dirOf),
+  );
+  for (const dir of [...dirsWithDockerfile].sort()) {
+    const inDir = [...byPath.keys()].filter(p => dirOf(p) === dir);
+    const cluster = inDir.filter(p => {
+      const b = baseOf(p);
+      return b === 'Dockerfile'
+        || b === '.dockerignore'
+        || b.startsWith('docker-compose.');
+    });
+    if (cluster.length) {
+      groups.push({ files: cluster.map(p => byPath.get(p)), mergeable: false });
+      cluster.forEach(p => consumed.add(p));
+    }
+  }
+
+  // Group B: .github/workflows/*
+  const ghWorkflows = [...byPath.keys()].filter(
+    p => p.startsWith('.github/workflows/') && (p.endsWith('.yml') || p.endsWith('.yaml')),
+  ).filter(p => !consumed.has(p));
+  if (ghWorkflows.length) {
+    groups.push({ files: ghWorkflows.map(p => byPath.get(p)), mergeable: false });
+    ghWorkflows.forEach(p => consumed.add(p));
+  }
+
+  // Group C: .gitlab-ci.yml + .circleci/*
+  const ciFiles = [...byPath.keys()].filter(
+    p => (p === '.gitlab-ci.yml' || p.startsWith('.circleci/'))
+      && !consumed.has(p),
+  );
+  if (ciFiles.length) {
+    groups.push({ files: ciFiles.map(p => byPath.get(p)), mergeable: false });
+    ciFiles.forEach(p => consumed.add(p));
+  }
+
+  // Group D: SQL migrations per migrations/ or migration/ directory.
+  // Defensive consumed.has check: no upstream group consumes SQL today, but
+  // future Group additions could; keep the check for forward-compat.
+  const migrationDirs = new Set(
+    [...byPath.keys()]
+      .filter(p => p.endsWith('.sql'))
+      .map(dirOf)
+      .filter(d => /(^|\/)migrations?$/.test(d)),
+  );
+  for (const dir of migrationDirs) {
+    const sqls = [...byPath.keys()]
+      .filter(p => dirOf(p) === dir && p.endsWith('.sql') && !consumed.has(p))
+      .sort();
+    if (sqls.length) {
+      groups.push({ files: sqls.map(p => byPath.get(p)), mergeable: false });
+      sqls.forEach(p => consumed.add(p));
+    }
+  }
+
+  // Group E: all remaining grouped by immediate parent dir, max 20 per batch
+  const remainingByDir = new Map();
+  for (const p of [...byPath.keys()].sort()) {
+    if (consumed.has(p)) continue;
+    const dir = dirOf(p);
+    if (!remainingByDir.has(dir)) remainingByDir.set(dir, []);
+    remainingByDir.get(dir).push(p);
+  }
+  // Per design spec: max files per parent-dir batch for Group E.
+  const MAX_E = 20;
+  for (const [, paths] of remainingByDir) {
+    for (let i = 0; i < paths.length; i += MAX_E) {
+      const slice = paths.slice(i, i + MAX_E);
+      groups.push({ files: slice.map(p => byPath.get(p)), mergeable: true });
+    }
+  }
+
+  return groups;
+}
+
+/**
+ * Build a lookup map from file path → batchIndex across all batches (code +
+ * non-code). Used to resolve cross-batch neighbor references in neighborMap.
+ */
+function buildBatchOfMap(allBatches) {
+  const m = new Map();
+  for (const b of allBatches) {
+    for (const f of b.files) m.set(f.path, b.batchIndex);
+  }
+  return m;
+}
+
+/**
+ * Returns Map<path, communityId> via Louvain. May throw — caller must catch
+ * and fall back if it does. Honors UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW=1
+ * to allow tests to exercise the fallback path.
+ */
+function runLouvain(codeFiles, importMap) {
+  if (process.env.UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW === '1') {
+    throw new Error('forced throw via UA_COMPUTE_BATCHES_FORCE_LOUVAIN_THROW');
+  }
+  const g = new Graph({ type: 'undirected', allowSelfLoops: false });
+  for (const f of codeFiles) g.addNode(f.path);
+  for (const [src, targets] of Object.entries(importMap)) {
+    if (!g.hasNode(src)) continue;
+    for (const tgt of targets) {
+      if (!g.hasNode(tgt) || src === tgt || g.hasEdge(src, tgt)) continue;
+      g.addEdge(src, tgt);
+    }
+  }
+  const cs = louvain(g);  // { nodeId: communityId }
+  return new Map(Object.entries(cs));
+}
+
+/**
+ * Returns Map<path, communityId> via alphabetical chunking of `batchSize`
+ * files per batch. Deterministic, used as fallback when Louvain fails.
+ */
+function countBasedAssignment(codeFiles, batchSize = 12) {
+  const out = new Map();
+  const sorted = [...codeFiles].map(f => f.path).sort();
+  for (let i = 0; i < sorted.length; i++) {
+    out.set(sorted[i], `count_${Math.floor(i / batchSize)}`);
+  }
+  return out;
+}
+
+/**
+ * Pool small mergeable batches into "misc" batches to reduce dispatch overhead.
+ * Preserves semantic groupings (non-code Groups A-D, marked `mergeable=false`)
+ * regardless of size; only merges code Louvain singletons / orphans and
+ * Group E parent-dir batches that fall below MIN_BATCH_SIZE.
+ *
+ * On a 314-file microservices-demo run, vanilla Louvain produced 87 singleton
+ * communities → 87 dispatch tasks of size 1. This pass collapses them into
+ * ceil(N / MAX_MERGE_TARGET) misc batches, drastically cutting orchestration
+ * overhead while leaving the high-modularity communities untouched.
+ *
+ * Returns the rewritten batch list with reassigned batchIndex (1-based,
+ * keepers first preserving their relative order, misc batches appended).
+ */
+function mergeSmallBatches(bareBatches) {
+  // MIN_BATCH_SIZE=3: below this, file-analyzer dispatch overhead (subagent
+  // spin-up, prompt setup) dwarfs the per-file analysis cost — not worth a
+  // standalone batch.
+  const MIN_BATCH_SIZE = 3;
+  // MAX_MERGE_TARGET=25: stays below MAX_COMMUNITY_SIZE=35 so the misc-batch
+  // agent retains headroom for neighborMap context without overflowing.
+  const MAX_MERGE_TARGET = 25;
+
+  const keepers = [];
+  const smallMergeable = [];
+  for (const b of bareBatches) {
+    if (b.mergeable && b.files.length < MIN_BATCH_SIZE) {
+      smallMergeable.push(b);
+    } else {
+      keepers.push(b);
+    }
+  }
+
+  if (smallMergeable.length === 0) {
+    // Nothing to merge — strip mergeable flag and renumber for cleanliness.
+    return keepers.map((b, i) => ({
+      batchIndex: i + 1,
+      files: b.files,
+    }));
+  }
+
+  // Pool and sort deterministically by path so repeated runs match byte-for-byte.
+  const pooledFiles = smallMergeable
+    .flatMap(b => b.files)
+    .sort((a, b) => a.path.localeCompare(b.path));
+
+  const miscBatches = [];
+  for (let i = 0; i < pooledFiles.length; i += MAX_MERGE_TARGET) {
+    miscBatches.push({ files: pooledFiles.slice(i, i + MAX_MERGE_TARGET) });
+  }
+
+  // Use `Info:` rather than `Warning:` — singleton consolidation is a
+  // routine optimization, not a fallback/degrade path. Per
+  // [[feedback_visible_warnings]] only fallbacks should bubble as Warning:
+  // to the Phase 7 final report. Real warnings would get drowned out if
+  // every normal Louvain run with singletons (i.e. almost every run) added
+  // a Warning: line.
+  process.stderr.write(
+    `Info: compute-batches: merged ${smallMergeable.length} small batches ` +
+    `(${pooledFiles.length} files) into ${miscBatches.length} misc batches ` +
+    `— singletons and orphans consolidated\n`,
+  );
+
+  const final = [...keepers, ...miscBatches];
+  return final.map((b, i) => ({
+    batchIndex: i + 1,
+    files: b.files,
+  }));
+}
+
+// ── Main: load → Louvain (or count-fallback) → enrich → write batches.json ─
+async function main() {
+  const projectRoot = process.argv[2];
+  if (!projectRoot) {
+    process.stderr.write('Usage: node compute-batches.mjs <project-root> [--changed-files=<path>]\n');
+    process.exit(1);
+  }
+
+  let changedFiles = null;
+  for (const arg of process.argv.slice(3)) {
+    const m = arg.match(/^--changed-files=(.+)$/);
+    if (m) {
+      const p = m[1];
+      let content;
+      try {
+        content = readFileSync(p, 'utf-8');
+      } catch (err) {
+        process.stderr.write(
+          `Error: compute-batches: --changed-files path not readable: ${p} (${err.message})\n`,
+        );
+        process.exit(1);
+      }
+      const lines = content
+        .split('\n')
+        .map(s => s.trim())
+        .filter(Boolean);
+      changedFiles = new Set(lines);
+    }
+  }
+
+  const scanPath = join(projectRoot, '.understand-anything', 'intermediate', 'scan-result.json');
+  if (!existsSync(scanPath)) {
+    process.stderr.write(`Error: scan-result.json not found at ${scanPath}\n`);
+    process.exit(1);
+  }
+
+  const scan = JSON.parse(readFileSync(scanPath, 'utf-8'));
+  const files = scan.files || [];
+  const codeFiles = files.filter(f => f.fileCategory === 'code');
+  const nonCodeFiles = files.filter(f => f.fileCategory !== 'code');
+  const importMap = scan.importMap || {};
+
+  process.stderr.write(`Loaded ${files.length} files (${codeFiles.length} code).\n`);
+
+  const exportsByPath = await extractExports(projectRoot, codeFiles);
+
+  let algorithm = 'louvain';
+  let perFileCommunity;
+  try {
+    perFileCommunity = runLouvain(codeFiles, importMap);
+  } catch (err) {
+    process.stderr.write(
+      `Warning: compute-batches: Louvain failed (${err.message}) ` +
+      `— falling back to count-based grouping (12 files/batch) ` +
+      `— module semantic boundaries lost\n`,
+    );
+    perFileCommunity = countBasedAssignment(codeFiles, 12);
+    algorithm = 'count-fallback';
+  }
+
+  // Group files by community id
+  const filesByCommunity = new Map();
+  for (const [path, cid] of perFileCommunity) {
+    if (!filesByCommunity.has(cid)) filesByCommunity.set(cid, []);
+    filesByCommunity.get(cid).push(path);
+  }
+
+  // Size enforcement only on louvain output. count-fallback already chunked.
+  const MAX_COMMUNITY_SIZE = 35;
+  const splitCommunities = new Map();
+  let nextSyntheticId = 0;
+  if (algorithm === 'louvain') {
+    for (const [cid, paths] of filesByCommunity) {
+      if (paths.length <= MAX_COMMUNITY_SIZE) {
+        splitCommunities.set(cid, paths);
+        continue;
+      }
+      process.stderr.write(
+        `Warning: compute-batches: community size ${paths.length} > max ${MAX_COMMUNITY_SIZE} ` +
+        `— splitting via alphabetical chunking — modularity may decrease\n`,
+      );
+      const sorted = [...paths].sort();
+      const parts = Math.ceil(paths.length / MAX_COMMUNITY_SIZE);
+      const perPart = Math.ceil(paths.length / parts);
+      for (let i = 0; i < parts; i++) {
+        const slice = sorted.slice(i * perPart, (i + 1) * perPart);
+        const synthId = `__split_${cid}_${nextSyntheticId++}`;
+        splitCommunities.set(synthId, slice);
+      }
+    }
+  } else {
+    for (const [cid, paths] of filesByCommunity) splitCommunities.set(cid, paths);
+  }
+
+  // Sort communities by size desc, then by min-path asc for determinism
+  const sortedCommunities = [...splitCommunities.entries()]
+    .sort((a, b) => {
+      if (b[1].length !== a[1].length) return b[1].length - a[1].length;
+      const minA = [...a[1]].sort()[0];
+      const minB = [...b[1]].sort()[0];
+      return minA.localeCompare(minB);
+    });
+
+  // Build per-batch file list with full file metadata from scan
+  const fileMetaByPath = new Map(files.map(f => [f.path, f]));
+  // Safe: every path in a community is a graph node, and graph nodes are a
+  // subset of files (see addNode loop above). fileMetaByPath.get() can
+  // never return undefined here.
+
+  // First-pass: assemble bare batches (no batchImportData/neighborMap yet).
+  // All Louvain communities are mergeable=true so the merge-small pass can
+  // collapse singletons / 2-file orphans. Non-code groups carry per-group
+  // mergeable flags from buildNonCodeBatches (false for semantic Groups A-D,
+  // true for Group E catch-all).
+  const codeBatchObjsBare = sortedCommunities.map(([, paths], idx) => ({
+    batchIndex: idx + 1,
+    files: paths.sort().map(p => fileMetaByPath.get(p)),
+    mergeable: true,
+  }));
+  const nonCodeGroups = buildNonCodeBatches(nonCodeFiles);
+  const nonCodeBatchObjsBare = nonCodeGroups.map((g, i) => ({
+    batchIndex: codeBatchObjsBare.length + i + 1,
+    files: g.files,
+    mergeable: g.mergeable,
+  }));
+  const bareBatches = [...codeBatchObjsBare, ...nonCodeBatchObjsBare];
+  const mergedBareBatches = mergeSmallBatches(bareBatches);
+  const batchOf = buildBatchOfMap(mergedBareBatches);
+
+  // Build reverse import map: target → [sources that import target]
+  const reverseImportMap = new Map();
+  for (const [src, targets] of Object.entries(importMap)) {
+    for (const tgt of targets) {
+      if (!reverseImportMap.has(tgt)) reverseImportMap.set(tgt, []);
+      reverseImportMap.get(tgt).push(src);
+    }
+  }
+
+  // Compute neighbor degree (number of import relations) per path, used for
+  // truncation when neighborMap[file] has > MAX_NEIGHBORS entries.
+  const NEIGHBOR_DEGREE = new Map();
+  for (const f of codeFiles) {
+    const outDeg = (importMap[f.path] || []).length;
+    const inDeg = (reverseImportMap.get(f.path) || []).length;
+    NEIGHBOR_DEGREE.set(f.path, outDeg + inDeg);
+  }
+
+  const MAX_NEIGHBORS = 50;
+
+  // Second-pass: enrich each batch with batchImportData + neighborMap
+  const batches = mergedBareBatches.map(b => {
+    const batchPaths = new Set(b.files.map(f => f.path));
+    const batchImportData = {};
+    const neighborMap = {};
+    for (const f of b.files) {
+      batchImportData[f.path] = (importMap[f.path] || []).slice();
+
+      // 1-hop neighbors: imports out + imported-by in, excluding same batch.
+      // Note on truncation: we measure "popularity" by total raw 1-hop neighbor
+      // count (rawCount), not kept.length. A widely-imported hub like a logger
+      // module may have N>50 inbound imports but, after Louvain + size
+      // enforcement, only some land in other batches — kept.length can be < 50
+      // while the file is still a high-degree hub whose missing relationships
+      // matter for downstream cross-batch edge confidence. Warning on rawCount
+      // surfaces this; truncation on kept ensures the JSON stays bounded.
+      const outNeighbors = importMap[f.path] || [];
+      const inNeighbors = reverseImportMap.get(f.path) || [];
+      const all = new Set([...outNeighbors, ...inNeighbors]);
+      const rawCount = all.size;
+      const filtered = [...all].filter(p => batchOf.has(p) && !batchPaths.has(p));
+
+      let kept = filtered.map(p => ({
+        path: p,
+        batchIndex: batchOf.get(p),
+        symbols: exportsByPath.get(p) || [],
+      }));
+
+      if (rawCount > MAX_NEIGHBORS) {
+        kept.sort((a, b2) => (NEIGHBOR_DEGREE.get(b2.path) || 0)
+                            - (NEIGHBOR_DEGREE.get(a.path) || 0)
+                            || a.path.localeCompare(b2.path));  // deterministic tiebreak
+        const beforeSlice = kept.length;
+        kept = kept.slice(0, MAX_NEIGHBORS);
+        process.stderr.write(
+          `Warning: compute-batches: neighborMap for ${f.path} has high 1-hop degree ${rawCount} ` +
+          `— exceeds soft cap of ${MAX_NEIGHBORS} — keeping top ${kept.length} cross-batch entries ` +
+          `(${beforeSlice - kept.length} dropped by degree sort)\n`,
+        );
+      }
+
+      if (kept.length) neighborMap[f.path] = kept;
+    }
+    return { batchIndex: b.batchIndex, files: b.files, batchImportData, neighborMap };
+  });
+
+  let finalBatches = batches;
+  if (changedFiles) {
+    finalBatches = batches.filter(b => b.files.some(f => changedFiles.has(f.path)));
+    // batchIndex on filtered batches retains the full-graph assignment
+    // (the design says neighborMap should still reference unchanged files'
+    // full-graph batchIndex). No renumbering.
+  }
+
+  // Note: under --changed-files mode, totalFiles is the FULL project file
+  // count (unchanged from the input scan) while totalBatches reflects only
+  // the filtered set written to disk. batchIndex values on the kept batches
+  // preserve the full-graph assignment so neighborMap references resolve.
+  const output = {
+    schemaVersion: 1,
+    algorithm,
+    totalFiles: scan.files.length,
+    totalBatches: finalBatches.length,
+    exportsByPath: Object.fromEntries(exportsByPath),
+    batches: finalBatches,
+  };
+
+  const outPath = join(projectRoot, '.understand-anything', 'intermediate', 'batches.json');
+  writeFileSync(outPath, JSON.stringify(output, null, 2), 'utf-8');
+  const batchSizes = finalBatches.map(b => b.files.length);
+  const maxSize = batchSizes.length ? Math.max(...batchSizes) : 0;
+  const minSize = batchSizes.length ? Math.min(...batchSizes) : 0;
+  process.stderr.write(
+    `Wrote ${finalBatches.length} batches (sizes: max=${maxSize}, min=${minSize}) to ${outPath}\n`,
+  );
+}
+
+// ---------------------------------------------------------------------------
+// Run only when executed directly as a CLI; importing the module (e.g. from
+// tests) must not trigger main().
+//
+// Canonicalize both sides through realpathSync. Node ESM resolves
+// import.meta.url through symlinks but pathToFileURL(process.argv[1]) preserves
+// them, so a raw equality check silently no-ops when the script is invoked via
+// a symlinked plugin install path (the default in Claude Code / Copilot CLI
+// caches). See GitHub issue #162.
+// ---------------------------------------------------------------------------
+function isCliEntry() {
+  if (!process.argv[1]) return false;
+  try {
+    const modulePath = realpathSync(fileURLToPath(import.meta.url));
+    const argvPath = realpathSync(process.argv[1]);
+    return modulePath === argvPath;
+  } catch {
+    return false;
+  }
+}
+
+if (isCliEntry()) {
+  try {
+    await main();
+  } catch (err) {
+    process.stderr.write(`compute-batches.mjs failed: ${err.message}\n${err.stack}\n`);
+    process.exit(1);
+  }
+}
@@ -1023,11 +1023,74 @@ def main() -> None:
        print("Error: no batch-*.json files found in intermediate/", file=sys.stderr)
        sys.exit(1)

-    print(f"Found {len(batch_files)} batch files:", file=sys.stderr)
+    # Group by logical batch index so the report distinguishes single-batch
+    # files from multi-part file-analyzer outputs. Files that don't match the
+    # `batch-<N>.json` / `batch-<N>-part-<K>.json` pattern (e.g. fused
+    # `batch-fused-8-13.json`, range `batch-8-13.json`) would otherwise be
+    # silently dropped during load — flag them loudly instead so the user
+    # can fix the file-analyzer agent.
+    from collections import defaultdict as _dd
+    by_batch = _dd(list)
+    unrecognized_batch_files: list[str] = []
+    for f in batch_files:
+        m = re.match(r"batch-(\d+)(?:-part-(\d+))?\.json", f.name)
+        if m:
+            by_batch[int(m.group(1))].append((f.name, int(m.group(2)) if m.group(2) else None))
+        else:
+            unrecognized_batch_files.append(f.name)

-    # Load batches
+    if unrecognized_batch_files:
+        preview = ", ".join(unrecognized_batch_files[:5])
+        suffix = (
+            f" (+{len(unrecognized_batch_files) - 5} more)"
+            if len(unrecognized_batch_files) > 5
+            else ""
+        )
+        print(
+            f"Warning: merge-batch-graphs: {len(unrecognized_batch_files)} "
+            f"batch file(s) with unrecognized filenames will be DROPPED — "
+            f"files: {preview}{suffix} — fix the file-analyzer agent to use "
+            f"only batch-<N>.json or batch-<N>-part-<K>.json patterns",
+            file=sys.stderr,
+        )
+
+    logical_count = len(by_batch)
+    multi_part = sum(1 for entries in by_batch.values() if len(entries) > 1)
+    print(
+        f"Found {len(batch_files)} batch files "
+        f"({logical_count} logical batches, {multi_part} multi-part):",
+        file=sys.stderr,
+    )
+
+    # Missing-part detection: for any logical batch with parts (len > 1), the
+    # set of part numbers MUST be contiguous starting at 1. Gaps suggest a
+    # truncated write — emit a visible warning so the user can investigate.
+    # Collect into `missing_part_warnings` so they also surface in the final
+    # phase report; stderr alone gets buried under the per-batch load lines.
+    missing_part_warnings: list[str] = []
+    for idx, entries in by_batch.items():
+        part_nums = [p for (_n, p) in entries if p is not None]
+        if not part_nums:
+            continue
+        present = set(part_nums)
+        expected = set(range(1, max(part_nums) + 1))
+        missing = sorted(expected - present)
+        if missing:
+            msg = (
+                f"batch {idx} has parts {sorted(present)} but "
+                f"missing part {missing} — possible truncated write — "
+                f"affected nodes/edges may be lost"
+            )
+            print(f"Warning: merge: {msg}", file=sys.stderr)
+            missing_part_warnings.append(msg)
+
+    # Load batches — skip unrecognized filenames so they don't pollute the
+    # merged graph with content the agent labeled incorrectly.
+    unrecognized_set = set(unrecognized_batch_files)
    batches: list[dict[str, Any]] = []
    for f in batch_files:
+        if f.name in unrecognized_set:
+            continue
        batch = load_batch(f)
        if batch is not None:
            batches.append(batch)
@@ -1042,6 +1105,38 @@ def main() -> None:
    # Merge and normalize
    assembled, report = merge_and_normalize(batches)

+    # Surface missing multi-part files to the phase report (parallel to
+    # unrecognized-filename handling below). Stderr lines emitted during
+    # batch discovery get buried under per-batch load output — re-emitting
+    # via the report list ensures the Phase 4 review and final summary see
+    # the data-loss signal.
+    if missing_part_warnings:
+        report.append("")
+        report.append(
+            f"Warning: {len(missing_part_warnings)} batch(es) with missing parts "
+            f"— some nodes/edges silently dropped:"
+        )
+        for w in missing_part_warnings:
+            report.append(f"  - {w}")
+
+    # Surface unrecognized-filename drops to the phase report so the
+    # downstream review step sees them, not just stderr.
+    if unrecognized_batch_files:
+        preview = ", ".join(unrecognized_batch_files[:5])
+        suffix = (
+            f" (+{len(unrecognized_batch_files) - 5} more)"
+            if len(unrecognized_batch_files) > 5
+            else ""
+        )
+        report.append("")
+        report.append(
+            f"Warning: dropped {len(unrecognized_batch_files)} batch file(s) "
+            f"with unrecognized filenames — files: {preview}{suffix} — "
+            f"fix the file-analyzer agent to use only batch-<N>.json or "
+            f"batch-<N>-part-<K>.json patterns (every node/edge in these "
+            f"files was excluded from the final graph)"
+        )
+
    # Recover any imports edges file-analyzer batches dropped despite
    # `batchImportData` containing them. The project-scanner's importMap
    # is the deterministic source of truth.
@@ -0,0 +1,794 @@
+#!/usr/bin/env node
+/**
+ * scan-project.mjs
+ *
+ * Deterministic file enumeration + language/category detection for the
+ * project-scanner agent. Replaces the LLM-written prose scanner that used to
+ * (a) author a per-run Node.js script (`tmp/ua-project-scan.js`), (b) walk the
+ * file tree, and (c) classify each file via lookup tables in LLM context — a
+ * pure rule-lookup pass that was being billed at LLM rates and adding many
+ * minutes of per-run latency on mid-sized monorepos.
+ *
+ * What the LLM still owns (Step A of project-scanner.md Phase 1):
+ *   - Reading README + top-level manifests to synthesize `name`,
+ *     `rawDescription`, `readmeHead`, `frameworks`, and the high-level
+ *     `languages` narrative.
+ *
+ * What this script owns:
+ *   - File enumeration (git ls-files preferred, recursive walk fallback)
+ *   - `.understandignore` filtering (delegated to core's createIgnoreFilter)
+ *   - Per-file language detection (extension + filename table)
+ *   - Per-file category assignment (priority-ordered rules from
+ *     project-scanner.md Step 4)
+ *   - Line counting
+ *   - Complexity estimation (project-scanner.md Step 7 thresholds)
+ *
+ * Usage:
+ *   node scan-project.mjs <projectRoot> <outputPath>
+ *
+ * Output JSON (subset of what project-scanner.md Phase 1 expects — the LLM
+ * agent merges this with Step A's narrative fields and Step C's importMap to
+ * produce the final scan-result.json):
+ *   {
+ *     "scriptCompleted": true,
+ *     "files": [{ "path": "...", "language": "...", "sizeLines": N, "fileCategory": "..." }, ...],
+ *     "totalFiles": N,
+ *     "filteredByIgnore": M,
+ *     "estimatedComplexity": "small" | "moderate" | "large" | "very-large",
+ *     "stats": { "filesScanned": N, "byCategory": {...}, "byLanguage": {...} }
+ *   }
+ *
+ * Logging: stderr only (stdout reserved for piped tooling).
+ * Per-file resilience: read/stat failures emit
+ *   `Warning: scan-project: <path> — <reason> — file skipped from output`
+ * to stderr and the file is dropped; the rest of the scan completes.
+ *
+ * Determinism: files are sorted by `path.localeCompare` before emission, and
+ * the underlying enumeration is deterministic (git ls-files returns a stable
+ * order; the fallback walker sorts each directory's entries).
+ */
+
+import { createRequire } from 'node:module';
+import { dirname, resolve, join, basename, extname, relative, sep } from 'node:path';
+import { fileURLToPath, pathToFileURL } from 'node:url';
+import {
+  existsSync,
+  readFileSync,
+  readdirSync,
+  realpathSync,
+  statSync,
+  writeFileSync,
+} from 'node:fs';
+import { spawnSync } from 'node:child_process';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+// skills/understand/ -> plugin root is two dirs up
+const pluginRoot = resolve(__dirname, '../..');
+const require = createRequire(resolve(pluginRoot, 'package.json'));
+
+// ---------------------------------------------------------------------------
+// Resolve @understand-anything/core
+//
+// Two-step resolution: try the workspace-linked package first, fall back to
+// the installed plugin cache layout. pathToFileURL() is required on Windows
+// because dynamic import() of raw "C:\..." paths throws
+// ERR_UNSUPPORTED_ESM_URL_SCHEME (Node parses "C:" as a URL scheme).
+// ---------------------------------------------------------------------------
+let core;
+try {
+  core = await import(pathToFileURL(require.resolve('@understand-anything/core')).href);
+} catch {
+  core = await import(pathToFileURL(resolve(pluginRoot, 'packages/core/dist/index.js')).href);
+}
+
+const { createIgnoreFilter } = core;
+
+// ---------------------------------------------------------------------------
+// Language detection
+//
+// Mirrors the canonical extension list from
+// understand-anything-plugin/packages/core/src/languages/configs/* and the
+// project-scanner.md Step 3 table. Extensions are matched lowercase;
+// filenames (Dockerfile, Makefile, etc.) are matched case-sensitively because
+// the projects-in-the-wild use canonical capitalizations.
+//
+// Where the core configs and project-scanner.md diverge (rare), project-
+// scanner.md wins because it is the user-facing contract.
+// ---------------------------------------------------------------------------
+
+/**
+ * Extension -> language id. Lowercase keys; lookup is `.ext.toLowerCase()`.
+ * Includes the legacy Step-3 mapping (.cfg/.ini/.env -> `config`) — note
+ * that `config` is a language id here, not a category. Category routing
+ * for these extensions is handled separately in CATEGORY_BY_EXT.
+ */
+const LANGUAGE_BY_EXT = Object.freeze({
+  // TypeScript / JavaScript
+  '.ts': 'typescript',
+  '.tsx': 'typescript',
+  '.js': 'javascript',
+  '.jsx': 'javascript',
+  '.mjs': 'javascript',
+  '.cjs': 'javascript',
+  // Python
+  '.py': 'python',
+  '.pyi': 'python',
+  // Go / Rust / Java / Kotlin / C# / Swift / Lua
+  '.go': 'go',
+  '.rs': 'rust',
+  '.java': 'java',
+  '.kt': 'kotlin',
+  '.kts': 'kotlin',
+  '.cs': 'csharp',
+  '.swift': 'swift',
+  '.lua': 'lua',
+  // Ruby / PHP
+  '.rb': 'ruby',
+  '.rake': 'ruby',
+  '.php': 'php',
+  // C / C++
+  '.c': 'c',
+  '.h': 'c',
+  '.cpp': 'cpp',
+  '.cc': 'cpp',
+  '.cxx': 'cpp',
+  '.hpp': 'cpp',
+  '.hxx': 'cpp',
+  // Vue / Svelte (no tree-sitter extractor, but project-scanner contract
+  // lists them as code languages — downstream import map will return [])
+  '.vue': 'vue',
+  '.svelte': 'svelte',
+  // Shell / Batch / PowerShell
+  '.sh': 'shell',
+  '.bash': 'shell',
+  '.zsh': 'shell',
+  '.ps1': 'powershell',
+  '.psm1': 'powershell',
+  '.psd1': 'powershell',
+  '.bat': 'batch',
+  '.cmd': 'batch',
+  // Markup / docs
+  '.html': 'html',
+  '.htm': 'html',
+  '.css': 'css',
+  '.scss': 'css',
+  '.sass': 'css',
+  '.less': 'css',
+  '.md': 'markdown',
+  '.mdx': 'markdown',
+  '.rst': 'markdown',
+  // Config / data
+  '.yaml': 'yaml',
+  '.yml': 'yaml',
+  '.json': 'json',
+  '.jsonc': 'jsonc',
+  '.toml': 'toml',
+  '.xml': 'xml',
+  '.xsl': 'xml',
+  '.xsd': 'xml',
+  '.plist': 'xml',
+  '.cfg': 'config',
+  '.ini': 'config',
+  '.env': 'config',
+  // Data / schema
+  '.sql': 'sql',
+  '.graphql': 'graphql',
+  '.gql': 'graphql',
+  '.proto': 'protobuf',
+  '.prisma': 'prisma',
+  '.csv': 'csv',
+  '.tsv': 'csv',
+  // Infra
+  '.tf': 'terraform',
+  '.tfvars': 'terraform',
+  // JVM build files (categorized via filename-or-extension)
+  '.gradle': 'gradle',
+  // .NET project files (mapped to extension-derived ids; downstream
+  // treats them as config — see CATEGORY_BY_EXT)
+  '.csproj': 'csproj',
+  '.sln': 'sln',
+  '.properties': 'properties',
+  '.mod': 'mod',
+  '.sum': 'sum',
+});
+
+/**
+ * Filename (no extension) -> language id. Compared case-sensitively against
+ * basename(path). Includes the most common no-extension conventions; anything
+ * NOT in this table with no extension falls back to `unknown`.
+ *
+ * Dockerfile.* variants (Dockerfile.dev, Dockerfile.prod) are handled by a
+ * startsWith check in `detectLanguage()` so we don't have to enumerate every
+ * possible suffix.
+ */
+const LANGUAGE_BY_FILENAME = Object.freeze({
+  Dockerfile: 'dockerfile',
+  Makefile: 'makefile',
+  GNUmakefile: 'makefile',
+  makefile: 'makefile',
+  Jenkinsfile: 'jenkinsfile',
+  Procfile: 'procfile',
+  Vagrantfile: 'vagrantfile',
+});
+
+/**
+ * Detect the language of a file by its path. Lowercase extension lookup,
+ * then no-extension filename lookup. Never returns null — falls back to
+ * the lowercased extension (without dot) or 'unknown' if there is no
+ * extension. Downstream consumers rely on this field always being a string
+ * (see project-scanner.md Step 3 "Fallback" note).
+ */
+export function detectLanguage(filePath) {
+  const base = basename(filePath);
+  const ext = extname(filePath).toLowerCase();
+
+  // Dockerfile.dev, Dockerfile.prod, etc. — common variant form.
+  if (base === 'Dockerfile' || base.startsWith('Dockerfile.')) return 'dockerfile';
+
+  // Dotfile names like .env, .env.local — path.extname returns '' for
+  // single-segment dotfiles (e.g. '.env') and the SECOND segment for
+  // compound dotfiles (e.g. '.local' for '.env.local'). Neither hits the
+  // intended LANGUAGE_BY_EXT['.env'] mapping. Try the leading dotfile
+  // portion first so `.env`, `.env.local`, `.env.production` all map.
+  const dotKey = dotfileKey(base);
+  if (dotKey && LANGUAGE_BY_EXT[dotKey]) return LANGUAGE_BY_EXT[dotKey];
+
+  if (ext) {
+    const byExt = LANGUAGE_BY_EXT[ext];
+    if (byExt) return byExt;
+    // Unknown extension → drop the leading dot, lowercase. Never null.
+    return ext.slice(1);
+  }
+
+  // No-extension file — try filename table.
+  const byFilename = LANGUAGE_BY_FILENAME[base];
+  if (byFilename) return byFilename;
+
+  return 'unknown';
+}
+
+/**
+ * Extract the canonical dotfile "extension" from a basename, or null.
+ *
+ * `.env`          -> `.env`
+ * `.env.local`    -> `.env`
+ * `.bashrc`       -> `.bashrc`
+ * `package.json`  -> null (not a dotfile)
+ *
+ * Used by both detectLanguage and detectCategory so dotfile-style configs
+ * (e.g., `.env`, `.env.local`, `.env.production`) get their leading
+ * segment treated as the implicit extension instead of falling through
+ * to `unknown` / `code`.
+ */
+function dotfileKey(base) {
+  if (!base.startsWith('.')) return null;
+  const m = base.match(/^(\.[a-z0-9]+)/i);
+  return m ? m[1].toLowerCase() : null;
+}
+
+// ---------------------------------------------------------------------------
+// Category detection
+//
+// Implements the priority-ordered rules from project-scanner.md Step 4.
+// Order matters: more specific rules must run before more general ones
+// (e.g. `docker-compose.yml` is infra, not config).
+//
+// Categories: code | config | docs | infra | data | script | markup
+// ---------------------------------------------------------------------------
+
+/**
+ * Extension -> category. Used only after the higher-priority path-based
+ * checks (infra/docs exclusions) in `detectCategory()`. Plain extension
+ * lookup is intentionally last-resort — many configs need their full path
+ * inspected first.
+ */
+const CATEGORY_BY_EXT = Object.freeze({
+  // docs
+  '.md': 'docs',
+  '.mdx': 'docs',
+  '.rst': 'docs',
+  '.txt': 'docs',
+  '.text': 'docs',
+  // config
+  '.yaml': 'config',
+  '.yml': 'config',
+  '.json': 'config',
+  '.jsonc': 'config',
+  '.toml': 'config',
+  '.xml': 'config',
+  '.xsl': 'config',
+  '.xsd': 'config',
+  '.plist': 'config',
+  '.cfg': 'config',
+  '.ini': 'config',
+  '.env': 'config',
+  '.properties': 'config',
+  '.csproj': 'config',
+  '.sln': 'config',
+  '.mod': 'config',
+  '.sum': 'config',
+  '.gradle': 'config',
+  // infra
+  '.tf': 'infra',
+  '.tfvars': 'infra',
+  // data
+  '.sql': 'data',
+  '.graphql': 'data',
+  '.gql': 'data',
+  '.proto': 'data',
+  '.prisma': 'data',
+  '.csv': 'data',
+  '.tsv': 'data',
+  // script
+  '.sh': 'script',
+  '.bash': 'script',
+  '.zsh': 'script',
+  '.ps1': 'script',
+  '.psm1': 'script',
+  '.psd1': 'script',
+  '.bat': 'script',
+  '.cmd': 'script',
+  // markup
+  '.html': 'markup',
+  '.htm': 'markup',
+  '.css': 'markup',
+  '.scss': 'markup',
+  '.sass': 'markup',
+  '.less': 'markup',
+});
+
+/**
+ * Filenames (no extension or full filename with extension) that always
+ * map to `infra` regardless of their extension. Compared case-sensitively
+ * against basename(path).
+ */
+const INFRA_FILENAMES = new Set([
+  'Dockerfile',
+  '.dockerignore',
+  'Makefile',
+  'GNUmakefile',
+  'makefile',
+  'Jenkinsfile',
+  'Procfile',
+  'Vagrantfile',
+  '.gitlab-ci.yml',
+]);
+
+/**
+ * Detect the project-scanner category for a file. Priority order matches
+ * project-scanner.md Step 4 "Priority rule" — most specific wins.
+ *
+ * 1. LICENSE -> code (per the spec note "except LICENSE"). The Step-2
+ *    exclusion table normally removes LICENSE, but if a project chooses to
+ *    re-include it via `.understandignore` negation, it should NOT land in
+ *    docs. We classify as `code` rather than inventing a new bucket.
+ * 2. Filename-based infra (Dockerfile, Makefile, Jenkinsfile,
+ *    docker-compose.*, Vagrantfile, Procfile, .gitlab-ci.yml,
+ *    .dockerignore).
+ * 3. Path-based infra (.github/workflows/, .circleci/, k8s/, kubernetes/,
+ *    *.k8s.yml, *.k8s.yaml).
+ * 4. Extension-based mapping (CATEGORY_BY_EXT).
+ * 5. Fallback: `code` (matches the spec — "All other extensions").
+ */
+export function detectCategory(filePath) {
+  const base = basename(filePath);
+  const ext = extname(filePath).toLowerCase();
+  const posix = filePath.split(sep).join('/');
+
+  // Rule 1: LICENSE exception (project-scanner.md Step 4 table comment).
+  if (base === 'LICENSE') return 'code';
+
+  // Rule 2: infra by filename — Dockerfile + variants, Makefile,
+  // Jenkinsfile, docker-compose.*, Procfile, Vagrantfile, .gitlab-ci.yml,
+  // .dockerignore.
+  if (INFRA_FILENAMES.has(base)) return 'infra';
+  if (base === 'Dockerfile' || base.startsWith('Dockerfile.')) return 'infra';
+  if (base.startsWith('docker-compose.')) return 'infra';
+  if (base === 'compose.yml' || base === 'compose.yaml') return 'infra';
+
+  // Rule 3: infra by path.
+  if (posix.startsWith('.github/workflows/')) return 'infra';
+  if (posix.startsWith('.circleci/')) return 'infra';
+  // Match a `k8s/` or `kubernetes/` segment anywhere in the path.
+  if (/(^|\/)(k8s|kubernetes)\//.test(posix)) return 'infra';
+  // `*.k8s.yml` and `*.k8s.yaml` — Kubernetes-flavored YAML.
+  if (/\.k8s\.(ya?ml)$/i.test(base)) return 'infra';
+
+  // Rule 4: extension-based lookup.
+  if (ext) {
+    const byExt = CATEGORY_BY_EXT[ext];
+    if (byExt) return byExt;
+  }
+
+  // Rule 4.5: dotfile-style configs (.env, .env.local, .env.production).
+  // path.extname misses these — see dotfileKey docstring.
+  const dotKey = dotfileKey(base);
+  if (dotKey) {
+    const byDot = CATEGORY_BY_EXT[dotKey];
+    if (byDot) return byDot;
+  }
+
+  // Rule 5: filename-based config catch-all for no-extension config files
+  // commonly seen in JVM/Go/.NET projects (covered above for infra but not
+  // config). We don't enumerate every possible config filename here — that
+  // gets handled by the language map's no-extension entries upstream.
+  // Anything not matched falls through to `code`.
+  return 'code';
+}
+
+// ---------------------------------------------------------------------------
+// Complexity estimation (project-scanner.md Step 7)
+// ---------------------------------------------------------------------------
+
+/**
+ * Map a total file count to a complexity tier. Thresholds are inclusive on
+ * the lower bound:
+ *   - small:      1-30
+ *   - moderate:   31-150
+ *   - large:      151-500
+ *   - very-large: >500
+ *
+ * Edge case: 0 files maps to `small` (the lowest tier) so the field is
+ * always set even on empty repos. Downstream consumers treat 0 files as
+ * a sentinel for "nothing to analyze" via `totalFiles`, not complexity.
+ */
+export function estimateComplexity(totalFiles) {
+  if (totalFiles <= 30) return 'small';
+  if (totalFiles <= 150) return 'moderate';
+  if (totalFiles <= 500) return 'large';
+  return 'very-large';
+}
+
+// ---------------------------------------------------------------------------
+// File enumeration
+// ---------------------------------------------------------------------------
+
+/**
+ * Normalize a path to forward-slash POSIX. The project-scanner contract
+ * emits POSIX paths; we re-normalize so the output is stable across
+ * Windows/macOS/Linux.
+ */
+function toPosix(p) {
+  return p.split(sep).join('/');
+}
+
+/**
+ * Enumerate all files in `projectRoot` via `git ls-files`. Returns an
+ * array of project-relative POSIX paths, or null if the directory is not
+ * a git repository (or git is not installed). Caller falls back to the
+ * recursive walker.
+ *
+ * Why git ls-files first: it respects the repo's `.gitignore`, handles
+ * submodules sensibly, and gives a fast, deterministic listing. The walker
+ * is a strict superset of what git would emit (no .gitignore awareness),
+ * so the ignore filter has to do more work in the fallback path.
+ */
+function enumerateViaGit(projectRoot) {
+  const result = spawnSync('git', ['ls-files', '-co', '--exclude-standard'], {
+    cwd: projectRoot,
+    encoding: 'utf-8',
+    maxBuffer: 256 * 1024 * 1024, // 256MB — huge monorepos can produce >10MB of paths
+  });
+  if (result.status !== 0 || !result.stdout) return null;
+  // Each line is one path, project-relative, already POSIX on all platforms
+  // because git emits forward slashes regardless of OS.
+  return result.stdout
+    .split('\n')
+    .map(s => s.trim())
+    .filter(Boolean)
+    .map(toPosix);
+}
+
+/**
+ * Recursive directory walker — fallback when `git ls-files` is unavailable
+ * (no git, not a repo, or git refused). Skips hard-coded "obviously bad"
+ * directory names BEFORE invoking the ignore filter so we don't waste cycles
+ * descending into `node_modules/` etc. on huge trees.
+ *
+ * Yields project-relative POSIX paths in directory-sorted order so the
+ * output is deterministic without an extra sort pass.
+ */
+function enumerateViaWalk(projectRoot) {
+  // Hard skip — these directories are universally non-source and skipping
+  // at the walker level avoids materializing thousands of node_modules
+  // paths before the ignore filter drops them. The ignore filter still
+  // runs on everything else.
+  const HARD_SKIP_DIRS = new Set([
+    'node_modules',
+    '.git',
+    '.svn',
+    '.hg',
+    '__pycache__',
+  ]);
+
+  const out = [];
+
+  function walk(absDir) {
+    let entries;
+    try {
+      entries = readdirSync(absDir, { withFileTypes: true });
+    } catch (err) {
+      process.stderr.write(
+        `Warning: scan-project: ${toPosix(relative(projectRoot, absDir)) || '.'} ` +
+        `— directory read failed (${err.message}) — subtree skipped\n`,
+      );
+      return;
+    }
+    // Sort deterministically by name; mix files and dirs together so the
+    // final output (after the path sort) is identical regardless of
+    // OS-specific readdir order.
+    entries.sort((a, b) => a.name.localeCompare(b.name));
+    for (const ent of entries) {
+      if (ent.isDirectory()) {
+        if (HARD_SKIP_DIRS.has(ent.name)) continue;
+        walk(join(absDir, ent.name));
+      } else if (ent.isFile()) {
+        const rel = toPosix(relative(projectRoot, join(absDir, ent.name)));
+        if (rel) out.push(rel);
+      }
+      // Symlinks intentionally ignored — git ls-files doesn't follow them
+      // either, and following them is a classic recursion-bomb footgun.
+    }
+  }
+
+  walk(projectRoot);
+  return out;
+}
+
+/**
+ * Enumerate all candidate files in `projectRoot`. Tries git ls-files first;
+ * falls back to a recursive walk if git is unavailable or this is not a
+ * repo. Returns an array of project-relative POSIX paths in unspecified
+ * order — caller is responsible for sorting + filtering.
+ */
+function enumerateFiles(projectRoot) {
+  const fromGit = enumerateViaGit(projectRoot);
+  if (fromGit !== null) return fromGit;
+  process.stderr.write(
+    `scan-project: git ls-files unavailable — falling back to recursive walk\n`,
+  );
+  return enumerateViaWalk(projectRoot);
+}
+
+// ---------------------------------------------------------------------------
+// Filter accounting
+//
+// The project-scanner.md contract requires `filteredByIgnore` to count files
+// dropped *specifically* by user `.understandignore` patterns (the delta
+// beyond what the hardcoded defaults would have removed). We accomplish this
+// by building TWO filters:
+//   - `defaultOnly`: defaults only, no user patterns
+//   - `combined`: defaults + user patterns (createIgnoreFilter)
+// and counting paths that the combined filter excludes but the defaults-only
+// filter would have kept.
+//
+// Negation (`!pattern`) is correctly handled by the combined filter — a file
+// re-included via `!` won't be in the combined-excluded set, so it WON'T be
+// counted in filteredByIgnore (it's "kept", not "additionally filtered").
+// ---------------------------------------------------------------------------
+
+/**
+ * Build a defaults-only IgnoreFilter — same patterns as createIgnoreFilter
+ * would apply, minus any user .understandignore content. We synthesize this
+ * via a temp directory with no .understandignore files so the core function
+ * still drives the matcher. (Re-implementing the ignore-package wiring here
+ * would risk subtle behavior drift from core's matcher.)
+ */
+function buildDefaultsOnlyFilter() {
+  // Use the createIgnoreFilter with a path that we KNOW has no .understandignore.
+  // `os.tmpdir()`-based fresh dir guarantees no user patterns leak in.
+  // The directory doesn't need to exist on disk because createIgnoreFilter
+  // only checks existsSync() before reading.
+  const fakeProjectRoot = join(
+    require('node:os').tmpdir(),
+    `ua-scan-defaults-${process.pid}-${Date.now()}`,
+  );
+  return createIgnoreFilter(fakeProjectRoot);
+}
+
+/**
+ * Determine whether `projectRoot` has any user .understandignore files.
+ * When neither file exists, the combined and defaults-only filters are
+ * identical, so we can skip the dual-filter accounting entirely.
+ */
+function hasUserIgnoreFile(projectRoot) {
+  return (
+    existsSync(join(projectRoot, '.understandignore'))
+    || existsSync(join(projectRoot, '.understand-anything', '.understandignore'))
+  );
+}
+
+// ---------------------------------------------------------------------------
+// Line counting
+// ---------------------------------------------------------------------------
+
+/**
+ * Count newline-delimited lines in a file. Returns the number of `\n`
+ * characters; this matches `wc -l` semantics (which counts newlines, not
+ * "lines of content"). Files without a trailing newline therefore report
+ * one fewer than the visible line count — same behavior as wc.
+ *
+ * Per-file failure: emits a Warning: and returns null. Caller decides
+ * whether to drop the file or keep it with sizeLines=0.
+ */
+function countLines(absPath, posixPath) {
+  try {
+    const buf = readFileSync(absPath);
+    // Manual newline count beats split('\n').length on large files — no
+    // intermediate array allocation. We count the `\n` byte (0x0a) directly.
+    let count = 0;
+    for (let i = 0; i < buf.length; i++) {
+      if (buf[i] === 0x0a) count++;
+    }
+    return count;
+  } catch (err) {
+    process.stderr.write(
+      `Warning: scan-project: ${posixPath} — line count failed ` +
+      `(${err.message}) — file skipped from output\n`,
+    );
+    return null;
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Main
+// ---------------------------------------------------------------------------
+
+async function main() {
+  const [, , projectRoot, outputPath] = process.argv;
+  if (!projectRoot || !outputPath) {
+    process.stderr.write(
+      'Usage: node scan-project.mjs <projectRoot> <outputPath>\n',
+    );
+    process.exit(1);
+  }
+
+  if (!existsSync(projectRoot)) {
+    process.stderr.write(
+      `scan-project.mjs failed: projectRoot does not exist: ${projectRoot}\n`,
+    );
+    process.exit(1);
+  }
+  const projectRootStat = statSync(projectRoot);
+  if (!projectRootStat.isDirectory()) {
+    process.stderr.write(
+      `scan-project.mjs failed: projectRoot is not a directory: ${projectRoot}\n`,
+    );
+    process.exit(1);
+  }
+
+  // 1. Enumerate. Either git ls-files or recursive walk.
+  const candidates = enumerateFiles(projectRoot);
+
+  // 2. Filter via createIgnoreFilter (defaults + user .understandignore).
+  //    Build a defaults-only filter in parallel to count user-driven drops.
+  const combined = createIgnoreFilter(projectRoot);
+  const userIgnoresPresent = hasUserIgnoreFile(projectRoot);
+  const defaultsOnly = userIgnoresPresent ? buildDefaultsOnlyFilter() : combined;
+
+  let filteredByIgnore = 0;
+  const kept = [];
+  for (const rel of candidates) {
+    const isIgnoredCombined = combined.isIgnored(rel);
+    if (!isIgnoredCombined) {
+      kept.push(rel);
+      continue;
+    }
+    // Dropped by combined filter. If defaults-only would have ALSO dropped
+    // it, this is a baseline default drop — not counted. If defaults-only
+    // would have KEPT it, this drop is attributable to the user's
+    // .understandignore content.
+    if (userIgnoresPresent && !defaultsOnly.isIgnored(rel)) {
+      filteredByIgnore++;
+    }
+  }
+
+  // 3. Per-file: language + category + line count.
+  //    Drop files that fail line counting (per-file resilience).
+  const fileEntries = [];
+  for (const rel of kept) {
+    const absPath = join(projectRoot, rel);
+    // Stat first — git ls-files could include paths that vanished between
+    // listing and processing; the walker shouldn't but defensive anyway.
+    try {
+      const st = statSync(absPath);
+      if (!st.isFile()) {
+        // Symlinks-to-dir, special files, etc. — skip silently. Not a
+        // warning condition because git wouldn't have tracked it as a file.
+        continue;
+      }
+    } catch (err) {
+      process.stderr.write(
+        `Warning: scan-project: ${rel} — stat failed (${err.message}) ` +
+        `— file skipped from output\n`,
+      );
+      continue;
+    }
+    const sizeLines = countLines(absPath, rel);
+    if (sizeLines === null) {
+      // countLines already emitted the Warning: line.
+      continue;
+    }
+    fileEntries.push({
+      path: rel,
+      language: detectLanguage(rel),
+      sizeLines,
+      fileCategory: detectCategory(rel),
+    });
+  }
+
+  // 4. Determinism: sort by path.localeCompare.
+  fileEntries.sort((a, b) => a.path.localeCompare(b.path));
+
+  // 5. Stats.
+  const byCategory = {};
+  const byLanguage = {};
+  for (const f of fileEntries) {
+    byCategory[f.fileCategory] = (byCategory[f.fileCategory] || 0) + 1;
+    byLanguage[f.language] = (byLanguage[f.language] || 0) + 1;
+  }
+
+  const estimatedComplexity = estimateComplexity(fileEntries.length);
+
+  const output = {
+    scriptCompleted: true,
+    files: fileEntries,
+    totalFiles: fileEntries.length,
+    filteredByIgnore,
+    estimatedComplexity,
+    stats: {
+      filesScanned: fileEntries.length,
+      byCategory,
+      byLanguage,
+    },
+  };
+
+  writeFileSync(outputPath, JSON.stringify(output, null, 2), 'utf-8');
+
+  if (!existsSync(outputPath)) {
+    throw new Error(`output file missing after write: ${outputPath}`);
+  }
+
+  process.stderr.write(
+    `scan-project: filesScanned=${fileEntries.length} ` +
+    `filteredByIgnore=${filteredByIgnore} ` +
+    `complexity=${estimatedComplexity}\n`,
+  );
+}
+
+// ---------------------------------------------------------------------------
+// Run only when executed directly as a CLI; importing the module (e.g. from
+// tests) must not trigger main().
+//
+// Canonicalize both sides through realpathSync. Node ESM resolves
+// import.meta.url through symlinks but pathToFileURL(process.argv[1]) preserves
+// them, so a raw equality check silently no-ops when the script is invoked via
+// a symlinked plugin install path (the default in Claude Code / Copilot CLI
+// caches). See GitHub issue #162.
+// ---------------------------------------------------------------------------
+function isCliEntry() {
+  if (!process.argv[1]) return false;
+  try {
+    const modulePath = realpathSync(fileURLToPath(import.meta.url));
+    const argvPath = realpathSync(process.argv[1]);
+    return modulePath === argvPath;
+  } catch {
+    return false;
+  }
+}
+
+if (isCliEntry()) {
+  try {
+    await main();
+  } catch (err) {
+    process.stderr.write(`scan-project.mjs failed: ${err.message}\n${err.stack}\n`);
+    process.exit(1);
+  }
+}
+
+// Default export of helpers for testability.
+export default {
+  detectLanguage,
+  detectCategory,
+  estimateComplexity,
+};
@@ -0,0 +1,14 @@
+import { defineConfig } from 'vitest/config';
+
+// The plugin package no longer ships any test files — they were relocated
+// to the repo-root `tests/` tree so they no longer ride along with the
+// plugin marketplace bundle. This config exists solely to shadow the
+// repo-root vitest.config.ts (which would otherwise be inherited via
+// upward config discovery from this cwd) and explicitly resolve no tests.
+//
+// Run skill tests from the repo root with `pnpm test` instead.
+export default defineConfig({
+  test: {
+    include: [],
+  },
+});
@@ -0,0 +1,25 @@
+import { defineConfig } from 'vitest/config';
+
+// Single-config aggregation for the whole monorepo. Picks up:
+//   - tests/**                                          — relocated skill tests (out-of-plugin so they
+//                                                         do not ship via the marketplace bundle)
+//   - understand-anything-plugin/src/**                 — skill TS source tests
+//   - understand-anything-plugin/packages/dashboard/**  — dashboard utils tests
+//
+// The `@understand-anything/core` package owns its own vitest.config.ts and is
+// invoked separately via `pnpm --filter @understand-anything/core test`; its
+// files are excluded here to avoid double-counting.
+export default defineConfig({
+  test: {
+    include: [
+      'tests/**/*.test.{js,mjs,ts}',
+      'understand-anything-plugin/src/**/*.test.{js,mjs,ts}',
+      'understand-anything-plugin/packages/dashboard/**/*.test.{js,mjs,ts,tsx}',
+    ],
+    exclude: [
+      '**/node_modules/**',
+      '**/dist/**',
+      'understand-anything-plugin/packages/core/**',
+    ],
+  },
+});