fix(hooks/auto-update): make fingerprints merge unambiguous in Phase 3d

Fixes #152. Phase 3d step 3 instructed the LLM to "merge with existing
fingerprints (keep unchanged files as-is)" but the prose was vague
enough that the LLM-written script frequently wrote only the freshly
re-analyzed batch entries to fingerprints.json, discarding every other
file's fingerprint. The next auto-update saw N-batch_size files with
no stored fingerprint → classified as STRUCTURAL → exceeded the 30-file
threshold → FULL_UPDATE permanently, burning hundreds of thousands of
tokens on every subsequent commit.

Replace the four-bullet description with an explicit LOAD-PATCH-SAVE
script template:

  1. LOAD ALL existing entries from fingerprints.json (never skip).
  2. PATCH or REMOVE each path in filesToReanalyze (inline deletion
     handling so the spec doesn't need a separate deletedFiles list).
  3. GUARD: if the file existed and was non-empty but loaded as {},
     abort the write — silent load failure would otherwise clobber
     every fingerprint.
  4. SAVE the full dict back.

The reporter's dry-run showed this restores 81/97 files to COSMETIC
classification on their project (zero LLM tokens) instead of all 97
incorrectly forced into STRUCTURAL.

Note: a related ordering bug exists in skills/understand/SKILL.md
Phase 7 (meta.json written before fingerprints.json — silent failure
in step 2.5 leaves stale fingerprints). That's a separate fix in a
different file and is intentionally not bundled here.
This commit is contained in:
Lum1104
2026-05-18 09:58:45 +08:00
Unverified
parent 5304ff06f3
commit dd8b724c99
@@ -240,12 +240,54 @@ Perform lightweight validation (no graph-reviewer agent):
}
```
3. **Update fingerprints:** Write and execute a Node.js script that:
- Reads the existing `fingerprints.json`
- For each re-analyzed file: computes new content hash and extracts structural elements via regex
- For deleted files: removes their entries
- Merges with existing fingerprints (keep unchanged files as-is)
- Writes updated `fingerprints.json`
3. **Update fingerprints (LOAD-PATCH-SAVE, not OVERWRITE).**
The most common failure mode here: writing only the freshly-computed batch entries to `fingerprints.json`, discarding every other file's fingerprint. The next auto-update then sees all those files as new (no stored fingerprint), classifies them as STRUCTURAL, and escalates to FULL_UPDATE permanently (issue #152). The script must LOAD ALL existing entries, PATCH only the re-analyzed ones, and SAVE the full dict back.
Write and execute a Node.js script in this exact ordering:
```javascript
import { readFileSync, writeFileSync, existsSync } from 'node:fs';
import { createHash } from 'node:crypto';
import path from 'node:path';
const fpPath = path.join(PROJECT_ROOT, '.understand-anything', 'fingerprints.json');
const existedAndNonEmpty = existsSync(fpPath) && readFileSync(fpPath, 'utf-8').trim().length > 0;
// 1. LOAD ALL existing entries (NEVER skip — preserves un-analyzed files)
const all = existedAndNonEmpty
? JSON.parse(readFileSync(fpPath, 'utf-8'))
: {};
const before = Object.keys(all).length;
// 2. PATCH (file still exists) or REMOVE (file deleted) for each re-analyzed path.
// `filesToReanalyze` may include paths that were deleted in this commit —
// handle both branches inline rather than expecting a separate deleted list.
for (const filePath of filesToReanalyze) {
const fullPath = path.join(PROJECT_ROOT, filePath);
if (!existsSync(fullPath)) {
delete all[filePath];
continue;
}
const content = readFileSync(fullPath, 'utf-8');
const contentHash = createHash('sha256').update(content).digest('hex');
// Extract functions, classes, imports, exports via the same regex as Phase 1.
all[filePath] = { contentHash, functions, classes, imports, exports };
}
// 3. GUARD against silent load failure: if fingerprints.json existed and was
// non-empty but `before` came out as 0, refuse to overwrite — something
// went wrong reading the file and writing now would clobber every entry.
if (existedAndNonEmpty && before === 0) {
throw new Error('fingerprints.json existed and was non-empty but loaded as {} — refusing to overwrite');
}
// 4. SAVE ALL entries back (full dict — not just the patched subset)
writeFileSync(fpPath, JSON.stringify(all, null, 2));
console.log(`Fingerprints: ${before} → ${Object.keys(all).length}`);
```
The `existedAndNonEmpty && before === 0` guard catches the silent-load-failure case before it corrupts the store. If the count shrinks from N to a small number that matches the batch size, the LOAD step was skipped — abort the write rather than persist the wrong dict.
4. Clean up intermediate files:
```bash