mirror of
https://github.com/Egonex-AI/Understand-Anything.git
synced 2026-06-22 10:58:03 +08:00
fix(hooks/auto-update): make fingerprints merge unambiguous in Phase 3d
Fixes #152. Phase 3d step 3 instructed the LLM to "merge with existing fingerprints (keep unchanged files as-is)" but the prose was vague enough that the LLM-written script frequently wrote only the freshly re-analyzed batch entries to fingerprints.json, discarding every other file's fingerprint. The next auto-update saw N-batch_size files with no stored fingerprint → classified as STRUCTURAL → exceeded the 30-file threshold → FULL_UPDATE permanently, burning hundreds of thousands of tokens on every subsequent commit. Replace the four-bullet description with an explicit LOAD-PATCH-SAVE script template: 1. LOAD ALL existing entries from fingerprints.json (never skip). 2. PATCH or REMOVE each path in filesToReanalyze (inline deletion handling so the spec doesn't need a separate deletedFiles list). 3. GUARD: if the file existed and was non-empty but loaded as {}, abort the write — silent load failure would otherwise clobber every fingerprint. 4. SAVE the full dict back. The reporter's dry-run showed this restores 81/97 files to COSMETIC classification on their project (zero LLM tokens) instead of all 97 incorrectly forced into STRUCTURAL. Note: a related ordering bug exists in skills/understand/SKILL.md Phase 7 (meta.json written before fingerprints.json — silent failure in step 2.5 leaves stale fingerprints). That's a separate fix in a different file and is intentionally not bundled here.
This commit is contained in:
@@ -240,12 +240,54 @@ Perform lightweight validation (no graph-reviewer agent):
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update fingerprints:** Write and execute a Node.js script that:
|
||||
- Reads the existing `fingerprints.json`
|
||||
- For each re-analyzed file: computes new content hash and extracts structural elements via regex
|
||||
- For deleted files: removes their entries
|
||||
- Merges with existing fingerprints (keep unchanged files as-is)
|
||||
- Writes updated `fingerprints.json`
|
||||
3. **Update fingerprints (LOAD-PATCH-SAVE, not OVERWRITE).**
|
||||
|
||||
The most common failure mode here: writing only the freshly-computed batch entries to `fingerprints.json`, discarding every other file's fingerprint. The next auto-update then sees all those files as new (no stored fingerprint), classifies them as STRUCTURAL, and escalates to FULL_UPDATE permanently (issue #152). The script must LOAD ALL existing entries, PATCH only the re-analyzed ones, and SAVE the full dict back.
|
||||
|
||||
Write and execute a Node.js script in this exact ordering:
|
||||
|
||||
```javascript
|
||||
import { readFileSync, writeFileSync, existsSync } from 'node:fs';
|
||||
import { createHash } from 'node:crypto';
|
||||
import path from 'node:path';
|
||||
|
||||
const fpPath = path.join(PROJECT_ROOT, '.understand-anything', 'fingerprints.json');
|
||||
const existedAndNonEmpty = existsSync(fpPath) && readFileSync(fpPath, 'utf-8').trim().length > 0;
|
||||
|
||||
// 1. LOAD ALL existing entries (NEVER skip — preserves un-analyzed files)
|
||||
const all = existedAndNonEmpty
|
||||
? JSON.parse(readFileSync(fpPath, 'utf-8'))
|
||||
: {};
|
||||
const before = Object.keys(all).length;
|
||||
|
||||
// 2. PATCH (file still exists) or REMOVE (file deleted) for each re-analyzed path.
|
||||
// `filesToReanalyze` may include paths that were deleted in this commit —
|
||||
// handle both branches inline rather than expecting a separate deleted list.
|
||||
for (const filePath of filesToReanalyze) {
|
||||
const fullPath = path.join(PROJECT_ROOT, filePath);
|
||||
if (!existsSync(fullPath)) {
|
||||
delete all[filePath];
|
||||
continue;
|
||||
}
|
||||
const content = readFileSync(fullPath, 'utf-8');
|
||||
const contentHash = createHash('sha256').update(content).digest('hex');
|
||||
// Extract functions, classes, imports, exports via the same regex as Phase 1.
|
||||
all[filePath] = { contentHash, functions, classes, imports, exports };
|
||||
}
|
||||
|
||||
// 3. GUARD against silent load failure: if fingerprints.json existed and was
|
||||
// non-empty but `before` came out as 0, refuse to overwrite — something
|
||||
// went wrong reading the file and writing now would clobber every entry.
|
||||
if (existedAndNonEmpty && before === 0) {
|
||||
throw new Error('fingerprints.json existed and was non-empty but loaded as {} — refusing to overwrite');
|
||||
}
|
||||
|
||||
// 4. SAVE ALL entries back (full dict — not just the patched subset)
|
||||
writeFileSync(fpPath, JSON.stringify(all, null, 2));
|
||||
console.log(`Fingerprints: ${before} → ${Object.keys(all).length}`);
|
||||
```
|
||||
|
||||
The `existedAndNonEmpty && before === 0` guard catches the silent-load-failure case before it corrupts the store. If the count shrinks from N to a small number that matches the batch size, the LOAD step was skipped — abort the write rather than persist the wrong dict.
|
||||
|
||||
4. Clean up intermediate files:
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user