Files
Understand-Anything/.copilot-plugin
T
Lum1104 c49c46d974 fix(pipeline): close 12 sources of silent data loss in graph extraction
A deep audit of the project-scanner โ†’ file-analyzer โ†’ merge pipeline
turned up a wide range of silent data-loss bugs. Each one alone is
small; together they were producing graphs with very few import edges,
missing sub-file nodes for non-code formats, and inconsistent metrics.

Root-cause fixes (high impact):

- project-scanner.md: extend import-pattern table to resolve absolute
  imports for Python (`from a.b.c import x`), TS/JS (tsconfig.json
  paths/baseUrl aliases), Java/Kotlin (`com.foo.Bar` โ†” file paths),
  Ruby (`require 'foo/bar'` load-path), PHP (composer PSR-4 namespaces),
  and C/C++ (`#include` headers). Was relative-only, which produced
  empty importMap entries for the majority of real projects.
- project-scanner.md: add `.ps1`, `.bat`, `.cmd`, `.jsonc` to language
  table; require non-null `language` field with an explicit fallback.
- file-analyzer.md: document `sections`, `definitions`, `services`,
  `endpoints`, `steps`, `resources` in the extraction-output schema and
  spell out the sub-file node-creation rules per category. Was missing,
  so per-table / endpoint / resource nodes were never created from
  SQL / OpenAPI / Terraform / K8s / Dockerfile parser output.
- file-analyzer.md: add explicit source-reading fallback rules for
  PowerShell, Batch, Bash, Swift, Kotlin (no tree-sitter coverage).
- yaml-parser: declare `kubernetes`, `docker-compose`, `github-actions`,
  `openapi` languages so files the language-registry tags with those
  ids actually get section extraction. Recognize quoted top-level keys
  (e.g. `"on":` in GitHub Actions). Emit one section per entry for
  array-root YAML documents.
- json-parser: declare `json-schema`, `openapi`; add `stripJsoncSyntax`
  helper that removes line / block comments and trailing commas before
  parse so `.jsonc` files (wrangler, tsconfig with comments) parse cleanly.
- shell-parser: declare `jenkinsfile`. Tighten function-detection regex
  to require a reachable `{` brace so `name() echo hi` and patterns
  appearing inside heredocs are no longer false-positives.
- markdown-parser: track fenced-code-block state and skip headings
  inside ``` / ~~~ blocks (`# install` shell comments were being
  emitted as level-1 sections).
- merge-batch-graphs.py: add `article`, `entity`, `topic`, `claim`,
  `source` to VALID_NODE_PREFIXES and TYPE_TO_PREFIX so knowledge-base
  node types stop being flagged unknown / coerced to `file:`. Add
  `direction` to the edge dedup key so `forward` and `bidirectional`
  variants of the same (src, tgt, type) don't overwrite each other.
  Use a placeholder in bare-id fallback when `filePath` is missing on
  function/class nodes so unrelated `parse()` functions don't merge.
- typescript-extractor: actually compute `isDefault` for default
  exports (was always emitted as `false` from buildResult).
- extract-structure.mjs: match `wc -l` semantics for `totalLines` so
  the scanner's `sizeLines` and the extractor's `totalLines` agree on
  POSIX text files. Filter the parser-imports fallback to relative-only
  so `importCount` semantics stay *internal-import* whether the scanner
  resolved them or not. Drop unused `isCode` local.

Tests: +19 cases covering JSONC parsing, markdown fenced-code skip,
YAML quoted-keys / array-root, shell function false-positives,
extract-structure import fallback semantics + totalLines off-by-one.
764 passing (was 745).

Bumps version to 2.6.2 across the five tracked manifests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c49c46d974 ยท 2026-05-07 21:44:24 +08:00
History
..