Add mixin_declaration handling to extractStructure, folding mixins into
classes[] (same convention as class_definition). The `on` constraint
sibling is intentionally ignored for graph purposes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add constructorName() helper and extend collectClassBody() to surface
unnamed constructors as "ClassName", named constructors as "Class.named",
and factory named constructors as "Class.named" in methods[]/functions[].
Probe confirmed plan's AST shapes match exactly; extractReturnType returns
undefined for all constructor forms (factory keyword is an unnamed node).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TDD tests and implement extractTopLevelFunction with helpers for
extracting function name, params, and return type (including generics
where the grammar emits type_identifier + type_arguments as siblings).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Empty extractor that satisfies the LanguageExtractor interface so the
plugin pipeline can load it. Real extraction logic lands in subsequent
TDD commits.
Adds the Dart language config and wires it into builtinLanguageConfigs
so .dart files are recognized by the language registry. References the
vendored @understand-anything/tree-sitter-dart-wasm package for grammar
loading.
No extractor yet — structural extraction lands in the next commit.
The upstream tree-sitter-dart@1.0.0 ships a pre-`dylink.0` wasm that
fails to load in web-tree-sitter@0.26.x. The grammar source itself is
sound — rebuilding with the current tree-sitter-cli + wasi-sdk produces
a working dylink.0 wasm. Vendor that artifact as a workspace-internal
package so @understand-anything/core can depend on it via workspace:*.
BUILD.md documents the provenance and rebuild instructions.
Thirteen-task TDD plan walking from vendoring the workspace wasm package
through scaffolding the extractor and adding extraction logic in
test-first slices: functions, classes, constructors, mixins, extensions,
enums, imports, exports, visibility, and call graph.
Every code block reflects AST shapes confirmed via a live probe against
a freshly-built tree-sitter-dart wasm in the project's own
web-tree-sitter at 0.26.x. No placeholder code, no "fill in later"
references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live verification during planning surfaced two facts that change the
shipping strategy:
1. tree-sitter-dart@1.0.0's prebuilt wasm uses the pre-`dylink.0` format
and fails to load in web-tree-sitter@0.26.x (the version this project
uses). Verified by directly loading the upstream wasm and catching
the failure in getDylinkMetadata.
2. The grammar source itself is sound — rebuilding with the current
tree-sitter-cli@0.26.x + wasi-sdk-29 toolchain produces a working
dylink.0-format wasm that parses every construct the extractor needs.
Revised packaging: ship the freshly-built wasm as a workspace-internal
package (@understand-anything/tree-sitter-dart-wasm) rather than
depending on the broken upstream npm artifact. No loader changes
required; existing TreeSitterPlugin resolves it the same way it
resolves other tree-sitter packages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the brainstormed design for landing deep Dart support at parity with
the recent Kotlin add (PR #347): LanguageConfig + tree-sitter WASM grammar
(tree-sitter-dart@1.0.0, verified ships a prebuilt .wasm in its tarball) +
DartExtractor + ~22 vitest cases. Six file changes, no edits to shared
schemas/registries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 7's `rm -rf` of the just-created `intermediate/` and `tmp/` dirs
trips destructive-action gates on hardened hosts (e.g. freshness-window
checks that flag deleting paths created moments earlier). Move them into
a timestamped `.trash-<epoch>/` instead; Phase 0 reclaims the space once
the trash is older than 7 days, well past any freshness window. Behavior
on normal hosts is unchanged — disk usage is identical after the next
run's purge.
Closes#301
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the regression flagged by ZebangCheng on #346: under the
parallelised `buildResolutionContext`, `loadTsConfigs` /
`loadGoModules` / `loadPhpAutoloads` ran concurrently but each wrote
warnings to stderr inline as it iterated read results, so a fixture
with both a malformed `tsconfig.json` and a malformed `composer.json`
could emit `composer, tsconfig` instead of the pre-PR `tsconfig,
composer` depending on I/O timing.
Each loader now buffers its warnings into a returned array and the
caller drains them in canonical order (tsconfig → go → php) after
`Promise.all`, restoring byte-identical stderr output. Added a
regression test that fixtures both malformed configs and asserts the
tsconfig warning precedes the composer warning in stderr.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When /understand runs with no --language flag and no stored outputLanguage,
step 3.6 now infers the conversation language and — only when it is non-English
— confirms once before generating, then persists the choice to config.json.
English conversations keep the exact same silent `en` path; --language flag and
stored config still take priority. README documents the behavior; version
bumped 2.7.5 -> 2.7.6 across all five manifests (user-visible behavior change).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflict in tests/skill/understand/test_extract_import_map.test.mjs
by keeping both new test groups — they cover independent fixes that should
coexist:
- upstream #214: tsconfig path-alias targets with leading "./"
- this PR #294: NodeNext .js → .ts rewrite for ESM TypeScript imports
The extract-import-map.mjs script auto-merged cleanly; both fixes are
already present in the merged source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes the silent near-edgeless-graph regression on any modern ESM
TypeScript project. Reported in #294 with full repro + root-cause
analysis.
### Why this matters
Under `moduleResolution: NodeNext` (or `Node16` / `Bundler` with
explicit extensions — the default for new TS-ESM projects since 2023),
TypeScript does NOT rewrite import specifiers during compilation:
// src/index.ts — real, idiomatic NodeNext source
import { x } from './config.js'; // on disk: config.ts
Before this fix, `probeWithExtensions` only tried APPENDING extensions
to the import specifier:
'./config.js' → not in fileSet
'./config.js.ts', './config.js.tsx', './config.js.js', ... → all miss
→ returns null → edge dropped at merge as dangling
Net result on the reporter's repro: a knowledge graph with hundreds of
file nodes and almost no `imports` edges between them — silently
removing exactly the dependency structure the graph is meant to show.
### Fix
New `NODENEXT_REWRITES` table maps each compiled-output extension to
the TypeScript source extensions that could have produced it:
.js → [.ts, .tsx, .js, .jsx]
.jsx → [.tsx, .jsx]
.mjs → [.mts, .mjs, .ts]
.cjs → [.cts, .cjs, .ts]
`probeWithExtensions` now applies the rewrite when the import already
ends with one of these extensions and no such file exists on disk. The
rewrite runs BEFORE the legacy append-extensions loop — otherwise
`./foo.js` would generate the nonsense candidate `foo.js.ts` and the
append loop would never reach the actual `foo.ts`.
### Disambiguation
If both `config.ts` and `config.js` exist on disk (rare, but possible
during a partial migration), `import './config.js'` still resolves to
the .js — that's an exact-disk match and what NodeNext compilation
actually does. The rewrite only kicks in when the .js doesn't exist.
### Tests
6 new tests in `test_extract_import_map.test.mjs`:
- The main #294 case (`.js → .ts`)
- `.jsx → .tsx` and `.mjs → .mts` rewrites
- Disambiguation when both `.ts` and `.js` exist on disk
- Pure-JS projects still work (real `.js → .js` imports)
- Historical no-extension probes unaffected
- Missing files still return null (rewrite can't invent targets)
Total: 202 tests passing (was 196).
Closes#294
Closes a cluster of community-profile gaps (#248, #249, #251, #252) in one
PR rather than four micro-PRs that all touch the same surface area.
### Templates (#251, #252)
- .github/ISSUE_TEMPLATE/bug_report.yml — required fields for repro
(plugin version, platform, OS, project language, file count); the four
pieces of context that are missing from ~every current bug report.
- .github/ISSUE_TEMPLATE/feature_request.yml — leads with the *problem*
rather than the proposed solution, which keeps maintainer review focused
on whether to solve, not just how.
- .github/ISSUE_TEMPLATE/question.yml — separate from bug to keep the
bug queue triagable.
- .github/ISSUE_TEMPLATE/config.yml — disables blank issues and routes
general discussion to README + Discussions.
- .github/PULL_REQUEST_TEMPLATE.md — includes the version-bump checklist
that CLAUDE.md says must stay in sync across 5 manifests; otherwise
every contributor learns this rule by getting their PR bounced.
### Community files
- CODE_OF_CONDUCT.md — short, project-specific document that names the
expectations and reporting path. Not a verbatim Contributor Covenant
to keep it readable.
- SECURITY.md — describes the project's local-only threat model
explicitly so reporters know what's in / out of scope before they
spend time on a writeup. Points at GitHub private vulnerability
reporting as the primary channel.
### CI (#249)
- ci.yml now also runs on pushes to main, not only PRs. Without this,
a direct push to main (which happens when maintainers merge a PR
branch locally) doesn't trigger CI, so a regression can land green-
looking and stay broken for days.
- Added a concurrency group that cancels stale runs for the same ref.
Saves runner minutes and keeps the per-ref status meaningful.
- Used `github.ref` (a controlled value), not user-controlled input,
so no script-injection surface.
### package.json (#248)
- Added description, license, repository, bugs, homepage, keywords —
the standard set for npm package discoverability and so GitHub's
community-profile check shows the project at 100%.
Wires Kotlin into the existing tree-sitter pipeline so .kt and .kts
files now produce functions, classes, data classes, sealed classes,
interfaces, objects, imports, exports, and call-graph edges — matching
the behavior of the other language extractors.
## Why @tree-sitter-grammars/tree-sitter-kotlin
The standard `tree-sitter-kotlin` (v0.3.8) ships only native bindings.
The new `@tree-sitter-grammars/tree-sitter-kotlin@1.1.0` ships a
prebuilt `.wasm` (loads cleanly with `web-tree-sitter@^0.26.6`,
nodeTypeCount=289, parses class_declaration / function_declaration as
expected). Same shape that PR1 used for Swift, just a different
publisher because the repomix WASM bundle does not include Kotlin.
`@tree-sitter-grammars` is the official tree-sitter org's GitHub
account, so this is the canonical upstream WASM source for Kotlin.
## Notes for reviewers
- `kotlinConfig` already existed as a stub (no `treeSitter` field), so
Android / JVM / Gradle codebases currently produce no structural
edges between `.kt` files. This PR adds the `treeSitter` field; the
existing plugin loader picks it up unchanged.
- **Visibility rule differs from Swift**: Kotlin's default visibility
is `public`, so the extractor treats *every* declaration with no
modifier as exported. Only an explicit `private` opts out. `internal`
and `protected` remain exported in the project-graph sense because
they are still resolvable from other files (within the module / via
inheritance).
- `class_declaration` in tree-sitter-kotlin is overloaded for class,
data class, sealed class, and interface (distinguished by the keyword
child and `modifiers > class_modifier`). The extractor handles all
four uniformly.
- `object_declaration` is a separate node type (Kotlin singletons) —
treated as a class-like entry with its own `name` and members.
- Primary-constructor parameters marked `val` / `var` are surfaced as
class properties; plain `parameter`s without `val/var` are
constructor-only and are NOT counted as properties (matching Kotlin
semantics).
- Import handling distinguishes the three forms: plain dotted
(`import a.b.C`), wildcard (`import a.b.*` → specifier `"*"`), and
aliased (`import a.b.C as Foo` → specifier `"Foo"`).
## Verification
- `pnpm lint` clean
- `pnpm --filter @understand-anything/core build` clean
- `pnpm --filter @understand-anything/skill build` clean
- `pnpm --filter @understand-anything/core test`: **692/692** (+22 new
Kotlin tests, matching the bar set by go-extractor.test.ts /
swift-extractor.test.ts)
- `pnpm test`: 196/196 (no regressions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /understand pipeline reads every code file twice during analysis:
once in compute-batches (`extractExports` for the cross-batch neighbour
map) and once again in extract-import-map (per-language config loaders).
Both sites used sequential `readFileSync` loops, so on the iOS repo in
issue #226 (~15k files) the disk-read time was effectively serialised
behind a single libuv thread while the rest of the pool sat idle.
## Changes
- `extractExports` now batches files into `IO_PARALLELISM = 64` slices
and issues all `readFile` calls in each slice through `Promise.all`,
letting libuv's worker-thread pool overlap disk reads. The
tree-sitter parse stays on the main thread because `web-tree-sitter`
is single-threaded WASM — pipelining the I/O while parses run is
where the wall-time savings come from.
- `loadTsConfigs`, `loadGoModules`, `loadPhpAutoloads` and
`buildResolutionContext` switch to async / `Promise.all` for the
same reason. `buildResolutionContext` also runs the three loader
passes concurrently (`Promise.all([...])`) since they're independent.
- A small `readFilesParallel(paths)` helper is added at the top of
`extract-import-map.mjs` so the three loaders share the same
error-preserving shape.
## Why behavior stays identical
- Each loader collects its candidate paths in `files[]` order *before*
issuing reads, then iterates `reads` in the same order to emit
warnings + populate output maps. So stderr order and the final map
contents are byte-identical to the previous sequential loops.
- `extractExports` collects per-file errors in-place in the
`Promise.all` callbacks and emits warnings during the post-read
serial loop, again in chunk order — so warning text and order match
the previous implementation.
- Tree-sitter parsing is unchanged: parses still run serially on the
main thread, just with reads pipelined alongside.
## What's NOT in this PR
- `buildFingerprintStore` and `analyzeChanges` in `core/fingerprint.ts`
have the same sequential pattern. They're left alone here because
they're part of the public `@understand-anything/core` API; making
them async would be a breaking change worth its own discussion.
Internal-only `.mjs` scripts are safe to refactor without API churn.
- No change to scan-project: most of its sync I/O is `statSync`
(metadata, not content) plus a handful of small `.gitignore` /
`.understandignore` reads. The parallelism win is marginal there.
## Verification
- `pnpm lint` clean
- `pnpm --filter @understand-anything/core build` clean
- `pnpm --filter @understand-anything/skill build` clean
- `pnpm test`: 196/196 — including
`test_compute_batches.test.mjs` (19 tests) and
`test_extract_import_map.test.mjs` (40 tests), which exercise both
changed pipelines end-to-end with fixture projects. No output
diff vs main.
Refs #76
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`enumerateViaGit` ran `git ls-files -co --exclude-standard` (newline-separated
output) and then `split('\n').map(trim)` on the result. Without `-z`,
`git ls-files` C-escapes any byte outside the locale's "safe" set and wraps
the path in double quotes — for example, a directory named `30. 🏗️ docs/`
comes back as `"30. \360\237\217\227\357\270\217 docs/"`. Downstream
consumers then can't round-trip those octal-quoted strings to real disk
paths, so every file under such directories is silently dropped from the
scan.
This is particularly biting on Windows (where the issue surfaces even with
UTF-8 locale settings) and for any project that uses emoji, accented
characters, or CJK codepoints in directory names — which is increasingly
common in design/spec/journal trees.
The fix is to use `-z` (NUL-terminated output), the same approach git
itself documents for downstream consumers (e.g. `xargs -0`). NUL-separated
chunks are raw bytes, so every codepoint round-trips back to its real disk
path on every platform. Split on `\0` instead of `\n`; drop the now-
unnecessary `.trim()`.
Verified on a real project with emoji-prefixed directory names:
bare `git ls-files`:
"30. \360\237\217\227\357\270\217\360\237\247\231\342\200\215..."
`git ls-files -z`:
30. 🏗️🧙♂️🔮 BD-CCSP/01. Demo's/DEMO--...
Discovered during a multi-agent scan of an Atlas Intelligence spoke repo;
~33 design-intent files in `30. 🏗️ BD-{app}/` directories were silently
dropped per scan. Full report: atlas-intelligence-io/fleet-feedback#491.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #199 documented the community-reported Cursor Settings → Plugins
workaround in README.md only; the seven READMEs/* locale files still
said auto-discovery always works. Mirror that fallback here so non-English
readers hit the same fix without hunting issue #172.
Several users have reported that cloning the repo and opening it in Cursor
doesn't always trigger auto-discovery of the .cursor-plugin manifest. Add
the community-reported workaround (Cursor Settings → Plugins, paste repo
URL) to the Cursor section so it isn't only discoverable from issue
threads.
Closes#172
`model: inherit` is a Claude Code-specific keyword that means "use the
parent session's model." Other tools that read the same agent frontmatter
(opencode, codex, etc.) don't understand it and instead try to use
`inherit` as a literal model id, which the configured provider rejects.
Reproduction (from #167): opencode + deepseek runs `/understand`, the
project-scanner subagent dispatches with `model: inherit`, deepseek
returns `ProviderModelNotFoundError`, and the pipeline halts on every
subagent dispatch.
With the field omitted, each platform falls back to its own configured
default:
- Claude Code: user's default subagent model
- opencode / codex / etc.: globally configured model
Note for Claude Code Opus users: subagents will no longer auto-inherit
the Opus session model. If you want the previous behavior, set your
default subagent model globally — that single setting now controls all
nine agents.
Closes#167
Several users have reported that cloning the repo and opening it in Cursor
doesn't always trigger auto-discovery of the .cursor-plugin manifest. Add
the community-reported workaround (Cursor Settings → Plugins, paste repo
URL) to the Cursor section so it isn't only discoverable from issue
threads.
Closes#172
Add to all 8 READMEs (English + 7 translations):
- "Keep learning" section gets inline commands for incremental re-runs, the --auto-update post-commit hook, and scoping /understand to a subdirectory for huge monorepos
- "Under the Hood" gets a new "Tree-sitter + LLM hybrid" subsection explaining the deterministic-vs-semantic split that powers the pipeline
Adds phase status lines, batch progress with total count, and phase
completion confirmations to the skill definition. Users now see
[Phase N/7] headers and Batch X/N during analysis instead of
unnumbered batch lines with no context.
Fixes#182