Commit Graph

580 Commits

  • feat(core): DartExtractor — mixin declarations
    Add mixin_declaration handling to extractStructure, folding mixins into
    classes[] (same convention as class_definition). The `on` constraint
    sibling is intentionally ignored for graph purposes.
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
  • feat(core): DartExtractor — constructor naming (default/named/factory)
    Add constructorName() helper and extend collectClassBody() to surface
    unnamed constructors as "ClassName", named constructors as "Class.named",
    and factory named constructors as "Class.named" in methods[]/functions[].
    Probe confirmed plan's AST shapes match exactly; extractReturnType returns
    undefined for all constructor forms (factory keyword is an unnamed node).
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
  • feat(core): DartExtractor — top-level function extraction
    Add TDD tests and implement extractTopLevelFunction with helpers for
    extracting function name, params, and return type (including generics
    where the grammar emits type_identifier + type_arguments as siblings).
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
  • feat(core): scaffold DartExtractor + register in builtinExtractors
    Empty extractor that satisfies the LanguageExtractor interface so the
    plugin pipeline can load it. Real extraction logic lands in subsequent
    TDD commits.
  • feat(core): register dart LanguageConfig
    Adds the Dart language config and wires it into builtinLanguageConfigs
    so .dart files are recognized by the language registry. References the
    vendored @understand-anything/tree-sitter-dart-wasm package for grammar
    loading.
    
    No extractor yet — structural extraction lands in the next commit.
  • feat(tree-sitter-dart-wasm): vendor freshly-built dart WASM grammar
    The upstream tree-sitter-dart@1.0.0 ships a pre-`dylink.0` wasm that
    fails to load in web-tree-sitter@0.26.x. The grammar source itself is
    sound — rebuilding with the current tree-sitter-cli + wasi-sdk produces
    a working dylink.0 wasm. Vendor that artifact as a workspace-internal
    package so @understand-anything/core can depend on it via workspace:*.
    
    BUILD.md documents the provenance and rebuild instructions.
  • docs: implementation plan for Dart language support
    Thirteen-task TDD plan walking from vendoring the workspace wasm package
    through scaffolding the extractor and adding extraction logic in
    test-first slices: functions, classes, constructors, mixins, extensions,
    enums, imports, exports, visibility, and call graph.
    
    Every code block reflects AST shapes confirmed via a live probe against
    a freshly-built tree-sitter-dart wasm in the project's own
    web-tree-sitter at 0.26.x. No placeholder code, no "fill in later"
    references.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • docs: revise Dart spec for workspace-vendored wasm
    Live verification during planning surfaced two facts that change the
    shipping strategy:
    
    1. tree-sitter-dart@1.0.0's prebuilt wasm uses the pre-`dylink.0` format
       and fails to load in web-tree-sitter@0.26.x (the version this project
       uses). Verified by directly loading the upstream wasm and catching
       the failure in getDylinkMetadata.
    
    2. The grammar source itself is sound — rebuilding with the current
       tree-sitter-cli@0.26.x + wasi-sdk-29 toolchain produces a working
       dylink.0-format wasm that parses every construct the extractor needs.
    
    Revised packaging: ship the freshly-built wasm as a workspace-internal
    package (@understand-anything/tree-sitter-dart-wasm) rather than
    depending on the broken upstream npm artifact. No loader changes
    required; existing TreeSitterPlugin resolves it the same way it
    resolves other tree-sitter packages.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • docs: design spec for Dart language support
    Adds the brainstormed design for landing deep Dart support at parity with
    the recent Kotlin add (PR #347): LanguageConfig + tree-sitter WASM grammar
    (tree-sitter-dart@1.0.0, verified ships a prebuilt .wasm in its tarball) +
    DartExtractor + ~22 vitest cases. Six file changes, no edits to shared
    schemas/registries.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • Merge pull request #419 from chengyongru/feat/nanobot-support
    feat(install): add Nanobot platform support
  • Merge pull request #347 from tirth8205/feat/kotlin-extractor
    feat(core): add Kotlin structural analysis via tree-sitter
  • Merge pull request #359 from tirth8205/fix/import-resolver-nodenext-rewrite
    fix(extract-import-map): apply NodeNext .js→.ts rewrite (#294)
  • Merge pull request #387 from tirth8205/fix/phase7-cleanup-mv-trash
    fix(skill): use mv-to-trash + delayed purge for Phase 7 cleanup (#301)
  • Merge pull request #346 from tirth8205/perf/understand-pipeline
    perf(understand): parallelise file I/O in compute-batches + extract-import-map (#76)
  • fix(skill): use mv-to-trash + delayed purge for Phase 7 cleanup (#301)
    Phase 7's `rm -rf` of the just-created `intermediate/` and `tmp/` dirs
    trips destructive-action gates on hardened hosts (e.g. freshness-window
    checks that flag deleting paths created moments earlier). Move them into
    a timestamped `.trash-<epoch>/` instead; Phase 0 reclaims the space once
    the trash is older than 7 days, well past any freshness window. Behavior
    on normal hosts is unchanged — disk usage is identical after the next
    run's purge.
    
    Closes #301
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • fix(extract-import-map): preserve deterministic stderr order across concurrent loaders
    Addresses the regression flagged by ZebangCheng on #346: under the
    parallelised `buildResolutionContext`, `loadTsConfigs` /
    `loadGoModules` / `loadPhpAutoloads` ran concurrently but each wrote
    warnings to stderr inline as it iterated read results, so a fixture
    with both a malformed `tsconfig.json` and a malformed `composer.json`
    could emit `composer, tsconfig` instead of the pre-PR `tsconfig,
    composer` depending on I/O timing.
    
    Each loader now buffers its warnings into a returned array and the
    caller drains them in canonical order (tsconfig → go → php) after
    `Promise.all`, restoring byte-identical stderr output. Added a
    regression test that fixtures both malformed configs and asserts the
    tsconfig warning precedes the composer warning in stderr.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • Merge pull request #378 from BozhengLong/feat/understand-language-auto-detection
    feat(understand): auto-detect conversation language on first run
  • feat(understand): auto-detect conversation language on first run
    When /understand runs with no --language flag and no stored outputLanguage,
    step 3.6 now infers the conversation language and — only when it is non-English
    — confirms once before generating, then persists the choice to config.json.
    English conversations keep the exact same silent `en` path; --language flag and
    stored config still take priority. README documents the behavior; version
    bumped 2.7.5 -> 2.7.6 across all five manifests (user-visible behavior change).
    
    Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
  • Merge pull request #350 from tirth8205/chore/repo-hygiene
    chore(repo): community templates, CoC, SECURITY, package metadata, CI on main (#248, #249, #251, #252).
  • Merge pull request #332 from ZebangCheng/fix/phase7-preserve-scan-result
    fix(skill): preserve scan-result.json across Phase 7 cleanup for incremental runs (#293)
  • Merge branch 'main' into fix/import-resolver-nodenext-rewrite
    Resolve conflict in tests/skill/understand/test_extract_import_map.test.mjs
    by keeping both new test groups — they cover independent fixes that should
    coexist:
      - upstream #214: tsconfig path-alias targets with leading "./"
      - this PR  #294: NodeNext .js → .ts rewrite for ESM TypeScript imports
    
    The extract-import-map.mjs script auto-merged cleanly; both fixes are
    already present in the merged source.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • fix(extract-import-map): apply NodeNext .js→.ts rewrite (#294)
    Fixes the silent near-edgeless-graph regression on any modern ESM
    TypeScript project. Reported in #294 with full repro + root-cause
    analysis.
    
    ### Why this matters
    
    Under `moduleResolution: NodeNext` (or `Node16` / `Bundler` with
    explicit extensions — the default for new TS-ESM projects since 2023),
    TypeScript does NOT rewrite import specifiers during compilation:
    
      // src/index.ts — real, idiomatic NodeNext source
      import { x } from './config.js';   // on disk: config.ts
    
    Before this fix, `probeWithExtensions` only tried APPENDING extensions
    to the import specifier:
    
      './config.js' → not in fileSet
      './config.js.ts', './config.js.tsx', './config.js.js', ... → all miss
      → returns null → edge dropped at merge as dangling
    
    Net result on the reporter's repro: a knowledge graph with hundreds of
    file nodes and almost no `imports` edges between them — silently
    removing exactly the dependency structure the graph is meant to show.
    
    ### Fix
    
    New `NODENEXT_REWRITES` table maps each compiled-output extension to
    the TypeScript source extensions that could have produced it:
    
      .js   → [.ts, .tsx, .js, .jsx]
      .jsx  → [.tsx, .jsx]
      .mjs  → [.mts, .mjs, .ts]
      .cjs  → [.cts, .cjs, .ts]
    
    `probeWithExtensions` now applies the rewrite when the import already
    ends with one of these extensions and no such file exists on disk. The
    rewrite runs BEFORE the legacy append-extensions loop — otherwise
    `./foo.js` would generate the nonsense candidate `foo.js.ts` and the
    append loop would never reach the actual `foo.ts`.
    
    ### Disambiguation
    
    If both `config.ts` and `config.js` exist on disk (rare, but possible
    during a partial migration), `import './config.js'` still resolves to
    the .js — that's an exact-disk match and what NodeNext compilation
    actually does. The rewrite only kicks in when the .js doesn't exist.
    
    ### Tests
    
    6 new tests in `test_extract_import_map.test.mjs`:
    - The main #294 case (`.js → .ts`)
    - `.jsx → .tsx` and `.mjs → .mts` rewrites
    - Disambiguation when both `.ts` and `.js` exist on disk
    - Pure-JS projects still work (real `.js → .js` imports)
    - Historical no-extension probes unaffected
    - Missing files still return null (rewrite can't invent targets)
    
    Total: 202 tests passing (was 196).
    
    Closes #294
  • chore(repo): add issue/PR templates, SECURITY.md, CoC, package metadata; widen CI triggers
    Closes a cluster of community-profile gaps (#248, #249, #251, #252) in one
    PR rather than four micro-PRs that all touch the same surface area.
    
    ### Templates (#251, #252)
    
    - .github/ISSUE_TEMPLATE/bug_report.yml — required fields for repro
      (plugin version, platform, OS, project language, file count); the four
      pieces of context that are missing from ~every current bug report.
    - .github/ISSUE_TEMPLATE/feature_request.yml — leads with the *problem*
      rather than the proposed solution, which keeps maintainer review focused
      on whether to solve, not just how.
    - .github/ISSUE_TEMPLATE/question.yml — separate from bug to keep the
      bug queue triagable.
    - .github/ISSUE_TEMPLATE/config.yml — disables blank issues and routes
      general discussion to README + Discussions.
    - .github/PULL_REQUEST_TEMPLATE.md — includes the version-bump checklist
      that CLAUDE.md says must stay in sync across 5 manifests; otherwise
      every contributor learns this rule by getting their PR bounced.
    
    ### Community files
    
    - CODE_OF_CONDUCT.md — short, project-specific document that names the
      expectations and reporting path. Not a verbatim Contributor Covenant
      to keep it readable.
    - SECURITY.md — describes the project's local-only threat model
      explicitly so reporters know what's in / out of scope before they
      spend time on a writeup. Points at GitHub private vulnerability
      reporting as the primary channel.
    
    ### CI (#249)
    
    - ci.yml now also runs on pushes to main, not only PRs. Without this,
      a direct push to main (which happens when maintainers merge a PR
      branch locally) doesn't trigger CI, so a regression can land green-
      looking and stay broken for days.
    - Added a concurrency group that cancels stale runs for the same ref.
      Saves runner minutes and keeps the per-ref status meaningful.
    - Used `github.ref` (a controlled value), not user-controlled input,
      so no script-injection surface.
    
    ### package.json (#248)
    
    - Added description, license, repository, bugs, homepage, keywords —
      the standard set for npm package discoverability and so GitHub's
      community-profile check shows the project at 100%.
  • feat(core): add Kotlin structural analysis via tree-sitter
    Wires Kotlin into the existing tree-sitter pipeline so .kt and .kts
    files now produce functions, classes, data classes, sealed classes,
    interfaces, objects, imports, exports, and call-graph edges — matching
    the behavior of the other language extractors.
    
    ## Why @tree-sitter-grammars/tree-sitter-kotlin
    
    The standard `tree-sitter-kotlin` (v0.3.8) ships only native bindings.
    The new `@tree-sitter-grammars/tree-sitter-kotlin@1.1.0` ships a
    prebuilt `.wasm` (loads cleanly with `web-tree-sitter@^0.26.6`,
    nodeTypeCount=289, parses class_declaration / function_declaration as
    expected). Same shape that PR1 used for Swift, just a different
    publisher because the repomix WASM bundle does not include Kotlin.
    
    `@tree-sitter-grammars` is the official tree-sitter org's GitHub
    account, so this is the canonical upstream WASM source for Kotlin.
    
    ## Notes for reviewers
    
    - `kotlinConfig` already existed as a stub (no `treeSitter` field), so
      Android / JVM / Gradle codebases currently produce no structural
      edges between `.kt` files. This PR adds the `treeSitter` field; the
      existing plugin loader picks it up unchanged.
    - **Visibility rule differs from Swift**: Kotlin's default visibility
      is `public`, so the extractor treats *every* declaration with no
      modifier as exported. Only an explicit `private` opts out. `internal`
      and `protected` remain exported in the project-graph sense because
      they are still resolvable from other files (within the module / via
      inheritance).
    - `class_declaration` in tree-sitter-kotlin is overloaded for class,
      data class, sealed class, and interface (distinguished by the keyword
      child and `modifiers > class_modifier`). The extractor handles all
      four uniformly.
    - `object_declaration` is a separate node type (Kotlin singletons) —
      treated as a class-like entry with its own `name` and members.
    - Primary-constructor parameters marked `val` / `var` are surfaced as
      class properties; plain `parameter`s without `val/var` are
      constructor-only and are NOT counted as properties (matching Kotlin
      semantics).
    - Import handling distinguishes the three forms: plain dotted
      (`import a.b.C`), wildcard (`import a.b.*` → specifier `"*"`), and
      aliased (`import a.b.C as Foo` → specifier `"Foo"`).
    
    ## Verification
    
    - `pnpm lint` clean
    - `pnpm --filter @understand-anything/core build` clean
    - `pnpm --filter @understand-anything/skill build` clean
    - `pnpm --filter @understand-anything/core test`: **692/692** (+22 new
      Kotlin tests, matching the bar set by go-extractor.test.ts /
      swift-extractor.test.ts)
    - `pnpm test`: 196/196 (no regressions)
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • perf(understand): parallelise file I/O in compute-batches + extract-import-map (#76)
    The /understand pipeline reads every code file twice during analysis:
    once in compute-batches (`extractExports` for the cross-batch neighbour
    map) and once again in extract-import-map (per-language config loaders).
    Both sites used sequential `readFileSync` loops, so on the iOS repo in
    issue #226 (~15k files) the disk-read time was effectively serialised
    behind a single libuv thread while the rest of the pool sat idle.
    
    ## Changes
    
    - `extractExports` now batches files into `IO_PARALLELISM = 64` slices
      and issues all `readFile` calls in each slice through `Promise.all`,
      letting libuv's worker-thread pool overlap disk reads. The
      tree-sitter parse stays on the main thread because `web-tree-sitter`
      is single-threaded WASM — pipelining the I/O while parses run is
      where the wall-time savings come from.
    - `loadTsConfigs`, `loadGoModules`, `loadPhpAutoloads` and
      `buildResolutionContext` switch to async / `Promise.all` for the
      same reason. `buildResolutionContext` also runs the three loader
      passes concurrently (`Promise.all([...])`) since they're independent.
    - A small `readFilesParallel(paths)` helper is added at the top of
      `extract-import-map.mjs` so the three loaders share the same
      error-preserving shape.
    
    ## Why behavior stays identical
    
    - Each loader collects its candidate paths in `files[]` order *before*
      issuing reads, then iterates `reads` in the same order to emit
      warnings + populate output maps. So stderr order and the final map
      contents are byte-identical to the previous sequential loops.
    - `extractExports` collects per-file errors in-place in the
      `Promise.all` callbacks and emits warnings during the post-read
      serial loop, again in chunk order — so warning text and order match
      the previous implementation.
    - Tree-sitter parsing is unchanged: parses still run serially on the
      main thread, just with reads pipelined alongside.
    
    ## What's NOT in this PR
    
    - `buildFingerprintStore` and `analyzeChanges` in `core/fingerprint.ts`
      have the same sequential pattern. They're left alone here because
      they're part of the public `@understand-anything/core` API; making
      them async would be a breaking change worth its own discussion.
      Internal-only `.mjs` scripts are safe to refactor without API churn.
    - No change to scan-project: most of its sync I/O is `statSync`
      (metadata, not content) plus a handful of small `.gitignore` /
      `.understandignore` reads. The parallelism win is marginal there.
    
    ## Verification
    
    - `pnpm lint` clean
    - `pnpm --filter @understand-anything/core build` clean
    - `pnpm --filter @understand-anything/skill build` clean
    - `pnpm test`: 196/196 — including
      `test_compute_batches.test.mjs` (19 tests) and
      `test_extract_import_map.test.mjs` (40 tests), which exercise both
      changed pipelines end-to-end with fixture projects. No output
      diff vs main.
    
    Refs #76
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • Merge pull request #235 from ZebangCheng/feat/add-trae-platform
    feat(install): add Trae (ByteDance AI IDE) platform support (#229)
  • Merge pull request #227 from ZebangCheng/fix/tsconfig-path-leading-dot-slash
    fix(extract-import-map): normalize tsconfig path-alias candidates with leading "./" (#214)
  • Merge pull request #208 from evanclan/docs/cursor-manual-install-i18n
    docs(readme): add Cursor manual install fallback to 7 translated READMEs
  • Merge pull request #231 from atlas-architect/fix/scan-project-non-ascii-paths
    fix(scan-project): preserve non-ASCII path bytes via `git ls-files -z`
  • fix(scan-project): preserve non-ASCII path bytes via git ls-files -z
    `enumerateViaGit` ran `git ls-files -co --exclude-standard` (newline-separated
    output) and then `split('\n').map(trim)` on the result. Without `-z`,
    `git ls-files` C-escapes any byte outside the locale's "safe" set and wraps
    the path in double quotes — for example, a directory named `30. 🏗️ docs/`
    comes back as `"30. \360\237\217\227\357\270\217 docs/"`. Downstream
    consumers then can't round-trip those octal-quoted strings to real disk
    paths, so every file under such directories is silently dropped from the
    scan.
    
    This is particularly biting on Windows (where the issue surfaces even with
    UTF-8 locale settings) and for any project that uses emoji, accented
    characters, or CJK codepoints in directory names — which is increasingly
    common in design/spec/journal trees.
    
    The fix is to use `-z` (NUL-terminated output), the same approach git
    itself documents for downstream consumers (e.g. `xargs -0`). NUL-separated
    chunks are raw bytes, so every codepoint round-trips back to its real disk
    path on every platform. Split on `\0` instead of `\n`; drop the now-
    unnecessary `.trim()`.
    
    Verified on a real project with emoji-prefixed directory names:
    
      bare `git ls-files`:
        "30. \360\237\217\227\357\270\217\360\237\247\231\342\200\215..."
    
      `git ls-files -z`:
        30. 🏗️🧙‍♂️🔮 BD-CCSP/01. Demo's/DEMO--...
    
    Discovered during a multi-agent scan of an Atlas Intelligence spoke repo;
    ~33 design-intent files in `30. 🏗️ BD-{app}/` directories were silently
    dropped per scan. Full report: atlas-intelligence-io/fleet-feedback#491.
    
    Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
  • docs(readme): add Cursor manual install fallback to 7 translated READMEs (#172)
    PR #199 documented the community-reported Cursor Settings → Plugins
    workaround in README.md only; the seven READMEs/* locale files still
    said auto-discovery always works. Mirror that fallback here so non-English
    readers hit the same fix without hunting issue #172.
  • Merge pull request #200 from AsimRaza10/fix/agent-model-omit-inherit
    fix(agents): omit `model: inherit` so non-Claude tools don't see a bad model id
  • docs(readme): note Cursor manual install fallback when auto-discovery fails (#199)
    Several users have reported that cloning the repo and opening it in Cursor
    doesn't always trigger auto-discovery of the .cursor-plugin manifest. Add
    the community-reported workaround (Cursor Settings → Plugins, paste repo
    URL) to the Cursor section so it isn't only discoverable from issue
    threads.
    
    Closes #172
  • Merge pull request #204 from Lum1104/feat/semantic-batching-and-output-chunking
    fix(#159): semantic batching + bundled importMap + Phase 1 speedup
  • fix(agents): omit model: inherit so non-Claude tools don't see a bad model id
    `model: inherit` is a Claude Code-specific keyword that means "use the
    parent session's model." Other tools that read the same agent frontmatter
    (opencode, codex, etc.) don't understand it and instead try to use
    `inherit` as a literal model id, which the configured provider rejects.
    
    Reproduction (from #167): opencode + deepseek runs `/understand`, the
    project-scanner subagent dispatches with `model: inherit`, deepseek
    returns `ProviderModelNotFoundError`, and the pipeline halts on every
    subagent dispatch.
    
    With the field omitted, each platform falls back to its own configured
    default:
    - Claude Code: user's default subagent model
    - opencode / codex / etc.: globally configured model
    
    Note for Claude Code Opus users: subagents will no longer auto-inherit
    the Opus session model. If you want the previous behavior, set your
    default subagent model globally — that single setting now controls all
    nine agents.
    
    Closes #167
  • docs(readme): note Cursor manual install fallback when auto-discovery fails
    Several users have reported that cloning the repo and opening it in Cursor
    doesn't always trigger auto-discovery of the .cursor-plugin manifest. Add
    the community-reported workaround (Cursor Settings → Plugins, paste repo
    URL) to the Cursor section so it isn't only discoverable from issue
    threads.
    
    Closes #172
  • Merge pull request #186 from AsimRaza10/fix/tailwind-source-detection
    fix(dashboard): explicit @source for Tailwind v4 (fixes #179)
  • Merge pull request #187 from devangpratap/fix/progress-reporting
    fix(ux): add progress reporting to /understand pipeline
  • docs(readme): document incremental updates, subdir scoping, and tree-sitter+LLM split
    Add to all 8 READMEs (English + 7 translations):
    - "Keep learning" section gets inline commands for incremental re-runs, the --auto-update post-commit hook, and scoping /understand to a subdirectory for huge monorepos
    - "Under the Hood" gets a new "Tree-sitter + LLM hybrid" subsection explaining the deterministic-vs-semantic split that powers the pipeline
  • fix(ux): add progress reporting to /understand pipeline
    Adds phase status lines, batch progress with total count, and phase
    completion confirmations to the skill definition. Users now see
    [Phase N/7] headers and Batch X/N during analysis instead of
    unnumbered batch lines with no context.
    
    Fixes #182