diff --git a/docs/superpowers/specs/2026-04-09-understand-knowledge-design.md b/docs/superpowers/specs/2026-04-09-understand-knowledge-design.md new file mode 100644 index 0000000..2c63252 --- /dev/null +++ b/docs/superpowers/specs/2026-04-09-understand-knowledge-design.md @@ -0,0 +1,335 @@ +# /understand-knowledge — Personal Knowledge Base Plugin Design + +## Overview + +A new `/understand-knowledge` skill within the existing Understand Anything plugin that takes any folder of markdown notes and produces an interactive knowledge graph visualized in the existing dashboard. + +Inspired by Andrej Karpathy's LLM Wiki pattern — where an LLM compiles and maintains a structured wiki from raw sources — this plugin goes further by adding typed relationship discovery and interactive graph visualization that tools like Obsidian and Logseq cannot provide. + +### Goals + +- Accept any markdown-based knowledge base (Obsidian vault, Logseq graph, Dendron workspace, Foam, Karpathy-style LLM wiki, Zettelkasten, or plain markdown) +- Auto-detect the format and adapt parsing accordingly +- Use LLM analysis to discover implicit relationships beyond explicit links +- Produce a knowledge graph with typed nodes and edges +- Visualize in the existing dashboard with knowledge-specific layout, sidebar, and reading mode + +### Non-Goals + +- Real-time sync with the knowledge base tool (Obsidian, Logseq, etc.) +- Replacing the user's existing PKM tool — this is a visualization/analysis layer on top +- Supporting non-markdown formats (PDFs, bookmarks) in v1 + +--- + +## Schema Extensions + +### New Node Types (5) + +Added to the existing `NodeType` union (currently 16 types): + +```typescript +export type NodeType = + // existing (16) + | "file" | "function" | "class" | "module" | "concept" + | "config" | "document" | "service" | "table" | "endpoint" + | "pipeline" | "schema" | "resource" + | "domain" | "flow" | "step" + // knowledge (5 new → 21 total) + | "article" | "entity" | "topic" | "claim" | "source"; +``` + +| Type | What it represents | Example | +|------|-------------------|---------| +| `article` | A wiki/note page — the primary content unit | "LLM Knowledge Bases.md" | +| `entity` | A named thing: person, tool, paper, org, project | "Andrej Karpathy", "Obsidian" | +| `topic` | A thematic cluster grouping related articles | "Personal Knowledge Management" | +| `claim` | A specific assertion, insight, or takeaway | "RAG loses context at chunk boundaries" | +| `source` | Raw/reference material that articles are compiled from | A paper URL, a raw PDF reference | + +### New Edge Types (6) + +Added to the existing `EdgeType` union (currently 29 types): + +```typescript +export type EdgeType = + // existing (29) + | ... + // knowledge (6 new → 35 total) + | "cites" | "contradicts" | "builds_on" + | "exemplifies" | "categorized_under" | "authored_by"; +``` + +| Type | Direction | Meaning | +|------|-----------|---------| +| `cites` | article → source | References or draws from | +| `contradicts` | claim → claim | Conflicts or disagrees with | +| `builds_on` | article → article | Extends, refines, or deepens | +| `exemplifies` | entity → concept/topic | Is a concrete example of | +| `categorized_under` | article/entity → topic | Belongs to this theme | +| `authored_by` | article → entity | Written or created by | + +### New Metadata Interface + +```typescript +export interface KnowledgeMeta { + format?: "obsidian" | "logseq" | "dendron" | "foam" | "karpathy" | "zettelkasten" | "plain"; + wikilinks?: string[]; + backlinks?: string[]; + frontmatter?: Record; + sourceUrl?: string; + confidence?: number; // 0-1, for LLM-inferred relationships +} +``` + +Added as an optional field on `GraphNode`: + +```typescript +export interface GraphNode { + // ...existing fields + knowledgeMeta?: KnowledgeMeta; +} +``` + +### Graph-Level Kind Flag + +```typescript +export interface KnowledgeGraph { + version: string; + kind: "codebase" | "knowledge"; // NEW + project: ProjectMeta; + nodes: GraphNode[]; + edges: GraphEdge[]; + layers: Layer[]; + tour: TourStep[]; +} +``` + +The `kind` field tells the dashboard which layout, sidebar, and visual styling to use. For backward compatibility, graphs without a `kind` field default to `"codebase"`. + +--- + +## Format Detection & Format Guides + +### Auto-Detection Logic + +Scans the target directory for signature files/patterns. Priority order (first match wins): + +| Priority | Signal | Detected Format | +|----------|--------|----------------| +| 1 | `.obsidian/` directory | Obsidian | +| 2 | `logseq/` + `pages/` directories | Logseq | +| 3 | `.dendron.yml` or `*.schema.yml` | Dendron | +| 4 | `.foam/` or `.vscode/foam.json` | Foam | +| 5 | `raw/` + `wiki/` + `index.md` | Karpathy | +| 6 | `[[wikilinks]]` + unique ID prefixes in filenames | Zettelkasten | +| 7 | Fallback | Plain markdown | + +### Format Guides + +Located at `skills/understand-knowledge/formats/`. Each guide tells the LLM agents how to parse that format: + +``` +skills/understand-knowledge/ + SKILL.md + formats/ + obsidian.md — [[wikilinks]], [[note|alias]], [[note#heading]], + #tags, YAML frontmatter, .obsidian/ config, + dataview annotations, canvas files + logseq.md — block-based outliner, ((block-refs)), + journals/YYYY_MM_DD.md, pages/, + property:: value syntax, TODO/DONE states + dendron.md — dot-delimited hierarchy (a.b.c.md), + .schema.yml for structure validation, + cross-vault links, refactoring rules + foam.md — [[wikilinks]] + link reference definitions + at file bottom, .foam/config, placeholder links + karpathy.md — raw/ → wiki/ pipeline, index.md master map, + log.md append-only record, _meta/ state, + LLM-maintained cross-references + zettelkasten.md — atomic notes, unique ID prefixes (timestamps), + typed semantic links, one idea per note + plain.md — standard [markdown](links), folder hierarchy, + heading structure, no special conventions +``` + +Each format guide covers: +- How to parse links (wikilinks vs standard vs block refs) +- Where metadata lives (frontmatter vs inline properties vs block properties) +- What the folder structure means (journals/ = daily notes, pages/ = permanent notes) +- What conventions to respect vs what to infer + +### Format Guide Authoring Process + +Format guides must be research-backed. During implementation, the agent building each format guide must: +1. Read the official documentation for that format (Obsidian Help, Logseq docs, Dendron wiki, Foam docs, etc.) +2. Study real-world examples of that format's structure +3. Write the guide based on verified behavior, not assumptions + +--- + +## Agent Pipeline + +``` +knowledge-scanner → format-detector → article-analyzer → relationship-builder → graph-reviewer +``` + +### Agent Definitions + +| Agent | Input | Output | Model | +|-------|-------|--------|-------| +| `knowledge-scanner` | Target directory path | File manifest: all `.md` files with paths, sizes, first 20 lines preview | `inherit` | +| `format-detector` | File manifest + directory structure | Detected format + format-specific parsing hints | `inherit` | +| `article-analyzer` | Individual `.md` file + format guide | Per-file nodes (article, entities, claims) + explicit edges (wikilinks, tags) | `inherit` | +| `relationship-builder` | All per-file results | Cross-file implicit edges (builds_on, contradicts, categorized_under) + topic clustering + layers | `inherit` | +| `graph-reviewer` | Assembled graph | Validated graph — deduped entities, consistent edge weights, orphan detection | `inherit` | + +### Key Differences from Codebase Pipeline + +- **No tree-sitter** — markdown parsing is simpler, mostly regex + LLM interpretation +- **format-detector** replaces framework detection — picks the right format guide +- **article-analyzer** replaces file-analyzer — extracts knowledge concepts instead of code structure +- **relationship-builder** is the heavy LLM step — discovers implicit connections across files that explicit links miss +- **graph-reviewer** stays similar — validates the assembled graph for consistency + +### Intermediate Files + +Same pattern as codebase analysis: + +``` +.understand-anything/intermediate/ + knowledge-manifest.json — scanner output + format-detection.json — detected format + hints + article-*.json — per-file analysis + relationships.json — cross-file edges + knowledge-graph.json — final assembled graph +``` + +Intermediate files are cleaned up after graph assembly (same as codebase flow). + +### Incremental Mode (`--ingest`) + +When the user runs `/understand-knowledge --ingest path/to/new-source.md`: + +1. **knowledge-scanner** — runs on just the new file(s) +2. **format-detector** — skipped (format already known from initial scan) +3. **article-analyzer** — processes only new/changed files +4. **relationship-builder** — runs on new nodes against the existing graph, finds connections to what's already there +5. **graph-reviewer** — validates the merged result + +Existing nodes are preserved; only new nodes/edges are added or updated. + +--- + +## Dashboard Changes + +All changes are scoped to graphs with `"kind": "knowledge"`. + +### Vertical Flow Layout + +- Default to top-down vertical layout (like existing domain/business flow view) +- Topics at top → articles in middle → entities/claims/sources at bottom +- Reads like a knowledge hierarchy: broad themes flow down into specifics +- User can still switch to horizontal or force-directed layout via controls + +### Knowledge Sidebar + +Replaces NodeInfo when a knowledge graph is loaded: + +| Selection | Sidebar Shows | +|-----------|---------------| +| Nothing selected | ProjectOverview: format detected, total articles/entities/topics/claims/sources | +| Article node | Title, summary, tags, frontmatter metadata, backlinks list (clickable), outgoing links, related topics | +| Entity node | Name, type (person/tool/paper/org), articles that mention it, relationships to other entities | +| Topic node | Description, child articles, child entities, cross-topic connections | +| Claim node | Assertion text, supporting articles, contradicting claims (if any), confidence score | +| Source node | Original URL/path, articles that cite it, ingestion date | + +### Reading Mode + +- Clicking an article node triggers a reading panel that slides up from the bottom (same pattern as current code viewer overlay) +- Shows the full compiled markdown rendered as HTML +- Includes a mini backlinks sidebar within the panel +- Clicking a `[[wikilink]]` or entity reference in the reading panel navigates the graph to that node + +### Node Visual Styling + +| Node Type | Shape | Color Accent | +|-----------|-------|-------------| +| `article` | Rounded rectangle | Warm amber | +| `entity` | Circle | Soft blue | +| `topic` | Large rounded rectangle | Muted gold | +| `claim` | Diamond | Green/red depending on contradictions | +| `source` | Small square | Gray | + +### Edge Visual Styling + +| Edge Type | Style | +|-----------|-------| +| `cites` | Dashed line | +| `contradicts` | Red line | +| `builds_on` | Solid with arrow | +| `categorized_under` | Thin gray | +| `authored_by` | Dotted blue | +| `exemplifies` | Dotted green | + +--- + +## Skill Interface + +### Usage + +```bash +# Full scan — first time or rescan +/understand-knowledge + +# Point at a specific directory +/understand-knowledge path/to/my-notes + +# Incremental ingest — add new sources to existing graph +/understand-knowledge --ingest path/to/new-note.md +/understand-knowledge --ingest path/to/new-folder/ +``` + +### Behavior + +1. Auto-detects format (Obsidian, Logseq, Karpathy, etc.) +2. Announces: "Detected Obsidian vault with 342 notes. Scanning..." +3. Runs the agent pipeline (scanner → detector → analyzer → relationship-builder → reviewer) +4. Writes `knowledge-graph.json` to `.understand-anything/` with `"kind": "knowledge"` +5. Auto-triggers `/understand-dashboard` after completion + +### File Structure + +``` +skills/understand-knowledge/ + SKILL.md — skill entry point, orchestration logic + formats/ + obsidian.md + logseq.md + dendron.md + foam.md + karpathy.md + zettelkasten.md + plain.md +``` + +### Coexistence with `/understand` + +- `/understand` produces `"kind": "codebase"` graphs +- `/understand-knowledge` produces `"kind": "knowledge"` graphs +- Both write to `.understand-anything/knowledge-graph.json` +- Running one replaces the other +- To scope knowledge analysis to a subdirectory (e.g., `docs/` within a code repo), use `/understand-knowledge path/to/docs` + +--- + +## What This Enables That Nothing Else Does + +| Existing Tools | Limitation | Our Advantage | +|---------------|-----------|---------------| +| Obsidian graph view | Untyped edges — all links look the same | Typed edges: cites, contradicts, builds_on | +| Logseq graph | Only shows explicit links | LLM discovers implicit relationships | +| All PKM tools | Single-format only | Cross-format support with auto-detection | +| Karpathy LLM Wiki | Flat text wiki, no visualization | Interactive graph dashboard with guided tours | +| None | No knowledge graph tours | Tour mode walks through a knowledge base step by step |