Python: replace pre-commit with prek, add PEP 723 script deps, clean up dev dependencies (#3748)

* python: replace pre-commit with prek, add PEP 723 script deps, clean up dev dependencies - Replace pre-commit with prek (Rust-native, faster pre-commit alternative) - Move supported hooks to repo: builtin for zero-clone speed - Add new builtin hooks: trailing-whitespace, check-merge-conflict, detect-private-key, check-added-large-files - Update all hook versions to latest (pre-commit-hooks v6, pyupgrade v3.21.2, bandit 1.9.3, uv-pre-commit 0.10.0) - Add PEP 723 inline script metadata to 34 samples with external deps - Remove autogen-agentchat/autogen-ext from dev deps (now declared per-sample) - Remove unused dev deps: pytest-env, tomli-w - Add agent-framework-core>=1.0.0b260130 lower bound to all 21 packages - Update CI workflow to use j178/prek-action - Update docs: DEV_SETUP.md, AGENTS.md, CODING_STANDARD.md, SAMPLE_GUIDELINES.md * updated lock * python: fix prek config paths for local execution and CI workflow Remove global 'files: ^python/' filter and strip python/ prefix from all path patterns in .pre-commit-config.yaml so prek finds files when run from the python/ directory. Update CI workflow to use --cd python instead of --config path. Include trailing whitespace fixes and dev dependency cleanup. * python: move helper scripts to scripts/ folder and exclude from checks * python: exclude AGENTS.md from prek markdown code lint * python: exclude AGENTS.md and azure_ai_search sample from markdown lint * fix m365 sample * python: ignore CPY rule for samples with PEP 723 headers * fix in dev_setup * python: replace aiofiles with regular open in samples * python: suppress reportUnusedImport in markdown code block checker * python: use samples pyright config for markdown code block checker Write a temp pyrightconfig.json matching pyrightconfig.samples.json rules (typeCheckingMode=off, only reportMissingImports and reportAttributeAccessIssue). Filter output to only fail on these rules since syntax-level errors (top-level await, undefined vars) are expected in README documentation snippets. * python: use markdown-code-lint with fixed globs instead of prek file list The prek-markdown-code-lint task received all changed files including non-README markdown and files with pre-existing broken imports. Replace with the standard markdown-code-lint task which uses the correct glob patterns (README.md, packages/**/README.md, samples/**/*.md). * python: exclude READMEs with pre-existing broken imports from markdown lint * python: fix broken README code snippets instead of excluding them - ag-ui: replace TextContent (removed) with content.type == 'text' - durabletask: fix import path to durabletask.worker.TaskHubGrpcWorker - orchestrations: use constructor params instead of .participants() method - observability: mark deprecated code blocks as plain text, filter reportMissingImports to agent_framework modules only - remove README excludes from markdown-code-lint task * add revision to gaia download * feat(python): parallelize checks across packages Run (package × task) cross-product in parallel using ThreadPoolExecutor and subprocesses. Key changes: - Add scripts/task_runner.py with shared parallel execution engine - Update run_tasks_in_packages_if_exists.py to accept multiple tasks - Update run_tasks_in_changed_packages.py with --files flag and parallel support - Add check-packages poe task (fmt+lint+pyright+mypy in parallel) - Add prek-markdown-code-lint and prek-samples-check with change detection - Split CI code quality workflow into parallel prek and mypy jobs - Update DEV_SETUP.md to document new parallel behavior Core package changes still trigger checks on all packages. * feat(ci): split code quality into 4 parallel jobs Split the single prek job into parallel jobs: - pre-commit-hooks: lightweight hooks (SKIP=poe-check) - package-checks: fmt/lint/pyright/mypy via check-packages - samples-markdown: samples-lint, samples-syntax, markdown-code-lint - mypy: change-detected mypy checks All 4 jobs run concurrently (×2 Python versions = 8 runners). * feat(ci): use only Python 3.10 for code quality checks * refactor(python): add future annotations and remove quoted types Add `from __future__ import annotations` to 93 package files that used quoted string annotations, then run pyupgrade --py310-plus to remove the now-unnecessary quotes. Fixes https://github.com/microsoft/agent-framework/issues/3578
2026-06-16 21:04:09 +08:00 · 2026-02-09 18:51:01 +01:00
parent ad0dac3c86
commit 977c3adfb2
177 changed files with 1373 additions and 1010 deletions
@@ -0,0 +1,157 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Check code blocks in Markdown files for syntax errors."""
+
+import argparse
+from enum import Enum
+import glob
+import logging
+import os
+import tempfile
+import subprocess  # nosec
+
+from pygments import highlight  # type: ignore
+from pygments.formatters import TerminalFormatter
+from pygments.lexers import PythonLexer
+
+logger = logging.getLogger(__name__)
+logger.addHandler(logging.StreamHandler())
+logger.setLevel(logging.INFO)
+
+
+class Colors(str, Enum):
+    CEND = "\33[0m"
+    CRED = "\33[31m"
+    CREDBG = "\33[41m"
+    CGREEN = "\33[32m"
+    CGREENBG = "\33[42m"
+    CVIOLET = "\33[35m"
+    CGREY = "\33[90m"
+
+
+def with_color(text: str, color: Colors) -> str:
+    """Prints a string with the specified color."""
+    return f"{color.value}{text}{Colors.CEND.value}"
+
+
+def expand_file_patterns(patterns: list[str], skip_glob: bool = False) -> list[str]:
+    """Expand glob patterns to actual file paths."""
+    all_files: list[str] = []
+    for pattern in patterns:
+        if skip_glob:
+            # When skip_glob is True, treat patterns as literal file paths
+            # Only include if it's a markdown file
+            if pattern.endswith('.md'):
+                matches = glob.glob(pattern, recursive=False)
+                all_files.extend(matches)
+        else:
+            # Handle both relative and absolute paths with glob expansion
+            matches = glob.glob(pattern, recursive=True)
+            all_files.extend(matches)
+    return sorted(set(all_files))  # Remove duplicates and sort
+
+
+def extract_python_code_blocks(markdown_file_path: str) -> list[tuple[str, int]]:
+    """Extract Python code blocks from a Markdown file."""
+    with open(markdown_file_path, encoding="utf-8") as file:
+        lines = file.readlines()
+
+    code_blocks: list[tuple[str, int]] = []
+    in_code_block = False
+    current_block: list[str] = []
+
+    for i, line in enumerate(lines):
+        if line.strip().startswith("```python"):
+            in_code_block = True
+            current_block = []
+        elif line.strip().startswith("```"):
+            in_code_block = False
+            code_blocks.append(("\n".join(current_block), i - len(current_block) + 1))
+        elif in_code_block:
+            current_block.append(line)
+
+    return code_blocks
+
+
+def check_code_blocks(markdown_file_paths: list[str], exclude_patterns: list[str] | None = None) -> None:
+    """Check Python code blocks in a Markdown file for syntax errors."""
+    files_with_errors: list[str] = []
+    exclude_patterns = exclude_patterns or []
+
+    for markdown_file_path in markdown_file_paths:
+        # Skip files that match any exclude pattern
+        if any(pattern in markdown_file_path for pattern in exclude_patterns):
+            logger.info(f"Skipping {markdown_file_path} (matches exclude pattern)")
+            continue
+        code_blocks = extract_python_code_blocks(markdown_file_path)
+        had_errors = False
+        for code_block, line_no in code_blocks:
+            markdown_file_path_with_line_no = f"{markdown_file_path}:{line_no}"
+            logger.info("Checking a code block in %s...", markdown_file_path_with_line_no)
+
+            # Skip blocks that don't import agent_framework modules or import lab modules
+            if (all(
+                all(import_code not in code_block for import_code in [f"import {module}", f"from {module}"])
+                for module in ["agent_framework"]
+            ) or "agent_framework.lab" in code_block):
+                logger.info(f' {with_color("OK[ignored]", Colors.CGREENBG)}')
+                continue
+
+            with tempfile.TemporaryDirectory() as tmp_dir:
+                # Use the same rules as pyrightconfig.samples.json:
+                # typeCheckingMode=off, only reportMissingImports and reportAttributeAccessIssue enabled.
+                pyright_cfg = os.path.join(tmp_dir, "pyrightconfig.json")
+                with open(pyright_cfg, "w") as cfg:
+                    cfg.write(
+                        '{"include":["."],"typeCheckingMode":"off",'
+                        '"reportMissingImports":"error","reportAttributeAccessIssue":"error"}'
+                    )
+                tmp_file = os.path.join(tmp_dir, "snippet.py")
+                with open(tmp_file, "w", encoding="utf-8") as f:
+                    f.write(code_block)
+
+                result = subprocess.run(["uv", "run", "pyright", "-p", tmp_dir], capture_output=True, text=True, cwd=".")  # nosec
+                # Filter to only errors from our config rules; syntax-level errors
+                # (top-level await, etc.) are expected in README documentation snippets.
+                # Only flag reportMissingImports for agent_framework modules, not third-party packages.
+                relevant_errors = [
+                    line for line in result.stdout.splitlines()
+                    if ("reportMissingImports" in line and "agent_framework" in line)
+                    or "reportAttributeAccessIssue" in line
+                ]
+                if relevant_errors:
+                    highlighted_code = highlight(code_block, PythonLexer(), TerminalFormatter())  # type: ignore
+                    logger.info(
+                        f" {with_color('FAIL', Colors.CREDBG)}\n"
+                        f"{with_color('========================================================', Colors.CGREY)}\n"
+                        f"{with_color('Error', Colors.CRED)}: Pyright found issues in {with_color(markdown_file_path_with_line_no, Colors.CVIOLET)}:\n"
+                        f"{with_color('--------------------------------------------------------', Colors.CGREY)}\n"
+                        f"{highlighted_code}\n"
+                        f"{with_color('--------------------------------------------------------', Colors.CGREY)}\n"
+                        "\n"
+                        f"{with_color('pyright output:', Colors.CVIOLET)}\n"
+                        f"{with_color(result.stdout, Colors.CRED)}"
+                        f"{with_color('========================================================', Colors.CGREY)}\n"
+                    )
+                    had_errors = True
+                else:
+                    logger.info(f" {with_color('OK', Colors.CGREENBG)}")
+
+        if had_errors:
+            files_with_errors.append(markdown_file_path)
+
+    if files_with_errors:
+        raise RuntimeError("Syntax errors found in the following files:\n" + "\n".join(files_with_errors))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Check code blocks in Markdown files for syntax errors.")
+    # Argument is a list of markdown files containing glob patterns
+    parser.add_argument("markdown_files", nargs="+", help="Markdown files to check (supports glob patterns).")
+    parser.add_argument("--exclude", action="append", help="Exclude files containing this pattern.")
+    parser.add_argument("--no-glob", action="store_true", help="Treat file arguments as literal paths (no glob expansion).")
+    args = parser.parse_args()
+
+    # Expand glob patterns to actual file paths (or skip if --no-glob)
+    expanded_files = expand_file_patterns(args.markdown_files, skip_glob=args.no_glob)
+    check_code_blocks(expanded_files, args.exclude)
@@ -0,0 +1,81 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Run task(s) only in packages that have changed files, in parallel by default."""
+
+import argparse
+from pathlib import Path
+
+from rich import print
+from task_runner import build_work_items, discover_projects, run_tasks
+
+
+def get_changed_packages(projects: list[Path], changed_files: list[str], workspace_root: Path) -> set[Path]:
+    """Determine which packages have changed files."""
+    changed_packages: set[Path] = set()
+    core_package_changed = False
+
+    for file_path in changed_files:
+        # Strip 'python/' prefix if present (when git diff is run from repo root)
+        file_path_str = str(file_path)
+        if file_path_str.startswith("python/"):
+            file_path_str = file_path_str[7:]  # Remove 'python/' prefix
+
+        # Convert to absolute path if relative
+        abs_path = Path(file_path_str)
+        if not abs_path.is_absolute():
+            abs_path = workspace_root / file_path_str
+
+        # Check which package this file belongs to
+        for project in projects:
+            project_abs = workspace_root / project
+            try:
+                # Check if the file is within this project directory
+                abs_path.relative_to(project_abs)
+                changed_packages.add(project)
+                # Check if the core package was changed
+                if project == Path("packages/core"):
+                    core_package_changed = True
+                break
+            except ValueError:
+                # File is not in this project
+                continue
+
+    # If core package changed, check all packages
+    if core_package_changed:
+        print("[yellow]Core package changed - checking all packages[/yellow]")
+        return set(projects)
+
+    return changed_packages
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run task(s) in changed packages, in parallel by default.")
+    parser.add_argument("tasks", nargs="+", help="Task name(s) to run")
+    parser.add_argument("--files", nargs="*", default=None, help="Changed files to determine which packages to run")
+    parser.add_argument("--seq", action="store_true", help="Run sequentially instead of in parallel")
+    args = parser.parse_args()
+
+    pyproject_file = Path(__file__).parent.parent / "pyproject.toml"
+    workspace_root = pyproject_file.parent
+    projects = discover_projects(pyproject_file)
+
+    # Determine which packages to check
+    if not args.files or args.files == ["."]:
+        task_list = ", ".join(args.tasks)
+        print(f"[yellow]No specific files provided, running {task_list} in all packages[/yellow]")
+        target_packages = sorted(set(projects))
+    else:
+        changed_packages = get_changed_packages(projects, args.files, workspace_root)
+        if changed_packages:
+            print(f"[cyan]Detected changes in packages: {', '.join(str(p) for p in sorted(changed_packages))}[/cyan]")
+        else:
+            print(f"[yellow]No changes detected in any package, skipping[/yellow]")
+            return
+        target_packages = sorted(changed_packages)
+
+    work_items = build_work_items(target_packages, args.tasks)
+    run_tasks(work_items, workspace_root, sequential=args.seq)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,29 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Run poe task(s) across all workspace packages, in parallel by default."""
+
+import argparse
+import sys
+from pathlib import Path
+
+from task_runner import build_work_items, discover_projects, run_tasks
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Run poe task(s) across all workspace packages, in parallel by default."
+    )
+    parser.add_argument("tasks", nargs="+", help="Task name(s) to run across packages")
+    parser.add_argument("--seq", action="store_true", help="Run sequentially instead of in parallel")
+    args = parser.parse_args()
+
+    pyproject_file = Path(__file__).parent.parent / "pyproject.toml"
+    workspace_root = pyproject_file.parent
+    projects = discover_projects(pyproject_file)
+
+    work_items = build_work_items(projects, args.tasks)
+    run_tasks(work_items, workspace_root, sequential=args.seq)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,150 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+"""Shared utilities for running poe tasks across workspace packages in parallel."""
+
+import concurrent.futures
+import glob
+import os
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+import tomli
+from rich import print
+
+
+def discover_projects(workspace_pyproject_file: Path) -> list[Path]:
+    """Discover all workspace projects from pyproject.toml."""
+    with workspace_pyproject_file.open("rb") as f:
+        data = tomli.load(f)
+
+    projects = data["tool"]["uv"]["workspace"]["members"]
+    exclude = data["tool"]["uv"]["workspace"].get("exclude", [])
+
+    all_projects: list[Path] = []
+    for project in projects:
+        if "*" in project:
+            globbed = glob.glob(str(project), root_dir=workspace_pyproject_file.parent)
+            globbed_paths = [Path(p) for p in globbed]
+            all_projects.extend(globbed_paths)
+        else:
+            all_projects.append(Path(project))
+
+    for project in exclude:
+        if "*" in project:
+            globbed = glob.glob(str(project), root_dir=workspace_pyproject_file.parent)
+            globbed_paths = [Path(p) for p in globbed]
+            all_projects = [p for p in all_projects if p not in globbed_paths]
+        else:
+            all_projects = [p for p in all_projects if p != Path(project)]
+
+    return all_projects
+
+
+def extract_poe_tasks(file: Path) -> set[str]:
+    """Extract poe task names from a pyproject.toml file."""
+    with file.open("rb") as f:
+        data = tomli.load(f)
+
+    tasks = set(data.get("tool", {}).get("poe", {}).get("tasks", {}).keys())
+
+    # Check if there is an include too
+    include: str | None = data.get("tool", {}).get("poe", {}).get("include", None)
+    if include:
+        include_file = file.parent / include
+        if include_file.exists():
+            tasks = tasks.union(extract_poe_tasks(include_file))
+
+    return tasks
+
+
+def build_work_items(projects: list[Path], task_names: list[str]) -> list[tuple[Path, str]]:
+    """Build cross-product of (package, task) for packages that define the task."""
+    work_items: list[tuple[Path, str]] = []
+    for project in projects:
+        available_tasks = extract_poe_tasks(project / "pyproject.toml")
+        for task in task_names:
+            if task in available_tasks:
+                work_items.append((project, task))
+    return work_items
+
+
+def _run_task_subprocess(project: Path, task: str, workspace_root: Path) -> tuple[Path, str, int, str, str, float]:
+    """Run a single poe task in a project directory via subprocess."""
+    start = time.monotonic()
+    cwd = workspace_root / project
+    result = subprocess.run(
+        ["uv", "run", "poe", task],
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+    )
+    elapsed = time.monotonic() - start
+    return (project, task, result.returncode, result.stdout, result.stderr, elapsed)
+
+
+def _run_sequential(work_items: list[tuple[Path, str]]) -> None:
+    """Run tasks sequentially using in-process PoeThePoet (streaming output)."""
+    from poethepoet.app import PoeThePoet
+
+    for project, task in work_items:
+        print(f"Running task {task} in {project}")
+        app = PoeThePoet(cwd=project)
+        result = app(cli_args=[task])
+        if result:
+            sys.exit(result)
+
+
+def _run_parallel(work_items: list[tuple[Path, str]], workspace_root: Path) -> None:
+    """Run all (package × task) combinations in parallel via subprocesses."""
+    max_workers = min(len(work_items), os.cpu_count() or 4)
+    failures: list[tuple[Path, str, str, str]] = []
+    completed = 0
+    total = len(work_items)
+
+    print(f"[cyan]Running {total} task(s) in parallel (max {max_workers} workers)...[/cyan]")
+
+    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
+        futures = {
+            executor.submit(_run_task_subprocess, project, task, workspace_root): (project, task)
+            for project, task in work_items
+        }
+        for future in concurrent.futures.as_completed(futures):
+            project, task, returncode, stdout, stderr, elapsed = future.result()
+            completed += 1
+            progress = f"[{completed}/{total}]"
+            if returncode == 0:
+                print(f"  [green]✓[/green] {progress} {task} in {project} ({elapsed:.1f}s)")
+            else:
+                print(f"  [red]✗[/red] {progress} {task} in {project} ({elapsed:.1f}s)")
+                failures.append((project, task, stdout, stderr))
+
+    if failures:
+        print(f"\n[red]{len(failures)} task(s) failed:[/red]")
+        for project, task, stdout, stderr in failures:
+            print(f"\n[red]{'='*60}[/red]")
+            print(f"[red]FAILED: {task} in {project}[/red]")
+            if stdout.strip():
+                print(stdout)
+            if stderr.strip():
+                sys.stderr.write(stderr)
+        sys.exit(1)
+
+    print(f"\n[green]All {total} task(s) passed ✓[/green]")
+
+
+def run_tasks(work_items: list[tuple[Path, str]], workspace_root: Path, *, sequential: bool = False) -> None:
+    """Run work items either in parallel or sequentially.
+
+    Single items use in-process PoeThePoet for streaming output.
+    Multiple items use parallel subprocesses by default.
+    """
+    if not work_items:
+        print("[yellow]No matching tasks found in any package[/yellow]")
+        return
+
+    if sequential or len(work_items) == 1:
+        _run_sequential(work_items)
+    else:
+        _run_parallel(work_items, workspace_root)