mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
977c3adfb2
* python: replace pre-commit with prek, add PEP 723 script deps, clean up dev dependencies - Replace pre-commit with prek (Rust-native, faster pre-commit alternative) - Move supported hooks to repo: builtin for zero-clone speed - Add new builtin hooks: trailing-whitespace, check-merge-conflict, detect-private-key, check-added-large-files - Update all hook versions to latest (pre-commit-hooks v6, pyupgrade v3.21.2, bandit 1.9.3, uv-pre-commit 0.10.0) - Add PEP 723 inline script metadata to 34 samples with external deps - Remove autogen-agentchat/autogen-ext from dev deps (now declared per-sample) - Remove unused dev deps: pytest-env, tomli-w - Add agent-framework-core>=1.0.0b260130 lower bound to all 21 packages - Update CI workflow to use j178/prek-action - Update docs: DEV_SETUP.md, AGENTS.md, CODING_STANDARD.md, SAMPLE_GUIDELINES.md * updated lock * python: fix prek config paths for local execution and CI workflow Remove global 'files: ^python/' filter and strip python/ prefix from all path patterns in .pre-commit-config.yaml so prek finds files when run from the python/ directory. Update CI workflow to use --cd python instead of --config path. Include trailing whitespace fixes and dev dependency cleanup. * python: move helper scripts to scripts/ folder and exclude from checks * python: exclude AGENTS.md from prek markdown code lint * python: exclude AGENTS.md and azure_ai_search sample from markdown lint * fix m365 sample * python: ignore CPY rule for samples with PEP 723 headers * fix in dev_setup * python: replace aiofiles with regular open in samples * python: suppress reportUnusedImport in markdown code block checker * python: use samples pyright config for markdown code block checker Write a temp pyrightconfig.json matching pyrightconfig.samples.json rules (typeCheckingMode=off, only reportMissingImports and reportAttributeAccessIssue). Filter output to only fail on these rules since syntax-level errors (top-level await, undefined vars) are expected in README documentation snippets. * python: use markdown-code-lint with fixed globs instead of prek file list The prek-markdown-code-lint task received all changed files including non-README markdown and files with pre-existing broken imports. Replace with the standard markdown-code-lint task which uses the correct glob patterns (README.md, packages/**/README.md, samples/**/*.md). * python: exclude READMEs with pre-existing broken imports from markdown lint * python: fix broken README code snippets instead of excluding them - ag-ui: replace TextContent (removed) with content.type == 'text' - durabletask: fix import path to durabletask.worker.TaskHubGrpcWorker - orchestrations: use constructor params instead of .participants() method - observability: mark deprecated code blocks as plain text, filter reportMissingImports to agent_framework modules only - remove README excludes from markdown-code-lint task * add revision to gaia download * feat(python): parallelize checks across packages Run (package × task) cross-product in parallel using ThreadPoolExecutor and subprocesses. Key changes: - Add scripts/task_runner.py with shared parallel execution engine - Update run_tasks_in_packages_if_exists.py to accept multiple tasks - Update run_tasks_in_changed_packages.py with --files flag and parallel support - Add check-packages poe task (fmt+lint+pyright+mypy in parallel) - Add prek-markdown-code-lint and prek-samples-check with change detection - Split CI code quality workflow into parallel prek and mypy jobs - Update DEV_SETUP.md to document new parallel behavior Core package changes still trigger checks on all packages. * feat(ci): split code quality into 4 parallel jobs Split the single prek job into parallel jobs: - pre-commit-hooks: lightweight hooks (SKIP=poe-check) - package-checks: fmt/lint/pyright/mypy via check-packages - samples-markdown: samples-lint, samples-syntax, markdown-code-lint - mypy: change-detected mypy checks All 4 jobs run concurrently (×2 Python versions = 8 runners). * feat(ci): use only Python 3.10 for code quality checks * refactor(python): add future annotations and remove quoted types Add `from __future__ import annotations` to 93 package files that used quoted string annotations, then run pyupgrade --py310-plus to remove the now-unnecessary quotes. Fixes https://github.com/microsoft/agent-framework/issues/3578
977c3adfb2
·
2026-02-09 17:51:01 +00:00
History
Self-Reflection Evaluation Sample
This sample demonstrates the self-reflection pattern using Agent Framework and Azure AI Foundry's Groundedness Evaluator. For details, see Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023).
Overview
What it demonstrates:
- Iterative self-reflection loop that automatically improves responses based on groundedness evaluation
- Batch processing of prompts from JSONL files with progress tracking
- Using
AzureOpenAIChatClientwith Azure CLI authentication - Comprehensive summary statistics and detailed result tracking
Prerequisites
Azure Resources
- Azure OpenAI: Deploy models (default: gpt-4.1 for both agent and judge)
- Azure CLI: Run
az loginto authenticate
Python Environment
pip install agent-framework-core azure-ai-projects pandas --pre
Environment Variables
# .env file
AZURE_AI_PROJECT_ENDPOINT=https://<your-ai-resource>.services.ai.azure.com/api/projects/<your-ai-project>/
Running the Sample
# Basic usage
python self_reflection.py
# With options
python self_reflection.py --input my_prompts.jsonl \
--output results.jsonl \
--max-reflections 5 \
-n 10
CLI Options:
--input,-i: Input JSONL file--output,-o: Output JSONL file--agent-model,-m: Agent model name (default: gpt-4.1)--judge-model,-e: Evaluator model name (default: gpt-4.1)--max-reflections: Max iterations (default: 3)--limit,-n: Process only first N prompts
Understanding Results
The agent iteratively improves responses:
- Generate initial response
- Evaluate groundedness (1-5 scale)
- If score < 5, provide feedback and retry
- Stop at max iterations or perfect score (5/5)
Example output:
[1/31] Processing prompt 0...
Self-reflection iteration 1/3...
Groundedness score: 3/5
Self-reflection iteration 2/3...
Groundedness score: 5/5
✓ Perfect groundedness score achieved!
✓ Completed with score: 5/5 (best at iteration 2/3)