mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
a2856d3b92
* restructure: Python samples into progressive 01-05 layout - 01-get-started/: 6 numbered steps (hello agent → hosting) - 02-agents/: all agent concept samples (tools, middleware, providers, etc.) - 03-workflows/: ALL existing workflow samples preserved as-is - 04-hosting/: azure-functions, durabletask, a2a - 05-end-to-end/: demos, evaluation, hosted agents - Old files moved to _to_delete/ for review - Added AGENTS.md with structure documentation - autogen-migration/ and semantic-kernel-migration/ preserved at root * fix: switch to AzureOpenAI Foundry, fix CI failures - Switch all 01-get-started samples to AzureOpenAIResponsesClient with Azure AI Foundry project endpoint (AZURE_AI_PROJECT_ENDPOINT + AZURE_OPENAI_RESPONSES_DEPLOYMENT_NAME + AzureCliCredential) - Add _to_delete/ and 05-end-to-end/ to pyrightconfig.samples.json excludes - Fix test paths in packages/ that referenced old getting_started/ dirs: durabletask conftest + streaming test, azurefunctions conftest, devui conftest + capture_messages + openai_sdk_integration - Fix workflow_as_agent_human_in_the_loop.py import (sibling import) - Update hosting READMEs and tool comment paths - Replace root README.md with new structure overview - Update AGENTS.md to document Azure OpenAI Foundry as default provider * cleanup: remove _to_delete folder, copy resource files to active dirs All files in _to_delete/ were either: - Exact duplicates of files in the new structure (240 files) - Same file with only comment path updates (100 files) - One import-fix diff (workflow_as_agent_human_in_the_loop.py) - One superseded minimal_sample.py Resource files (sample.pdf, countries.json, employees.pdf, weather.json) copied to 02-agents/sample_assets/ and 02-agents/resources/ since active samples reference them. * fix: address PR review comments, centralize resources, remove root duplicates - Fix type annotation in 04_memory.py (string union -> proper types) - Fix old sample paths in observability files - Fix grammar/spelling in observability samples - Move sample_assets/ and resources/ to shared/ folder - Remove 8 duplicate observability files from 02-agents root - Update resource path references in multimodal_input and provider samples * fix: update broken links from old getting_started paths to new structure - Update relative paths in READMEs: getting_started/ → 01-get-started/, 02-agents/, 03-workflows/, 04-hosting/, 05-end-to-end/ - Fix absolute GitHub URLs in package READMEs - Fix broken link in ollama package README * fix: convert absolute GitHub URLs to relative paths for link checker Absolute URLs to python/samples/ on main branch 404 until PR merges. Converted to relative paths that linkspector can verify locally. * fix: update link for handoff sample moved to orchestrations/ * fix: update chatkit-integration README path from demos/ to 05-end-to-end/ * fix: update broken links in orchestrations README to match flat directory structure
2.3 KiB
2.3 KiB
Self-Reflection Evaluation Sample
This sample demonstrates the self-reflection pattern using Agent Framework and Azure AI Foundry's Groundedness Evaluator. For details, see Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023).
Overview
What it demonstrates:
- Iterative self-reflection loop that automatically improves responses based on groundedness evaluation
- Batch processing of prompts from JSONL files with progress tracking
- Using
AzureOpenAIChatClientwith Azure CLI authentication - Comprehensive summary statistics and detailed result tracking
Prerequisites
Azure Resources
- Azure OpenAI: Deploy models (default: gpt-4.1 for both agent and judge)
- Azure CLI: Run
az loginto authenticate
Python Environment
pip install agent-framework-core azure-ai-projects pandas --pre
Environment Variables
# .env file
AZURE_AI_PROJECT_ENDPOINT=https://<your-ai-resource>.services.ai.azure.com/api/projects/<your-ai-project>/
Running the Sample
# Basic usage
python self_reflection.py
# With options
python self_reflection.py --input my_prompts.jsonl \
--output results.jsonl \
--max-reflections 5 \
-n 10
CLI Options:
--input,-i: Input JSONL file--output,-o: Output JSONL file--agent-model,-m: Agent model name (default: gpt-4.1)--judge-model,-e: Evaluator model name (default: gpt-4.1)--max-reflections: Max iterations (default: 3)--limit,-n: Process only first N prompts
Understanding Results
The agent iteratively improves responses:
- Generate initial response
- Evaluate groundedness (1-5 scale)
- If score < 5, provide feedback and retry
- Stop at max iterations or perfect score (5/5)
Example output:
[1/31] Processing prompt 0...
Self-reflection iteration 1/3...
Groundedness score: 3/5
Self-reflection iteration 2/3...
Groundedness score: 5/5
✓ Perfect groundedness score achieved!
✓ Completed with score: 5/5 (best at iteration 2/3)