mirror of
https://github.com/microsoft/agent-framework.git
synced 2026-06-16 21:04:09 +08:00
0521f5bed8
* [BREAKING] Rename ChatAgent -> Agent, ChatMessage -> Message, ChatClientProtocol -> SupportsChatGetResponse Simplify the public API by removing redundant 'Chat' prefix from core types: - ChatAgent -> Agent - RawChatAgent -> RawAgent - ChatMessage -> Message - ChatClientProtocol -> SupportsChatGetResponse Also renamed internal WorkflowMessage (was Message in _runner_context) to avoid collision. No backward compatibility aliases - this is a clean breaking change. * [BREAKING] Rename Agent chat_client parameter to client * Fix rebase issues: WorkflowMessage references and broken markdown links * Fix formatting and lint issues from code quality checks * Fix import ordering in workflow sample files * fixed rebase * Fix test failures: use WorkflowMessage and A2AMessage after ChatMessage→Message rename - Replace Message(data=..., source_id=...) with WorkflowMessage(...) in workflow tests - Fix isinstance check in A2A agent to use A2AMessage instead of Message - Fix import in test_workflow_observability.py (Message→WorkflowMessage) * Fix lint, fmt, and sample errors after ChatMessage→Message rename - Auto-fix 70+ ruff lint issues across samples (ChatMessage→Message refs) - Fix HostedVectorStoreContent→Content.from_hosted_vector_store in file search sample - Fix _normalize_messages→normalize_messages in custom agent sample - Fix context.terminate→raise MiddlewareTermination in middleware samples - Fix with_update_hook→with_transform_hook in override middleware sample - Add TOptions_co import back to custom_chat_client sample - Add noqa for FastAPI File() default in chatkit sample - Fix B023 loop variable capture in weather agent sample * fix: update Agent constructor calls from chat_client to client in declaration-only tool tests * fix: add register_cleanup to devui lazy-loading proxy and type stub * fixed tests and updated new pieces * fix agui typevar * fix merge errors * fix merge conflicts * fiux merge * Remove unused links --------- Co-authored-by: Evan Mattson <evan.mattson@microsoft.com>
0521f5bed8
·
2026-02-10 23:04:32 +00:00
History
Self-Reflection Evaluation Sample
This sample demonstrates the self-reflection pattern using Agent Framework and Azure AI Foundry's Groundedness Evaluator. For details, see Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023).
Overview
What it demonstrates:
- Iterative self-reflection loop that automatically improves responses based on groundedness evaluation
- Batch processing of prompts from JSONL files with progress tracking
- Using
AzureOpenAIChatClientwith Azure CLI authentication - Comprehensive summary statistics and detailed result tracking
Prerequisites
Azure Resources
- Azure OpenAI: Deploy models (default: gpt-4.1 for both agent and judge)
- Azure CLI: Run
az loginto authenticate
Python Environment
pip install agent-framework-core azure-ai-projects pandas --pre
Environment Variables
# .env file
AZURE_AI_PROJECT_ENDPOINT=https://<your-ai-resource>.services.ai.azure.com/api/projects/<your-ai-project>/
Running the Sample
# Basic usage
python self_reflection.py
# With options
python self_reflection.py --input my_prompts.jsonl \
--output results.jsonl \
--max-reflections 5 \
-n 10
CLI Options:
--input,-i: Input JSONL file--output,-o: Output JSONL file--agent-model,-m: Agent model name (default: gpt-4.1)--judge-model,-e: Evaluator model name (default: gpt-4.1)--max-reflections: Max iterations (default: 3)--limit,-n: Process only first N prompts
Understanding Results
The agent iteratively improves responses:
- Generate initial response
- Evaluate groundedness (1-5 scale)
- If score < 5, provide feedback and retry
- Stop at max iterations or perfect score (5/5)
Example output:
[1/31] Processing prompt 0...
Self-reflection iteration 1/3...
Groundedness score: 3/5
Self-reflection iteration 2/3...
Groundedness score: 5/5
✓ Perfect groundedness score achieved!
✓ Completed with score: 5/5 (best at iteration 2/3)