mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Dmytro Struk 5687e13221 Python: [BREAKING] Renamed create_agent to as_agent (#3249 )

* Renamed create_agent to as_agent

* Override for as_agent

* Added override

5687e13221 · 2026-01-16 19:21:52 +00:00

History

resources

Python: [BREAKING] Observability updates (#2782 )

2025-12-16 06:56:30 +00:00

.env.example

Python: Replace Eval SDK with AI Projects SDK in evaluation sample (#2540 )

2025-12-02 20:28:52 +00:00

README.md

Python: Replace Eval SDK with AI Projects SDK in evaluation sample (#2540 )

2025-12-02 20:28:52 +00:00

self_reflection.py

Python: [BREAKING] Renamed create_agent to as_agent (#3249 )

2026-01-16 19:21:52 +00:00

README.md

Self-Reflection Evaluation Sample

This sample demonstrates the self-reflection pattern using Agent Framework and Azure AI Foundry's Groundedness Evaluator. For details, see Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023).

Overview

What it demonstrates:

Iterative self-reflection loop that automatically improves responses based on groundedness evaluation
Batch processing of prompts from JSONL files with progress tracking
Using AzureOpenAIChatClient with Azure CLI authentication
Comprehensive summary statistics and detailed result tracking

Prerequisites

Azure Resources

Azure OpenAI: Deploy models (default: gpt-4.1 for both agent and judge)
Azure CLI: Run az login to authenticate

Python Environment

pip install agent-framework-core azure-ai-projects pandas --pre

Environment Variables

# .env file
AZURE_AI_PROJECT_ENDPOINT=https://<your-ai-resource>.services.ai.azure.com/api/projects/<your-ai-project>/

Running the Sample

# Basic usage
python self_reflection.py

# With options
python self_reflection.py --input my_prompts.jsonl \
                          --output results.jsonl \
                          --max-reflections 5 \
                          -n 10

CLI Options:

--input, -i: Input JSONL file
--output, -o: Output JSONL file
--agent-model, -m: Agent model name (default: gpt-4.1)
--judge-model, -e: Evaluator model name (default: gpt-4.1)
--max-reflections: Max iterations (default: 3)
--limit, -n: Process only first N prompts

Understanding Results

The agent iteratively improves responses:

Generate initial response
Evaluate groundedness (1-5 scale)
If score < 5, provide feedback and retry
Stop at max iterations or perfect score (5/5)

Example output:

[1/31] Processing prompt 0...
  Self-reflection iteration 1/3...
  Groundedness score: 3/5
  Self-reflection iteration 2/3...
  Groundedness score: 5/5
  ✓ Perfect groundedness score achieved!
  ✓ Completed with score: 5/5 (best at iteration 2/3)

README.md

Self-Reflection Evaluation Sample

Overview

Prerequisites

Azure Resources

Python Environment

Environment Variables

Running the Sample

Understanding Results

Related Resources