.NET: Add Foundry Evaluation samples (Safety + Quality) (#3697)

* Initial plan * Add Foundry evaluation samples for Red Teaming and Self-Reflection Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Refactor evaluation samples with real implementations in local functions Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Uncomment function signatures and bodies, keep only invocations commented Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> * Update Foundry evaluation samples with observability support * Restructure evaluation samples to follow FoundryAgents naming convention - Rename Evaluation/Evaluation_StepXX to FoundryAgents_Evaluations_StepXX - Add evaluation projects to slnx - Fix var usage, apply dotnet format, use DefaultAzureCredential - Add try/finally for agent cleanup - Fix evaluator deployment name separation in Step02 - Update README references Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rewrite Step01 to use Azure.AI.Projects RedTeam API and address review comments - Replace safety evaluator sample with actual Red Teaming using AIProjectClient.RedTeams - Use AttackStrategy (Easy, Moderate, Jailbreak) and RiskCategory from Azure.AI.Projects - Remove Microsoft.Extensions.AI.Evaluation.Safety dependency from Step01 - Add DefaultAzureCredential warning comments to Step02 - Remove unused bestResponse variable in Step02 - Add session isolation comments in self-reflection loop - Fix stale directory references in READMEs - Fix misleading evaluation overview link in main README Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add note about agent-targeted red teaming limitations in README The .NET RedTeam API currently only supports model deployment targets via AzureOpenAIModelConfiguration. Agent-targeted red teaming with AzureAIAgentTarget is documented in concept docs but not yet available in the SDK's RedTeam constructor. Results appear in classic portal view. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add classic Foundry disclaimer to red teaming sample README Clarify that this sample uses the classic Azure AI Foundry red teaming API (/redTeams/runs). The new Foundry portal uses a separate evaluation- based API not yet available in the .NET SDK. AzureAIAgentTarget exists in the SDK but is consumed by the Evaluation Taxonomy API, not RedTeam. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review comments on Step02 SelfReflection - Pass full prompt (with context) to evaluator messages instead of just the question, so evaluator input matches what the agent received - Include previous response text in self-reflection refinement prompt so the LLM can meaningfully improve its answer across iterations - Inline CreateKnowledgeAgent helper (single use, single statement) - Add comment clarifying why RunCombinedQualityAndSafetyEvaluation intentionally passes only the question (no context) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: rogerbarreto <19890735+rogerbarreto@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-16 21:04:09 +08:00 · 2026-02-18 13:52:26 +00:00
parent a97e42a989
commit 4b3df9ad89
9 changed files with 668 additions and 0 deletions
@@ -63,6 +63,9 @@
    <!-- Microsoft.Extensions.* -->
    <PackageVersion Include="Microsoft.Extensions.AI" Version="10.3.0" />
    <PackageVersion Include="Microsoft.Extensions.AI.Abstractions" Version="10.3.0" />
+    <PackageVersion Include="Microsoft.Extensions.AI.Evaluation" Version="10.3.0" />
+    <PackageVersion Include="Microsoft.Extensions.AI.Evaluation.Quality" Version="10.3.0" />
+    <PackageVersion Include="Microsoft.Extensions.AI.Evaluation.Safety" Version="10.3.0-preview.1.26109.11" />
    <PackageVersion Include="Microsoft.Extensions.AI.OpenAI" Version="10.3.0" />
    <PackageVersion Include="Microsoft.Extensions.Caching.Memory" Version="10.0.0" />
    <PackageVersion Include="Microsoft.Extensions.Configuration" Version="10.0.0" />
@@ -176,6 +176,8 @@
    <Project Path="samples/GettingStarted/FoundryAgents/FoundryAgents_Step13_Plugins/FoundryAgents_Step13_Plugins.csproj" />
    <Project Path="samples/GettingStarted/FoundryAgents/FoundryAgents_Step14_CodeInterpreter/FoundryAgents_Step14_CodeInterpreter.csproj" />
    <Project Path="samples/GettingStarted/FoundryAgents/FoundryAgents_Step15_ComputerUse/FoundryAgents_Step15_ComputerUse.csproj" />
+    <Project Path="samples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step01_RedTeaming/FoundryAgents_Evaluations_Step01_RedTeaming.csproj" />
+    <Project Path="samples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step02_SelfReflection/FoundryAgents_Evaluations_Step02_SelfReflection.csproj" />
  </Folder>
  <Folder Name="/Samples/GettingStarted/ModelContextProtocol/">
    <File Path="samples/GettingStarted/ModelContextProtocol/README.md" />
@@ -0,0 +1,16 @@
+<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFrameworks>net10.0</TargetFrameworks>
+
+    <Nullable>enable</Nullable>
+    <ImplicitUsings>enable</ImplicitUsings>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Azure.AI.Projects" />
+    <PackageReference Include="Azure.Identity" />
+  </ItemGroup>
+
+</Project>
@@ -0,0 +1,100 @@
+// Copyright (c) Microsoft. All rights reserved.
+
+// This sample demonstrates how to use Azure AI Foundry's Red Teaming service to assess
+// the safety and resilience of an AI model against adversarial attacks.
+//
+// It uses the RedTeam API from Azure.AI.Projects to run automated attack simulations
+// with various attack strategies (encoding, obfuscation, jailbreaks) across multiple
+// risk categories (Violence, HateUnfairness, Sexual, SelfHarm).
+//
+// For more details, see:
+// https://learn.microsoft.com/azure/ai-foundry/concepts/ai-red-teaming-agent
+
+using Azure.AI.Projects;
+using Azure.Identity;
+
+string endpoint = Environment.GetEnvironmentVariable("AZURE_FOUNDRY_PROJECT_ENDPOINT") ?? throw new InvalidOperationException("AZURE_FOUNDRY_PROJECT_ENDPOINT is not set.");
+string deploymentName = Environment.GetEnvironmentVariable("AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME") ?? "gpt-4o-mini";
+
+Console.WriteLine("=" + new string('=', 79));
+Console.WriteLine("RED TEAMING EVALUATION SAMPLE");
+Console.WriteLine("=" + new string('=', 79));
+Console.WriteLine();
+
+// Initialize Azure credentials and clients
+// WARNING: DefaultAzureCredential is convenient for development but requires careful consideration in production.
+// In production, consider using a specific credential (e.g., ManagedIdentityCredential) to avoid
+// latency issues, unintended credential probing, and potential security risks from fallback mechanisms.
+DefaultAzureCredential credential = new();
+AIProjectClient aiProjectClient = new(new Uri(endpoint), credential);
+
+// Configure the target model for red teaming
+AzureOpenAIModelConfiguration targetConfig = new(deploymentName);
+
+// Create the red team run configuration
+RedTeam redTeamConfig = new(targetConfig)
+{
+    DisplayName = "FinancialAdvisor-RedTeam",
+    ApplicationScenario = "A financial advisor assistant that provides general financial advice and information.",
+    NumTurns = 3,
+    RiskCategories =
+    {
+        RiskCategory.Violence,
+        RiskCategory.HateUnfairness,
+        RiskCategory.Sexual,
+        RiskCategory.SelfHarm,
+    },
+    AttackStrategies =
+    {
+        AttackStrategy.Easy,
+        AttackStrategy.Moderate,
+        AttackStrategy.Jailbreak,
+    },
+};
+
+Console.WriteLine($"Target model: {deploymentName}");
+Console.WriteLine("Risk categories: Violence, HateUnfairness, Sexual, SelfHarm");
+Console.WriteLine("Attack strategies: Easy, Moderate, Jailbreak");
+Console.WriteLine($"Simulation turns: {redTeamConfig.NumTurns}");
+Console.WriteLine();
+
+// Submit the red team run to the service
+Console.WriteLine("Submitting red team run...");
+RedTeam redTeamRun = await aiProjectClient.RedTeams.CreateAsync(redTeamConfig);
+
+Console.WriteLine($"Red team run created: {redTeamRun.Name}");
+Console.WriteLine($"Status: {redTeamRun.Status}");
+Console.WriteLine();
+
+// Poll for completion
+Console.WriteLine("Waiting for red team run to complete (this may take several minutes)...");
+while (redTeamRun.Status != "Completed" && redTeamRun.Status != "Failed" && redTeamRun.Status != "Canceled")
+{
+    await Task.Delay(TimeSpan.FromSeconds(15));
+    redTeamRun = await aiProjectClient.RedTeams.GetAsync(redTeamRun.Name);
+    Console.WriteLine($"  Status: {redTeamRun.Status}");
+}
+
+Console.WriteLine();
+
+if (redTeamRun.Status == "Completed")
+{
+    Console.WriteLine("Red team run completed successfully!");
+    Console.WriteLine();
+    Console.WriteLine("Results:");
+    Console.WriteLine(new string('-', 80));
+    Console.WriteLine($"  Run name:    {redTeamRun.Name}");
+    Console.WriteLine($"  Display name: {redTeamRun.DisplayName}");
+    Console.WriteLine($"  Status:      {redTeamRun.Status}");
+
+    Console.WriteLine();
+    Console.WriteLine("Review the detailed results in the Azure AI Foundry portal:");
+    Console.WriteLine($"  {endpoint}");
+}
+else
+{
+    Console.WriteLine($"Red team run ended with status: {redTeamRun.Status}");
+}
+
+Console.WriteLine();
+Console.WriteLine(new string('=', 80));
@@ -0,0 +1,101 @@
+# Red Teaming with Azure AI Foundry (Classic)
+
+> [!IMPORTANT]
+> This sample uses the **classic Azure AI Foundry** red teaming API (`/redTeams/runs`) via `Azure.AI.Projects`. Results are viewable in the classic Foundry portal experience. The **new Foundry** portal's red teaming feature uses a different evaluation-based API that is not yet available in the .NET SDK.
+
+This sample demonstrates how to use Azure AI Foundry's Red Teaming service to assess the safety and resilience of an AI model against adversarial attacks.
+
+## What this sample demonstrates
+
+- Configuring a red team run targeting an Azure OpenAI model deployment
+- Using multiple `AttackStrategy` options (Easy, Moderate, Jailbreak)
+- Evaluating across `RiskCategory` categories (Violence, HateUnfairness, Sexual, SelfHarm)
+- Submitting a red team scan and polling for completion
+- Reviewing results in the Azure AI Foundry portal
+
+## Prerequisites
+
+Before you begin, ensure you have the following prerequisites:
+
+- .NET 10 SDK or later
+- Azure AI Foundry project (hub and project created)
+- Azure OpenAI deployment (e.g., gpt-4o or gpt-4o-mini)
+- Azure CLI installed and authenticated (for Azure credential authentication)
+
+### Regional Requirements
+
+Red teaming is only available in regions that support risk and safety evaluators:
+- **East US 2**, **Sweden Central**, **US North Central**, **France Central**, **Switzerland West**
+
+### Environment Variables
+
+Set the following environment variables:
+
+```powershell
+$env:AZURE_FOUNDRY_PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/your-project" # Replace with your Azure Foundry project endpoint
+$env:AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME="gpt-4o-mini"  # Optional, defaults to gpt-4o-mini
+```
+
+## Run the sample
+
+Navigate to the sample directory and run:
+
+```powershell
+cd dotnet/samples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step01_RedTeaming
+dotnet run
+```
+
+## Expected behavior
+
+The sample will:
+
+1. Configure a `RedTeam` run targeting the specified model deployment
+2. Define risk categories and attack strategies
+3. Submit the scan to Azure AI Foundry's Red Teaming service
+4. Poll for completion (this may take several minutes)
+5. Display the run status and direct you to the Azure AI Foundry portal for detailed results
+
+## Understanding Red Teaming
+
+### Attack Strategies
+
+| Strategy | Description |
+|----------|-------------|
+| Easy | Simple encoding/obfuscation attacks (ROT13, Leetspeak, etc.) |
+| Moderate | Moderate complexity attacks requiring an LLM for orchestration |
+| Jailbreak | Crafted prompts designed to bypass AI safeguards (UPIA) |
+
+### Risk Categories
+
+| Category | Description |
+|----------|-------------|
+| Violence | Content related to violence |
+| HateUnfairness | Hate speech or unfair content |
+| Sexual | Sexual content |
+| SelfHarm | Self-harm related content |
+
+### Interpreting Results
+
+- Results are available in the Azure AI Foundry portal (**classic view** — toggle at top-right) under the red teaming section
+- Lower Attack Success Rate (ASR) is better — target ASR < 5% for production
+- Review individual attack conversations to understand vulnerabilities
+
+### Current Limitations
+
+> [!NOTE]
+> - The .NET Red Teaming API (`Azure.AI.Projects`) currently supports targeting **model deployments only** via `AzureOpenAIModelConfiguration`. The `AzureAIAgentTarget` type exists in the SDK but is consumed by the **Evaluation Taxonomy** API (`/evaluationtaxonomies`), not by the Red Teaming API (`/redTeams/runs`).
+> - Agent-targeted red teaming with agent-specific risk categories (Prohibited actions, Sensitive data leakage, Task adherence) is documented in the [concept docs](https://learn.microsoft.com/azure/ai-foundry/concepts/ai-red-teaming-agent) but is not yet available via the public REST API or .NET SDK.
+> - Results from this API appear in the **classic** Azure AI Foundry portal view. The new Foundry portal uses a separate evaluation-based system with `eval_*` identifiers.
+
+## Related Resources
+
+- [Azure AI Red Teaming Agent](https://learn.microsoft.com/azure/ai-foundry/concepts/ai-red-teaming-agent)
+- [RedTeam .NET API Reference](https://learn.microsoft.com/dotnet/api/azure.ai.projects.redteam?view=azure-dotnet-preview)
+- [Risk and Safety Evaluations](https://learn.microsoft.com/azure/ai-foundry/concepts/evaluation-metrics-built-in#risk-and-safety-evaluators)
+
+## Next Steps
+
+After running red teaming:
+1. Review attack results and strengthen agent guardrails
+2. Explore the Self-Reflection sample (FoundryAgents_Evaluations_Step02_SelfReflection) for quality assessment
+3. Set up continuous red teaming in your CI/CD pipeline
@@ -0,0 +1,25 @@
+<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFrameworks>net10.0</TargetFrameworks>
+
+    <Nullable>enable</Nullable>
+    <ImplicitUsings>enable</ImplicitUsings>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Azure.AI.OpenAI" />
+    <PackageReference Include="Azure.AI.Projects" />
+    <PackageReference Include="Azure.Identity" />
+    <PackageReference Include="Microsoft.Extensions.AI.Evaluation" />
+    <PackageReference Include="Microsoft.Extensions.AI.Evaluation.Quality" />
+    <PackageReference Include="Microsoft.Extensions.AI.Evaluation.Safety" />
+    <PackageReference Include="Microsoft.Extensions.AI.OpenAI" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\..\..\..\src\Microsoft.Agents.AI.AzureAI\Microsoft.Agents.AI.AzureAI.csproj" />
+  </ItemGroup>
+
+</Project>
@@ -0,0 +1,292 @@
+// Copyright (c) Microsoft. All rights reserved.
+
+// This sample demonstrates how to use Microsoft.Extensions.AI.Evaluation.Quality to evaluate
+// an Agent Framework agent's response quality with a self-reflection loop.
+//
+// It uses GroundednessEvaluator, RelevanceEvaluator, and CoherenceEvaluator to score responses,
+// then iteratively asks the agent to improve based on evaluation feedback.
+//
+// Based on: Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023)
+// Reference: https://arxiv.org/abs/2303.11366
+//
+// For more details, see:
+// https://learn.microsoft.com/dotnet/ai/evaluation/libraries
+
+using Azure.AI.OpenAI;
+using Azure.AI.Projects;
+using Azure.Identity;
+using Microsoft.Agents.AI;
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.AI.Evaluation;
+using Microsoft.Extensions.AI.Evaluation.Quality;
+using Microsoft.Extensions.AI.Evaluation.Safety;
+
+using ChatMessage = Microsoft.Extensions.AI.ChatMessage;
+using ChatRole = Microsoft.Extensions.AI.ChatRole;
+
+string endpoint = Environment.GetEnvironmentVariable("AZURE_FOUNDRY_PROJECT_ENDPOINT") ?? throw new InvalidOperationException("AZURE_FOUNDRY_PROJECT_ENDPOINT is not set.");
+string deploymentName = Environment.GetEnvironmentVariable("AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME") ?? "gpt-4o-mini";
+string openAiEndpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT") ?? throw new InvalidOperationException("AZURE_OPENAI_ENDPOINT is not set.");
+string evaluatorDeploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT_NAME") ?? deploymentName;
+
+Console.WriteLine("=" + new string('=', 79));
+Console.WriteLine("SELF-REFLECTION EVALUATION SAMPLE");
+Console.WriteLine("=" + new string('=', 79));
+Console.WriteLine();
+
+// Initialize Azure credentials and client
+// WARNING: DefaultAzureCredential is convenient for development but requires careful consideration in production.
+// In production, consider using a specific credential (e.g., ManagedIdentityCredential) to avoid
+// latency issues, unintended credential probing, and potential security risks from fallback mechanisms.
+DefaultAzureCredential credential = new();
+AIProjectClient aiProjectClient = new(new Uri(endpoint), credential);
+
+// Set up the LLM-based chat client for quality evaluators
+IChatClient chatClient = new AzureOpenAIClient(new Uri(openAiEndpoint), credential)
+    .GetChatClient(evaluatorDeploymentName)
+    .AsIChatClient();
+
+// Configure evaluation: quality evaluators use the LLM, safety evaluators use Azure AI Foundry
+ContentSafetyServiceConfiguration safetyConfig = new(
+    credential: credential,
+    endpoint: new Uri(endpoint));
+
+ChatConfiguration chatConfiguration = safetyConfig.ToChatConfiguration(
+    originalChatConfiguration: new ChatConfiguration(chatClient));
+
+// Create a test agent
+AIAgent agent = await aiProjectClient.CreateAIAgentAsync(
+    name: "KnowledgeAgent",
+    model: deploymentName,
+    instructions: "You are a helpful assistant. Answer questions accurately based on the provided context.");
+Console.WriteLine($"Created agent: {agent.Name}");
+Console.WriteLine();
+
+// Example question and grounding context
+const string Question = """
+    What are the main benefits of using Azure AI Foundry for building AI applications?
+    """;
+
+const string Context = """
+    Azure AI Foundry is a comprehensive platform for building, deploying, and managing AI applications.
+    Key benefits include:
+    1. Unified development environment with support for multiple AI frameworks and models
+    2. Built-in safety and security features including content filtering and red teaming tools
+    3. Scalable infrastructure that handles deployment and monitoring automatically
+    4. Integration with Azure services like Azure OpenAI, Cognitive Services, and Machine Learning
+    5. Evaluation tools for assessing model quality, safety, and performance
+    6. Support for RAG (Retrieval-Augmented Generation) patterns with vector search
+    7. Enterprise-grade compliance and governance features
+    """;
+
+Console.WriteLine("Question:");
+Console.WriteLine(Question);
+Console.WriteLine();
+
+// Run evaluations
+try
+{
+    await RunSelfReflectionWithGroundedness(agent, Question, Context, chatConfiguration);
+    await RunQualityEvaluation(agent, Question, Context, chatConfiguration);
+    await RunCombinedQualityAndSafetyEvaluation(agent, Question, chatConfiguration);
+}
+finally
+{
+    // Cleanup
+    await aiProjectClient.Agents.DeleteAgentAsync(agent.Name);
+    Console.WriteLine();
+    Console.WriteLine("Cleanup: Agent deleted.");
+}
+
+// ============================================================================
+// Implementation Functions
+// ============================================================================
+
+static async Task RunSelfReflectionWithGroundedness(
+    AIAgent agent, string question, string context, ChatConfiguration chatConfiguration)
+{
+    Console.WriteLine("Running Self-Reflection with Groundedness Evaluation...");
+    Console.WriteLine();
+
+    GroundednessEvaluator groundednessEvaluator = new();
+    GroundednessEvaluatorContext groundingContext = new(context);
+
+    const int MaxReflections = 3;
+    double bestScore = 0;
+
+    string currentPrompt = $"Context: {context}\n\nQuestion: {question}";
+
+    for (int i = 0; i < MaxReflections; i++)
+    {
+        Console.WriteLine($"Iteration {i + 1}/{MaxReflections}:");
+        Console.WriteLine(new string('-', 40));
+
+        // Create a new session for each reflection iteration so that
+        // conversation context does not carry over between runs. This keeps
+        // each evaluation independent and avoids biasing groundedness scores.
+        AgentSession session = await agent.CreateSessionAsync();
+        AgentResponse agentResponse = await agent.RunAsync(currentPrompt, session);
+        string responseText = agentResponse.Text;
+
+        Console.WriteLine($"Response: {responseText[..Math.Min(150, responseText.Length)]}...");
+
+        List<ChatMessage> messages =
+        [
+            new(ChatRole.User, currentPrompt),
+        ];
+        ChatResponse chatResponse = new(new ChatMessage(ChatRole.Assistant, responseText));
+
+        EvaluationResult result = await groundednessEvaluator.EvaluateAsync(
+            messages,
+            chatResponse,
+            chatConfiguration,
+            additionalContext: [groundingContext]);
+
+        NumericMetric groundedness = result.Get<NumericMetric>(GroundednessEvaluator.GroundednessMetricName);
+        double score = groundedness.Value ?? 0;
+        string rating = groundedness.Interpretation?.Rating.ToString() ?? "N/A";
+
+        Console.WriteLine($"Groundedness score: {score:F1}/5 (Rating: {rating})");
+        Console.WriteLine();
+
+        if (score > bestScore)
+        {
+            bestScore = score;
+        }
+
+        if (score >= 4.0 || i == MaxReflections - 1)
+        {
+            if (score >= 4.0)
+            {
+                Console.WriteLine("Good groundedness achieved!");
+            }
+
+            break;
+        }
+
+        // Ask for improvement in the next iteration, including the previous response
+        // so the LLM knows what to improve on (each iteration uses a new session).
+        currentPrompt = $"""
+            Context: {context}
+
+            Your previous answer scored {score}/5 on groundedness.
+            Your previous answer was:
+            {responseText}
+
+            Please improve your answer to be more grounded in the provided context.
+            Only include information that is directly supported by the context.
+
+            Question: {question}
+            """;
+        Console.WriteLine("Requesting improvement...");
+        Console.WriteLine();
+    }
+
+    Console.WriteLine($"Best groundedness score: {bestScore:F1}/5");
+    Console.WriteLine(new string('=', 80));
+    Console.WriteLine();
+}
+
+static async Task RunQualityEvaluation(
+    AIAgent agent, string question, string context, ChatConfiguration chatConfiguration)
+{
+    Console.WriteLine("Running Quality Evaluation (Relevance, Coherence, Groundedness)...");
+    Console.WriteLine();
+
+    IEvaluator[] evaluators =
+    [
+        new RelevanceEvaluator(),
+        new CoherenceEvaluator(),
+        new GroundednessEvaluator(),
+    ];
+
+    CompositeEvaluator compositeEvaluator = new(evaluators);
+    GroundednessEvaluatorContext groundingContext = new(context);
+
+    string prompt = $"Context: {context}\n\nQuestion: {question}";
+
+    AgentSession session = await agent.CreateSessionAsync();
+    AgentResponse agentResponse = await agent.RunAsync(prompt, session);
+    string responseText = agentResponse.Text;
+
+    Console.WriteLine($"Response: {responseText[..Math.Min(150, responseText.Length)]}...");
+    Console.WriteLine();
+
+    List<ChatMessage> messages =
+    [
+        new(ChatRole.User, prompt),
+    ];
+    ChatResponse chatResponse = new(new ChatMessage(ChatRole.Assistant, responseText));
+
+    EvaluationResult result = await compositeEvaluator.EvaluateAsync(
+        messages,
+        chatResponse,
+        chatConfiguration,
+        additionalContext: [groundingContext]);
+
+    foreach (EvaluationMetric metric in result.Metrics.Values)
+    {
+        if (metric is NumericMetric n)
+        {
+            string rating = n.Interpretation?.Rating.ToString() ?? "N/A";
+            Console.WriteLine($"  {n.Name,-20} Score: {n.Value:F1}/5  Rating: {rating}");
+        }
+    }
+
+    Console.WriteLine(new string('=', 80));
+    Console.WriteLine();
+}
+
+static async Task RunCombinedQualityAndSafetyEvaluation(
+    AIAgent agent, string question, ChatConfiguration chatConfiguration)
+{
+    Console.WriteLine("Running Combined Quality + Safety Evaluation...");
+    Console.WriteLine();
+
+    IEvaluator[] evaluators =
+    [
+        new RelevanceEvaluator(),
+        new CoherenceEvaluator(),
+        new ContentHarmEvaluator(),
+        new ProtectedMaterialEvaluator(),
+    ];
+
+    CompositeEvaluator compositeEvaluator = new(evaluators);
+
+    AgentSession session = await agent.CreateSessionAsync();
+    AgentResponse agentResponse = await agent.RunAsync(question, session);
+    string responseText = agentResponse.Text;
+
+    Console.WriteLine($"Response: {responseText[..Math.Min(150, responseText.Length)]}...");
+    Console.WriteLine();
+
+    List<ChatMessage> messages =
+    [
+        new(ChatRole.User, question), // No context in this evaluation — testing quality and safety on raw question
+    ];
+    ChatResponse chatResponse = new(new ChatMessage(ChatRole.Assistant, responseText));
+
+    EvaluationResult result = await compositeEvaluator.EvaluateAsync(
+        messages,
+        chatResponse,
+        chatConfiguration);
+
+    Console.WriteLine("Quality Metrics:");
+    foreach (EvaluationMetric metric in result.Metrics.Values)
+    {
+        if (metric is NumericMetric n)
+        {
+            string rating = n.Interpretation?.Rating.ToString() ?? "N/A";
+            bool failed = n.Interpretation?.Failed ?? false;
+            Console.WriteLine($"  {n.Name,-25} Score: {n.Value:F1,-6} Rating: {rating,-15} Failed: {failed}");
+        }
+        else if (metric is BooleanMetric b)
+        {
+            string rating = b.Interpretation?.Rating.ToString() ?? "N/A";
+            bool failed = b.Interpretation?.Failed ?? false;
+            Console.WriteLine($"  {b.Name,-25} Value: {b.Value,-6} Rating: {rating,-15} Failed: {failed}");
+        }
+    }
+
+    Console.WriteLine(new string('=', 80));
+}
@@ -0,0 +1,118 @@
+# Self-Reflection Evaluation with Groundedness Assessment
+
+This sample demonstrates the self-reflection pattern using Agent Framework with `Microsoft.Extensions.AI.Evaluation.Quality` evaluators. The agent iteratively improves its responses based on real groundedness evaluation scores.
+
+For details on the self-reflection approach, see [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/abs/2303.11366) (NeurIPS 2023).
+
+## What this sample demonstrates
+
+- Self-reflection loop that improves responses using real `GroundednessEvaluator` scores
+- Using `RelevanceEvaluator` and `CoherenceEvaluator` for multi-metric quality assessment
+- Combining quality and safety evaluators with `CompositeEvaluator`
+- Configuring `ContentSafetyServiceConfiguration` for safety evaluators alongside LLM-based quality evaluators
+- Tracking improvement across iterations
+
+## Prerequisites
+
+Before you begin, ensure you have the following prerequisites:
+
+- .NET 10 SDK or later
+- Azure AI Foundry project (hub and project created)
+- Azure OpenAI deployment (e.g., gpt-4o or gpt-4o-mini)
+- Azure CLI installed and authenticated (for Azure credential authentication)
+
+**Note**: This demo uses Azure CLI credentials for authentication. Make sure you're logged in with `az login` and have access to the Azure Foundry resource. For more information, see the [Azure CLI documentation](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively).
+
+### Azure Resources Required
+
+1. **Azure AI Hub and Project**: Create these in the Azure Portal
+   - Follow: https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects
+2. **Azure OpenAI Deployment**: Deploy a model (e.g., gpt-4o or gpt-4o-mini)
+   - Agent model: Used to generate responses
+   - Evaluator model: Quality evaluators use an LLM; best results with GPT-4o
+3. **Azure CLI**: Install and authenticate with `az login`
+
+### Environment Variables
+
+Set the following environment variables:
+
+```powershell
+$env:AZURE_FOUNDRY_PROJECT_ENDPOINT="https://your-project.api.azureml.ms"  # Azure Foundry project endpoint
+$env:AZURE_OPENAI_ENDPOINT="https://your-openai.openai.azure.com/"         # Azure OpenAI endpoint (for quality evaluators)
+$env:AZURE_FOUNDRY_PROJECT_DEPLOYMENT_NAME="gpt-4o-mini"                   # Model deployment name
+```
+
+**Note**: For best evaluation results, use GPT-4o or GPT-4o-mini as the evaluator model. The groundedness evaluator has been tested and tuned for these models.
+
+## Run the sample
+
+Navigate to the sample directory and run:
+
+```powershell
+cd dotnet/samples/GettingStarted/FoundryAgents/FoundryAgents_Evaluations_Step02_SelfReflection
+dotnet run
+```
+
+## Expected behavior
+
+The sample runs three evaluation scenarios:
+
+### 1. Self-Reflection with Groundedness
+- Asks a question with grounding context
+- Evaluates response groundedness using `GroundednessEvaluator`
+- If score is below 4/5, asks the agent to improve with feedback
+- Repeats up to 3 iterations
+- Tracks and reports the best score achieved
+
+### 2. Quality Evaluation
+- Evaluates a single response with multiple quality evaluators:
+  - `RelevanceEvaluator` — is the response relevant to the question?
+  - `CoherenceEvaluator` — is the response logically coherent?
+  - `GroundednessEvaluator` — is the response grounded in the provided context?
+
+### 3. Combined Quality + Safety Evaluation
+- Runs both quality and safety evaluators together:
+  - `RelevanceEvaluator`, `CoherenceEvaluator` (quality)
+  - `ContentHarmEvaluator` (safety — violence, hate, sexual, self-harm)
+  - `ProtectedMaterialEvaluator` (safety — copyrighted content detection)
+
+## Understanding the Evaluation
+
+### Groundedness Score (1-5 scale)
+
+The `GroundednessEvaluator` measures how well the agent's response is grounded in the provided context:
+
+- **5** = Excellent - Response is fully grounded in context
+- **4** = Good - Mostly grounded with minor deviations
+- **3** = Fair - Partially grounded but includes unsupported claims
+- **2** = Poor - Significant amount of ungrounded content
+- **1** = Very Poor - Response is largely unsupported by context
+
+### Self-Reflection Process
+
+1. **Initial Response**: Agent generates answer based on question + context
+2. **Evaluation**: `GroundednessEvaluator` scores the response (1-5)
+3. **Feedback**: If score < 4, agent receives the score and is asked to improve
+4. **Iteration**: Process repeats until good score or max iterations
+
+## Best Practices
+
+1. **Provide Complete Context**: Ensure grounding context contains all information needed to answer the question
+2. **Clear Instructions**: Give the agent clear instructions about staying grounded in context
+3. **Use Quality Models**: GPT-4o recommended for evaluation tasks
+4. **Multiple Evaluators**: Use combination of evaluators (groundedness + relevance + coherence)
+5. **Batch Processing**: For production, process multiple questions in batch
+
+## Related Resources
+
+- [Reflexion Paper (NeurIPS 2023)](https://arxiv.org/abs/2303.11366)
+- [Microsoft.Extensions.AI.Evaluation Libraries](https://learn.microsoft.com/dotnet/ai/evaluation/libraries)
+- [GroundednessEvaluator API Reference](https://learn.microsoft.com/dotnet/api/microsoft.extensions.ai.evaluation.quality.groundednessevaluator)
+- [Azure AI Foundry Evaluation Service](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/evaluate-sdk)
+
+## Next Steps
+
+After running self-reflection evaluation:
+1. Implement similar patterns for other quality metrics (relevance, coherence, fluency)
+2. Integrate into CI/CD pipeline for continuous quality assurance
+3. Explore the Safety Evaluation sample (FoundryAgents_Evaluations_Step01_RedTeaming) for content safety assessment
@@ -60,6 +60,17 @@ Before you begin, ensure you have the following prerequisites:
 |[Computer use](./FoundryAgents_Step15_ComputerUse/)|This sample demonstrates how to use computer use capabilities with a Foundry agent|
 |[Local MCP](./FoundryAgents_Step27_LocalMCP/)|This sample demonstrates how to use a local MCP client with a Foundry agent|

+## Evaluation Samples
+
+Evaluation is critical for building trustworthy and high-quality AI applications. The evaluation samples demonstrate how to assess agent safety, quality, and performance using Azure AI Foundry's evaluation capabilities.
+
+|Sample|Description|
+|---|---|
+|[Red Team Evaluation](./FoundryAgents_Evaluations_Step01_RedTeaming/)|This sample demonstrates how to use Azure AI Foundry's Red Teaming service to assess model safety against adversarial attacks|
+|[Self-Reflection with Groundedness](./FoundryAgents_Evaluations_Step02_SelfReflection/)|This sample demonstrates the self-reflection pattern where agents iteratively improve responses based on groundedness evaluation|
+
+For details on safety evaluation, see the [Red Team Evaluation README](./FoundryAgents_Evaluations_Step01_RedTeaming/README.md).
+
 ## Running the samples from the console

 To run the samples, navigate to the desired sample directory, e.g.