mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Ben Thomas 981726cc15 .NET: feat(evals): add ground_truth/expected_output support for workflow evaluation (#5755 )

* .NET: feat(evals): add ground_truth/expected_output support for workflow eval

Brings .NET to parity with Python PR #5234 for issue #5135:

- Add expectedOutput parameter to Run.EvaluateAsync (workflow) and stamp on the overall EvalItem.ExpectedOutput.
- Map EvalItem.ExpectedOutput -> ground_truth in the Foundry JSONL payload, item_schema, and data_mapping for similarity.
- Add GroundTruthEvaluators set (currently builtin.similarity) and a FindMissingGroundTruthEvaluators helper.
- Fail fast with InvalidOperationException when a ground-truth evaluator is selected but no item provides an ExpectedOutput, instead of surfacing a remote provider error.
- Add tests in FoundryEvalConverterTests and WorkflowEvaluationTests.
- Add Evaluation_WorkflowExpectedOutputs sample (workflow + Foundry similarity).

Fixes microsoft/agent-framework#5135 (.NET side).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: relax BuildOverallItem events to IReadOnlyList<WorkflowEvent>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sample: disable per-agent breakdown when using reference-based evaluator

Per-agent EvalItems are intentionally left without ExpectedOutput, so the new fail-fast validation in FoundryEvals would throw when Similarity is invoked for per-agent items. Pass includePerAgent: false in the workflow + similarity sample, and document this gotcha in the EvaluateAsync XML doc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix BuildOverallItem: fall back to last ExecutorCompletedEvent

AgentResponseEvent is only emitted when AIAgentHostOptions.EmitAgentResponseEvents is enabled, which is not the default for WorkflowBuilder(agent).AddEdge(...). When it is absent, fall back to the last non-internal ExecutorCompletedEvent whose Data is an AgentResponse / ChatMessage / string so the overall EvalItem (and any expectedOutput) is produced. Without this, samples wired up the standard way returned 0 evaluation items.

Update test to cover the fallback path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sample: enable EmitAgentResponseEvents; eval throws clear error when no overall response found

Root cause of '0 results': AIAgentHostExecutor only emits AgentResponseEvent when AIAgentHostOptions.EmitAgentResponseEvents is true (default false). For ordinary AIAgent executors the runtime's ExecutorCompletedEvent.Data is null, so the prior fallback couldn't find a final response either.

Sample now builds executors with EmitAgentResponseEvents=true via BindAsExecutor(hostOptions). EvaluateAsync now throws InvalidOperationException with a remediation hint when the user supplies expectedOutput but no overall final response can be located, instead of silently returning 0/0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Guard against null sample/error/usage/datasource_item in ParseDetailedItem

Foundry eval responses can have these properties present with JSON null
or non-object values, which caused JsonElement.TryGetProperty to throw
'requires Object, has Null'. Check ValueKind == Object before drilling in.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: reorder expectedOutput, tighten ground-truth check, add fail-fast test

* WorkflowEvaluationExtensions.EvaluateAsync: move 'expectedOutput' to
  after 'splitter' so the original positional contract of (splitter,
  cancellationToken) is preserved for existing callers.
* FoundryEvals: require ALL items to carry ExpectedOutput when a
  ground-truth evaluator is selected (e.g. similarity), not just any.
  Reference-based evaluators score per-item, so a single missing GT
  would still surface as a provider-side validation error. Updated
  fail-fast message accordingly.
* WorkflowEvaluationTests: add EvaluateAsync_WithExpectedOutputButNoFinalResponse_ThrowsAsync
  to verify the InvalidOperationException is thrown (and that the
  message mentions EmitAgentResponseEvents).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fail-fast on missing overall item regardless of expectedOutput; harden BuildOverallItem default

* EvaluateAsync now throws InvalidOperationException whenever 'includeOverall'
  is requested but BuildOverallItem cannot produce an item, instead of only
  when 'expectedOutput' is supplied. Same misconfiguration (agents not bound
  with EmitAgentResponseEvents) used to silently return empty results — now
  it surfaces a clear, actionable error in both cases.
* BuildOverallItem switch default now throws instead of returning null. The
  preceding for-loop already constrains Data to AgentResponse/ChatMessage/
  string, so reaching default would indicate a contract drift; throw to make
  the bug visible.
* Test renamed and broadened to verify the throw fires without expectedOutput.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

981726cc15 · 2026-05-13 19:03:27 +00:00

History

_StartHere

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Agents

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Checkpoint

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Concurrent

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

ConditionalEdges

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Declarative

.NET: Add declarative HttpRequestAction sample (#5572 )

2026-04-29 19:19:31 +00:00

Evaluation

.NET: feat(evals): add ground_truth/expected_output support for workflow evaluation (#5755 )

2026-05-13 19:03:27 +00:00

HumanInTheLoop/HumanInTheLoopBasic

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Loop

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Observability

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Orchestration/Handoff

.NET: Add Handoff sample (#5245 )

2026-04-16 20:02:31 +00:00

Resources

Python / .NET Samples - Restructure and Improve Samples (Feature Branc… (#4092 )

2026-02-26 00:56:10 +00:00

SharedStates

.NET: Add error checking to workflow samples (#5175 )

2026-04-16 20:03:16 +00:00

Visualization

Python / .NET Samples - Restructure and Improve Samples (Feature Branc… (#4092 )

2026-02-26 00:56:10 +00:00

README.md

.NET: Add Handoff sample (#5245 )

2026-04-16 20:02:31 +00:00

README.md

Workflow Getting Started Samples

The workflow samples demonstrate the fundamental concepts and functionality of workflows in Agent Framework.

Samples Overview

Foundational Concepts - Start Here

Please begin with the Start Here samples in order. These three samples introduce the core concepts of executors, edges, agents in workflows, streaming, and workflow construction.

The folder name starts with an underscore (_StartHere) to ensure it appears first in the explorer view.

Sample	Concepts
Streaming	Extends workflows with event streaming
Agents	Use agents in workflows
Agentic Workflow Patterns	Demonstrates common agentic workflow patterns
Multi-Service Workflows	Shows using multiple AI services in the same workflow
Sub-Workflows	Demonstrates composing workflows hierarchically by embedding workflows as executors
Mixed Workflow with Agents and Executors	Shows how to mix agents and executors with adapter pattern for type conversion and protocol handling
Writer-Critic Workflow	Demonstrates iterative refinement with quality gates, max iteration safety, multiple message handlers, and conditional routing for feedback loops

Once completed, please proceed to the other samples listed below.

Agents

Sample	Concepts
Foundry Agents in Workflows	Demonstrates using Microsoft Foundry agents in a workflow through `ChatClientAgent`
Custom Agent Executors	Shows how to create a custom agent executor for more complex scenarios
Workflow as an Agent	Illustrates how to encapsulate a workflow as an agent
Group Chat with Tool Approval	Shows multi-agent group chat with tool approval requests and human-in-the-loop interaction

Concurrent Execution

Sample	Concepts
Fan-Out and Fan-In	Introduces parallel processing with fan-out and fan-in patterns

Loop

Sample	Concepts
Looping	Shows how to create a loop within a workflow

Workflow Shared States

Sample	Concepts
Shared States	Demonstrates shared states between executors for data sharing and coordination

Conditional Edges

Sample	Concepts
Edge Conditions	Introduces conditional edges for dynamic routing based on executor outputs
Switch-Case Routing	Extends conditional edges with switch-case routing for multiple paths
Multi-Selection Routing	Demonstrates multi-selection routing where one executor can trigger multiple downstream executors

Orchestration Patterns

Sample	Concepts
Handoff Orchestration	Introduces the Handoff Orchestration pattern