mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Ben Thomas 981726cc15 .NET: feat(evals): add ground_truth/expected_output support for workflow evaluation (#5755 )

* .NET: feat(evals): add ground_truth/expected_output support for workflow eval

Brings .NET to parity with Python PR #5234 for issue #5135:

- Add expectedOutput parameter to Run.EvaluateAsync (workflow) and stamp on the overall EvalItem.ExpectedOutput.
- Map EvalItem.ExpectedOutput -> ground_truth in the Foundry JSONL payload, item_schema, and data_mapping for similarity.
- Add GroundTruthEvaluators set (currently builtin.similarity) and a FindMissingGroundTruthEvaluators helper.
- Fail fast with InvalidOperationException when a ground-truth evaluator is selected but no item provides an ExpectedOutput, instead of surfacing a remote provider error.
- Add tests in FoundryEvalConverterTests and WorkflowEvaluationTests.
- Add Evaluation_WorkflowExpectedOutputs sample (workflow + Foundry similarity).

Fixes microsoft/agent-framework#5135 (.NET side).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review: relax BuildOverallItem events to IReadOnlyList<WorkflowEvent>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sample: disable per-agent breakdown when using reference-based evaluator

Per-agent EvalItems are intentionally left without ExpectedOutput, so the new fail-fast validation in FoundryEvals would throw when Similarity is invoked for per-agent items. Pass includePerAgent: false in the workflow + similarity sample, and document this gotcha in the EvaluateAsync XML doc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix BuildOverallItem: fall back to last ExecutorCompletedEvent

AgentResponseEvent is only emitted when AIAgentHostOptions.EmitAgentResponseEvents is enabled, which is not the default for WorkflowBuilder(agent).AddEdge(...). When it is absent, fall back to the last non-internal ExecutorCompletedEvent whose Data is an AgentResponse / ChatMessage / string so the overall EvalItem (and any expectedOutput) is produced. Without this, samples wired up the standard way returned 0 evaluation items.

Update test to cover the fallback path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Sample: enable EmitAgentResponseEvents; eval throws clear error when no overall response found

Root cause of '0 results': AIAgentHostExecutor only emits AgentResponseEvent when AIAgentHostOptions.EmitAgentResponseEvents is true (default false). For ordinary AIAgent executors the runtime's ExecutorCompletedEvent.Data is null, so the prior fallback couldn't find a final response either.

Sample now builds executors with EmitAgentResponseEvents=true via BindAsExecutor(hostOptions). EvaluateAsync now throws InvalidOperationException with a remediation hint when the user supplies expectedOutput but no overall final response can be located, instead of silently returning 0/0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Guard against null sample/error/usage/datasource_item in ParseDetailedItem

Foundry eval responses can have these properties present with JSON null
or non-object values, which caused JsonElement.TryGetProperty to throw
'requires Object, has Null'. Check ValueKind == Object before drilling in.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR review: reorder expectedOutput, tighten ground-truth check, add fail-fast test

* WorkflowEvaluationExtensions.EvaluateAsync: move 'expectedOutput' to
  after 'splitter' so the original positional contract of (splitter,
  cancellationToken) is preserved for existing callers.
* FoundryEvals: require ALL items to carry ExpectedOutput when a
  ground-truth evaluator is selected (e.g. similarity), not just any.
  Reference-based evaluators score per-item, so a single missing GT
  would still surface as a provider-side validation error. Updated
  fail-fast message accordingly.
* WorkflowEvaluationTests: add EvaluateAsync_WithExpectedOutputButNoFinalResponse_ThrowsAsync
  to verify the InvalidOperationException is thrown (and that the
  message mentions EmitAgentResponseEvents).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fail-fast on missing overall item regardless of expectedOutput; harden BuildOverallItem default

* EvaluateAsync now throws InvalidOperationException whenever 'includeOverall'
  is requested but BuildOverallItem cannot produce an item, instead of only
  when 'expectedOutput' is supplied. Same misconfiguration (agents not bound
  with EmitAgentResponseEvents) used to silently return empty results — now
  it surfaces a clear, actionable error in both cases.
* BuildOverallItem switch default now throws instead of returning null. The
  preceding for-loop already constrains Data to AgentResponse/ChatMessage/
  string, so reaching default would indicate a contract drift; throw to make
  the bug visible.
* Test renamed and broadened to verify the throw fires without expectedOutput.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: alliscode <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

981726cc15 · 2026-05-13 19:03:27 +00:00

History

01-get-started

Fix typo:sesionEleme -> sessionElement (#5674 )

2026-05-07 21:05:41 +00:00

02-agents

Fixing FoundryToolboxMcp sample to use created toolbox. (#5786 )

2026-05-13 16:53:16 +00:00

03-workflows

.NET: feat(evals): add ground_truth/expected_output support for workflow evaluation (#5755 )

2026-05-13 19:03:27 +00:00

04-hosting

.NET: Hosted Agents - RAG Sample with Azure AI Search (#5693 ) (#5701 )

2026-05-11 13:59:42 +00:00

05-end-to-end

.NET: Fix OpenAIResponsesAgentClient to include agentName in endpoint path (#5748 )

2026-05-12 17:16:47 +00:00

.editorconfig

Update analyzers for .NET 10 SDK (#2611 )

2025-12-10 10:34:56 +00:00

AGENTS.md

.NET: Update models used in dotnet samples to gpt-5.4-mini (#5080 )

2026-04-07 15:34:00 +00:00

Directory.Build.props

.NET: Simplify store=false scenario for responses (#4124 )

2026-02-23 12:24:32 +00:00

README.md

.NET: [Breaking] Migrate A2A agent and hosting to A2A SDK v1 (#5423 )

2026-04-23 07:53:00 +00:00

README.md

Agent Framework Samples

The agent framework samples are designed to help you get started with building AI-powered agents from various providers.

The Agent Framework supports building agents using various inference and inference-style services. All these are supported using the single ChatClientAgent class.

The Agent Framework also supports creating proxy agents, that allow accessing remote agents as if they were local agents. These are supported using various AIAgent subclasses.

Sample Structure

Folder	Description
`01-get-started/`	Progressive tutorial: hello agent → hosting
`02-agents/`	Deep-dive by concept: tools, middleware, providers, orchestrations
`03-workflows/`	Workflow patterns: sequential, concurrent, state, declarative
`04-hosting/`	Deployment: Azure Functions, Durable Tasks
`05-end-to-end/`	Full applications, evaluation, demos

Getting Started

Start with 01-get-started/ and work through the numbered files:

01_hello_agent — Create and run your first agent
02_add_tools — Add function tools
03_multi_turn — Multi-turn conversations with AgentSession
04_memory — Agent memory with AIContextProvider
05_first_workflow — Build a workflow with executors and edges
06_host_your_agent — Host your agent via Azure Functions

Additional Samples

Some additional samples of note include:

Agents: Basic steps to get started with the agent framework. These samples demonstrate the fundamental concepts and functionalities of the agent framework when using the AIAgent and can be used with any underlying service that provides an AIAgent implementation.
Agent Providers: Shows how to create an AIAgent instance for a selection of providers.
Agent Telemetry: Demo which showcases the integration of OpenTelemetry with the Microsoft Agent Framework using Azure OpenAI and .NET Aspire Dashboard for telemetry visualization.
Durable Agents - Azure Functions: Samples for using the Microsoft Agent Framework with Azure Functions via the durable task extension.
Durable Agents - Console Apps: Samples demonstrating durable agents in console applications.

Migration from Semantic Kernel

If you are migrating from Semantic Kernel to the Microsoft Agent Framework, the following resources provide guidance and side-by-side examples to help you transition your existing agents, tools, and orchestration patterns. The migration samples map Semantic Kernel primitives (such as ChatCompletionAgent and Team orchestrations) to their Agent Framework equivalents (such as ChatClientAgent and workflow builders).

For an in-depth migration guide, see the official migration documentation.

Prerequisites

For prerequisites see each set of samples for their specific requirements.