* Foundry Evals integration for .NET
- Core evaluation framework: EvalItem, LocalEvaluator, FunctionEvaluator, EvalChecks
- IAgentEvaluator interface with MeaiEvaluatorAdapter bridge
- AgentEvaluationExtensions for agent.EvaluateAsync() overloads
- FoundryEvals wrapping MEAI quality/safety evaluators
- ConversationSplitters (LastTurn, Full) and IConversationSplitter
- EvalItem.PerTurnItems() for multi-turn decomposition
- HasImageContent for multimodal content detection
- WorkflowEvaluationExtensions for per-agent workflow evaluation
- 7 eval samples mirroring Python parity:
02-agents/Evaluation: SimpleEval, ExpectedOutputs, Multimodal
03-workflows/Evaluation: WorkflowEval
05-end-to-end/Evaluation: FoundryQuality, MixedProviders, ConversationSplits
- Comprehensive unit tests (1958 passing)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rewrite FoundryEvals to use real Foundry Evals API
Replace MEAI evaluator shim with actual OpenAI EvaluationClient protocol
methods. FoundryEvals now creates eval definitions, submits runs, polls
for completion, and fetches per-item results server-side.
- New constructor: FoundryEvals(AIProjectClient, model, evaluators)
- Add FoundryEvalConverter for MEAI ChatMessage -> Foundry JSON format
- Add EvalId, RunId, ReportUrl to AgentEvaluationResults
- All 20 built-in evaluator constants now work (agent, tool, quality, safety)
- Remove Microsoft.Extensions.AI.Evaluation.Quality/Safety dependencies
- Update all samples for new constructor (no more ChatConfiguration)
- Replace BuildEvaluators tests with ResolveEvaluator tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add response output to CustomEvals and ExpectedOutputs samples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address review: pagination, validation, error handling, tests
FoundryEvals fixes:
- Add pagination for output items (has_more/after cursor)
- Add guard clauses for pollIntervalSeconds/timeoutSeconds <= 0
- Fix double TryGetProperty for passed field parsing
- Throw on all-tool-evaluators with no tool definitions
- Fix XML doc (default 300s, not 180s)
New tests (30 added, 1989 total):
- EvalChecks: NonEmpty, ContainsExpected (pass/fail/skip/case),
HasImageContent, ToolCallsPresent
- FoundryEvalConverter: ConvertMessage (text, image, function call,
function results fan-out, empty fallback, mixed content),
ConvertEvalItem, BuildTestingCriteria (quality/agent/tool/groundedness
data mappings), BuildItemSchema
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix review: null-refs, Data.ToString() bug, ContainsExpected, add tests
- Fix NullReferenceException in sample Response display (pattern matching)
- Fix WorkflowEvaluationExtensions Data?.ToString() producing type names
instead of message text (pattern-match ChatMessage/AgentResponse/list)
- Change EvalChecks.ContainsExpected to return Passed=false when no
ExpectedOutput (was silently passing, masking misconfiguration)
- Add EvalItem constructor tests with LastTurn/Full/null splitters
- Add FoundryEvalConverter.ConvertMessage DataContent (base64 image) test
- Add ExtractAgentData tests with ChatMessage, list, and AgentResponse data
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix review: conversation fidelity, eval caching, fallback tests
- WorkflowEvaluationExtensions: preserve full response messages (tool calls,
intermediate) instead of synthetic 2-message conversation. Cast completed
Data to AgentResponse and use Messages when available, fallback to text.
- FoundryEvals: cache evalId per schema shape (hasContext, hasTools) so
subsequent EvaluateAsync calls create runs under the same eval definition.
- MeaiEvaluatorAdapter: code already correctly passes queryMessages (not full
conversation) to IEvaluator — no change needed, verified by inspection.
- Add tests: AgentResponse full messages preservation, unknown object
ToString() fallback for ExtractAgentData.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rename AzureAI→Foundry: move eval files, update references
- Move FoundryEvals.cs and FoundryEvalConverter.cs from
Microsoft.Agents.AI.AzureAI to Microsoft.Agents.AI.Foundry
- Update namespace from AzureAI to Foundry in both files
- Add explicit usings required by Foundry project (no implicit usings)
- Move FoundryEvalConverter tests to Foundry.UnitTests project
(avoids ReplacingRedactor type conflict from dual project refs)
- Update all sample csproj references and using statements
- Remove Foundry project reference from AI UnitTests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* PR review round 4: wire up tool extraction, remove eval cache, fix null safety
- BuildEvalItem: extract tools from agent via GetService<ChatOptions>() into EvalItem.Tools (Python parity)
- FoundryEvals: remove eval ID cache - each call creates fresh definition (matches Python behavior)
- FoundryEvals: replace null-forgiving operators with descriptive InvalidOperationException
- MixedProviders sample: remove unnecessary explicit PackageReferences (transitively provided)
- FoundryEvalConverter: document that tool results take precedence over text content
- Add LocalEvaluator zero-checks test documenting 0 metrics = failed behavior
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Python-dotnet parity: 9 feature gaps filled
New checks:
- ToolCallArgsMatch() — verify tool call names + argument subset match
- ToolCalledCheck(ToolCalledMode.Any, ...) — match any of the specified tools
- ToolCalledMode enum (All/Any)
FoundryEvals enhancements:
- Default evaluators now [Relevance, Coherence, TaskAdherence] (was Relevance, Coherence)
- Auto-add ToolCallAccuracy when items have tool definitions
- EvaluateTracesAsync — evaluate by response_ids, trace_ids, or agent_id
- EvaluateFoundryTargetAsync — evaluate deployed Foundry targets
Result type enrichment:
- AgentEvaluationResults: added Status, Error, PerEvaluator, DetailedItems
- New EvalItemResult/EvalScoreResult/PerEvaluatorResult types
- FoundryEvals populates all new fields from API responses
Workflow fix:
- Skip internal executors (_*, input-conversation, end-conversation, end)
Tests: 8 new tests covering ToolCallArgsMatch, ToolCalledMode.Any, internal executor filtering
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add MeaiEvaluatorAdapter and PerTurnItems edge case tests
- 3 tests for MeaiEvaluatorAdapter: query message forwarding, synthetic
response fallback, multiple items aggregation
- 3 tests for EvalItem.PerTurnItems: empty conversation, no user messages,
system+assistant only
- StubEvaluator and StubChatClient test helpers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Blocking link check for outdated package in DevUI.
* Replace Dictionary<string, object> payloads with typed wire models
Introduce internal FoundryEvalWireModels.cs with compile-time-safe types
for the OpenAI Evals API wire format. The OpenAI .NET SDK (2.9.1) only
provides protocol-level methods with BinaryContent/ClientResult — no
typed request models. These internal models replace scattered dictionary
literals with [JsonPropertyName]-annotated classes, giving:
- Compile-time safety (typos become build errors)
- Single point of change when the API evolves
- IntelliSense discoverability
- Cleaner serialization via JsonPolymorphic for content items
Models: WireContentItem hierarchy (text, image, tool_call, tool_result),
WireMessage, WireEvalItemPayload, WireTestingCriterion, WireItemSchema,
WireCreateEvalRequest, WireCreateRunRequest, and data source variants.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Skip metric when Foundry returns neither score nor passed
When an evaluator returns no score and no passed value, the previous
code created BooleanMetric(name, false), which falsely failed items
via ItemPassed. Now we skip the MEAI metric entirely for indeterminate
results — the raw data remains available in DetailedItems for diagnostics.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address PR #4914 review comments: fix tool evaluator bug and add tests
- Fix duplicate ToolCallAccuracy: resolve evaluator names before checking
against ToolEvaluators set (Comment 2)
- Make FilterToolEvaluators internal for testability; add tests for the
ArgumentException edge case when all evaluators are tool-type (Comment 3)
- Add CancellationToken test for LocalEvaluator (Comment 4)
- Add EvaluateAsync integration test on Run with sequential workflow and
per-agent SubResults verification (Comment 5)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address Peter's review comments on PR #4914
- Add trailing newline to Evaluation_FoundryQuality.csproj (Comment 6)
- Make evaluator name lookups case-insensitive: switch BuiltinEvaluators,
ToolEvaluators, AgentEvaluators, and ResolveEvaluator's StartsWith check
from Ordinal to OrdinalIgnoreCase (Comment 7)
- Add Trace.TraceWarning when Foundry returns fewer results than submitted
items, indicating expected vs actual count before padding (Comment 8)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add Microsoft.Extensions.AI.Evaluation packages to Directory.Packages.props
These were removed in #5269 as unused, but are needed by the Foundry
and core evaluation integration added in this PR.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: alliscode <bentho@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename authored identifiers, XML docs, log messages, and comments
from 'folder' to 'directory' across the file skills codebase for
consistency with the agentskills.io specification and .NET conventions.
Public API changes (experimental):
- ScriptFolders → ScriptDirectories
- ResourceFolders → ResourceDirectories
.NET BCL API calls (Directory.Exists, Path.GetDirectoryName, etc.)
were already using 'directory' and are unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* support reflection for discovery of resources and scripts in class-based skills
* fix format issues
* refactor samples to use reflection
* Validate resource member signatures during discovery
Add discovery-time validation in AgentClassSkill.DiscoverResources() to
fail fast when [AgentSkillResource] is applied to members with incompatible
signatures:
- Reject indexer properties (getter has parameters)
- Reject methods with parameters other than IServiceProvider or
CancellationToken
Throws InvalidOperationException with actionable error messages instead of
allowing silent runtime failures when ReadAsync invokes the AIFunction with
no named arguments.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* prevent duplicates
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* .NET: Add JsonSerializerOptions support to programmatic skill APIs
Allow callers to pass custom JsonSerializerOptions when creating inline
resources and scripts via AgentInlineSkill, AgentClassSkill,
AgentInlineSkillResource, and AgentInlineSkillScript. A skill-level
default can be set on AgentInlineSkill and overridden per-resource/
script call.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Update dotnet/tests/Microsoft.Agents.AI.UnitTests/AgentSkills/TestSkillTypes.cs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add class-based skills
* address formating issues
* Remove generated filtered-unit.slnx and add to .gitignore
The filtered solution file is generated dynamically by
eng/scripts/New-FilteredSolution.ps1 during CI. Checking it in
risks it becoming stale and out-of-sync with the real solution.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Remove generated filtered-unit.slnx and add to .gitignore
The filtered solution file is generated dynamically by
eng/scripts/New-FilteredSolution.ps1 during CI. Checking it in
risks it becoming stale and out-of-sync with the real solution.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* discover scripts and resource from folders defined in spec
* Remove Step05 and Step06 DI skill samples
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* address review comments
* fix build error
* Fix mixed path separators in skill folder discovery on .NET Framework
Path.Combine with forward-slash folder names (e.g. "scripts/f1") produces
mixed separators on Windows, causing the StartsWith containment check to
fail against Path.GetFullPath-resolved file paths. Wrap in Path.GetFullPath
to canonicalize separators before the containment comparison.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* address comment
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* add class-based skills
* address formating issues
* Remove generated filtered-unit.slnx and add to .gitignore
The filtered solution file is generated dynamically by
eng/scripts/New-FilteredSolution.ps1 during CI. Checking it in
risks it becoming stale and out-of-sync with the real solution.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Remove generated filtered-unit.slnx and add to .gitignore
The filtered solution file is generated dynamically by
eng/scripts/New-FilteredSolution.ps1 during CI. Checking it in
risks it becoming stale and out-of-sync with the real solution.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* consolidate DI samples into one
* fix file encoding
* suppress compatibility warning
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add API breaking change validation for RC packages
Enable .NET Package Validation for release candidate packages to detect
API breaking changes in CI. This follows the same pattern used by
Semantic Kernel, centralized through nuget-package.props.
Changes:
- Enable EnablePackageValidation for IsReleaseCandidate packages
- Update PackageValidationBaselineVersion to 1.0.0-rc4 (latest published)
- Generate CompatibilitySuppressions.xml for existing known API changes
in 5 packages (AI, AzureAI, OpenAI, Workflows, Workflows.Declarative.AzureAI)
- Opt out Workflows.Declarative.Mcp (not yet published to NuGet)
- Add breaking changes guidance to CONTRIBUTING.md
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Address PR review feedback
- Remove unnecessary empty PackageValidationBaselineVersion override
in Workflows.Declarative.Mcp.csproj (EnablePackageValidation=false
is sufficient)
- Tighten CONTRIBUTING.md wording to clarify opt-out possibility
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Enable package validation for GA packages (no VersionSuffix)
Expand the EnablePackageValidation condition to also cover future GA
packages that have no VersionSuffix, not just RC packages.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix EnablePackageValidation GA condition to check PackageVersion
The previous condition VersionSuffix=='' matched all packages (preview
included) since VersionSuffix defaults to empty. Now uses two separate
conditions: one for RC, one for true GA (PackageVersion == VersionPrefix).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add IsGeneralAvailable flag for package validation
Replace fragile PackageVersion condition with explicit IsGeneralAvailable
property, following the same per-project self-declaration pattern as
IsReleaseCandidate.
- Directory.Build.props: Add IsGeneralAvailable=false default
- nuget-package.props: EnablePackageValidation on RC OR GA
- CONTRIBUTING.md: Update docs to mention both flags
When packages go GA, they set IsGeneralAvailable=true in their .csproj.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rename IsGeneralAvailable to IsGenerallyAvailable
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* add inline skills
* Fix IDE1006 and IDE0004 formatting errors in test files
- Add 'Async' suffix to async test methods in FilteringAgentSkillsSourceTests,
DeduplicatingAgentSkillsSourceTests, and AgentInMemorySkillsSourceTests
- Use pragma to suppress false-positive IDE0004 on casts needed for overload
disambiguation in AgentInlineSkillTests and AgentInlineSkillResourceTests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* address issues
* address comments
* make inline skills script and resource model classes internal
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Persist messages during the Function Call Loop
* Revert version reset
* Fix bugs and improve sample
* Fix formatting issues
* Also updating conversation id during run
* Update based on ADR feedback
* Fix FileAgentSkillsProvider accepting SkillsInstructionPrompt without {0} placeholder (#4638)
BuildSkillsInstructionPrompt validated only format-string syntax via
string.Format(template, ""), which silently accepted templates without a
{0} placeholder. The generated skills list was then dropped from the final
instructions.
Tighten validation to format with a sentinel string and verify it appears
in the output, rejecting templates that do not reference argument 0 with
an ArgumentException.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix netstandard2.0 compat and simplify prompt template validation (#4638)
- Replace string.Contains(string, StringComparison) with IndexOf for
netstandard2.0/net472 compatibility
- Remove sentinel round-trip check; validate {0} directly on the raw
template string using IndexOf
- Add positive test verifying custom SkillsInstructionPrompt with {0}
is accepted and applied to output
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix filter combine logic for ChatHistoryMemoryProvider
* Replace var with explicit types in filter building code and test
Address PR review nit: use explicit types instead of var for better
readability in the filter-building logic and the new combined filter
compilation test.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix style issues
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* support script execution by code interpretor
* improve the instruction prompt
* Add DefaultAzureCredential production warning to AgentSkills samples
Add the standard three-line WARNING comment about DefaultAzureCredential
production considerations to both AgentSkills sample Program.cs files,
matching the convention used in all other GettingStarted/Agents samples.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* address pr review comments
* address feedback
* rename Skill* types to FileAgentSkill* prefix for consistency
- Rename SkillFrontmatter -> FileAgentSkillFrontmatter
- Rename SkillScriptExecutor -> FileAgentSkillScriptExecutor
- Add FileAgentSkillScriptExecutionContext and FileAgentSkillScriptExecutionDetails
- Update sample, provider, loader, and tests accordingly
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* reorder usings
* use set for props initialization instead of init
* rename HostedCodeInterpreterSkillScriptExecutor
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When converting base AgentRunOptions to ChatClientAgentRunOptions, the middleware
now preserves AllowBackgroundResponses, ContinuationToken, and AdditionalProperties
in addition to ResponseFormat.
Added unit test verifying all properties are preserved during the conversion.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add ChatClient decorator for calling AIContextProviders
* Format new files
* Address PR comments
* Revert problematic change
* Rename Use to UseAIContextProvider
Extract 11 private const string fields for vector store property names
(Key, Role, MessageId, AuthorName, ApplicationId, AgentId, UserId,
SessionId, Content, CreatedAt, ContentEmbedding) and replace all inline
usages across the collection definition, store dictionary, search result
access, and filter expressions.
Fixes#3801
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>