mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

T

Ben Thomas e0d0ad16a0 Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

* Python: feat(evals): RubricScore type + EvalScoreResult.dimensions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): parse rubric_scores from output items + assertion helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): YAML config loader + sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(evals): address PR review feedback

Addresses 4 Copilot review comments on PR #6101:

1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.

2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.

3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.

4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): hosted-agent-aware rubric generation

* Auto-detect hosted Foundry agents in agent_as_eval_source: when the
  agent's chat_client exposes a string agent_name (the convention used
  by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
  type='agent' EvalGenerationSource so the service fetches instructions
  and tools from the agent registry instead of relying on the local
  wrapper (which holds neither for hosted agents).
* Add hosted_agent_version kwarg and a new agent_version field on
  EvalGenerationSource so PromptAgent runs can pin to a specific hosted
  version for reproducible rubric generation.
* Add force_prompt_source escape hatch to bypass auto-detection and
  always emit a rendered prompt dossier - useful when the local wrapper
  carries overrides the service-side agent doesnt see.
* Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
  not dataset_name=/dataset_version=. The mismatch would raise TypeError
  against the real azure-ai-projects 2.3.0a* SDK; only unmocked
  integration paths were affected.

Tests cover: auto-detection happy path, versionless hosted agent,
explicit hosted_agent_version forwarding, force_prompt_source override,
non-string chat_client attrs (MagicMock test doubles) not mis-detected,
agent_version forwarded through _to_sdk_source, and the corrected
dataset SDK kwarg names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): accept canonical dimension_scores key per docs

The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): add manual create_rubric_evaluator

Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).

Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.

Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): manual rubric sample + namespace re-exports

Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.

Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.

Updates the foundry-evals README with a chooser entry for the two rubric paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): remove rubric creation flows; keep consumption only

Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.

Removed creation surface area:

- FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.

- EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).

- BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.

- Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.

Kept (consumption surface):

- GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.

- RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.

- _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).

Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): add evaluate_with_rubric_sample

Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): satisfy mypy on _fetch_output_items

mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link

The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry-evals): hoist repeated local imports to module top

Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

e0d0ad16a0 · 2026-06-01 23:01:56 +00:00

2,206 Commits

.devcontainer

Python: Add a HarnessAgent with available features and sample (#6041 )

2026-05-27 14:54:00 +01:00

.github

.NET - Fix missing id on function_call_output in Foundry Hosting (#6246 )

2026-06-01 18:43:45 +00:00

declarative-agents

Python: Move workflow-samples and agent-samples under declarative-agents directory (#5011 )

2026-04-02 09:34:33 +00:00

docs

.NET: Add Hosted-MemoryAgent sample with isolation key plumbing (#5692 ) (#5702 )

2026-05-15 05:42:12 +00:00

dotnet

.NET: Update hosted agents (#6243 )

2026-06-01 21:27:29 +00:00

python

Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

2026-06-01 23:01:56 +00:00

schemas

.NET: Add orchestration ID to durable agent entity state (#2137 )

2025-11-26 16:32:32 +00:00

.gitattributes

Python: Post-Fix GH008 - Re-Upload missing favicons via LFS (#1490 )

2025-10-15 16:35:50 +00:00

.gitignore

.NET: Update hosted agents (#6243 )

2026-06-01 21:27:29 +00:00

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md committed

2025-04-28 12:54:41 -07:00

COMMUNITY.md

Add community readme (#1817 )

2025-10-30 20:29:01 +00:00

CONTRIBUTING.md

Improve CONTRIBUTING.md with dev setup links and docs guidance (#5000 )

2026-03-31 16:42:52 +00:00

LICENSE

LICENSE committed

2025-04-28 12:54:43 -07:00

README.md

docs: enhance README with 1.0 features and improved structure (#5534 )

2026-05-05 01:19:57 +00:00

SECURITY.md

SECURITY.md committed

2025-04-28 12:54:42 -07:00

SUPPORT.md

.NET: Update hosted agents (#6243 )

2026-06-01 21:27:29 +00:00

TRANSPARENCY_FAQ.md

Python: docs: Update Python orchestration documentation (#2087 )

2025-11-27 00:13:15 +00:00

wf-source-gen-plan.md

[BREAKING] Obsoleting ReflectingExecutor in favor of source gen (#3380 )

2026-02-04 20:07:43 +00:00

README.md

Welcome to Microsoft Agent Framework!

Microsoft Agent Framework (MAF) is an open, multi-language framework for building production-grade AI agents and multi-agent workflows in .NET and Python.

Microsoft Agent Framework is built for teams taking agents from prototype to production. It provides a consistent foundation for building, orchestrating, and operating agent systems across Python and .NET, while keeping architecture choices open as requirements evolve, and supports a broad ecosystem including Microsoft Foundry, Azure OpenAI, OpenAI, and the GitHub Copilot SDK, with samples and hosting patterns for both local development and cloud deployment.

Watch the full Agent Framework introduction (30 min)

Is this the right framework for you?

MAF is a strong fit if you:

are building agents and workflows you expect to run in production,
need orchestration beyond a single prompt or stateless chat loop,
want graph-based patterns such as sequential, concurrent, handoff, and group collaboration,
care about durability, restartability, observability, governance, or human-in-the-loop control,
need provider flexibility so your architecture can evolve without major rewrites.

Key Features

Explore new MAF capabilities and real implementation patterns on the official blog.

Python and C#/.NET Support: Full framework support for both Python and C#/.NET implementations with consistent APIs
- Python packages | .NET source
Multiple Agent Provider Support: Support for various LLM providers with more being added continuously
- Python examples | .NET examples
Middleware: Flexible middleware system for request/response processing, exception handling, and custom pipelines
- Python middleware | .NET middleware
Orchestration Patterns & Workflows: Build multi-agent systems with graph-based workflows supporting sequential, concurrent, handoff, and group collaboration patterns; includes checkpointing, streaming, human-in-the-loop, and time-travel
- Python workflows | .NET workflows
Foundry Hosted Agents (new): Deploy and host your agents to Foundry-hosted infrastructure with just 2 additional lines of code
- Python samples | .NET samples
Observability: Built-in OpenTelemetry integration for distributed tracing, monitoring, and debugging
- Python observability | .NET telemetry
Declarative Agents: Define agents using YAML for faster setup and versioning
- Declarative agent samples
Agent Skills: Build domain-specific knowledge bases from multiple sources—files, inline code, class libraries—for agents to discover and use
- Skills design
AF Labs: Experimental packages for cutting-edge features including benchmarking, reinforcement learning, and research initiatives
- Labs directory
DevUI: Interactive developer UI for agent development, testing, and debugging workflows
- See the DevUI in action

Getting Started
More Examples & Samples
Community & Feedback
Troubleshooting
Contributor Resources

Getting Started

Installation

Python

pip install agent-framework
# This will install all sub-packages, see `python/packages` for individual packages.
# It may take a minute on first install on Windows.

.NET

dotnet add package Microsoft.Agents.AI
# For Foundry integration (used in the .NET quickstart below):
dotnet add package Microsoft.Agents.AI.Foundry
dotnet add package Azure.AI.Projects
dotnet add package Azure.Identity

Learning Resources

Overview - High level overview of the framework
Quick Start - Get started with a simple agent
Tutorials - Step by step tutorials
User Guide - In-depth user guide for building agents and workflows
Migration from Semantic Kernel - Guide to migrate from Semantic Kernel
Migration from AutoGen - Guide to migrate from AutoGen

Quickstart

Basic Agent - Python

Create a simple Azure Responses Agent that writes a haiku about the Microsoft Agent Framework

# pip install agent-framework
# Use `az login` to authenticate with Azure CLI
import os
import asyncio
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential


async def main():
    # Initialize a chat agent with Microsoft Foundry
    # the endpoint, deployment name, and api version can be set via environment variables
    # or they can be passed in directly to the FoundryChatClient constructor
    agent = Agent(
      client=FoundryChatClient(
          credential=AzureCliCredential(),
          # project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
          # model=os.environ["FOUNDRY_MODEL_DEPLOYMENT_NAME"],
      ),
      name="HaikuAgent",
      instructions="You are an upbeat assistant that writes beautifully.",
    )

    print(await agent.run("Write a haiku about Microsoft Agent Framework."))

if __name__ == "__main__":
    asyncio.run(main())

Basic Agent - .NET

Create a simple Agent, using Microsoft Foundry that writes a haiku about the Microsoft Agent Framework

// This sample shows how to create and run a basic agent with AIProjectClient.AsAIAgent(...).

using Azure.AI.Projects;
using Azure.Identity;
using Microsoft.Agents.AI;

string endpoint = Environment.GetEnvironmentVariable("AZURE_AI_PROJECT_ENDPOINT") ?? throw new InvalidOperationException("AZURE_AI_PROJECT_ENDPOINT is not set.");
string deploymentName = Environment.GetEnvironmentVariable("AZURE_AI_MODEL_DEPLOYMENT_NAME") ?? "gpt-5.4-mini";

AIAgent agent =
    new AIProjectClient(new Uri(endpoint), new DefaultAzureCredential())
    .AsAIAgent(model: deploymentName, instructions: "You are an upbeat assistant that writes beautifully.", name: "HaikuAgent");

// Once you have the agent, you can invoke it like any other AIAgent.
Console.WriteLine(await agent.RunAsync("Write a haiku about Microsoft Agent Framework."));

More Examples & Samples

Python

Getting Started: progressive tutorial from hello-world to hosting
Agent Concepts: deep-dive samples by topic (tools, middleware, providers, etc.)
Workflows: workflow creation and integration with agents
Hosting: A2A, Azure Functions, Durable Task hosting
End-to-End: full applications, evaluation, and demos

.NET

Getting Started: progressive tutorial from hello agent to hosting
Agent Concepts: basic agent creation and tool usage
Agent Providers: samples showing different agent providers
Workflows: advanced multi-agent patterns and workflow orchestration
Hosting: A2A, Durable Agents, Durable Workflows
End-to-End: full applications and demos

Community & Feedback

Found a bug? File a GitHub issue to help us improve.
Enjoying MAF? to show your support and help others discover the project.
Have questions? Join our Discord or visit weekly office hours.

Troubleshooting

Authentication

Problem	Cause	Fix
Authentication errors when using Azure credentials	Not signed in to Azure CLI	Run `az login` before starting your app
API key errors	Wrong or missing API key	Verify the key and ensure it's for the correct resource/provider

Tip: DefaultAzureCredential is convenient for development but in production, consider using a specific credential (e.g., ManagedIdentityCredential) to avoid latency issues, unintended credential probing, and potential security risks from fallback mechanisms.

Environment Variables

For environment variable configuration specific to each sample, refer to the README in the sample directory (Python samples | .NET samples).

Contributor Resources

Important Notes

Important

If you use Microsoft Agent Framework to build applications that operate with any third-party servers, agents, code, or non-Azure Direct models (“Third-Party Systems”), you do so at your own risk. Third-Party Systems are Non-Microsoft Products under the Microsoft Product Terms and are governed by their own third-party license terms. You are responsible for any usage and associated costs.

We recommend reviewing all data being shared with and received from Third-Party Systems and being cognizant of third-party practices for handling, sharing, retention and location of data. It is your responsibility to manage whether your data will flow outside of your organization’s Azure compliance and geographic boundaries and any related implications, and that appropriate permissions, boundaries and approvals are provisioned.

You are responsible for carefully reviewing and testing applications you build using Microsoft Agent Framework in the context of your specific use cases, and making all appropriate decisions and customizations. This includes implementing your own responsible AI mitigations such as metaprompt, content filters, or other safety systems, and ensuring your applications meet appropriate quality, reliability, security, and trustworthiness standards. See also: Transparency FAQ

Languages

Python 50.9%

C# 45.8%

TypeScript 2.7%

HTML 0.2%

PowerShell 0.1%

Other 0.1%

README.md Unescape Escape

Welcome to Microsoft Agent Framework!

Is this the right framework for you?

Key Features

Table of Contents

Getting Started

Installation

Learning Resources

Quickstart

Basic Agent - Python

Basic Agent - .NET

More Examples & Samples

Python

.NET

Community & Feedback

Troubleshooting

Authentication

Environment Variables

Contributor Resources

Important Notes

README.md