mirror of https://github.com/microsoft/agent-framework.git synced 2026-06-16 21:04:09 +08:00

Files

T

Ben Thomas e0d0ad16a0 Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

* Python: feat(evals): RubricScore type + EvalScoreResult.dimensions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): RubricDimension + GeneratedEvaluatorRef + accept in evaluators=

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): parse rubric_scores from output items + assertion helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(evals): BaseAgent.as_eval_source / Workflow.as_eval_source

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): EvalGenerationSource + generate_rubric helper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): YAML config loader + sample

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: fix(evals): address PR review feedback

Addresses 4 Copilot review comments on PR #6101:

1. assert_dimension_score_at_least: drop the (not evaluator or found_any) guard so require_applicable=True correctly raises when the named evaluator produces no entries for the dimension. Adds TestRubricAssertions covering the regression.

2. GeneratedEvaluatorRef docstring: reword to describe actual behaviour (pinning recommended, not required) so it matches the dataclass default and FoundryEvals warning path.

3. _poll_generation_job: switch from asyncio.get_event_loop() to get_running_loop() and bound the per-iteration sleep by remaining time, matching _poll_eval_run.

4. generate_rubric: type category as Literal['quality','safety'] and validate at the entry point with a ValueError; drop the silent 'invalid -> quality' rewrite in _generation_job_to_ref. Adds a regression test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: feat(foundry-evals): hosted-agent-aware rubric generation

* Auto-detect hosted Foundry agents in agent_as_eval_source: when the
  agent's chat_client exposes a string agent_name (the convention used
  by RawFoundryAgentChatClient for PromptAgents/HostedAgents), emit a
  type='agent' EvalGenerationSource so the service fetches instructions
  and tools from the agent registry instead of relying on the local
  wrapper (which holds neither for hosted agents).
* Add hosted_agent_version kwarg and a new agent_version field on
  EvalGenerationSource so PromptAgent runs can pin to a specific hosted
  version for reproducible rubric generation.
* Add force_prompt_source escape hatch to bypass auto-detection and
  always emit a rendered prompt dossier - useful when the local wrapper
  carries overrides the service-side agent doesnt see.
* Fix _to_sdk_source for dataset sources: SDK ctor takes name=/version=,
  not dataset_name=/dataset_version=. The mismatch would raise TypeError
  against the real azure-ai-projects 2.3.0a* SDK; only unmocked
  integration paths were affected.

Tests cover: auto-detection happy path, versionless hosted agent,
explicit hosted_agent_version forwarding, force_prompt_source override,
non-string chat_client attrs (MagicMock test doubles) not mis-detected,
agent_version forwarded through _to_sdk_source, and the corrected
dataset SDK kwarg names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): accept canonical dimension_scores key per docs

The published Foundry rubric-evaluator output (Microsoft Learn 'Rubric evaluators' reference) places per-dimension breakdowns under properties.dimension_scores, not properties.rubric_scores. The parser now tries dimension_scores first and falls back to rubric_scores for preview-build compatibility, and tolerates non-list payloads (e.g. MagicMock auto-attrs) by trying the next candidate when parsing yields zero entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): add manual create_rubric_evaluator

Adds FoundryEvals.create_rubric_evaluator as the agent-framework surface over project_client.beta.evaluators.create_version. This is the manual counterpart to generate_rubric: callers supply RubricDimension instances (authored locally, ported from another framework, or hand-tuned) and we POST a RubricBasedEvaluatorDefinition. The service auto-attaches the non-editable residual dimension (general_quality for quality, general_policy_compliance for safety).

Per the Microsoft Learn 'Rubric evaluators' reference, the auto-generation path (create_generation_job) is primarily a portal/UI feature; external SDK clients with rich local agent context are better served by manual create_version. This keeps generate_rubric for users who want to round-trip through a Foundry-registered agent.

Validation up front: weight must be in [1,10], ids unique, descriptions non-empty, pass_threshold in [0,1]. The returned GeneratedEvaluatorRef is identical in shape to one obtained from generate_rubric, so downstream evaluators= lists work unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): manual rubric sample + namespace re-exports

Adds evaluate_with_manual_rubric_sample.py demonstrating the end-to-end dev scenario for FoundryEvals.create_rubric_evaluator: hand-author a list of RubricDimension, register via create_rubric_evaluator, then use the pinned GeneratedEvaluatorRef alongside built-in evaluators in an agent regression run.

Also re-exports RubricDimension, GeneratedEvaluatorRef, build_sources, and load_evals_config from agent_framework.foundry (both the lazy runtime shim and the type stub) so the rubric samples can import everything from a single namespace; the auto-generate sample was previously broken because the shim was missing build_sources / load_evals_config.

Updates the foundry-evals README with a chooser entry for the two rubric paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(foundry-evals): remove rubric creation flows; keep consumption only

Reframes agent-framework as a pure consumer of Foundry rubric evaluators: scoring against rubrics that already exist (authored in the Foundry portal or via the dedicated SDK / REST surface) instead of creating them from the SDK.

Removed creation surface area:

- FoundryEvals.generate_rubric (auto-generate path) and create_rubric_evaluator (manual path), plus all _GenerationSdkTypes / _ManualRubricSdkTypes / _to_sdk_dimensions / _coalesce_generation_sources / _to_sdk_source / _poll_generation_job / _generation_job_to_ref / _evaluator_version_to_ref / _get_beta_evaluators / _import_*_sdk_types helpers.

- EvalGenerationSource (the input source discriminator), RubricDimension (the input dimension type), agent_as_eval_source / workflow_as_eval_source / _detect_hosted_foundry_agent helpers, and the YAML-config loader (_evals_config.py with RubricGenerationSpec / RubricSourceSpec / parse_evals_config / load_evals_config / build_sources).

- BaseAgent.as_eval_source / Workflow.as_eval_source plus the _render_agent_dossier / _render_workflow_dossier helpers in core. These existed only to feed the now-removed generation pipeline.

- Samples evaluate_with_generated_rubric_sample.py, evaluate_with_manual_rubric_sample.py, and evaluators.yaml. Replaced with a short README section showing how to reference an existing rubric evaluator via GeneratedEvaluatorRef.

Kept (consumption surface):

- GeneratedEvaluatorRef, slimmed to (name, version, display_name). Still accepted alongside built-in evaluator strings in FoundryEvals(evaluators=[...]). Versionless refs still warn.

- RubricScore on EvalScoreResult.dimensions plus EvalResults.assert_dimension_score_at_least for per-dimension CI gates.

- _parse_dimension_entries / _extract_rubric_scores output parsing (both canonical dimension_scores and the legacy rubric_scores key).

Tests: 160/160 foundry unit tests and 71/71 core local-eval tests pass; pyright is clean across changed files. The pre-existing tests/core/test_telemetry.py::test_detect_hosted_fallback_import_error failure is unrelated and reproduces on the prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* samples(foundry-evals): add evaluate_with_rubric_sample

Adds a runnable end-to-end sample showing how to consume a pre-existing rubric evaluator created in Foundry: reference it with GeneratedEvaluatorRef(name, version), mix it with built-in evaluators in FoundryEvals, and gate CI with assert_dimension_score_at_least on a specific dimension.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(foundry-evals): satisfy mypy on _fetch_output_items

mypy infers OutputItemListResponse.sample as dict[str, object] | None while pyright correctly infers the typed Sample model. Cast to Any so both type checkers accept the attribute access pattern, rename the local to avoid shadowing the inner-loop sample binding, and drop the now-stale pyright suppressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(foundry-evals): drop unpublished rubric-evaluators learn.microsoft.com link

The Adaptive Evals authoring docs are not yet published on Microsoft Learn, so the link 404s. Keep the descriptive text without the broken hyperlink; we can re-add it once the docs ship.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(foundry-evals): hoist repeated local imports to module top

Per code review feedback (eavanvalkenburg): the test file repeated 'from agent_framework_foundry._foundry_evals import ...' inside 22 test bodies and 'from agent_framework_foundry import GeneratedEvaluatorRef' inside 8 more. Move all of them to the existing top-level imports; the symbols are the same across tests and the local imports were redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ben Thomas <25218250+alliscode@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

e0d0ad16a0 · 2026-06-01 23:01:56 +00:00

History

.github

[BREAKING] Python: Enable instrumentation by default (#5865 )

2026-05-20 11:52:08 +00:00

.vscode

Python: Simplify Python Poe tasks and unify package selectors (#4722 )

2026-03-18 18:39:11 +00:00

agent_framework_meta

Add metapackage metadata stub to restore flit builds (#1043 )

2025-10-01 07:20:33 +00:00

packages

Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

2026-06-01 23:01:56 +00:00

samples

Python: feat(evals): Foundry Adaptive Evals integration (rubric-generation) (#6101 )

2026-06-01 23:01:56 +00:00

scripts

Python: Adding AgentFileStore and FileAccessProvider to support file access operations. (#6099 )

2026-05-28 20:09:50 +00:00

tests/samples

Python: refresh dev dependencies and validate runtime bounds (#6238 )

2026-06-01 17:53:56 +00:00

.cspell.json

Python: Foundry hosted agent V2 (#5379 )

2026-04-21 05:21:27 +00:00

.env.example

Python: Add Mistral AI embedding client package (#5480 )

2026-05-29 07:20:56 +00:00

.pre-commit-config.yaml

Python: Bump uv from 0.11.3 to 0.11.6 in /python/packages/lab (#5469 )

2026-04-28 07:24:43 +00:00

AGENTS.md

Python: feat(foundry): add to_prompt_agent / deploy_as_prompt_agent (experimental) (#5959 )

2026-05-27 13:31:21 +00:00

CHANGELOG.md

Bump Python package versions for 1.7.0 release (#6142 )

2026-05-28 19:45:31 +09:00

CODING_STANDARD.md

[BREAKING] Python: Enable instrumentation by default (#5865 )

2026-05-20 11:52:08 +00:00

DEV_SETUP.md

Fix environment variable set statement in py DEV_SETUP (#5006 )

2026-03-31 18:34:04 +00:00

devsetup.sh

Python: replace pre-commit with prek, add PEP 723 script deps, clean up dev dependencies (#3748 )

2026-02-09 17:51:01 +00:00

LICENSE

Python: Packaging fixes (#1056 )

2025-10-01 11:54:26 +00:00

PACKAGE_STATUS.md

Python: Add Mistral AI embedding client package (#5480 )

2026-05-29 07:20:56 +00:00

pyproject.toml

Python: refresh dev dependencies and validate runtime bounds (#6238 )

2026-06-01 17:53:56 +00:00

pyrightconfig.samples.json

Python: restructure: Python samples into progressive 01-05 layout (#3862 )

2026-02-12 17:36:36 +00:00

pyrightconfig.samples.py310.json

Python: chore(python): improve dependency range automation (#4343 )

2026-03-13 12:32:37 +00:00

README.md

Python: [BREAKING] update to v1.0.0 (#5062 )

2026-04-02 15:26:30 +00:00

shared_tasks.toml

Python: Simplify Python Poe tasks and unify package selectors (#4722 )

2026-03-18 18:39:11 +00:00

uv.lock

Python: refresh dev dependencies and validate runtime bounds (#6238 )

2026-06-01 17:53:56 +00:00

README.md

Get Started with Microsoft Agent Framework for Python Developers

Quick Install

We recommend two common installation paths depending on your use case.

1. Development mode

If you are exploring or developing locally, install the entire framework with all sub-packages:

pip install agent-framework

This installs the core and every integration package, making sure that all features are available without additional steps. This is the simplest way to get started.

2. Selective install

If you only need specific integrations, you can install at a more granular level. This keeps dependencies lighter and focuses on what you actually plan to use. Some examples:

# Core only
# includes Azure OpenAI and OpenAI support by default
# also includes workflows and orchestrations
pip install agent-framework-core

# Core + Azure AI Foundry integration
pip install agent-framework-foundry

# Core + Microsoft Copilot Studio integration (preview package)
pip install agent-framework-copilotstudio --pre

# Core + both Microsoft Copilot Studio and Azure AI Foundry integration
pip install --pre agent-framework-copilotstudio agent-framework-foundry

This selective approach is useful when you know which integrations you need, and it is the recommended way to set up lightweight environments. Released packages such as agent-framework, agent-framework-core, and agent-framework-foundry no longer require --pre, while preview connectors such as agent-framework-copilotstudio still do.

Supported Platforms:

Python: 3.10+
OS: Windows, macOS, Linux

1. Setup API Keys

Set as environment variables, or create a .env file at your project root:

OPENAI_API_KEY=sk-...
OPENAI_MODEL=...
...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_MODEL=...
...
FOUNDRY_PROJECT_ENDPOINT=...
FOUNDRY_MODEL=...

For the generic OpenAI clients (OpenAIChatClient and OpenAIChatCompletionClient), configuration resolves in this order:

Explicit Azure inputs such as credential or azure_endpoint
OPENAI_API_KEY / explicit OpenAI API-key parameters
Azure environment fallback such as AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY

This means mixed shells default to OpenAI when OPENAI_API_KEY is present. To force Azure routing, pass an explicit Azure input such as credential=AzureCliCredential().

You can also override environment variables by explicitly passing configuration parameters to the chat client constructor:

from agent_framework.openai import OpenAIChatClient

client = OpenAIChatClient(
    api_key='',
    azure_endpoint='',
    model='',
    api_version='',
)

See the following setup guide for more information.

2. Create a Simple Agent

Create agents and invoke them directly:

import asyncio
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient

async def main():
    agent = Agent(
        client=OpenAIChatClient(),
        instructions="""
        1) A robot may not injure a human being...
        2) A robot must obey orders given it by human beings...
        3) A robot must protect its own existence...

        Give me the TLDR in exactly 5 words.
        """
    )

    result = await agent.run("Summarize the Three Laws of Robotics")
    print(result)

asyncio.run(main())
# Output: Protect humans, obey, self-preserve, prioritized.

3. Directly Use Chat Clients (No Agent Required)

You can use the chat client classes directly for advanced workflows:

import asyncio
from agent_framework import Message
from agent_framework.openai import OpenAIChatClient

async def main():
    client = OpenAIChatClient()

    messages = [
        Message("system", ["You are a helpful assistant."]),
        Message("user", ["Write a haiku about Agent Framework."])
    ]

    response = await client.get_response(messages)
    print(response.messages[0].text)

    """
    Output:

    Agents work in sync,
    Framework threads through each task—
    Code sparks collaboration.
    """

asyncio.run(main())

4. Build an Agent with Tools and Functions

Enhance your agent with custom tools and function calling:

import asyncio
from typing import Annotated
from random import randint
from pydantic import Field
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient


def get_weather(
    location: Annotated[str, Field(description="The location to get the weather for.")],
) -> str:
    """Get the weather for a given location."""
    conditions = ["sunny", "cloudy", "rainy", "stormy"]
    return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."


def get_menu_specials() -> str:
    """Get today's menu specials."""
    return """
    Special Soup: Clam Chowder
    Special Salad: Cobb Salad
    Special Drink: Chai Tea
    """


async def main():
    agent = Agent(
        client=OpenAIChatClient(),
        instructions="You are a helpful assistant that can provide weather and restaurant information.",
        tools=[get_weather, get_menu_specials]
    )

    response = await agent.run("What's the weather in Amsterdam and what are today's specials?")
    print(response)

    """
    Output:
    The weather in Amsterdam is sunny with a high of 22°C. Today's specials include
    Clam Chowder soup, Cobb Salad, and Chai Tea as the special drink.
    """

if __name__ == "__main__":
    asyncio.run(main())

You can explore additional agent samples here.

5. Multi-Agent Orchestration

Coordinate multiple agents to collaborate on complex tasks using orchestration patterns:

import asyncio
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient


async def main():
    # Create specialized agents
    writer = Agent(
        client=OpenAIChatClient(),
        name="Writer",
        instructions="You are a creative content writer. Generate and refine slogans based on feedback."
    )

    reviewer = Agent(
        client=OpenAIChatClient(),
        name="Reviewer",
        instructions="You are a critical reviewer. Provide detailed feedback on proposed slogans."
    )

    # Sequential workflow: Writer creates, Reviewer provides feedback
    task = "Create a slogan for a new electric SUV that is affordable and fun to drive."

    # Step 1: Writer creates initial slogan
    initial_result = await writer.run(task)
    print(f"Writer: {initial_result}")

    # Step 2: Reviewer provides feedback
    feedback_request = f"Please review this slogan: {initial_result}"
    feedback = await reviewer.run(feedback_request)
    print(f"Reviewer: {feedback}")

    # Step 3: Writer refines based on feedback
    refinement_request = f"Please refine this slogan based on the feedback: {initial_result}\nFeedback: {feedback}"
    final_result = await writer.run(refinement_request)
    print(f"Final Slogan: {final_result}")

    # Example Output:
    # Writer: "Charge Forward: Affordable Adventure Awaits!"
    # Reviewer: "Good energy, but 'Charge Forward' is overused in EV marketing..."
    # Final Slogan: "Power Up Your Adventure: Premium Feel, Smart Price!"

if __name__ == "__main__":
    asyncio.run(main())

For more advanced orchestration patterns including Sequential, Concurrent, Group Chat, Handoff, and Magentic orchestrations, see the orchestration samples.

More Examples & Samples

Getting Started with Agents: Basic agent creation and tool usage
Chat Client Examples: Direct chat client usage patterns
Foundry Integration: Microsoft Foundry integration
Workflow Samples: Advanced multi-agent patterns

README.md

Get Started with Microsoft Agent Framework for Python Developers

Quick Install

1. Development mode

2. Selective install

1. Setup API Keys

2. Create a Simple Agent

3. Directly Use Chat Clients (No Agent Required)

4. Build an Agent with Tools and Functions

5. Multi-Agent Orchestration

More Examples & Samples

Agent Framework Documentation